# Electrical Engineering and Systems Science

## New submissions

### New submissions for Tue, 17 May 22

[1]
Title: Unsupervised Representation Learning for 3D MRI Super Resolution with Degradation Adaptation
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Medical Physics (physics.med-ph)

High-resolution (HR) MRI is critical in assisting the doctor's diagnosis and image-guided treatment, but is hard to obtain in a clinical setting due to long acquisition time. Therefore, the research community investigated deep learning-based super-resolution (SR) technology to reconstruct HR MRI images with shortened acquisition time. However, training such neural networks usually requires paired HR and low-resolution (LR) in-vivo images, which are difficult to acquire due to patient movement during and between the image acquisition. Rigid movements of hard tissues can be corrected with image-registration, whereas the alignment of deformed soft tissues is challenging, making it impractical to train the neural network with such authentic HR and LR image pairs. Therefore, most of the previous studies proposed SR reconstruction by employing authentic HR images and synthetic LR images downsampled from the HR images, yet the difference in degradation representations between synthetic and authentic LR images suppresses the performance of SR reconstruction from authentic LR images. To mitigate the aforementioned problems, we propose a novel Unsupervised DEgradation Adaptation Network (UDEAN). Our model consists of two components: the degradation learning network and the SR reconstruction network. The degradation learning network downsamples the HR images by addressing the degradation representation of the misaligned or unpaired LR images, and the SR reconstruction network learns the mapping from the downsampled HR images to their original HR images. As a result, the SR reconstruction network can generate SR images from the LR images and achieve comparable quality to the HR images. Experimental results show that our method outperforms the state-of-the-art models and can potentially be applied in real-world clinical settings.

[2]
Title: A rigorous multi-population multi-lane hybrid traffic model and its mean-field limit for dissipation of waves via autonomous vehicles
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

In this paper, a multi-lane multi-population microscopic model, which presents stop and go waves, is proposed to simulate traffic on a ring-road. Vehicles are divided between human-driven and autonomous vehicles (AV). Control strategies are designed with the ultimate goal of using a small number of AVs (less than 5\% penetration rate) to represent Lagrangian control actuators that can smooth the multilane traffic flow and dissipate the stop-and-go waves. This in turn may reduce fuel consumption and emissions.
The lane-changing mechanism is based on three components that we treat as parameters in the model: safety, incentive and cool-down time. The choice of these parameters in the lane-change mechanism is critical to modeling traffic accurately, because different parameter values can lead to drastically different traffic behaviors. In particular, the number of lane-changes and the speed variance are highly affected by the choice of parameters. Despite this modeling issue, when using sufficiently simple and robust controllers for AVs, the stabilization of uniform flow steady-state is effective for any realistic value of the parameters, and ultimately bypasses the observed modeling issue. Our approach is based on accurate and rigorous mathematical models, which allows a limit procedure that is termed, in gas dynamic terminology, mean-field. In simple words, from increasing the human-driven population to infinity, a system of coupled ordinary and partial differential equations are obtained. Moreover, control problems also pass to the limit, allowing the design to be tackled at different scales.

[3]
Title: Task splitting for DNN-based acoustic echo and noise removal
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Neural networks have led to tremendous performance gains for single-task speech enhancement, such as noise suppression and acoustic echo cancellation (AEC). In this work, we evaluate whether it is more useful to use a single joint or separate modules to tackle these problems. We describe different possible implementations and give insights into their performance and efficiency. We show that using a separate echo cancellation module and a module for noise and residual echo removal results in less near-end speech distortion and better echo suppression, especially for double-talk.

[4]
Title: BronchusNet: Region and Structure Prior Embedded Representation Learning for Bronchus Segmentation and Classification
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

CT-based bronchial tree analysis plays an important role in the computer-aided diagnosis for respiratory diseases, as it could provide structured information for clinicians. The basis of airway analysis is bronchial tree reconstruction, which consists of bronchus segmentation and classification. However, there remains a challenge for accurate bronchial analysis due to the individual variations and the severe class imbalance. In this paper, we propose a region and structure prior embedded framework named BronchusNet to achieve accurate segmentation and classification of bronchial regions in CT images. For bronchus segmentation, we propose an adaptive hard region-aware UNet that incorporates multi-level prior guidance of hard pixel-wise samples in the general Unet segmentation network to achieve better hierarchical feature learning. For the classification of bronchial branches, we propose a hybrid point-voxel graph learning module to fully exploit bronchial structure priors and to support simultaneous feature interactions across different branches. To facilitate the study of bronchial analysis, we contribute~\textbf{BRSC}: an open-access benchmark of \textbf{BR}onchus imaging analysis with high-quality pixel-wise \textbf{S}egmentation masks and the \textbf{C}lass of bronchial segments. Experimental results on BRSC show that our proposed method not only achieves the state-of-the-art performance for binary segmentation of bronchial region but also exceeds the best existing method on bronchial branches classification by 6.9\%.

[5]
Title: SVR-based Observer Design for Unknown Linear Systems: Complexity and Performance
Subjects: Systems and Control (eess.SY)

In this paper we consider estimating the system parameters and designing stable observer for unknown noisy linear time-invariant (LTI) systems. We propose a Support Vector Regression (SVR) based estimator to provide adjustable asymmetric error interval for estimations. This estimator is capable to trade-off bias-variance of the estimation error by tuning parameter $\gamma > 0$ in the loss function. This method enjoys the same sample complexity of $\mathcal{O}(1/\sqrt{N})$ as the Ordinary Least Square (OLS) based methods but achieves a $\mathcal{O}(1/(\gamma+1))$ smaller variance. Then, a stable observer gain design procedure based on the estimations is proposed. The observation performance bound based on the estimations is evaluated by the mean square observation error, which is shown to be adjustable by tuning the parameter $\gamma$, thus achieving higher scalability than the OLS methods. The advantages of the estimation error bias-variance trade-off for observer design are also demonstrated through matrix spectrum and observation performance optimality analysis. Extensive simulation validations are conducted to verify the computed estimation error and performance optimality with different $\gamma$ and noise settings. The variances of the estimation error and the fluctuations in performance are smaller with a properly-designed parameter $\gamma$ compared with the OLS methods.

[6]
Title: Self-supervised Assisted Active Learning for Skin Lesion Segmentation
Comments: Accepted by the 44th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2022)
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Label scarcity has been a long-standing issue for biomedical image segmentation, due to high annotation costs and professional requirements. Recently, active learning (AL) strategies strive to reduce annotation costs by querying a small portion of data for annotation, receiving much traction in the field of medical imaging. However, most of the existing AL methods have to initialize models with some randomly selected samples followed by active selection based on various criteria, such as uncertainty and diversity. Such random-start initialization methods inevitably introduce under-value redundant samples and unnecessary annotation costs. For the purpose of addressing the issue, we propose a novel self-supervised assisted active learning framework in the cold-start setting, in which the segmentation model is first warmed up with self-supervised learning (SSL), and then SSL features are used for sample selection via latent feature clustering without accessing labels. We assess our proposed methodology on skin lesions segmentation task. Extensive experiments demonstrate that our approach is capable of achieving promising performance with substantial improvements over existing baselines.

[7]
Title: Performance Analysis of Irregular Repetition Slotted Aloha with Multi-Cell Interference
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)

Irregular repetition slotted aloha (IRSA) is a massive random access protocol in which users transmit several replicas of their packet over a frame to a base station. Existing studies have analyzed IRSA in the single-cell (SC) setup, which does not extend to the more practically relevant multi-cell (MC) setup due to the inter-cell interference. In this work, we analyze MC IRSA, accounting for pilot contamination and multiuser interference. Via numerical simulations, we illustrate that, in practical settings, MC IRSA can have a drastic loss of throughput, up to $70\%$, compared to SC IRSA. Further, MC IRSA requires a significantly higher training length (about 4-5x compared to SC IRSA), in order to support the same user density and achieve the same throughput. We also provide insights into the impact of the pilot length, number of antennas, and signal to noise ratio on the performance of MC IRSA.

[8]
Title: A Unifying Multi-sampling-ratio CS-MRI Framework With Two-grid-cycle Correction and Geometric Prior Distillation
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

CS is an efficient method to accelerate the acquisition of MR images from under-sampled k-space data. Although existing deep learning CS-MRI methods have achieved considerably impressive performance, explainability and generalizability continue to be challenging for such methods since most of them are not flexible enough to handle multi-sampling-ratio reconstruction assignments, often the transition from mathematical analysis to network design not always natural enough. In this work, to tackle explainability and generalizability, we propose a unifying deep unfolding multi-sampling-ratio CS-MRI framework, by merging advantages of model-based and deep learning-based methods. The combined approach offers more generalizability than previous works whereas deep learning gains explainability through a geometric prior module. Inspired by multigrid algorithm, we first embed the CS-MRI-based optimization algorithm into correction-distillation scheme that consists of three ingredients: pre-relaxation module, correction module and geometric prior distillation module. Furthermore, we employ a condition module to learn adaptively step-length and noise level from compressive sampling ratio in every stage, which enables the proposed framework to jointly train multi-ratio tasks through a single model. The proposed model can not only compensate the lost contextual information of reconstructed image which is refined from low frequency error in geometric characteristic k-space, but also integrate the theoretical guarantee of model-based methods and the superior reconstruction performances of deep learning-based methods. All physical-model parameters are learnable, and numerical experiments show that our framework outperforms state-of-the-art methods in terms of qualitative and quantitative evaluations.

[9]
Title: A Learning Approach for Joint Design of Event-triggered Control and Power-Efficient Resource Allocation
Comments: 14 pages, 12 figures, in IEEE Transactions on Vehicular Technology
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

In emerging Industrial Cyber-Physical Systems (ICPSs), the joint design of communication and control sub-systems is essential, as these sub-systems are interconnected. In this paper, we study the joint design problem of an event-triggered control and an energy-efficient resource allocation in a fifth generation (5G) wireless network. We formally state the problem as a multi-objective optimization one, aiming to minimize the number of updates on the actuators' input and the power consumption in the downlink transmission. To address the problem, we propose a model-free hierarchical reinforcement learning approach \textcolor{blue}{with uniformly ultimate boundedness stability guarantee} that learns four policies simultaneously. These policies contain an update time policy on the actuators' input, a control policy, and energy-efficient sub-carrier and power allocation policies. Our simulation results show that the proposed approach can properly control a simulated ICPS and significantly decrease the number of updates on the actuators' input as well as the downlink power consumption.

[10]
Title: Pretraining Approaches for Spoken Language Recognition: TalTech Submission to the OLR 2021 Challenge
Comments: Accepted to Speaker Odyssey 2022
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)

This paper investigates different pretraining approaches to spoken language identification. The paper is based on our submission to the Oriental Language Recognition 2021 Challenge. We participated in two tracks of the challenge: constrained and unconstrained language recognition. For the constrained track, we first trained a Conformer-based encoder-decoder model for multilingual automatic speech recognition (ASR), using the provided training data that had transcripts available. The shared encoder of the multilingual ASR model was then finetuned for the language identification task. For the unconstrained task, we relied on both externally available pretrained models as well as external data: the multilingual XLSR-53 wav2vec2.0 model was finetuned on the VoxLingua107 corpus for the language recognition task, and finally finetuned on the provided target language training data, augmented with CommonVoice data. Our primary metric $C_{\rm avg}$ values on the Test set are 0.0079 for the constrained task and 0.0119 for the unconstrained task which resulted in the second place in both rankings. In post-evaluation experiments, we study the amount of target language data needed for training an accurate backend model, the importance of multilingual pretraining data, and compare different models as finetuning starting points.

[11]
Title: Collar-aware Training for Streaming Speaker Change Detection in Broadcast Speech
Comments: Accepted to Speaker Odyssey 2022
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)

In this paper, we present a novel training method for speaker change detection models. Speaker change detection is often viewed as a binary sequence labelling problem. The main challenges with this approach are the vagueness of annotated change points caused by the silences between speaker turns and imbalanced data due to the majority of frames not including a speaker change. Conventional training methods tackle these by artificially increasing the proportion of positive labels in the training data. Instead, the proposed method uses an objective function which encourages the model to predict a single positive label within a specified collar. This is done by marginalizing over all possible subsequences that have exactly one positive label within the collar. Experiments on English and Estonian datasets show large improvements over the conventional training method. Additionally, the model outputs have peaks concentrated to a single frame, removing the need for post-processing to find the exact predicted change point which is particularly useful for streaming applications.

[12]
Title: Nearly optimal resolution estimate for the two-dimensional super-resolution and a new algorithm for direction of arrival estimation with uniform rectangular array
Subjects: Image and Video Processing (eess.IV)

In this paper, we develop a new technique to obtain nearly optimal estimates of the computational resolution limits introduced in Appl. Comput. Harmon. Anal. 56 (2022) 402-446; IEEE Trans. Inf. Theory 67(7) (2021) 4812-4827; Inverse Probl. 37(10) (2021) 104001 for two-dimensional super-resolution problems. Our main contributions are fivefold: (i) Our work improves the resolution estimate for number detection and location recovery in two-dimensional super-resolution problems to nearly optimal; (ii) As a consequence, we derive a stability result for a sparsity-promoting algorithm in two-dimensional super-resolution problems (or Direction of Arrival problems (DOA)). The stability result exhibits the optimal performance of sparsity promoting in solving such problems; (iii) Our techniques pave the way for improving the estimate for resolution limits in higher-dimensional super-resolutions to nearly optimal; (iv) Inspired by these new techniques, we propose a new coordinate-combination-based model order detection algorithm for two-dimensional DOA estimation and theoretically demonstrate its optimal performance, and (v) we also propose a new coordinate-combination-based MUSIC algorithm for super-resolving sources in two-dimensional DOA estimation. It has excellent performance and enjoys many advantages compared to the conventional DOA algorithms. The coordinate-combination idea seems to be a promising way for multi-dimensional DOA estimation.

[13]
Title: Interpretable Stochastic Model Predictive Control using Distributional Reinforced Estimation for Quadrotor Tracking Systems
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)

This paper presents a novel trajectory tracker for autonomous quadrotor navigation in dynamic and complex environments. The proposed framework integrates a distributional Reinforcement Learning (RL) estimator for unknown aerodynamic effects into a Stochastic Model Predictive Controller (SMPC) for trajectory tracking. Aerodynamic effects derived from drag forces and moment variations are difficult to model directly and accurately. Most current quadrotor tracking systems therefore treat them as simple `disturbances' in conventional control approaches. We propose Quantile-approximation-based Distributional Reinforced-disturbance-estimator, an aerodynamic disturbance estimator, to accurately identify disturbances, i.e., uncertainties between the true and estimated values of aerodynamic effects. Simplified Affine Disturbance Feedback is employed for control parameterization to guarantee convexity, which we then integrate with a SMPC to achieve sufficient and non-conservative control signals. We demonstrate our system to improve the cumulative tracking errors by at least 66% with unknown and diverse aerodynamic forces compared with recent state-of-the-art. Concerning traditional Reinforcement Learning's non-interpretability, we provide convergence and stability guarantees of Distributional RL and SMPC, respectively, with non-zero mean disturbances.

[14]
Title: Exponentially Stable Observer-based Controller for VTOL-UAV without Velocity Measurements
Authors: Hashim A. Hashim
Subjects: Systems and Control (eess.SY); Robotics (cs.RO)

There is a great demand for vision-based robotics solutions that can operate using Global Positioning Systems (GPS), but are also robust against GPS signal loss and gyroscope failure. This paper investigates the estimation and tracking control in application to a Vertical Take-Off and Landing (VTOL) Unmanned Aerial Vehicle (UAV) in six degrees of freedom (6 DoF). A full state observer for the estimation of VTOL-UAV motion parameters (attitude, angular velocity, position, and linear velocity) is proposed on the Lie Group of $\mathbb{SE}_{2}\left(3\right)\times\mathbb{R}^{3}$ $=\mathbb{SO}\left(3\right)\times\mathbb{R}^{9}$ with almost globally exponentially stable closed loop error signals. Thereafter, a full state observer-based controller for the VTOL-UAV motion parameters is proposed on the Lie Group with a guaranteed almost global exponential stability. The proposed approach produces good results without the need for angular and linear velocity measurements (without a gyroscope and GPS signals) utilizing only a set of known landmarks obtained by a vision-aided unit (monocular or stereo camera). The equivalent quaternion representation on $\mathbb{S}^{3}\times\mathbb{R}^{9}$ is provided in the Appendix. The observer-based controller is presented in a continuous form while its discrete version is tested using a VTOL-UAV simulation that incorporates large initial error and uncertain measurements. The proposed observer is additionally tested experimentally on a real-world UAV flight dataset. Keywords: Unmanned aerial vehicle, nonlinear filter algorithm, autonomous navigation, tracking control, feature measurement, observer-based controller, localization, exponential stability, asymptotic stability, inertial measurement unit (IMU), Global Positioning Systems (GPS), vision aided inertial navigation system.

[15]
Title: Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT
Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)

This paper investigates self-supervised pre-training for audio-visual speaker representation learning where a visual stream showing the speaker's mouth area is used alongside speech as inputs. Our study focuses on the Audio-Visual Hidden Unit BERT (AV-HuBERT) approach, a recently developed general-purpose audio-visual speech pre-training framework. We conducted extensive experiments probing the effectiveness of pre-training and visual modality. Experimental results suggest that AV-HuBERT generalizes decently to speaker related downstream tasks, improving label efficiency by roughly ten fold for both audio-only and audio-visual speaker verification. We also show that incorporating visual information, even just the lip area, greatly improves the performance and noise robustness, reducing EER by 38% in the clean condition and 75% in noisy conditions. Our code and models will be publicly available.

[16]
Title: Nonconvex ${L_ {1/2}}$-Regularized Nonlocal Self-similarity Denoiser for Compressive Sensing based CT Reconstruction
Authors: Yunyi Li (1), Yiqiu Jiang (2), Hengmin Zhang (3), Jianxun Liu (1), Xiangling Ding (1), Guan Gui (4) ((1) School of Computer Science and Engineering, Hunan University of Science and Technology (2) Department of Sports Medicine and Joint Surgery, Nanjing First Hospital, Nanjing Medical University (3) Department of Computer and Information Science, University of Macau (4) College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications)
Comments: Preprint submitted to Journal of The Franklin Institute. Corresponding Author: [email protected], [email protected]
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Compressive sensing (CS) based computed tomography (CT) image reconstruction aims at reducing the radiation risk through sparse-view projection data. It is usually challenging to achieve satisfying image quality from incomplete projections. Recently, the nonconvex ${{L_ {{1/2}}}}$-norm has achieved promising performance in sparse recovery, while the applications on imaging are unsatisfactory due to its nonconvexity. In this paper, we develop a ${{L_ {{1/2}}}}$-regularized nonlocal self-similarity (NSS) denoiser for CT reconstruction problem, which integrates low-rank approximation with group sparse coding (GSC) framework. Concretely, we first split the CT reconstruction problem into two subproblems, and then improve the CT image quality furtherly using our ${{L_ {{1/2}}}}$-regularized NSS denoiser. Instead of optimizing the nonconvex problem under the perspective of GSC, we particularly reconstruct CT image via low-rank minimization based on two simple yet essential schemes, which build the equivalent relationship between GSC based denoiser and low-rank minimization. Furtherly, the weighted singular value thresholding (WSVT) operator is utilized to optimize the resulting nonconvex ${{L_ {{1/2}}}}$ minimization problem. Following this, our proposed denoiser is integrated with the CT reconstruction problem by alternating direction method of multipliers (ADMM) framework. Extensive experimental results on typical clinical CT images have demonstrated that our approach can further achieve better performance than popular approaches.

[17]
Title: GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech Synthesis
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)

Style transfer for out-of-domain (OOD) speech synthesis aims to generate speech samples with unseen style (e.g., speaker identity, emotion, and prosody) derived from an acoustic reference, while facing the following challenges: 1) The highly dynamic style features in expressive voice are difficult to model and transfer; and 2) the TTS models should be robust enough to handle diverse OOD conditions that differ from the source data. This paper proposes GenerSpeech, a text-to-speech model towards high-fidelity zero-shot style transfer of OOD custom voice. GenerSpeech decomposes the speech variation into the style-agnostic and style-specific parts by introducing two components: 1) a multi-level style adaptor to efficiently model a large range of style conditions, including global speaker and emotion characteristics, and the local (utterance, phoneme, and word-level) fine-grained prosodic representations; and 2) a generalizable content adaptor with Mix-Style Layer Normalization to eliminate style information in the linguistic content representation and thus improve model generalization. Our evaluations on zero-shot style transfer demonstrate that GenerSpeech surpasses the state-of-the-art models in terms of audio quality and style similarity. The extension studies to adaptive style transfer further show that GenerSpeech performs robustly in the few-shot data setting. Audio samples are available at \url{https://GenerSpeech.github.io/}

[18]
Title: Improved Multi-step FCS-MPCC with Disturbance Compensation for PMSM Drives -- Methods and Experimental Validation
Subjects: Systems and Control (eess.SY)

In this paper, an improved multi-step finite control set model predictive current control (FCS-MPCC) strategy with speed loop disturbance compensation is proposed for permanent magnet synchronous machine (PMSM) drives system. A multi-step prediction mechanism is beneficial to significantly improve the steady-state performance of the motor system. While the conventional multi-step prediction has the defect of heavy computational burden, an improved multi-step finite control set model predictive current control (IM MPCC) strategy is proposed by developing a new multi-step prediction mechanism. Furthermore, in order to improve the dynamic response of the system, a disturbance compensation (DC) mechanism based on an extended state observer (ESO) is proposed to estimate and compensate the total disturbance in the speed loop of the PMSM system. Both simulation and experimental results validate the effectiveness of the proposed control strategy.

[19]
Title: Combating COVID-19 using Generative Adversarial Networks and Artificial Intelligence for Medical Images: A Scoping Review
Journal-ref: JMIR Medical Informatics, 2022
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

This review presents a comprehensive study on the role of GANs in addressing the challenges related to COVID-19 data scarcity and diagnosis. It is the first review that summarizes the different GANs methods and the lungs images datasets for COVID-19. It attempts to answer the questions related to applications of GANs, popular GAN architectures, frequently used image modalities, and the availability of source code. This review included 57 full-text studies that reported the use of GANs for different applications in COVID-19 lungs images data. Most of the studies (n=42) used GANs for data augmentation to enhance the performance of AI techniques for COVID-19 diagnosis. Other popular applications of GANs were segmentation of lungs and super-resolution of the lungs images. The cycleGAN and the conditional GAN were the most commonly used architectures used in nine studies each. 29 studies used chest X-Ray images while 21 studies used CT images for the training of GANs. For majority of the studies (n=47), the experiments were done and results were reported using publicly available data. A secondary evaluation of the results by radiologists/clinicians was reported by only two studies. Conclusion: Studies have shown that GANs have great potential to address the data scarcity challenge for lungs images of COVID-19. Data synthesized with GANs have been helpful to improve the training of the Convolutional Neural Network (CNN) models trained for the diagnosis of COVID-19. Besides, GANs have also contributed to enhancing the CNNs performance through the super-resolution of the images and segmentation. This review also identified key limitations of the potential transformation of GANs based methods in clinical applications.

[20]
Title: Chetaev Instability Framework for Kinetostatic Compliance-Based Protein Unfolding
Comments: Accepted for Publication in IEEE Control Systems Letters (L-CSS)
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC); Biomolecules (q-bio.BM)

Understanding the process of protein unfolding plays a crucial role in various applications such as design of folding-based protein engines. Using the well-established kinetostatic compliance (KCM)-based method for modeling of protein conformation dynamics and a recent nonlinear control theoretic approach to KCM-based protein folding, this paper formulates protein unfolding as a destabilizing control analysis/synthesis problem. In light of this formulation, it is shown that the Chetaev instability framework can be used to investigate the KCM-based unfolding dynamics. In particular, a Chetaev function for analysis of unfolding dynamics under the effect of optical tweezers and a class of control Chetaev functions for synthesizing control inputs that elongate protein strands from their folded conformations are presented. Based on the presented control Chetaev function, an unfolding input is derived from the Artstein-Sontag universal formula and the results are compared against optical tweezer-based unfolding.

[21]
Title: Learning Representations for New Sound Classes With Continual Self-Supervised Learning
Comments: Submitted to IEEE Signal Processing Letters
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)

In this paper, we present a self-supervised learning framework for continually learning representations for new sound classes. The proposed system relies on a continually trained neural encoder that is trained with similarity-based learning objectives without using labels. We show that representations learned with the proposed method generalize better and are less susceptible to catastrophic forgetting than fully-supervised approaches. Remarkably, our technique does not store past data or models and is more computationally efficient than distillation-based methods. To accurately assess the system performance, in addition to using existing protocols, we propose two realistic evaluation protocols that use only a small amount of labeled data to simulate practical use cases.

[22]
Title: Statistical Modeling and Forecasting of Automatic Generation Control Signals
Subjects: Systems and Control (eess.SY)

The performance of frequency regulating units for automatic generation control (AGC) of power systems depends on their ability to track the AGC signal accurately. In addition, representative models and advanced analysis and analytics can yield forecasts of the AGC signal that aids in controller design. In this paper, time-series analyses are conducted on an AGC signal, specifically the PJM Reg-D, and using the results, a statistical model is derived that fairly accurately captures its second moments and saturated nature, as well as a time-series-based predictive model to provide forecasts. As an application, the predictive model is used in a model predictive control framework to ensure optimal tracking performance of a down ramp-limited distributed energy resource coordination scheme. The results provide valuable insight into the properties of the AGC signal and indicate the effectiveness of these models in replicating its behavior.

[23]
Title: A Tutorial on Decoding Techniques of Sparse Code Multiple Access
Subjects: Signal Processing (eess.SP)

Sparse Code Multiple Access (SCMA) is a disruptive code-domain non-orthogonal multiple access (NOMA) scheme to enable \color{black}future massive machine-type communication networks. As an evolved variant of code division multiple access (CDMA), multiple users in SCMA are separated by assigning distinctive sparse codebooks (CBs). Efficient multiuser detection is carried out at the receiver by employing the message passing algorithm (MPA) that exploits the sparsity of CBs to achieve error performance approaching to that of the maximum likelihood receiver. In spite of numerous research efforts in recent years, a comprehensive one-stop tutorial of SCMA covering the background, the basic principles, and new advances, is still missing, to the best of our knowledge. To fill this gap and to stimulate more forthcoming research, we provide a holistic introduction to the principles of SCMA encoding, CB design, and MPA based decoding in a self-contained manner. As an ambitious paper aiming to push the limits of SCMA, we present a survey of advanced decoding techniques with brief algorithmic descriptions as well as several promising directions.

[24]
Title: Adaptive Convolutional Dictionary Network for CT Metal Artifact Reduction
Journal-ref: the 31st International Joint Conference on Artificial Intelligence 2022
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Inspired by the great success of deep neural networks, learning-based methods have gained promising performances for metal artifact reduction (MAR) in computed tomography (CT) images. However, most of the existing approaches put less emphasis on modelling and embedding the intrinsic prior knowledge underlying this specific MAR task into their network designs. Against this issue, we propose an adaptive convolutional dictionary network (ACDNet), which leverages both model-based and learning-based methods. Specifically, we explore the prior structures of metal artifacts, e.g., non-local repetitive streaking patterns, and encode them as an explicit weighted convolutional dictionary model. Then, a simple-yet-effective algorithm is carefully designed to solve the model. By unfolding every iterative substep of the proposed algorithm into a network module, we explicitly embed the prior structure into a deep network, \emph{i.e.,} a clear interpretability for the MAR task. Furthermore, our ACDNet can automatically learn the prior for artifact-free CT images via training data and adaptively adjust the representation kernels for each input CT image based on its content. Hence, our method inherits the clear interpretability of model-based methods and maintains the powerful representation ability of learning-based methods. Comprehensive experiments executed on synthetic and clinical datasets show the superiority of our ACDNet in terms of effectiveness and model generalization. {\color{blue}{{\textit{Code is available at {\url{https://github.com/hongwang01/ACDNet}}.}}}}

[25]
Title: Learning-Based sensitivity analysis and feedback design for drug delivery of mixed therapy of cancer in the presence of high model uncertainties
Authors: Mazen Alamir
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

In this paper, a methodology is proposed that enables to analyze the sensitivity of the outcome of a therapy to unavoidable high dispersion of the patient specific parameters on one hand and to the choice of the parameters that define the drug delivery feedback strategy on the other hand. More precisely, a method is given that enables to extract and rank the most influent parameters that determine the probability of success/failure of a given feedback therapy for a given set of initial conditions over a cloud of realizations of uncertainties. Moreover predictors of the expectations of the amounts of drugs being used can also be derived. This enables to design an efficient stochastic optimization framework that guarantees safe contraction of the tumor while minimizing a weighted sum of the quantities of the different drugs being used. The framework is illustrated and validated using the example of a mixed therapy of cancer involving three combined drugs namely: a chemotherapy drug, an immunology vaccine and an immunotherapy drug. Finally, in this specific case, it is shown that dash-boards can be built in the 2D-space of the most influent state components that summarize the outcomes' probabilities and the associated drug usage as iso-values curves in the reduced state space.

[26]
Title: Double-Sided Information Aided Temporal-Correlated Massive Access
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)

This letter considers temporal-correlated massive access, where each device, once activated, is likely to transmit continuously over several consecutive frames. Motivated by that the device activity at each frame is correlated to not only its previous frame but also its next frame, we propose a double-sided information (DSI) aided joint activity detection and channel estimation algorithm based on the approximate message passing (AMP) framework. The DSI is extracted from the estimation results in a sliding window that contains the target detection frame and its previous and next frames. The proposed algorithm demonstrates superior performance over the state-of-the-art methods.

[27]
Title: State of Health Estimation of Lithium-Ion Batteries in Vehicle-to-Grid Applications Using Recurrent Neural Networks for Learning the Impact of Degradation Stress Factors
Subjects: Systems and Control (eess.SY)

This work presents an effective state of health indicator to indicate lithium-ion battery degradation based on a long short-term memory (LSTM) recurrent neural network (RNN) coupled with a sliding-window. The developed LSTM RNN is able to capture the underlying long-term dependencies of degraded cell capacity on battery degradation stress factors. The learning performance was robust when there was sufficient training data, with an error of < 5% if more than 1.15 years worth of data was supplied for training.

[28]
Title: A Framework to Map VMAF with the Probability of Just Noticeable Difference between Video Encoding Recipes
Subjects: Image and Video Processing (eess.IV)

Just Noticeable Difference (JND) model developed based on Human Vision System (HVS) through subjective studies is valuable for many multimedia use cases. In the streaming industries, it is commonly applied to reach a good balance between compression efficiency and perceptual quality when selecting video encoding recipes. Nevertheless, recent state-of-the-art deep learning based JND prediction model relies on large-scale JND ground truth that is expensive and time consuming to collect. Most of the existing JND datasets contain limited number of contents and are limited to a certain codec (e.g., H264). As a result, JND prediction models that were trained on such datasets are normally not agnostic to the codecs. To this end, in order to decouple encoding recipes and JND estimation, we propose a novel framework to map the difference of objective Video Quality Assessment (VQA) scores, i.e., VMAF, between two given videos encoded with different encoding recipes from the same content to the probability of having just noticeable difference between them. The proposed probability mapping model learns from DCR test data, which is significantly cheaper compared to standard JND subjective test. As we utilize objective VQA metric (e.g., VMAF that trained with contents encoded with different codecs) as proxy to estimate JND, our model is agnostic to codecs and computationally efficient. Throughout extensive experiments, it is demonstrated that the proposed model is able to estimate JND values efficiently.

[29]
Title: DMRF-UNet: A Two-Stage Deep Learning Scheme for GPR Data Inversion under Heterogeneous Soil Conditions
Subjects: Signal Processing (eess.SP); Image and Video Processing (eess.IV)

Traditional ground-penetrating radar (GPR) data inversion leverages iterative algorithms which suffer from high computation costs and low accuracy when applied to complex subsurface scenarios. Existing deep learning-based methods focus on the ideal homogeneous subsurface environments and ignore the interference due to clutters and noise in real-world heterogeneous environments. To address these issues, a two-stage deep neural network (DNN), called DMRF-UNet, is proposed to reconstruct the permittivity distributions of subsurface objects from GPR B-scans under heterogeneous soil conditions. In the first stage, a U-shape DNN with multi-receptive-field convolutions (MRF-UNet1) is built to remove the clutters due to inhomogeneity of the heterogeneous soil. Then the denoised B-scan from the MRF-UNet1 is combined with the noisy B-scan to be inputted to the DNN in the second stage (MRF-UNet2). The MRF-UNet2 learns the inverse mapping relationship and reconstructs the permittivity distribution of subsurface objects. To avoid information loss, an end-to-end training method combining the loss functions of two stages is introduced. A wide range of subsurface heterogeneous scenarios and B-scans are generated to evaluate the inversion performance. The test results in the numerical experiment and the real measurement show that the proposed network reconstructs the permittivities, shapes, sizes, and locations of subsurface objects with high accuracy. The comparison with existing methods demonstrates the superiority of the proposed methodology for the inversion under heterogeneous soil conditions.

[30]
Title: Weakly-supervised Biomechanically-constrained CT/MRI Registration of the Spine
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

CT and MRI are two of the most informative modalities in spinal diagnostics and treatment planning. CT is useful when analysing bony structures, while MRI gives information about the soft tissue. Thus, fusing the information of both modalities can be very beneficial. Registration is the first step for this fusion. While the soft tissues around the vertebra are deformable, each vertebral body is constrained to move rigidly. We propose a weakly-supervised deep learning framework that preserves the rigidity and the volume of each vertebra while maximizing the accuracy of the registration. To achieve this goal, we introduce anatomy-aware losses for training the network. We specifically design these losses to depend only on the CT label maps since automatic vertebra segmentation in CT gives more accurate results contrary to MRI. We evaluate our method on an in-house dataset of 167 patients. Our results show that adding the anatomy-aware losses increases the plausibility of the inferred transformation while keeping the accuracy untouched.

[31]
Title: A Nonlinear Lateral Controller Design for Vehicle Path-following with an Arbitrary Sensor Location
Comments: 11 pages, 9 figures, 1 table, submitted to IEEE Transactions on Intelligent Vehicles
Subjects: Systems and Control (eess.SY)

This paper investigates the lateral control problem in vehicular path-following when the feedback sensor(s) are mounted at an arbitrary location in the longitudinal symmetric axis. We point out that some existing literature has abused the kinematic bicycle model describing the motion of rear axle center for other locations, which may lead to poor performance in practical implementations. A new nonlinear controller with low-complexity and high-maneuverability is then designed that takes into account senor mounting location, driving comfort and transient response with large initial errors. Design insights and intuitions are also provided in detail. Furthermore, analysis on stability and tracking performance for the closed-loop system are studied, and conditions and guidelines are provided on the selection of control parameters. Comprehensive simulations are performed to demonstrate the efficacy of the proposed nonlinear controller for arbitrary sensor locations. Meanwhile, we also show that designing controllers ignoring the sensor location may lead to unexpected vehicular sway motion in non-straight paths.

[32]
Title: JR2net: A Joint Non-Linear Representation and Recovery Network for Compressive Spectral Imaging
Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG)

Deep learning models are state-of-the-art in compressive spectral imaging (CSI) recovery. These methods use a deep neural network (DNN) as an image generator to learn non-linear mapping from compressed measurements to the spectral image. For instance, the deep spectral prior approach uses a convolutional autoencoder network (CAE) in the optimization algorithm to recover the spectral image by using a non-linear representation. However, the CAE training is detached from the recovery problem, which does not guarantee optimal representation of the spectral images for the CSI problem. This work proposes a joint non-linear representation and recovery network (JR2net), linking the representation and recovery task into a single optimization problem. JR2net consists of an optimization-inspired network following an ADMM formulation that learns a non-linear low-dimensional representation and simultaneously performs the spectral image recovery, trained via the end-to-end approach. Experimental results show the superiority of the proposed method with improvements up to 2.57 dB in PSNR and performance around 2000 times faster than state-of-the-art methods.

[33]
Title: Switch as a Verifier: Toward Scalable Data Plane Checking via Distributed, On-Device Verification
Subjects: Systems and Control (eess.SY)

Data plane verification (DPV) is important for finding network errors. Current DPV tools employ a centralized architecture, where a server collects the data planes of all devices and verifies them. Despite substantial efforts on accelerating DPV, this centralized architecture is inherently unscalable. In this paper, to tackle the scalability challenge of DPV, we circumvent the scalability bottleneck of centralized design and design Coral, a distributed, on-device DPV framework. The key insight of Coral is that DPV can be transformed into a counting problem on a directed acyclic graph, which can be naturally decomposed into lightweight tasks executed at network devices, enabling scalability. Coral consists of (1) a declarative requirement specification language, (2) a planner that employs a novel data structure DVNet to systematically decompose global verification into on-device counting tasks, and (3) a distributed verification (DV) protocol that specifies how on-device verifiers communicate task results efficiently to collaboratively verify the requirements. We implement a prototype of Coral. Extensive experiments with real-world datasets (WAN/LAN/DC) show that Coral consistently achieves scalable DPV under various networks and DPV scenarios, i.e., up to 1250 times speed up in the scenario of burst update, and up to 202 times speed up on 80% quantile of incremental verification, than state-of-the-art DPV tools, with little overhead on commodity network devices.

[34]
Title: Multi-ship cooperative air defense model based on queuing theory
Subjects: Systems and Control (eess.SY)

The study of the multi-ship air defense model is of great significance in the simulation and evaluation of the actual combat process, the demonstration of air defense tactics, and the improvement of the security of important targets. The traditional multi-ship air defense model does not consider the coordination between ships, and the model assumptions are often too simple to effectively describe the capabilities of the multi-ship cooperative air defense system in realistic combat scenarios. In response to the above problems, this paper proposes a multi-ship cooperative air defense model, which effectively integrates the attack and defense parameters of both sides such as missile launch rate, missile flight speed, missile launch direction, ship interception rate, ship interception range, and the number of ship interception fire units. Then, the cooperative interception capability among ships is modeled by the method of task assignment. Based on the queuing theory, this paper strictly deduces the penetration probability of the cooperative air defense system, and provides an analytical calculation model for the analysis and design of the cooperative air defense system. Finally, through simulation experiments in typical scenarios, this paper studies and compares the air defense capabilities of the system in two different modes with and without coordination, and verifies the superiority of the multi-ship cooperative air defense model in reducing the probability of missile penetration; Further, the ability changes of the defense system under different parameters such as missile speed, speed, angle, ship interception rate, range, and number of fire units are studied, and the weak points of the defense formation, defense range settings, and interception settings are obtained.

### Cross-lists for Tue, 17 May 22

[35]  arXiv:2205.06814 (cross-list from cs.IT) [pdf, other]
Title: Deep Reinforcement Learning in mmW-NOMA: Joint Power Allocation and Hybrid Beamforming
Comments: 20 pages (single Column), 9 figures. arXiv admin note: text overlap with arXiv:2205.06489
Subjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)

High demand of data rate in the next generation of wireless communication could be ensured by Non-Orthogonal Multiple Access (NOMA) approach in the millimetre-wave (mmW) frequency band. Decreasing the interference on the other users while maintaining the bit rate via joint power allocation and beamforming is mandatory to guarantee the high demand of bit-rate. Furthermore, mmW frequency bands dictates the hybrid structure for beamforming because of the trade-off in implementation and performance, simultaneously. In this paper, joint power allocation and hybrid beamforming of mmW-NOMA systems is brought up via recent advances in machine learning and control theory approaches called Deep Reinforcement Learning (DRL). Actor-critic phenomena is exploited to measure the immediate reward and providing the new action to maximize the overall Q-value of the network. Additionally, to improve the stability of the approach, we have utilized Soft Actor-Critic (SAC) approach where overall reward and action entropy is maximized, simultaneously. The immediate reward has been defined based on the soft weighted summation of the rate of all the users. The soft weighting is based on the achieved rate and allocated power of each user. Furthermore, the channel responses between the users and base station (BS) is defined as the state of environment, while action space is involved of the digital and analog beamforming weights and allocated power to each user. The simulation results represent the superiority of the proposed approach rather than the Time-Division Multiple Access (TDMA) and Non-Line of Sight (NLOS)-NOMA in terms of sum-rate of the users. It's outperformance is caused by the joint optimization and independency of the proposed approach to the channel responses.

[36]  arXiv:2205.06861 (cross-list from cs.IT) [pdf, ps, other]
Title: QoS-Aware User Scheduling in Crowded XL-MIMO Systems Under Non-Stationary Multi-State LoS/NLoS Channels
Comments: 12 pages, 12 figures, 1 table
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)

Providing minimum quality-of-service (QoS) in crowded wireless communications systems, with high user density, is challenging due to the network structure with limited transmit power budget and resource blocks. Smart resource allocation methods, such as user scheduling, power allocation, and modulation and coding scheme selection, must be implemented to cope with the challenge. Aiming to enhance the number of served users with minimum QoS in the downlink (DL) channel of crowded extra-large scale massive multiple-input multiple-output (XL-MIMO) systems, in this paper we propose a QoS-aware joint user scheduling and power allocation technique. The proposed technique is constituted by two sequential procedures: the clique search-based scheduling (CBS) algorithm for user scheduling followed by optimal power allocation with transmit power budget and minimum achievable rate per user constraints. To accurately evaluate the proposed technique in the XL-MIMO scenario, we propose a generalized non-stationary multi-state channel model based on spherical-wave propagation assuming that users under LoS and NLoS transmission coexist in the same communication cell. Such model considers that users under different channel states experience different propagation aspects both in the multipath fading model and the path loss rule. Numerical results on the achievable sum-rate, number of scheduled users, and distribution of the scheduled users reveal that the proposed CBS algorithm provides a fair coverage over the whole cell area, achieving remarkable numbers of scheduled users when users under the LoS and NLoS channel states coexist in the communication cell.

[37]  arXiv:2205.06896 (cross-list from cs.LG) [pdf, ps, other]
Title: Robustness of Control Design via Bayesian Learning
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)

In the realm of supervised learning, Bayesian learning has shown robust predictive capabilities under input and parameter perturbations. Inspired by these findings, we demonstrate the robustness properties of Bayesian learning in the control search task. We seek to find a linear controller that stabilizes a one-dimensional open-loop unstable stochastic system. We compare two methods to deduce the controller: the first (deterministic) one assumes perfect knowledge of system parameter and state, the second takes into account uncertainties in both and employs Bayesian learning to compute a posterior distribution for the controller.

[38]  arXiv:2205.06908 (cross-list from cs.RO) [pdf, other]
Title: Neural-Fly Enables Rapid Learning for Agile Flight in Strong Winds
Comments: This is the accepted version of Science Robotics Vol. 7, Issue 66, eabm6597 (2022). Video: this https URL
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY)

Executing safe and precise flight maneuvers in dynamic high-speed winds is important for the ongoing commoditization of uninhabited aerial vehicles (UAVs). However, because the relationship between various wind conditions and its effect on aircraft maneuverability is not well understood, it is challenging to design effective robot controllers using traditional control design methods. We present Neural-Fly, a learning-based approach that allows rapid online adaptation by incorporating pretrained representations through deep learning. Neural-Fly builds on two key observations that aerodynamics in different wind conditions share a common representation and that the wind-specific part lies in a low-dimensional space. To that end, Neural-Fly uses a proposed learning algorithm, domain adversarially invariant meta-learning (DAIML), to learn the shared representation, only using 12 minutes of flight data. With the learned representation as a basis, Neural-Fly then uses a composite adaptation law to update a set of linear coefficients for mixing the basis elements. When evaluated under challenging wind conditions generated with the Caltech Real Weather Wind Tunnel, with wind speeds up to 43.6 kilometers/hour (12.1 meters/second), Neural-Fly achieves precise flight control with substantially smaller tracking error than state-of-the-art nonlinear and adaptive controllers. In addition to strong empirical performance, the exponential stability of Neural-Fly results in robustness guarantees. Last, our control design extrapolates to unseen wind conditions, is shown to be effective for outdoor flights with only onboard sensors, and can transfer across drones with minimal performance degradation.

[39]  arXiv:2205.06963 (cross-list from cs.CL) [pdf, other]
Title: Improved Consistency Training for Semi-Supervised Sequence-to-Sequence ASR via Speech Chain Reconstruction and Self-Transcribing
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Consistency regularization has recently been applied to semi-supervised sequence-to-sequence (S2S) automatic speech recognition (ASR). This principle encourages an ASR model to output similar predictions for the same input speech with different perturbations. The existing paradigm of semi-supervised S2S ASR utilizes SpecAugment as data augmentation and requires a static teacher model to produce pseudo transcripts for untranscribed speech. However, this paradigm fails to take full advantage of consistency regularization. First, the masking operations of SpecAugment may damage the linguistic contents of the speech, thus influencing the quality of pseudo labels. Second, S2S ASR requires both input speech and prefix tokens to make the next prediction. The static prefix tokens made by the offline teacher model cannot match dynamic pseudo labels during consistency training. In this work, we propose an improved consistency training paradigm of semi-supervised S2S ASR. We utilize speech chain reconstruction as the weak augmentation to generate high-quality pseudo labels. Moreover, we demonstrate that dynamic pseudo transcripts produced by the student ASR model benefit the consistency training. Experiments on LJSpeech and LibriSpeech corpora show that compared to supervised baselines, our improved paradigm achieves a 12.2% CER improvement in the single-speaker setting and 38.6% in the multi-speaker setting.

[40]  arXiv:2205.06971 (cross-list from cs.IT) [pdf, ps, other]
Title: Design of a Reconfigurable Intelligent Surface-Assisted FM-DCSK-SWIPT Scheme with Non-linear Energy Harvesting Model
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

In this paper, we propose a reconfigurable intelligent surface (RIS)-assisted frequency-modulated (FM) differential chaos shift keying (DCSK) scheme with simultaneous wireless information and power transfer (SWIPT), called RIS-FM-DCSK-SWIPT scheme, for low-power, low-cost, and high-reliability wireless communication networks. In particular, the proposed scheme is developed under a non-linear energy-harvesting (EH) model which can accurately characterize the practical situation. The proposed RIS-FM-DCSK-SWIPT scheme has an appealing feature that it does not require channel state information, thus avoiding the complex channel estimation. We derive the theoretical expressions for the energy shortage probability and bit error rate (BER) of the proposed scheme over the multipath Rayleigh fading channel. We investigate the influence of key parameters on the performance of the proposed scheme in two different scenarios, i.e., RIS-assisted access point (RIS-AP) and dual-hop communication (RIS-DH). Finally, we carry out various Monte-Carlo experiments to verify the accuracy of the theoretical derivation, and illustrate the performance advantage of the proposed scheme over the existing DCSK-SWIPT schemes.

[41]  arXiv:2205.07019 (cross-list from cs.CV) [pdf, other]
Title: Evaluating the Generalization Ability of Super-Resolution Networks
Comments: First Generalization Assessment Index for SR networks
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Performance and generalization ability are two important aspects to evaluate deep learning models. However, research on the generalization ability of Super-Resolution (SR) networks is currently absent. We make the first attempt to propose a Generalization Assessment Index for SR networks, namely SRGA. SRGA exploits the statistical characteristics of internal features of deep networks, not output images to measure the generalization ability. Specially, it is a non-parametric and non-learning metric. To better validate our method, we collect a patch-based image evaluation set (PIES) that includes both synthetic and real-world images, covering a wide range of degradations. With SRGA and PIES dataset, we benchmark existing SR models on the generalization ability. This work could lay the foundation for future research on model generalization in low-level vision.

[42]  arXiv:2205.07079 (cross-list from cs.NI) [pdf]
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)

[43]  arXiv:2205.07092 (cross-list from cs.IT) [pdf, other]
Title: Blind Goal-Oriented Massive Access for Future Wireless Networks
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Emerging communication networks are envisioned to support massive wireless connectivity of heterogeneous devices with sporadic traffic and diverse requirements in terms of latency, reliability, and bandwidth. Providing multiple access to an increasing number of uncoordinated users and sharing the limited resources become essential in this context. In this work, we revisit the random access (RA) problem and exploit the continuous angular group sparsity feature of wireless channels to propose a novel RA strategy that provides low latency, high reliability, and massive access with limited bandwidth resources in an all-in-one package. To this end, we first design a reconstruction-free goal-oriented optimization problem, which only preserves the angular information required to identify the active devices. To solve this, we propose an alternating direction method of multipliers (ADMM) and derive closed-form expressions for each ADMM step. Then, we design a clustering algorithm that assigns the users in specific groups from which we can identify active stationary devices by their angles. For mobile devices, we propose an alternating minimization algorithm to recover their data and their channel gains simultaneously, which allows us to identify active mobile users. Simulation results show significant performance gains in terms of active user detection and false alarm probabilities as compared to state-of-the-art RA schemes, even with limited number of preambles. Moreover, unlike prior work, the performance of the proposed blind goal-oriented massive access does not depend on the number of devices.

[44]  arXiv:2205.07099 (cross-list from cs.CV) [pdf, other]
Title: Differentiable SAR Renderer and SAR Target Reconstruction
Authors: Shilei Fu, Feng Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Forward modeling of wave scattering and radar imaging mechanisms is the key to information extraction from synthetic aperture radar (SAR) images. Like inverse graphics in optical domain, an inherently-integrated forward-inverse approach would be promising for SAR advanced information retrieval and target reconstruction. This paper presents such an attempt to the inverse graphics for SAR imagery. A differentiable SAR renderer (DSR) is developed which reformulates the mapping and projection algorithm of SAR imaging mechanism in the differentiable form of probability maps. First-order gradients of the proposed DSR are then analytically derived which can be back-propagated from rendered image/silhouette to the target geometry and scattering attributes. A 3D inverse target reconstruction algorithm from SAR images is devised. Several simulation and reconstruction experiments are conducted, including targets with and without background, using both synthesized data or real measured inverse SAR (ISAR) data by ground radar. Results demonstrate the efficacy of the proposed DSR and its inverse approach.

[45]  arXiv:2205.07100 (cross-list from cs.CL) [pdf, other]
Title: Multiformer: A Head-Configurable Transformer-Based Model for Direct Speech Translation
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Transformer-based models have been achieving state-of-the-art results in several fields of Natural Language Processing. However, its direct application to speech tasks is not trivial. The nature of this sequences carries problems such as long sequence lengths and redundancy between adjacent tokens. Therefore, we believe that regular self-attention mechanism might not be well suited for it.
Different approaches have been proposed to overcome these problems, such as the use of efficient attention mechanisms. However, the use of these methods usually comes with a cost, which is a performance reduction caused by information loss. In this study, we present the Multiformer, a Transformer-based model which allows the use of different attention mechanisms on each head. By doing this, the model is able to bias the self-attention towards the extraction of more diverse token interactions, and the information loss is reduced. Finally, we perform an analysis of the head contributions, and we observe that those architectures where all heads relevance is uniformly distributed obtain better results. Our results show that mixing attention patterns along the different heads and layers outperforms our baseline by up to 0.7 BLEU.

[46]  arXiv:2205.07108 (cross-list from cs.HC) [pdf, other]
Title: Formalizing PQRST Complex in Accelerometer-based Gait Cycle for Authentication
Subjects: Human-Computer Interaction (cs.HC); Signal Processing (eess.SP)

Accelerometer signals generated through gait present a new frontier of human interface with mobile devices. Gait cycle detection based on these signals has applications in various areas, including authentication, health monitoring, and activity detection. Template-based studies focus on how the entire gait cycle represents walking patterns, but these are compute-intensive. Aggregate feature-based studies extract features in the time domain and frequency domain from the entire gait cycle to reduce the number of features. However, these methods may miss critical structural information needed to appropriately represent the intricacies of walking patterns. To the best of our knowledge, no study has formally proposed a structure to capture variations within gait cycles or phases from accelerometer readings. We propose a new structure named the PQRST Complex, which corresponds to the swing phase in a gait cycle and matches the foot movements during this phase, thus capturing the changes in foot position. In our experiments, based on the nine features derived from this structure, the accelerometer-based gait authentication system outperforms many state-of-the-art gait cycle-based authentication systems. Our work opens up a new paradigm of capturing the structure of gait and opens multiple areas of research and practice using gait analogous to the "QRS complex" structure of ECG signals related to the heart.

[47]  arXiv:2205.07123 (cross-list from cs.CL) [pdf, other]
Title: The VoicePrivacy 2020 Challenge Evaluation Plan
Subjects: Computation and Language (cs.CL); Cryptography and Security (cs.CR); Audio and Speech Processing (eess.AS)

The VoicePrivacy Challenge aims to promote the development of privacy preservation tools for speech technology by gathering a new community to define the tasks of interest and the evaluation methodology, and benchmarking solutions through a series of challenges. In this document, we formulate the voice anonymization task selected for the VoicePrivacy 2020 Challenge and describe the datasets used for system development and evaluation. We also present the attack models and the associated objective and subjective evaluation metrics. We introduce two anonymization baselines and report objective evaluation results.

[48]  arXiv:2205.07172 (cross-list from cs.LG) [pdf, ps, other]
Title: Sparsity-Aware Robust Normalized Subband Adaptive Filtering algorithms based on Alternating Optimization
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)

This paper proposes a unified sparsity-aware robust normalized subband adaptive filtering (SA-RNSAF) algorithm for identification of sparse systems under impulsive noise. The proposed SA-RNSAF algorithm generalizes different algorithms by defining the robust criterion and sparsity-aware penalty. Furthermore, by alternating optimization of the parameters (AOP) of the algorithm, including the step-size and the sparsity penalty weight, we develop the AOP-SA-RNSAF algorithm, which not only exhibits fast convergence but also obtains low steady-state misadjustment for sparse systems. Simulations in various noise scenarios have verified that the proposed AOP-SA-RNSAF algorithm outperforms existing techniques.

[49]  arXiv:2205.07178 (cross-list from cs.DC) [pdf, other]
Comments: Submitted to Wiopt22. arXiv admin note: substantial text overlap with arXiv:2205.00714
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Systems and Control (eess.SY)

[50]  arXiv:2205.07216 (cross-list from cs.LG) [pdf, other]
Title: FedHAP: Fast Federated Learning for LEO Constellations using Collaborative HAPs
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)

Low Earth Obit (LEO) satellite constellations have seen a sharp increase of deployment in recent years, due to their distinctive capabilities of providing broadband Internet access and enabling global data acquisition as well as large-scale AI applications. To apply machine learning (ML) in such applications, the traditional way of downloading satellite data such as imagery to a ground station (GS) and then training a model in a centralized manner, is not desirable because of the limited bandwidth, intermittent connectivity between satellites and the GS, and privacy concerns on transmitting raw data. Federated Learning (FL) as an emerging communication and computing paradigm provides a potentially supreme solution to this problem. However, we show that existing FL solutions do not fit well in such LEO constellation scenarios because of significant challenges such as excessive convergence delay and unreliable wireless channels. To this end, we propose to introduce high-altitude platforms (HAPs) as distributed parameter servers (PSs) and propose a synchronous FL algorithm, FedHAP, to accomplish model training in an efficient manner via inter-satellite collaboration. To accelerate convergence, we also propose a layered communication scheme between satellites and HAPs that FedHAP leverages. Our simulations demonstrate that FedHAP attains model convergence in much fewer communication rounds than benchmarks, cutting the training time substantially from several days down to a few hours with the same level of resulting accuracy.

[51]  arXiv:2205.07250 (cross-list from cs.LG) [pdf, other]
Title: Reliable Offline Model-based Optimization for Industrial Process Control
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)

In the research area of offline model-based optimization, novel and promising methods are frequently developed. However, implementing such methods in real-world industrial systems such as production lines for process control is oftentimes a frustrating process. In this work, we address two important problems to extend the current success of offline model-based optimization to industrial process control problems: 1) how to learn a reliable dynamics model from offline data for industrial processes? 2) how to learn a reliable but not over-conservative control policy from offline data by utilizing existing model-based optimization algorithms? Specifically, we propose a dynamics model based on ensemble of conditional generative adversarial networks to achieve accurate reward calculation in industrial scenarios. Furthermore, we propose an epistemic-uncertainty-penalized reward evaluation function which can effectively avoid giving over-estimated rewards to out-of-distribution inputs during the learning/searching of the optimal control policy. We provide extensive experiments with the proposed method on two representative cases (a discrete control case and a continuous control case), showing that our method compares favorably to several baselines in offline policy learning for industrial process control.

[52]  arXiv:2205.07301 (cross-list from cs.GR) [pdf, other]
Title: Conditional Vector Graphics Generation for Music Cover Images
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Generative Adversarial Networks (GAN) have motivated a rapid growth of the domain of computer image synthesis. As almost all the existing image synthesis algorithms consider an image as a pixel matrix, the high-resolution image synthesis is complicated.A good alternative can be vector images. However, they belong to the highly sophisticated parametric space, which is a restriction for solving the task of synthesizing vector graphics by GANs. In this paper, we consider a specific application domain that softens this restriction dramatically allowing the usage of vector image synthesis.
Music cover images should meet the requirements of Internet streaming services and printing standards, which imply high resolution of graphic materials without any additional requirements on the content of such images. Existing music cover image generation services do not analyze tracks themselves; however, some services mostly consider only genre tags. To generate music covers as vector images that reflect the music and consist of simple geometric objects, we suggest a GAN-based algorithm called CoverGAN. The assessment of resulting images is based on their correspondence to the music compared with AttnGAN and DALL-E text-to-image generation according to title or lyrics. Moreover, the significance of the patterns found by CoverGAN has been evaluated in terms of the correspondence of the generated cover images to the musical tracks. Listeners evaluate the music covers generated by the proposed algorithm as quite satisfactory and corresponding to the tracks. Music cover images generation code and demo are available at https://github.com/IzhanVarsky/CoverGAN.

[53]  arXiv:2205.07319 (cross-list from cs.SD) [pdf]
Title: cMelGAN: An Efficient Conditional Generative Model Based on Mel Spectrograms
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Analysing music in the field of machine learning is a very difficult problem with numerous constraints to consider. The nature of audio data, with its very high dimensionality and widely varying scales of structure, is one of the primary reasons why it is so difficult to model. There are many applications of machine learning in music, like the classifying the mood of a piece of music, conditional music generation, or popularity prediction. The goal for this project was to develop a genre-conditional generative model of music based on Mel spectrograms and evaluate its performance by comparing it to existing generative music models that use note-based representations. We initially implemented an autoregressive, RNN-based generative model called MelNet . However, due to its slow speed and low fidelity output, we decided to create a new, fully convolutional architecture that is based on the MelGAN [4] and conditional GAN architectures, called cMelGAN.

[54]  arXiv:2205.07348 (cross-list from cs.CV) [pdf, other]
Title: Novel Multicolumn Kernel Extreme Learning Machine for Food Detection via Optimal Features from CNN
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Automatic food detection is an emerging topic of interest due to its wide array of applications ranging from detecting food images on social media platforms to filtering non-food photos from the users in dietary assessment apps. Recently, during the COVID-19 pandemic, it has facilitated enforcing an eating ban by automatically detecting eating activities from cameras in public places. Therefore, to tackle the challenge of recognizing food images with high accuracy, we proposed the idea of a hybrid framework for extracting and selecting optimal features from an efficient neural network. There on, a nonlinear classifier is employed to discriminate between linearly inseparable feature vectors with great precision. In line with this idea, our method extracts features from MobileNetV3, selects an optimal subset of attributes by using Shapley Additive exPlanations (SHAP) values, and exploits kernel extreme learning machine (KELM) due to its nonlinear decision boundary and good generalization ability. However, KELM suffers from the 'curse of dimensionality problem' for large datasets due to the complex computation of kernel matrix with large numbers of hidden nodes. We solved this problem by proposing a novel multicolumn kernel extreme learning machine (MCKELM) which exploited the k-d tree algorithm to divide data into N subsets and trains separate KELM on each subset of data. Then, the method incorporates KELM classifiers into parallel structures and selects the top k nearest subsets during testing by using the k-d tree search for classifying input instead of the whole network. For evaluating a proposed framework large food/non-food dataset is prepared using nine publically available datasets. Experimental results showed the superiority of our method on an integrated set of measures while solving the problem of 'curse of dimensionality in KELM for large datasets.

[55]  arXiv:2205.07399 (cross-list from cs.CV) [pdf]
Title: SuperWarp: Supervised Learning and Warping on U-Net for Invariant Subvoxel-Precise Registration
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

In recent years, learning-based image registration methods have gradually moved away from direct supervision with target warps to instead use self-supervision, with excellent results in several registration benchmarks. These approaches utilize a loss function that penalizes the intensity differences between the fixed and moving images, along with a suitable regularizer on the deformation. In this paper, we argue that the relative failure of supervised registration approaches can in part be blamed on the use of regular U-Nets, which are jointly tasked with feature extraction, feature matching, and estimation of deformation. We introduce one simple but crucial modification to the U-Net that disentangles feature extraction and matching from deformation prediction, allowing the U-Net to warp the features, across levels, as the deformation field is evolved. With this modification, direct supervision using target warps begins to outperform self-supervision approaches that require segmentations, presenting new directions for registration when images do not have segmentations. We hope that our findings in this preliminary workshop paper will re-ignite research interest in supervised image registration techniques. Our code is publicly available from https://github.com/balbasty/superwarp.

[56]  arXiv:2205.07450 (cross-list from cs.SD) [pdf, other]
Title: PRISM: Pre-trained Indeterminate Speaker Representation Model for Speaker Diarization and Speaker Verification
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Speaker embedding has been a fundamental feature for speaker-related tasks such as verification, clustering, and diarization. Traditionally, speaker embeddings are represented as fixed vectors in high-dimensional space. This could lead to biased estimations, especially when handling shorter utterances. In this paper we propose to represent a speaker utterance as "floating" vector whose state is indeterminate without knowing the context. The state of a speaker representation is jointly determined by itself, other speech from the same speaker, as well as other speakers it is being compared to. The content of the speech also contributes to determining the final state of a speaker representation. We pre-train an indeterminate speaker representation model that estimates the state of an utterance based on the context. The pre-trained model can be fine-tuned for downstream tasks such as speaker verification, speaker clustering, and speaker diarization. Substantial improvements are observed across all downstream tasks.

[57]  arXiv:2205.07554 (cross-list from astro-ph.IM) [pdf, other]
Title: Towards on-sky adaptive optics control using reinforcement learning
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG); Robotics (cs.RO); Systems and Control (eess.SY)

The direct imaging of potentially habitable Exoplanets is one prime science case for the next generation of high contrast imaging instruments on ground-based extremely large telescopes. To reach this demanding science goal, the instruments are equipped with eXtreme Adaptive Optics (XAO) systems which will control thousands of actuators at a framerate of kilohertz to several kilohertz. Most of the habitable exoplanets are located at small angular separations from their host stars, where the current XAO systems' control laws leave strong residuals.Current AO control strategies like static matrix-based wavefront reconstruction and integrator control suffer from temporal delay error and are sensitive to mis-registration, i.e., to dynamic variations of the control system geometry. We aim to produce control methods that cope with these limitations, provide a significantly improved AO correction and, therefore, reduce the residual flux in the coronagraphic point spread function.
We extend previous work in Reinforcement Learning for AO. The improved method, called PO4AO, learns a dynamics model and optimizes a control neural network, called a policy. We introduce the method and study it through numerical simulations of XAO with Pyramid wavefront sensing for the 8-m and 40-m telescope aperture cases. We further implemented PO4AO and carried out experiments in a laboratory environment using MagAO-X at the Steward laboratory. PO4AO provides the desired performance by improving the coronagraphic contrast in numerical simulations by factors 3-5 within the control region of DM and Pyramid WFS, in simulation and in the laboratory. The presented method is also quick to train, i.e., on timescales of typically 5-10 seconds, and the inference time is sufficiently small (< ms) to be used in real-time control for XAO with currently available hardware even for extremely large telescopes.

[58]  arXiv:2205.07598 (cross-list from cs.IT) [pdf, ps, other]
Title: Cell-Free MmWave Massive MIMO Systems with Low-Capacity Fronthaul Links and Low-Resolution ADC/DACs
Comments: accepted with minor revisions as a paper to IEEE Transactions on Vehicular Technology
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

In this paper, we consider the uplink channel estimation phase and downlink data transmission phase of cell-free millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) systems with low-capacity fronthaul links and low-resolution analog-to-digital converters/digital-to-analog converters (ADC/DACs). In cell-free massive MIMO, a control unit dictates the baseband processing at a geographical scale, while the base stations communicate with the control unit through fronthaul links. Unlike most of previous works in cell-free massive MIMO with finite-capacity fronthaul links, we consider the general case where the fronthaul capacity and ADC/DAC resolution are not necessarily the same. In particular, the fronthaul compression and ADC/DAC quantization occur independently where each one is modeled based on the information theoretic argument and additive quantization noise model (AQNM). Then, we address the codebook design problem that aims to minimize the channel estimation error for the independent and identically distributed (i.i.d.) and colored compression noise cases. Also, we propose an alternating optimization (AO) method to tackle the max-min fairness problem. In essence, the AO method alternates between two subproblems that correspond to the power allocation and codebook design problems. The AO method proposed for the zero-forcing (ZF) precoder is guaranteed to converge, whereas the one for the maximum ratio transmission (MRT) precoder has no such guarantee. Finally, the performance of the proposed schemes is evaluated by the simulation results in terms of both energy and spectral efficiency. The numerical results show that the proposed scheme for the ZF precoder yields spectral and energy efficiency 28% and 15% higher than that of the best baseline.

[59]  arXiv:2205.07617 (cross-list from cs.DC) [pdf, other]
Title: Analysis of Distributed Ledger Technologies for Industrial Manufacturing
Comments: 19 pages, 4 figures, submitted for publication
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Systems and Control (eess.SY)

In recent years, industrial manufacturing has undergone massive technological changes that embrace digitalization and automation towards the vision of intelligent manufacturing plants. With the aim of maximizing efficiency and profitability in production, an important goal is to enable flexible manufacturing, both, for the customer (desiring more individualized products) and for the manufacturer (to adjust to market demands). Manufacturing-as-a-service can support this through manufacturing plants that are used by different tenants who utilize the machines in the plant, which are offered by different providers. To enable such pay-per-use business models, Distributed Ledger Technology (DLT) is a viable option to establish decentralized trust and traceability. Thus, in this paper, we study potential DLT technologies for an efficient and intelligent integration of DLT-based solutions in manufacturing environments. We propose a general framework to adapt DLT in manufacturing, then we introduce the use case of shared manufacturing, which we utilize to study the communication and computation efficiency of selected DLTs in resource-constrained wireless IoT networks.

[60]  arXiv:2205.07646 (cross-list from cs.CL) [pdf, other]
Title: A Fast Attention Network for Joint Intent Detection and Slot Filling on Edge Devices
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Intent detection and slot filling are two main tasks in natural language understanding and play an essential role in task-oriented dialogue systems. The joint learning of both tasks can improve inference accuracy and is popular in recent works. However, most joint models ignore the inference latency and cannot meet the need to deploy dialogue systems at the edge. In this paper, we propose a Fast Attention Network (FAN) for joint intent detection and slot filling tasks, guaranteeing both accuracy and latency. Specifically, we introduce a clean and parameter-refined attention module to enhance the information exchange between intent and slot, improving semantic accuracy by more than 2%. FAN can be implemented on different encoders and delivers more accurate models at every speed level. Our experiments on the Jetson Nano platform show that FAN inferences fifteen utterances per second with a small accuracy drop, showing its effectiveness and efficiency on edge devices.

[61]  arXiv:2205.07654 (cross-list from cs.NE) [pdf, other]
Title: Hyperdimensional computing encoding for feature selection on the use case of epileptic seizure detection
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG); Signal Processing (eess.SP)

The healthcare landscape is moving from the reactive interventions focused on symptoms treatment to a more proactive prevention, from one-size-fits-all to personalized medicine, and from centralized to distributed paradigms. Wearable IoT devices and novel algorithms for continuous monitoring are essential components of this transition. Hyperdimensional (HD) computing is an emerging ML paradigm inspired by neuroscience research with various aspects interesting for IoT devices and biomedical applications. Here we explore the not yet addressed topic of optimal encoding of spatio-temporal data, such as electroencephalogram (EEG) signals, and all information it entails to the HD vectors. Further, we demonstrate how the HD computing framework can be used to perform feature selection by choosing an adequate encoding. To the best of our knowledge, this is the first approach to performing feature selection using HD computing in the literature. As a result, we believe it can support the ML community to further foster the research in multiple directions related to feature and channel selection, as well as model interpretability.

[62]  arXiv:2205.07680 (cross-list from cs.CV) [pdf, other]
Title: VQBB: Image-to-image Translation with Vector Quantized Brownian Bridge
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Image-to-image translation is an important and challenging problem in computer vision. Existing approaches like Pixel2Pixel, DualGAN suffer from the instability of GAN and fail to generate diverse outputs because they model the task as a one-to-one mapping. Although diffusion models can generate images with high quality and diversity, current conditional diffusion models still can not maintain high similarity with the condition image on image-to-image translation tasks due to the Gaussian noise added in the reverse process. To address these issues, a novel Vector Quantized Brownian Bridge(VQBB) diffusion model is proposed in this paper. On one hand, Brownian Bridge diffusion process can model the transformation between two domains more accurate and flexible than the existing Markov diffusion methods. As far as the authors know, it is the first work for Brownian Bridge diffusion process proposed for image-to-image translation. On the other hand, the proposed method improved the learning efficiency and translation accuracy by confining the diffusion process in the quantized latent space. Finally, numerical experimental results validated the performance of the proposed method.

[63]  arXiv:2205.07682 (cross-list from cs.SD) [pdf, ps, other]
Title: L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data
Comments: accepted for IEEE SMARTCOMP 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Smartphones and wearable devices, along with Artificial Intelligence, can represent a game-changer in the pandemic control, by implementing low-cost and pervasive solutions to recognize the development of new diseases at their early stages and by potentially avoiding the rise of new outbreaks. Some recent works show promise in detecting diagnostic signals of COVID-19 from voice and coughs by using machine learning and hand-crafted acoustic features. In this paper, we decided to investigate the capabilities of the recently proposed deep embedding model L3-Net to automatically extract meaningful features from raw respiratory audio recordings in order to improve the performances of standard machine learning classifiers in discriminating between COVID-19 positive and negative subjects from smartphone data. We evaluated the proposed model on 3 datasets, comparing the obtained results with those of two reference works. Results show that the combination of L3-Net with hand-crafted features overcomes the performance of the other works of 28.57% in terms of AUC in a set of subject-independent experiments. This result paves the way to further investigation on different deep audio embeddings, also for the automatic detection of different diseases.

[64]  arXiv:2205.07711 (cross-list from cs.SD) [pdf, other]
Title: Transferability of Adversarial Attacks on Synthetic Speech Detection
Comments: 5 pages, submit to Interspeech2022
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Audio and Speech Processing (eess.AS)

Synthetic speech detection is one of the most important research problems in audio security. Meanwhile, deep neural networks are vulnerable to adversarial attacks. Therefore, we establish a comprehensive benchmark to evaluate the transferability of adversarial attacks on the synthetic speech detection task. Specifically, we attempt to investigate: 1) The transferability of adversarial attacks between different features. 2) The influence of varying extraction hyperparameters of features on the transferability of adversarial attacks. 3) The effect of clipping or self-padding operation on the transferability of adversarial attacks. By performing these analyses, we summarise the weaknesses of synthetic speech detectors and the transferability behaviours of adversarial attacks, which provide insights for future research. More details can be found at https://gitee.com/djc_QRICK/Attack-Transferability-On-Synthetic-Detection.

[65]  arXiv:2205.07772 (cross-list from cs.RO) [pdf, other]
Title: Moving Target Interception Considering Dynamic Environment
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

The interception of moving targets is a widely studied issue. In this paper, we propose an algorithm of intercepting the moving target with a wheeled mobile robot in a dynamic environment. We first predict the future position of the target through polynomial fitting. The algorithm then generates an interception trajectory with path and speed decoupling. We use Hybrid A* search to plan a path and optimize it via gradient decent method. To avoid the dynamic obstacles in the environment, we introduce ST graph for speed planning. The speed curve is represented by piecewise B\'ezier curves for further optimization. Compared with other interception algorithms, we consider a dynamic environment and plan a safety trajectory which satisfies the kinematic characteristics of the wheeled robot while ensuring the accuracy of interception. Simulation illustrates that the algorithm successfully achieves the interception tasks and has high computational efficiency.

[66]  arXiv:2205.07828 (cross-list from cs.IT) [pdf, ps, other]
Title: Digital Blind Box: Random Symmetric Private Information Retrieval
Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR); Databases (cs.DB); Signal Processing (eess.SP)

We introduce the problem of random symmetric private information retrieval (RSPIR). In canonical PIR, a user downloads a message out of $K$ messages from $N$ non-colluding and replicated databases in such a way that no database can know which message the user has downloaded (user privacy). In SPIR, the privacy is symmetric, in that, not only that the databases cannot know which message the user has downloaded, the user itself cannot learn anything further than the particular message it has downloaded (database privacy). In RSPIR, different from SPIR, the user does not have an input to the databases, i.e., the user does not pick a specific message to download, instead is content with any one of the messages. In RSPIR, the databases need to send symbols to the user in such a way that the user is guaranteed to download a message correctly (random reliability), the databases do not know which message the user has received (user privacy), and the user does not learn anything further than the one message it has received (database privacy). This is the digital version of a blind box, also known as gachapon, which implements the above specified setting with physical objects for entertainment. This is also the blind version of $1$-out-of-$K$ oblivious transfer (OT), an important cryptographic primitive. We study the information-theoretic capacity of RSPIR for the case of $N=2$ databases. We determine its exact capacity for the cases of $K = 2, 3, 4$ messages. While we provide a general achievable scheme that is applicable to any number of messages, the capacity for $K\geq 5$ remains open.

### Replacements for Tue, 17 May 22

[67]  arXiv:1909.10913 (replaced) [pdf]
Title: On Constant Distance Spacing Policies for Cooperative Adaptive Cruise Control
Comments: A short version of this study was published in the Vehicular Technology Section in IEEE Access under the title "A numerical study on constant spacing policies for starting platoons at oversaturated intersections". First version (v1) submitted to IEEE Access on 21-Oct-2021, resubmission (v2) on 31-Jan-2022, acceptance (v3) on 09-Mar-2022
Journal-ref: IEEE Access, Volume 10: 2022
Subjects: Systems and Control (eess.SY)
[68]  arXiv:1911.06382 (replaced) [pdf, other]
Title: R-local unlabeled sensing: A novel graph matching approach for multiview unlabeled sensing under local permutations
Journal-ref: A. A. Abbasi, A. Tasissa and S. Aeron, "R-Local Unlabeled Sensing: A Novel Graph Matching Approach for Multiview Unlabeled Sensing Under Local Permutations," in IEEE Open Journal of Signal Processing, vol. 2, pp. 309-317, 2021
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT); Machine Learning (stat.ML)
[69]  arXiv:1912.12479 (replaced) [pdf, other]
Title: Short-Term Load Forecasting Using AMI Data
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Machine Learning (stat.ML)
[70]  arXiv:2003.07688 (replaced) [pdf, other]
Title: End-to-end Recurrent Denoising Autoencoder Embeddings for Speaker Identification
Comments: Published on Monday 10th of May 2021 in Neural Computing and Applications, Springer
Journal-ref: Online, Neural Comput & Applic (2021), pp. 1-11
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[71]  arXiv:2006.02271 (replaced) [pdf, other]
Title: Low-light Image Enhancement Using the Cell Vibration Model
Comments: This paper has been accepted by IEEE Transactions on Multimedia (IEEE TMM) on May 12, 2022. The accepted version can be downloaded in arXiv
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[72]  arXiv:2006.14983 (replaced) [pdf, ps, other]
Title: Solution of matching equations of IDA-PBC by Pfaffian differential equations
Subjects: Systems and Control (eess.SY)
[73]  arXiv:2007.00930 (replaced) [pdf, other]
Title: Robust MPC for Linear Systems with Parametric and Additive Uncertainty: A Novel Constraint Tightening Approach
Comments: Minor typo fixes. Shortened and slightly altered version of this draft accepted as a full paper in Automatica
Subjects: Systems and Control (eess.SY)
[74]  arXiv:2011.07567 (replaced) [pdf, ps, other]
Title: SOBMOR: Structured Optimization-Based Model Order Reduction
Comments: 29 pages, 7 figures; change title and modify objective function to be smooth also for MIMO systems
Subjects: Systems and Control (eess.SY); Numerical Analysis (math.NA); Optimization and Control (math.OC)
[75]  arXiv:2102.09404 (replaced) [pdf, ps, other]
Title: Transient Performance of Tube-based Robust Economic Model Predictive Control
Journal-ref: IFAC-PapersOnLine Volume 54, Issue 6, 2021, Pages 28-35
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
[76]  arXiv:2103.16608 (replaced) [pdf, other]
Title: The intrinsic communication in power systems: a new perspective to understand stability
Subjects: Systems and Control (eess.SY)
[77]  arXiv:2104.13463 (replaced) [pdf]
Title: A ridesharing simulation platform that considers dynamic supply-demand interactions
Subjects: Multiagent Systems (cs.MA); Systems and Control (eess.SY)
[78]  arXiv:2105.14615 (replaced) [pdf, ps, other]
Title: On the Controllers Based on Time Delay Estimation for Robotic Manipulators
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
[79]  arXiv:2108.04349 (replaced) [src]
Title: AASeg: Attention Aware Network for Real Time Semantic Segmentation
Authors: Abhinav Sagar
Comments: This work makes assumptions which were found wrong later by the author
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[80]  arXiv:2109.05341 (replaced) [pdf, other]
Title: Energy-Efficient Backscatter Aided Uplink NOMA Roadside Sensor Communications under Channel Estimation Errors
Comments: 13 pages, 8 figures. Submitted to IEEE (Revised version)
Subjects: Signal Processing (eess.SP)
[81]  arXiv:2109.13842 (replaced) [pdf, other]
Title: Cross-layer Design for Real-Time Grid Operation: Estimation, Optimization and Power Flow
Subjects: Systems and Control (eess.SY)
[82]  arXiv:2110.00954 (replaced) [pdf, other]
Title: Adaptive Real-Time Grid Operation via Online Feedback Optimization with Sensitivity Estimation
Subjects: Systems and Control (eess.SY)
[83]  arXiv:2110.02636 (replaced) [pdf, other]
Title: Learning Sparse Masks for Diffusion-based Image Inpainting
Comments: To appear in A. J. Pinho, P. Georgieva, L. F. Teixeira, J. A. S\'anchez (Eds.): Pattern Recognition and Image Analysis. Lecture Notes in Computer Science, Springer, Cham, 2022
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[84]  arXiv:2111.00812 (replaced) [pdf, other]
Title: Topology identification of autonomous quantum dynamical networks
Comments: 12 pages, 2 figures; v2: misprints corrected and presentation of the results improved with respect to v1
Subjects: Quantum Physics (quant-ph); Mesoscale and Nanoscale Physics (cond-mat.mes-hall); Systems and Control (eess.SY)
[85]  arXiv:2111.08943 (replaced) [pdf, other]
Title: Edge Computing in IoT: A 6G Perspective
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
[86]  arXiv:2111.12046 (replaced) [pdf, ps, other]
Title: Distributed energy control in electric energy systems
Comments: Preprint submitted to Automatica (Revision 1)
Subjects: Systems and Control (eess.SY)
[87]  arXiv:2112.05320 (replaced) [pdf, other]
Title: Open-Access Data and Toolbox for Tracking COVID-19 Impact on Power Systems
Comments: Journal accepted by IEEE Trans on Power Systems, 12 pages, 7 figures, 5 tables. Website: this https URL
Subjects: Systems and Control (eess.SY)
[88]  arXiv:2112.06238 (replaced) [pdf, other]
Title: HerosNet: Hyperspectral Explicable Reconstruction and Optimal Sampling Deep Network for Snapshot Compressive Imaging
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[89]  arXiv:2112.06667 (replaced) [pdf, other]
Title: Long-Term Benefits of Network Boosters for Renewables Integration and Corrective Grid Security
Comments: Preprint submitted to International Journal of Electrical Power & Energy Systems
Subjects: Systems and Control (eess.SY)
[90]  arXiv:2112.07765 (replaced) [pdf, ps, other]
Title: Nonlinear Discrete-time System Identification without Persistence of Excitation: Finite-time Concurrent Learning Methods
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Optimization and Control (math.OC)
[91]  arXiv:2112.13503 (replaced) [pdf, ps, other]
Title: Under-Approximate Reachability Analysis for a Class of Linear Uncertain Systems
Subjects: Systems and Control (eess.SY); Numerical Analysis (math.NA); Optimization and Control (math.OC)
[92]  arXiv:2201.05502 (replaced) [pdf]
Title: Fast and accurate waveform modeling of long-haul multi-channel optical fiber transmission using a hybrid model-data driven scheme
Comments: 8 pages, 5 figures, 1 table, 30 references
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
[93]  arXiv:2201.06259 (replaced) [pdf, other]
Title: Segmentation of the Carotid Lumen and Vessel Wall using Deep Learning and Location Priors
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[94]  arXiv:2201.09200 (replaced) [pdf, ps, other]
Title: Asymptotics for Outlier Hypothesis Testing
Comments: to appear in IEEE ISIT 2022 and a short version of our IT paper arXiv:2009.03505
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Signal Processing (eess.SP)
[95]  arXiv:2202.02932 (replaced) [pdf, other]
Title: On the Stability of Super-Resolution and a Beurling-Selberg Type Extremal Problem
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP); Numerical Analysis (math.NA)
[96]  arXiv:2202.03129 (replaced) [pdf, other]
Title: Over-the-Air Ensemble Inference with Model Privacy
Comments: To appear in IEEE International Symposium on Information Theory (ISIT) 2022
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Signal Processing (eess.SP)
[97]  arXiv:2202.13122 (replaced) [pdf, other]
Title: What ODE-Approximation Schemes of Time-Delay Systems Reveal about Lyapunov-Krasovskii Functionals
Comments: 6 pages, 2 figures, "This work has been submitted to IFAC for possible publication."
Subjects: Systems and Control (eess.SY)
[98]  arXiv:2202.13862 (replaced) [pdf, other]
Title: Variable Rate Compression for Raw 3D Point Clouds
Comments: To be published in the 2022 IEEE International Conference on Robotics and Automation (ICRA)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Image and Video Processing (eess.IV)
[99]  arXiv:2204.01702 (replaced) [pdf, other]
Title: Personalized Prediction of Future Lesion Activity and Treatment Effect in Multiple Sclerosis from Baseline MRI
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[100]  arXiv:2204.01795 (replaced) [pdf, other]
Title: Lightweight HDR Camera ISP for Robust Perception in Dynamic Illumination Conditions via Fourier Adversarial Networks
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[101]  arXiv:2204.02883 (replaced) [pdf, ps, other]
Title: A Convex Optimization Approach for Control of Linear Quadratic Systems with Multiplicative Noise via System Level Synthesis
Subjects: Systems and Control (eess.SY)
[102]  arXiv:2204.10114 (replaced) [pdf, other]
Title: Reconfigurable Intelligent Surface for Near Field Communications: Beamforming and Sensing
Subjects: Signal Processing (eess.SP)
[103]  arXiv:2204.12533 (replaced) [pdf, other]
Title: A Gaussian Process Model for Opponent Prediction in Autonomous Racing
Comments: A version of this work was accepted as a contributed paper at the 2022 ICRA 2nd Workshop on Opportunities and Challenges with Autonomous Racing
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
[104]  arXiv:2204.12815 (replaced) [pdf, other]
Title: Semi-Autonomous Electric Vehicles in Platooning Mode and Their Effects on Travel Time: A Framework for Simulation Evaluation
Subjects: Systems and Control (eess.SY)
[105]  arXiv:2205.00797 (replaced) [pdf, ps, other]
Title: Balanced Performance Between Energy-Delay and Bit Error Rate in UAV Relay Networks
Subjects: Signal Processing (eess.SP)
[106]  arXiv:2205.02397 (replaced) [pdf, other]
Title: Compressive Ptychography using Deep Image and Generative Priors
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Signal Processing (eess.SP)
[107]  arXiv:2205.02682 (replaced) [pdf]
Title: Temporally and Spatially variant-resolution illumination patterns in computational ghost imaging
Subjects: Image and Video Processing (eess.IV); Optics (physics.optics)
[108]  arXiv:2205.02848 (replaced) [pdf, other]
Title: Building Brains: Subvolume Recombination for Data Augmentation in Large Vessel Occlusion Detection
Comments: PrePrint - Accepted at MICCAI 2022
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[109]  arXiv:2205.02913 (replaced) [pdf, other]
Title: Exponentially Stable Adaptive Optimal Control of Uncertain LTI Systems
Subjects: Systems and Control (eess.SY)
[110]  arXiv:2205.03269 (replaced) [pdf, ps, other]
Title: A Rapid Power-iterative Root-MUSIC Estimator for Massive/Ultra-massive MIMO Receiver
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[111]  arXiv:2205.04326 (replaced) [pdf, other]
Title: HierAttn: Effectively Learn Representations from Stage Attention and Branch Attention for Skin Lesions Diagnosis
Comments: The code is available at this https URL
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[112]  arXiv:2205.04437 (replaced) [pdf, other]
Title: Activating More Pixels in Image Super-Resolution Transformer
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[113]  arXiv:2205.04590 (replaced) [pdf, other]
Title: A Verification Framework for Certifying Learning-Based Safety-Critical Aviation Systems
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[114]  arXiv:2205.05577 (replaced) [pdf, ps, other]
Title: Channel Estimation in RIS-assisted Downlink Massive MIMO: A Learning-Based Approach
Comments: accepted to appear in IEEE SPAWC'22, Oulu, Finland
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[115]  arXiv:2205.06215 (replaced) [pdf, other]
Title: Fluorescent wavefront shaping using incoherent iterative phase conjugation
Authors: Dror Aizik (1), Ioannis Gkioulekas (2), Anat Levin (1) ((1) Department of Electrical and Computer Engineering, Technion, Haifa, Israel, (2) Robotics Institute, Carnegie Mellon University, PA, USA)
Subjects: Optics (physics.optics); Image and Video Processing (eess.IV)
[116]  arXiv:2205.06221 (replaced) [pdf]
Title: High-Frequency Tunable Grounded & Floating Incremental-Decremental Meminductor Emulator and Application