# Electrical Engineering and Systems Science

## New submissions

[ total of 79 entries: 1-79 ]
[ showing up to 2000 entries per page: fewer | more ]

### New submissions for Thu, 2 Dec 21

[1]
Title: Zero-Shot Learning of Continuous 3D Refractive Index Maps from Discrete Intensity-Only Measurements
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Intensity diffraction tomography (IDT) refers to a class of optical microscopy techniques for imaging the 3D refractive index (RI) distribution of a sample from a set of 2D intensity-only measurements. The reconstruction of artifact-free RI maps is a fundamental challenge in IDT due to the loss of phase information and the missing cone problem. Neural fields (NF) has recently emerged as a new deep learning (DL) paradigm for learning continuous representations of complex 3D scenes without external training datasets. We present DeCAF as the first NF-based IDT method that can learn a high-quality continuous representation of a RI volume directly from its intensity-only and limited-angle measurements. We show on three different IDT modalities and multiple biological samples that DeCAF can generate high-contrast and artifact-free RI maps.

[2]
Title: Vicinity Effects of Field Free Point on the Relaxation Behavior of MNPs
Subjects: Signal Processing (eess.SP); Medical Physics (physics.med-ph)

In Magnetic Particle Imaging (MPI), the distribution of magnetic nanoparticles (MNPs) is imaged by moving a field free point (FFP) in space. All MNPs in close vicinity of the FFP contribute to the signal induced on the receive coil. The relaxation behavior of these MNPs are subject to a DC field due to the selection field (SF). In this work, we investigate the effects of the DC field on the relaxation behavior of the MNPs, with the goal of understanding the differences between the measured relaxations in Magnetic Particle Spectrometer (MPS) setups vs. MPI scanners.

[3]
Title: Comparison of inverse problem linear and non-linear methods for localization source: a combined TMS-EEG study
Subjects: Signal Processing (eess.SP)

The Electro-Encephalo-Graphy (EEG) technique consists of estimating the cortical distribution of signals over time of electrical activity and also of locating the zones of primary sensory projection. Moreover, it is able to record respectively the variations of potential and field magnetic waves generated by electrical activity in the brain every millisecond. Concerning, the study of the localization source, the brain localizationactivity requires the solution of a inverse problem. Many different imaging methods are used to solve the inverse problem.The aim of the presentstudy is to provide comparison criteria for choosing the least bad method. Hence, the transcranial magnetic stimulation (TMS) and electroencephalography (EEG) technique are combined for the sake of studying the dynamics of the brain at rest following a disturbance. The study focuses in the comparison of the following methods for EEG following stimulation by TMS: sLORETA (standardized Low Resolution Electromagnetic Tomography), MNE (Minimum Estimate of the standard), dSPM (dynamic Statistical Parametric Mapping) and wMEM (wavelet based on the Maximum Entropy on the Mean)in order to study the impact of TMS towards rest and to study inter and intra zone connectivity.The contribution of the comparison is demonstrated via the stages of the simulations.

[4]
Title: Representation learning through cross-modal conditional teacher-student training for speech emotion recognition
Subjects: Audio and Speech Processing (eess.AS)

Generic pre-trained speech and text representations promise to reduce the need for large labeled datasets on specific speech and language tasks. However, it is not clear how to effectively adapt these representations for speech emotion recognition. Recent public benchmarks show the efficacy of several popular self-supervised speech representations for emotion classification. In this study, we show that the primary difference between the top-performing representations is in predicting valence while the differences in predicting activation and dominance dimensions are less pronounced. However, we show that even the best-performing HuBERT representation underperforms on valence prediction compared to a multimodal model that also incorporates text representation. We address this shortcoming by injecting lexical information into the speech representation using the multimodal model as a teacher. To improve the efficacy of our approach, we propose a novel estimate of the quality of the emotion predictions, to condition teacher-student training. We report new audio-only state-of-the-art concordance correlation coefficient (CCC) values of 0.757, 0.627, 0.671 for activation, valence and dominance predictions, respectively, on the MSP-Podcast corpus, and also state-of-the-art values of 0.667, 0.582, 0.545 on the IEMOCAP corpus.

[5]
Title: High-Resolution WiFi Imaging with Reconfigurable Intelligent Surfaces
Subjects: Signal Processing (eess.SP)

WiFi-based imaging enables pervasive sensing in a privacy-preserving and cost-effective way. However, most of existing methods either require specialized hardware modification or suffer from poor imaging performance due to the fundamental limit of off-the-shelf commodity WiFi devices in spatial resolution. We observe that the recently developed reconfigurable intelligent surface (RIS) could be a promising solution to overcome these challenges. Thus, in this paper, we propose a RIS-aided WiFi imaging framework to achieve high-resolution imaging with the off-the-shelf WiFi devices. Specifically, we first design a beamforming method to achieve the first-stage imaging by separating the signals from different spatial locations with the aid of the RIS. Then, we propose an optimization-based super-resolution imaging algorithm by leveraging the low rank nature of the reconstructed object. During the optimization, we also explicitly take into account the effect of finite phase quantization in RIS to avoid the resolution degradation due to quantization errors. Simulation results demonstrate that our framework achieves median root mean square error (RMSE) of 0.03 and median structural similarity (SSIM) of 0.52. The visual results show that high-resolution imaging results are achieved with simulation signals at 5 GHz that are matched with commercial WiFi 802.11n/ac protocols.

[6]
Title: Using Reconfigurable Intelligent Surfaces for UE Positioning in mmWave MIMO Systems
Subjects: Signal Processing (eess.SP)

A reconfigurable intelligent surface (RIS) consists of massive meta elements, which results in a reflection path between a base station (BS) and user equipment (UE). In wireless localization, this reflection path aids in positioning accuracy, especially when the line-of-sight (LOS) path is subject to severe blockage and fading. We develop a RIS-aided positioning framework to locate a UE in environments where the LOS path may or may not be available. We first estimate the RIS-aided channel parameters from the received signals at the UE. To reduce algorithmic complexity, we propose a linear combination of the estimated UE positions from the direct and reflection paths, which is shown to be approximately the maximum likelihood estimator under the large-sample regime when the estimates from different paths are independent. We optimize the RIS phase shifts to improve the positioning accuracy, and extend the proposed approach to the case with multiple BSs and UEs. We derive the Cramer-Rao bound (CRB) and demonstrate numerically that our proposed method approaches the CRB.

[7]
Title: Experimental Validation of Multi-lane Formation Control for Connected and Automated Vehicles in Multiple Scenarios
Subjects: Systems and Control (eess.SY)

Formation control methods of connected and automated vehicles have been proposed to smoothly switch the structure of vehicular formations in different scenarios. In the previous research, simulations are often conducted to verify the performance of formation control methods. This paper presents the experimental results of multi-lane formation control for connected and automated vehicles. The coordinated formation control framework and specific methods utilized for different scenarios are introduced. The details of experimental platform and vehicle control strategy is provided. Simulations and experiments are conducted in different scenarios, and the results indicate that the formation control method is applicable to multiple traffic scenarios and able to improve formation-structure-switching efficiency compared with benchmark methods.

[8]
Title: Joint Cluster Head Selection and Trajectory Planning in UAV-Aided IoT Networks by Reinforcement Learning with Sequential Model
Comments: This paper has been accepted in IEEE IoT-J
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

Employing unmanned aerial vehicles (UAVs) has attracted growing interests and emerged as the state-of-the-art technology for data collection in Internet-of-Things (IoT) networks. In this paper, with the objective of minimizing the total energy consumption of the UAV-IoT system, we formulate the problem of jointly designing the UAV's trajectory and selecting cluster heads in the IoT network as a constrained combinatorial optimization problem which is classified as NP-hard and challenging to solve. We propose a novel deep reinforcement learning (DRL) with a sequential model strategy that can effectively learn the policy represented by a sequence-to-sequence neural network for the UAV's trajectory design in an unsupervised manner. Through extensive simulations, the obtained results show that the proposed DRL method can find the UAV's trajectory that requires much less energy consumption when compared to other baseline algorithms and achieves close-to-optimal performance. In addition, simulation results show that the trained model by our proposed DRL algorithm has an excellent generalization ability to larger problem sizes without the need to retrain the model.

[9]
Title: An Open Source Software Stack for Tuning the Dynamical Behavior of Complex Power Systems
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

BlockSystems.jl and NetworkDynamics.jl are two novel software packages which facilitate highly efficient transient stability simulations of power networks. Users may specify inputs and power system design in a convenient modular and equation-based manner without compromising on speed or model detail. Written in the high-level, high-performance programming language Julia a rich open-source package ecosystem is available, which provides state-of-the-art solvers and machine learning algorithms. Motivated by the recent interest in the Nordic inertia challenge we have implemented the Nordic5 test case and tuned its control parameters by making use of the machine learning and automatic differentiation capabilities of our software stack.

[10]
Title: Energy Management of a Multi-Battery System for Renewable-Based High Power EV Charging
Comments: Submitted to the 22nd Power Systems Computation Conference (PSCC 2022)
Subjects: Systems and Control (eess.SY)

Stationary battery systems facilitate renewable-based electric vehicle fast charging with power levels above the connection capacity of distribution grids. This paper proposes heuristic energy management strategies for a novel multi-battery design that directly connects its strings to other DC components without the need of interfacing power converters. Hence, the energy management system has two degrees of control: (i) allocating strings to other DC microgrid components, in this case a photovoltaic system, two electric vehicle fast chargers, and a grid-tie inverter, and (ii) managing the energy exchange with the local distribution grid. For the grid exchange, a basic droop control is compared to an enhanced control including forecasts in the decision making. To this end, this paper evaluates results from multiple Monte Carlo simulations capturing the uncertainty of electric vehicle charging instances under varying charging frequencies. Using actual photovoltaic measurements from different months, the numerical analyses show that the enhanced control increases self-sufficiency by reducing grid exchange, and decreases the number of battery cycles. However, the enhanced control operates the battery closer to its state of charge limits, which accelerates calendar ageing.

[11]
Title: An improved bearing fault detection strategy based on artificial bee colony algorithm
Subjects: Signal Processing (eess.SP)

The operating state of bearing directly affects the performance of rotating machinery and how to accurately and decisively extract features from the original vibration signal and recognize the faulty parts as early as possible is very critical. In this study, the one-dimensional ternary model which has been proved to be an effective statistical method in feature selection is introduced and shapelets transformation is proposed to calculate the parameter of it which is also the standard deviation of the transformed shaplets that is usually selected by trial and error. Moreover, XGBoost is used to recognize the faults from the obtained features, and an improved artificial bee colony algorithm(ABC) where the evolution is guided by the importance indices of different search space is proposed to optimize the parameters of XGBoost. Here the value of importance index is related to the probability of optimal solutions in certain space, thus the problem of easily falling into local optimality in traditional ABC could be avoided.The experimental results based on the failure vibration signal samples show that the average accuracy of fault signal recognition can reach 97% which is much higher than the ones corresponding to other extraction strategies, thus the ability of extraction could be improved. And with the improved artificial bee colony algorithm which is used to optimize the parameters of XGBoost, the classification accuracy could be improved from 97.02% to about 98.60% compared with the traditional classification strategy

[12]
Title: Simultaneous Controller and Lyapunov Function Design for Constrained Nonlinear Systems
Comments: Initial submission to ACC 2022
Subjects: Systems and Control (eess.SY)

This paper presents a method to stabilize state and input constrained nonlinear systems using an offline optimization on variable triangulations of the set of admissible states. For control-affine systems, by choosing a continuous piecewise affine (CPA) controller structure, the non-convex optimization is formulated as iterative semi-definite programming (SDP), which can be solved efficiently using available software. The method has very general assumptions on the system's dynamics and constraints. Unlike similar existing methods, it avoids finding terminal invariant sets, solving non-convex optimizations, and does not rely on knowing a control Lyapunov function (CLF), as it finds a CPA Lyapunov function explicitly. The method enforces a desired upper-bound on the decay rate of the state norm and finds the exact region of attraction. Thus, it can be also viewed as a systematic approach for finding Lipschitz CLFs in state and input constrained control-affine systems. Using the CLF, a minimum norm controller is also formulated by quadratic programming for online application.

[13]
Title: Improving gearshift controllers for electric vehicles with reinforcement learning
Subjects: Systems and Control (eess.SY)

During a multi-speed transmission development process, the final calibration of the gearshift controller parameters is usually performed on a physical test bench. Engineers typically treat the mapping from the controller parameters to the gearshift quality as a black-box, and use methods rooted in experimental design -- a purely statistical approach -- to infer the parameter combination that will maximize a chosen gearshift performance indicator. This approach unfortunately requires thousands of gearshift trials, ultimately discouraging the exploration of different control strategies. In this work, we calibrate the feedforward and feedback parameters of a gearshift controller using a model-based reinforcement learning algorithm adapted from Pilco. Experimental results show that the method optimizes the controller parameters with few gearshift trials. This approach can accelerate the exploration of gearshift control strategies, which is especially important for the emerging technology of multi-speed transmissions for electric vehicles.

[14]
Title: A Fault Location Method Using Direct Convolution: Electromagnetic Time Reversal or Not Reversal
Subjects: Signal Processing (eess.SP)

Electromagnetic time reversal (EMTR) is drawing increasing interest in short-circuit fault location. In this letter, we investigate the classic EMTR fault location methods and find that it is not necessary to reverse the obtained signal in time which is a standard operation in these methods before injecting it into the network. The effectiveness of EMTR fault location method results from the specific similarity of the transfer functions in the forward and reverse processes. Therefore, we can inject an arbitrary type and length of source in the reverse process to locate the fault. Based on this observation, we propose a new EMTR fault location method using direct convolution. This method is different from the traditional methods, and it only needs to pre-calculate the assumed fault transients for a given network, which can be stored in embedded hardware. The faults can be located efficiently via direct convolution of the signal collected from a fault and the pre-stored calculated transients, even using a fraction of the fault signal.

[15]
Title: Predicting lexical skills from oral reading with acoustic measures
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)

Literacy assessment is an important activity for education administrators across the globe. Typically achieved in a school setting by testing a child's oral reading, it is intensive in human resources. While automatic speech recognition (ASR) is a potential solution to the problem, it tends to be computationally expensive for hand-held devices apart from needing language and accent-specific speech for training. In this work, we propose a system to predict the word-decoding skills of a student based on simple acoustic features derived from the recording. We first identify a meaningful categorization of word-decoding skills by analyzing a manually transcribed data set of children's oral reading recordings. Next the automatic prediction of the category is attempted with the proposed acoustic features. Pause statistics, syllable rate and spectral and intensity dynamics are found to be reliable indicators of specific types of oral reading deficits, providing useful feedback by discriminating the different characteristics of beginning readers. This computationally simple and language-agnostic approach is found to provide a performance close to that obtained using a language dependent ASR that required considerable tuning of its parameters.

[16]
Title: DeepAoANet: Learning Angle of Arrival from Software Defined Radios with Deep Neural Networks
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Robotics (cs.RO)

Direction finding and positioning systems based on RF signals are significantly impacted by multipath propagation, particularly in indoor environments. Existing algorithms (e.g MUSIC) perform poorly in resolving Angle of Arrival (AoA) in the presence of multipath or when operating in a weak signal regime. We note that digitally sampled RF frontends allow for the easy analysis of signals, and their delayed components. Low-cost Software-Defined Radio (SDR) modules enable Channel State Information (CSI) extraction across a wide spectrum, motivating the design of an enhanced Angle-of-Arrival (AoA) solution. We propose a Deep Learning approach to deriving AoA from a single snapshot of the SDR multichannel data. We compare and contrast deep-learning based angle classification and regression models, to estimate up to two AoAs accurately. We have implemented the inference engines on different platforms to extract AoAs in real-time, demonstrating the computational tractability of our approach. To demonstrate the utility of our approach we have collected IQ (In-phase and Quadrature components) samples from a four-element Universal Linear Array (ULA) in various Light-of-Sight (LOS) and Non-Line-of-Sight (NLOS) environments, and published the dataset. Our proposed method demonstrates excellent reliability in determining number of impinging signals and realized mean absolute AoA errors less than $2^{\circ}$.

### Cross-lists for Thu, 2 Dec 21

[17]  arXiv:2112.00007 (cross-list from cs.GR) [pdf, other]
Title: Sound-Guided Semantic Image Manipulation
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

The recent success of the generative model shows that leveraging the multi-modal embedding space can manipulate an image using text information. However, manipulating an image with other sources rather than text, such as sound, is not easy due to the dynamic characteristics of the sources. Especially, sound can convey vivid emotions and dynamic expressions of the real world. Here, we propose a framework that directly encodes sound into the multi-modal (image-text) embedding space and manipulates an image from the space. Our audio encoder is trained to produce a latent representation from an audio input, which is forced to be aligned with image and text representations in the multi-modal embedding space. We use a direct latent optimization method based on aligned embeddings for sound-guided image manipulation. We also show that our method can mix text and audio modalities, which enrich the variety of the image modification. We verify the effectiveness of our sound-guided image manipulation quantitatively and qualitatively. We also show that our method can mix different modalities, i.e., text and audio, which enrich the variety of the image modification. The experiments on zero-shot audio classification and semantic-level image classification show that our proposed model outperforms other text and sound-guided state-of-the-art methods.

[18]  arXiv:2112.00011 (cross-list from cs.CV) [pdf, other]
Title: Predicting Poverty Level from Satellite Imagery using Deep Neural Networks
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Determining the poverty levels of various regions throughout the world is crucial in identifying interventions for poverty reduction initiatives and directing resources fairly. However, reliable data on global economic livelihoods is hard to come by, especially for areas in the developing world, hampering efforts to both deploy services and monitor/evaluate progress. This is largely due to the fact that this data is obtained from traditional door-to-door surveys, which are time consuming and expensive. Overhead satellite imagery contain characteristics that make it possible to estimate the region's poverty level. In this work, I develop deep learning computer vision methods that can predict a region's poverty level from an overhead satellite image. I experiment with both daytime and nighttime imagery. Furthermore, because data limitations are often the barrier to entry in poverty prediction from satellite imagery, I explore the impact that data quantity and data augmentation have on the representational power and overall accuracy of the networks. Lastly, to evaluate the robustness of the networks, I evaluate them on data from continents that were absent in the development set.

[19]  arXiv:2112.00165 (cross-list from cs.RO) [pdf, other]
Title: Coordinated Multi-Robot Trajectory Tracking over Sampled Communication
Comments: 23 pages (main article: 14 pages; proofs: 9 pages); 22 figures (main article: 12 figures). Submitted to Automatica
Subjects: Robotics (cs.RO); Multiagent Systems (cs.MA); Systems and Control (eess.SY)

In this paper, we propose an inverse-kinematics controller for a class of multi-robot systems in the scenario of sampled communication. The goal is to make a group of robots perform trajectory tracking {in a coordinated way} when the sampling time of communications is non-negligible, disrupting the theoretical convergence guarantees of standard control designs. Given a feasible desired trajectory in the configuration space, the proposed controller receives measurements from the system at sampled time instants and computes velocity references for the robots, which are tracked by a low-level controller. We propose a jointly designed feedback plus feedforward controller with provable stability and error convergence guarantees, and further show that the obtained controller is amenable of decentralized implementation. We test the proposed control strategy via numerical simulations in the scenario of cooperative aerial manipulation of a cable-suspended load using a realistic simulator (Fly-Crane). Finally, we compare our proposed decentralized controller with centralized approaches that adapt the feedback gain online through smart heuristics, and show that it achieves comparable performance.

[20]  arXiv:2112.00190 (cross-list from cs.LG) [pdf]
Title: Is the use of Deep Learning and Artificial Intelligence an appropriate means to locate debris in the ocean without harming aquatic wildlife?
Comments: reference list is added/updated; sorry for causing any inconveniences. 3681 words, 14 pages
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

With the global issue of plastic debris ever expanding, it is about time that the technology industry stepped in. This study aims to assess whether deep learning can successfully distinguish between marine life and man-made debris underwater. The aim is to find if we are safely able to clean up our oceans with Artificial Intelligence without disrupting the delicate balance of the aquatic ecosystems. The research explores the use of Convolutional Neural Networks from the perspective of protecting the ecosystem, rather than primarily collecting rubbish. We did this by building a custom-built, deep learning model, with an original database including 1,644 underwater images and used a binary classification to sort synthesised material from aquatic life. We concluded that although it is possible to safely distinguish between debris and life, further exploration with a larger database and stronger CNN structure has the potential for much more promising results.

[21]  arXiv:2112.00209 (cross-list from cs.SD) [pdf, ps, other]
Title: Environmental Sound Extraction Using Onomatopoeia
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Onomatopoeia, which is a character sequence that phonetically imitates a sound, is effective in expressing characteristics of sound such as duration, pitch, and timbre. We propose an environmental-sound-extraction method using onomatopoeia to specify the target sound to be extracted. With this method, we estimate a time-frequency mask from an input mixture spectrogram and onomatopoeia by using U-Net architecture then extract the corresponding target sound by masking the spectrogram. Experimental results indicate that the proposed method can extract only the target sound corresponding to onomatopoeia and performs better than conventional methods that use sound-event classes to specify the target sound.

[22]  arXiv:2112.00216 (cross-list from cs.CV) [pdf, other]
Title: PoseKernelLifter: Metric Lifting of 3D Human Pose using Sound
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Reconstructing the 3D pose of a person in metric scale from a single view image is a geometrically ill-posed problem. For example, we can not measure the exact distance of a person to the camera from a single view image without additional scene assumptions (e.g., known height). Existing learning based approaches circumvent this issue by reconstructing the 3D pose up to scale. However, there are many applications such as virtual telepresence, robotics, and augmented reality that require metric scale reconstruction. In this paper, we show that audio signals recorded along with an image, provide complementary information to reconstruct the metric 3D pose of the person.
The key insight is that as the audio signals traverse across the 3D space, their interactions with the body provide metric information about the body's pose. Based on this insight, we introduce a time-invariant transfer function called pose kernel -- the impulse response of audio signals induced by the body pose. The main properties of the pose kernel are that (1) its envelope highly correlates with 3D pose, (2) the time response corresponds to arrival time, indicating the metric distance to the microphone, and (3) it is invariant to changes in the scene geometry configurations. Therefore, it is readily generalizable to unseen scenes. We design a multi-stage 3D CNN that fuses audio and visual signals and learns to reconstruct 3D pose in a metric scale. We show that our multi-modal method produces accurate metric reconstruction in real world scenes, which is not possible with state-of-the-art lifting approaches including parametric mesh regression and depth regression.

[23]  arXiv:2112.00250 (cross-list from cs.CV) [pdf]
Title: Shallow Network Based on Depthwise Over-Parameterized Convolution for Hyperspectral Image Classification
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Recently, convolutional neural network (CNN) techniques have gained popularity as a tool for hyperspectral image classification (HSIC). To improve the feature extraction efficiency of HSIC under the condition of limited samples, the current methods generally use deep models with plenty of layers. However, deep network models are prone to overfitting and gradient vanishing problems when samples are limited. In addition, the spatial resolution decreases severely with deeper depth, which is very detrimental to spatial edge feature extraction. Therefore, this letter proposes a shallow model for HSIC, which is called depthwise over-parameterized convolutional neural network (DOCNN). To ensure the effective extraction of the shallow model, the depthwise over-parameterized convolution (DO-Conv) kernel is introduced to extract the discriminative features. The depthwise over-parameterized Convolution kernel is composed of a standard convolution kernel and a depthwise convolution kernel, which can extract the spatial feature of the different channels individually and fuse the spatial features of the whole channels simultaneously. Moreover, to further reduce the loss of spatial edge features due to the convolution operation, a dense residual connection (DRC) structure is proposed to apply to the feature extraction part of the whole network. Experimental results obtained from three benchmark data sets show that the proposed method outperforms other state-of-the-art methods in terms of classification accuracy and computational efficiency.

[24]  arXiv:2112.00299 (cross-list from cs.IT) [pdf, ps, other]
Title: STAR-RISs: A Correlated T&R Phase-Shift Model and Practical Phase-Shift Configuration Strategies
Comments: 31 pages, 9 figures, submitted to IEEE journals for possible publication
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP); Applied Physics (physics.app-ph)

A correlated transmission and reflection (T&R) phase-shift model is proposed for passive lossless simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RISs). A STAR-RIS-aided two-user downlink communication system is investigated for both orthogonal multiple access (OMA) and non-orthogonal multiple access (NOMA). To evaluate the impact of the correlated T&R phase-shift model on the communication performance, three phase-shift configuration strategies are developed, namely the primary-secondary phase-shift configuration (PS-PSC), the diversity preserving phase-shift configuration (DP-PSC), and the T/R-group phase-shift configuration (TR-PSC) strategies. Furthermore, we derive the outage probabilities for the three proposed phase-shift configuration strategies as well as for those of the random phase-shift configuration and the independent phase-shift model, which constitute performance lower and upper bounds, respectively. Then, the diversity order of each strategy is investigated based on the obtained analytical results. It is shown that the proposed DP-PSC strategy achieves full diversity order simultaneously for users located on both sides of the STAR-RIS. Moreover, power scaling laws are derived for the three proposed strategies and for the random phase-shift configuration. Numerical simulations reveal a performance gain if the users on both sides of the STAR-RIS are served by NOMA instead of OMA. Moreover, it is shown that the proposed DP-PSC strategy yields the same diversity order as achieved by STAR-RISs under the independent phase-shift model and a comparable power scaling law with only 4 dB reduction in received power.

[25]  arXiv:2112.00330 (cross-list from cs.IT) [pdf, other]
Title: Soft-Output Joint Channel Estimation and Data Detection using Deep Unfolding
Comments: Presented at the 2021 IEEE Information Theory Workshop (ITW)
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

We propose a novel soft-output joint channel estimation and data detection (JED) algorithm for multiuser (MU) multiple-input multiple-output (MIMO) wireless communication systems. Our algorithm approximately solves a maximum a-posteriori JED optimization problem using deep unfolding and generates soft-output information for the transmitted bits in every iteration. The parameters of the unfolded algorithm are computed by a hyper-network that is trained with a binary cross entropy (BCE) loss. We evaluate the performance of our algorithm in a coded MU-MIMO system with 8 basestation antennas and 4 user equipments and compare it to state-of-the-art algorithms separate channel estimation from soft-output data detection. Our results demonstrate that our JED algorithm outperforms such data detectors with as few as 10 iterations.

[26]  arXiv:2112.00350 (cross-list from cs.CL) [pdf, other]
Title: Investigation of Training Label Error Impact on RNN-T
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

In this paper, we propose an approach to quantitatively analyze impacts of different training label errors to RNN-T based ASR models. The result shows deletion errors are more harmful than substitution and insertion label errors in RNN-T training data. We also examined label error impact mitigation approaches on RNN-T and found that, though all the methods mitigate the label-error-caused degradation to some extent, they could not remove the performance gap between the models trained with and without the presence of label errors. Based on the analysis results, we suggest to design data pipelines for RNN-T with higher priority on reducing deletion label errors. We also find that ensuring high-quality training labels remains important, despite of the existence of the label error mitigation approaches.

[27]  arXiv:2112.00355 (cross-list from cs.SD) [pdf, other]
Title: Score Transformer: Generating Musical Score from Note-level Representation
Authors: Masahiro Suzuki
Comments: Accepted at ACM Multimedia Asia 2021 (MMAsia '21); Project page: this https URL
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)

In this paper, we explore the tokenized representation of musical scores using the Transformer model to automatically generate musical scores. Thus far, sequence models have yielded fruitful results with note-level (MIDI-equivalent) symbolic representations of music. Although the note-level representations can comprise sufficient information to reproduce music aurally, they cannot contain adequate information to represent music visually in terms of notation. Musical scores contain various musical symbols (e.g., clef, key signature, and notes) and attributes (e.g., stem direction, beam, and tie) that enable us to visually comprehend musical content. However, automated estimation of these elements has yet to be comprehensively addressed. In this paper, we first design score token representation corresponding to the various musical elements. We then train the Transformer model to transcribe note-level representation into appropriate music notation. Evaluations of popular piano scores show that the proposed method significantly outperforms existing methods on all 12 musical aspects that were investigated. We also explore an effective notation-level token representation to work with the model and determine that our proposed representation produces the steadiest results.

[28]  arXiv:2112.00374 (cross-list from cs.CV) [pdf, other]
Title: CLIPstyler: Image Style Transfer with a Single Text Condition
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Image and Video Processing (eess.IV)

Existing neural style transfer methods require reference style images to transfer texture information of style images to content images. However, in many practical situations, users may not have reference style images but still be interested in transferring styles by just imagining them. In order to deal with such applications, we propose a new framework that enables a style transfer `without' a style image, but only with a text description of the desired style. Using the pre-trained text-image embedding model of CLIP, we demonstrate the modulation of the style of content images only with a single text condition. Specifically, we propose a patch-wise text-image matching loss with multiview augmentations for realistic texture transfer. Extensive experimental results confirmed the successful image style transfer with realistic textures that reflect semantic query texts.

[29]  arXiv:2112.00457 (cross-list from cs.IT) [pdf, other]
Title: Broadband beam steering for misaligned multi-mode OAM communication systems
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Orbital angular momentum (OAM) at radio frequency (RF) has attracted more and more attention as a novel approach of multiplexing a set of orthogonal OAM modes on the same frequency channel to achieve high spectral efficiency (SE). However, the precondition for maintaining the orthogonality among different OAM modes is perfect alignment of the transmit and receive uniform circular arrays (UCAs), which is difficult to be satisfied in practical wireless communication scenario. Therefore, to achieve available multi-mode OAM broadband wireless communication, we first investigate the effect of oblique angles on the transmission performance of the multi-mode OAM broadband system in the non-parallel misalignment case. Then, we compare the UCA-based RF analog and baseband digital transceiver structures and corresponding beam steering schemes. Mathematical analysis and numerical simulations validate that the SE of the misaligned multi-mode OAM broadband system is quite low, while analog and digital beam steering both can significantly improve the SE of the system. However, digital beam steering can obtain higher SE than analog beam steering especially when the bandwidth and the number of array elements are large, which validates that baseband digital transceiver with digital beam steering is more suitable for multi-mode OAM broadband wireless communication systems in practice.

[30]  arXiv:2112.00485 (cross-list from cs.CV) [pdf, other]
Title: Learning Transformer Features for Image Quality Assessment
Authors: Chao Zeng, Sam Kwong
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Objective image quality evaluation is a challenging task, which aims to measure the quality of a given image automatically. According to the availability of the reference images, there are Full-Reference and No-Reference IQA tasks, respectively. Most deep learning approaches use regression from deep features extracted by Convolutional Neural Networks. For the FR task, another option is conducting a statistical comparison on deep features. For all these methods, non-local information is usually neglected. In addition, the relationship between FR and NR tasks is less explored. Motivated by the recent success of transformers in modeling contextual information, we propose a unified IQA framework that utilizes CNN backbone and transformer encoder to extract features. The proposed framework is compatible with both FR and NR modes and allows for a joint training scheme. Evaluation experiments on three standard IQA datasets, i.e., LIVE, CSIQ and TID2013, and KONIQ-10K, show that our proposed model can achieve state-of-the-art FR performance. In addition, comparable NR performance is achieved in extensive experiments, and the results show that the NR performance can be leveraged by the joint training scheme.

[31]  arXiv:2112.00528 (cross-list from physics.med-ph) [pdf]
Title: Objective assessment of corneal transparency in the clinical setting with standard SD-OCT devices
Comments: 15 pages, 7 figures, 1 table
Subjects: Medical Physics (physics.med-ph); Image and Video Processing (eess.IV)

PURPOSE: To develop an automated algorithm allowing extraction of quantitative corneal transparency parameters from clinical spectral-domain OCT images. To establish a representative dataset of normative transparency values from healthy corneas.
METHODS: SD-OCT images of 83 normal corneas (ages 22-50 years) from a standard clinical device (RTVue-XR Avanti, Optovue Inc.) were processed. A pre-processing procedure is applied first, including a derivative approach and a PCA-based correction mask, to eliminate common central artifacts (i.e., apex-centered column saturation artifact and posterior stromal artifact) and enable standardized analysis. The mean intensity stromal-depth profile is then extracted over a 6-mm-wide corneal area and analyzed according to our previously developed method deriving quantitative transparency parameters related to the physics of light propagation in tissues, notably tissular heterogeneity (Birge ratio; $B_r$), followed by the photon mean-free path ($l_s$) in homogeneous tissues (i.e., $B_r \sim 1$).
RESULTS: After confirming stromal homogeneity ($B_r < 10$, IDR: 1.9-5.1), we measured a median $l_s$ of 570 $\mu$m (IDR: 270-2400 $\mu$m). Considering corneal thicknesses, this may be translated into a median fraction of transmitted (coherent) light $T_{coh(stroma)}$ of 51$\%$ (IDR: 22-83$\%$). No statistically significant correlation between transparency and age or thickness was found.
CONCLUSIONS: Our algorithm provides robust and quantitative measurement of corneal transparency from standard clinical SD-OCT images. It yields lower transparency values than previously reported, which may be attributed to our method being exclusively sensitive to spatially coherent light. Excluding images with central artifacts wider than 300 $\mu$m also raises our median $T_{coh(stroma)}$ to 70$\%$ (IDR: 34-87$\%$).

[32]  arXiv:2112.00556 (cross-list from cs.CV) [pdf, other]
Title: Semi-Supervised Surface Anomaly Detection of Composite Wind Turbine Blades From Drone Imagery
Comments: In-proceedings at 2022 17th International Conference on Computer Vision Theory and Applications (VISAPP)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

[33]  arXiv:2112.00560 (cross-list from cs.CV) [pdf, other]
Title: Attribute Artifacts Removal for Geometry-based Point Cloud Compression
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Geometry-based point cloud compression (G-PCC) can achieve remarkable compression efficiency for point clouds. However, it still leads to serious attribute compression artifacts, especially under low bitrate scenarios. In this paper, we propose a Multi-Scale Graph Attention Network (MS-GAT) to remove the artifacts of point cloud attributes compressed by G-PCC. We first construct a graph based on point cloud geometry coordinates and then use the Chebyshev graph convolutions to extract features of point cloud attributes. Considering that one point may be correlated with points both near and far away from it, we propose a multi-scale scheme to capture the short and long range correlations between the current point and its neighboring and distant points. To address the problem that various points may have different degrees of artifacts caused by adaptive quantization, we introduce the quantization step per point as an extra input to the proposed network. We also incorporate a graph attentional layer into the network to pay special attention to the points with more attribute artifacts. To the best of our knowledge, this is the first attribute artifacts removal method for G-PCC. We validate the effectiveness of our method over various point clouds. Experimental results show that our proposed method achieves an average of 9.28% BD-rate reduction. In addition, our approach achieves some performance improvements for the downstream point cloud semantic segmentation task.

[34]  arXiv:2112.00592 (cross-list from cs.IT) [pdf, other]
Title: BeamSync: Over-The-Air Carrier Synchronization in Distributed RadioWeaves
Comments: 6 pages, 6 figures. Accepted in 25th International ITG Workshop on Smart Antennas (WSA 2021)
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

In a distributed multi-antenna system, multiple geographically separated transmit nodes communicate simultaneously to a receive node. Synchronization of these nodes is essential to achieve a good performance at the receiver. RadioWeaves is a new paradigm of cell-free massive MIMO array deployment using distributed multi-antenna panels in indoor environments. In this paper, we study the carrier frequency synchronization problem in distributed RadioWeave panels. We propose a novel, over-the-air synchronization protocol, which we call as BeamSync, to synchronize all the different multi-antenna transmit panels. We also show that beamforming the synchronization signal in the dominant direction of the channel between the panels is optimal and the synchronization performance is significantly better than traditional beamforming techniques.

[35]  arXiv:2112.00633 (cross-list from cs.NI) [pdf, other]
Title: TEDGE-Caching: Transformer-based Edge Caching Towards 6G Networks
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG); Signal Processing (eess.SP)

As a consequence of the COVID-19 pandemic, the demand for telecommunication for remote learning/working and telemedicine has significantly increased. Mobile Edge Caching (MEC) in the 6G networks has been evolved as an efficient solution to meet the phenomenal growth of the global mobile data traffic by bringing multimedia content closer to the users. Although massive connectivity enabled by MEC networks will significantly increase the quality of communications, there are several key challenges ahead. The limited storage of edge nodes, the large size of multimedia content, and the time-variant users' preferences make it critical to efficiently and dynamically predict the popularity of content to store the most upcoming requested ones before being requested. Recent advancements in Deep Neural Networks (DNNs) have drawn much research attention to predict the content popularity in proactive caching schemes. Existing DNN models in this context, however, suffer from longterm dependencies, computational complexity, and unsuitability for parallel computing. To tackle these challenges, we propose an edge caching framework incorporated with the attention-based Vision Transformer (ViT) neural network, referred to as the Transformer-based Edge (TEDGE) caching, which to the best of our knowledge, is being studied for the first time. Moreover, the TEDGE caching framework requires no data pre-processing and additional contextual information. Simulation results corroborate the effectiveness of the proposed TEDGE caching framework in comparison to its counterparts.

[36]  arXiv:2112.00682 (cross-list from cs.CE) [pdf, ps, other]
Title: Quasi-3D Magneto-Thermal Quench Simulation Scheme for Superconducting Accelerator Magnets
Comments: 5 pages, 8 figures, MT27 conference special issue paper
Subjects: Computational Engineering, Finance, and Science (cs.CE); Systems and Control (eess.SY)

To tackle the strong multi-scale problem in the quench simulation of superconducting accelerator magnets, this work proposes a hybrid numerical method which uses two-dimensional first-order finite-elements in the magnet cross-section and one-dimensional higher-order orthogonal polynomials in longitudinal direction.

[37]  arXiv:2112.00698 (cross-list from cs.CV) [pdf, ps, other]
Title: CondenseNeXt: An Ultra-Efficient Deep Neural Network for Embedded Systems
Comments: 5 pages, 3 figures, published in an IEEE Conference
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Due to the advent of modern embedded systems and mobile devices with constrained resources, there is a great demand for incredibly efficient deep neural networks for machine learning purposes. There is also a growing concern of privacy and confidentiality of user data within the general public when their data is processed and stored in an external server which has further fueled the need for developing such efficient neural networks for real-time inference on local embedded systems. The scope of our work presented in this paper is limited to image classification using a convolutional neural network. A Convolutional Neural Network (CNN) is a class of Deep Neural Network (DNN) widely used in the analysis of visual images captured by an image sensor, designed to extract information and convert it into meaningful representations for real-time inference of the input data. In this paper, we propose a neoteric variant of deep convolutional neural network architecture to ameliorate the performance of existing CNN architectures for real-time inference on embedded systems. We show that this architecture, dubbed CondenseNeXt, is remarkably efficient in comparison to the baseline neural network architecture, CondenseNet, by reducing trainable parameters and FLOPs required to train the network whilst maintaining a balance between the trained model size of less than 3.0 MB and accuracy trade-off resulting in an unprecedented computational efficiency.

[38]  arXiv:2112.00702 (cross-list from cs.SD) [pdf, other]
Title: Semi-supervised music emotion recognition using noisy student training and harmonic pitch class profiles
Authors: Hao Hao Tan
Comments: MediaEval 2021 submission for Emotion and Themes in Music
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

We present Mirable's submission to the 2021 Emotions and Themes in Music challenge. In this work, we intend to address the question: can we leverage semi-supervised learning techniques on music emotion recognition? With that, we experiment with noisy student training, which has improved model performance in the image classification domain. As the noisy student method requires a strong teacher model, we further delve into the factors including (i) input training length and (ii) complementary music representations to further boost the performance of the teacher model. For (i), we find that models trained with short input length perform better in PR-AUC, whereas those trained with long input length perform better in ROC-AUC. For (ii), we find that using harmonic pitch class profiles (HPCP) consistently improve tagging performance, which suggests that harmonic representation is useful for music emotion tagging. Finally, we find that noisy student method only improves tagging results for the case of long training length. Additionally, we find that ensembling representations trained with different training lengths can improve tagging results significantly, which suggest a possible direction to explore incorporating multiple temporal resolutions in the network architecture for future work.

### Replacements for Thu, 2 Dec 21

[39]  arXiv:1912.07362 (replaced) [pdf, other]
Title: Dynamic controller that operates over homomorphically encrypted data for infinite time horizon
Subjects: Systems and Control (eess.SY)
[40]  arXiv:1912.10036 (replaced) [pdf, other]
Title: A Family of Deep Learning Architectures for Channel Estimation and Hybrid Beamforming in Multi-Carrier mm-Wave Massive MIMO
Comments: Accepted Paper in IEEE Transactions on Cognitive Communications and Networking. arXiv admin note: text overlap with arXiv:1910.14240
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT); Machine Learning (cs.LG)
[41]  arXiv:2003.08715 (replaced) [pdf, other]
Title: Structural-constrained Methods for the Identification of Unobservable False Data Injection Attacks in Power Systems
Subjects: Signal Processing (eess.SP)
[42]  arXiv:2003.10094 (replaced) [pdf, other]
Title: Penalized and Decentralized Contextual Bandit Learning for WLAN Channel Allocation with Contention-Driven Feature Extraction
Comments: 12 pages, 6 figures, 3 Tables
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
[43]  arXiv:2008.07527 (replaced) [pdf, other]
Title: Music Boundary Detection using Convolutional Neural Networks: A comparative analysis of combined input features
Journal-ref: International Journal of Interactive Multimedia & Artificial Intelligence (2021), vol. 7, no 2, p. 78-88
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[44]  arXiv:2009.09899 (replaced) [pdf, other]
Title: Clustering COVID-19 Lung Scans
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
[45]  arXiv:2010.09453 (replaced) [pdf, other]
Title: Fast accuracy estimation of deep learning based multi-class musical source separation
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[46]  arXiv:2011.12108 (replaced) [pdf, other]
Title: Wide-angle Image Rectification: A Survey
Comments: Accepted by the International Journal of Computer Vision (IJCV). Both the datasets and source code are available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[47]  arXiv:2011.15014 (replaced) [pdf, other]
Title: Learning from Human Directional Corrections
Subjects: Robotics (cs.RO); Machine Learning (cs.LG); Systems and Control (eess.SY)
[48]  arXiv:2012.03646 (replaced) [pdf, other]
Title: A novel dataset for the identification of computer generated melodies in the CSMT challenge
Journal-ref: In Proceedings of the 8th Conference on Sound and Music Technology. CSMT 2020. Lecture Notes in Electrical Engineering, vol 761. Springer
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[49]  arXiv:2101.03468 (replaced) [pdf, other]
Title: HePPCAT: Probabilistic PCA for Data with Heteroscedastic Noise
Comments: This article has been accepted for publication in the IEEE Transactions on Signal Processing. (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See this https URL for more information. 26 pages, 14 figures
Journal-ref: IEEE Transactions on Signal Processing, Vol. 69, pp. 4819-4834, 2021
Subjects: Statistics Theory (math.ST); Signal Processing (eess.SP)
[50]  arXiv:2102.00883 (replaced) [pdf, other]
Title: Stochastic High Fidelity Simulation and Scenarios for Testing of Fixed Wing Autonomous GNSS-Denied Navigation Algorithms
Authors: Eduardo Gallo
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
[51]  arXiv:2102.06492 (replaced) [pdf, other]
Title: Customizable Stochastic High Fidelity Model of the Sensors and Camera onboard a Low SWaP Fixed Wing Autonomous Aircraft
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
[52]  arXiv:2103.11791 (replaced) [pdf, ps, other]
Title: Machine Learning Empowered Resource Allocation in IRS Aided MISO-NOMA Networks
Authors: X. Gao, Y. Liu, X. Liu, L. Song
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)
[53]  arXiv:2105.09163 (replaced) [pdf, other]
Title: High-Performance FPGA-based Accelerator for Bayesian Neural Networks
Comments: Design Automation Conference (DAC) 2021
Subjects: Hardware Architecture (cs.AR); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[54]  arXiv:2105.13598 (replaced) [pdf, other]
Title: End-to-End Deep Fault Tolerant Control
Subjects: Systems and Control (eess.SY); Robotics (cs.RO)
[55]  arXiv:2105.14656 (replaced) [pdf, other]
Title: Human-level COVID-19 Diagnosis from Low-dose CT Scans Using a Two-stage Time-distributed Capsule Network
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[56]  arXiv:2106.06201 (replaced) [pdf, other]
Title: Distributed Urban Freeway Traffic Optimization Considering Congestion Propagation
Subjects: Systems and Control (eess.SY)
[57]  arXiv:2106.07533 (replaced) [pdf, other]
Title: Posterior Temperature Optimization in Variational Inference for Inverse Problems
Comments: Accepted at Bayesian Deep Learning workshop, NeurIPS 2021
Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG)
[58]  arXiv:2107.02375 (replaced) [pdf, other]
Title: SplitAVG: A heterogeneity-aware federated deep learning method for medical imaging
Subjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[59]  arXiv:2107.09153 (replaced) [pdf, other]
Title: User Association in Dense mmWave Networks as Restless Bandits
Subjects: Information Theory (cs.IT); Systems and Control (eess.SY)
[60]  arXiv:2107.12719 (replaced) [pdf, other]
Title: The CORSMAL benchmark for the prediction of the properties of containers
Comments: 13 pages, 6 tables, 7 figures, Pre-print submitted to IEEE Access
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[61]  arXiv:2108.06911 (replaced) [pdf, ps, other]
Title: Optimal Actor-Critic Policy with Optimized Training Datasets
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
[62]  arXiv:2109.01120 (replaced) [pdf, other]
Title: Automatic Diagnosis of Schizophrenia in EEG Signals Using CNN-LSTM Models
Journal-ref: Front. Neuroinform. 15:777977 (2021)
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
[63]  arXiv:2109.04405 (replaced) [pdf, other]
Title: An Accelerated Proximal Gradient-based Model Predictive Control Algorithm
Authors: Jia Wang, Ying Yang
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
[64]  arXiv:2109.09484 (replaced) [pdf, other]
Title: On Circuit-based Hybrid Quantum Neural Networks for Remote Sensing Imagery Classification
Comments: Submitted to the JSTARS special issue on "Quantum resources for Earth Observation" for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Emerging Technologies (cs.ET); Quantum Physics (quant-ph)
[65]  arXiv:2110.06634 (replaced) [pdf, other]
Title: End-to-end translation of human neural activity to speech with a dual-dual generative adversarial network
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS); Neurons and Cognition (q-bio.NC)
[66]  arXiv:2110.07699 (replaced) [pdf, other]
Title: Safe Autonomous Racing via Approximate Reachability on Ego-vision
Comments: 17 pages, 15 figures, 3 tables
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY)
[67]  arXiv:2110.08721 (replaced) [pdf, other]
Title: CAE-Transformer: Transformer-based Model to Predict Invasiveness of Lung Adenocarcinoma Subsolid Nodules from Non-thin Section 3D CT Scans
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[68]  arXiv:2110.12042 (replaced) [pdf, other]
Title: A Hybrid Approach for Approximating the Ideal Observer for Joint Signal Detection and Estimation Tasks by Use of Supervised Learning and Markov-Chain Monte Carlo Methods
Subjects: Signal Processing (eess.SP)
[69]  arXiv:2111.00273 (replaced) [pdf, other]
Title: Cross-Modality Fusion Transformer for Multispectral Object Detection
Comments: 6 figures, 4 tables, under consideration at Pattern Recognition Letters
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[70]  arXiv:2111.02174 (replaced) [pdf, other]
Title: Unsupervised detection and open-set classification of fast-ramped flexibility activation events
Comments: Submitted to Applied Energy. Revised by the authors
Subjects: Systems and Control (eess.SY); Machine Learning (stat.ML)
[71]  arXiv:2111.02363 (replaced) [pdf, other]
Title: Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[72]  arXiv:2111.08795 (replaced) [pdf, other]
Title: A Projection Operator-based Newton Method for the Trajectory Optimization of Closed Quantum Systems
Subjects: Quantum Physics (quant-ph); Systems and Control (eess.SY)
[73]  arXiv:2111.11305 (replaced) [pdf, other]
Title: Universal Efficient Variable-rate Neural Image Compression
Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG)
[74]  arXiv:2111.12124 (replaced) [pdf, ps, other]
Title: Towards Learning Universal Audio Representations
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[75]  arXiv:2111.13670 (replaced) [pdf, other]
Title: Non-Convex Recovery from Phaseless Low-Resolution Blind Deconvolution Measurements using Noisy Masked Patterns
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[76]  arXiv:2111.14448 (replaced) [pdf, other]
Title: AVA-AVD: Audio-visual Speaker Diarization in the Wild
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[77]  arXiv:2111.14831 (replaced) [pdf]
Title: MIST-net: Multi-domain Integrative Swin Transformer network for Sparse-View CT Reconstruction
Comments: 24 pages, 10 figures, 57 references
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[78]  arXiv:2111.14955 (replaced) [pdf, ps, other]
Title: Privacy-Preserving Serverless Edge Learning with Decentralized Small Data
Comments: Submitted for publication in the IEEE Network
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
[79]  arXiv:2111.15626 (replaced) [pdf, other]
Title: Variational Autoencoders for Studying the Manifold of Precoding Matrices with High Spectral Efficiency
Authors: Evgeny Bobrov (1 and 2), Alexander Markov (3), Dmitry Vetrov (3) ((1) Moscow Research Center, Huawei Technologies, Russia, (2) M. V. Lomonosov Moscow State University, Russia, (3) National Research University Higher School of Economics, Russia)