We gratefully acknowledge support from
the Simons Foundation and member institutions.

Audio and Speech Processing

Authors and titles for recent submissions

[ total of 48 entries: 1-25 | 26-48 ]
[ showing 25 entries per page: fewer | more | all ]

Wed, 18 May 2022

[1]  arXiv:2205.08138 [pdf, ps, other]
Title: Composing General Audio Representation by Fusing Multilayer Features of a Pre-trained Model
Comments: 5 pages, 4 figures and 4 tables. Accepted by EUSIPCO 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[2]  arXiv:2205.08014 [pdf, ps, other]
Title: Accented Speech Recognition: Benchmarking, Pre-training, and Diverse Data
Comments: 5 pages, 3 tables
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[3]  arXiv:2205.08459 (cross-list from cs.SD) [pdf, other]
Title: Dynamic Recognition of Speakers for Consent Management by Contrastive Embedding Replay
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. The current version includes 32 pages, 10 figures, and 2 tables
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[4]  arXiv:2205.08455 (cross-list from cs.SD) [pdf, other]
Title: Utterance Weighted Multi-Dilation Temporal Convolutional Networks for Monaural Speech Dereverberation
Comments: Submitted to IWAENC 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[5]  arXiv:2205.08180 (cross-list from cs.CL) [pdf, other]
Title: SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[6]  arXiv:2205.08007 (cross-list from cs.MM) [pdf, other]
Title: Perceptual Evaluation on Audio-visual Dataset of 360 Content
Comments: 6 pages, 5 figures, International Conference on Multimedia and Expo 2022
Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)

Tue, 17 May 2022

[7]  arXiv:2205.07390 [pdf, other]
Title: Learning Representations for New Sound Classes With Continual Self-Supervised Learning
Comments: Submitted to IEEE Signal Processing Letters
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[8]  arXiv:2205.07211 [pdf, other]
Title: GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech Synthesis
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[9]  arXiv:2205.07180 [pdf, other]
Title: Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT
Comments: Submitted to Interspeech
Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
[10]  arXiv:2205.07086 [pdf, other]
Title: Collar-aware Training for Streaming Speaker Change Detection in Broadcast Speech
Comments: Accepted to Speaker Odyssey 2022
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[11]  arXiv:2205.07083 [pdf, other]
Title: Pretraining Approaches for Spoken Language Recognition: TalTech Submission to the OLR 2021 Challenge
Comments: Accepted to Speaker Odyssey 2022
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
[12]  arXiv:2205.06931 [pdf, other]
Title: Task splitting for DNN-based acoustic echo and noise removal
Comments: submitted to IWAENC 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[13]  arXiv:2205.07711 (cross-list from cs.SD) [pdf, other]
Title: Transferability of Adversarial Attacks on Synthetic Speech Detection
Comments: 5 pages, submit to Interspeech2022
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Audio and Speech Processing (eess.AS)
[14]  arXiv:2205.07682 (cross-list from cs.SD) [pdf, ps, other]
Title: L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data
Comments: accepted for IEEE SMARTCOMP 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[15]  arXiv:2205.07646 (cross-list from cs.CL) [pdf, other]
Title: A Fast Attention Network for Joint Intent Detection and Slot Filling on Edge Devices
Comments: 9 pages, 4 figures
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[16]  arXiv:2205.07450 (cross-list from cs.SD) [pdf, other]
Title: PRISM: Pre-trained Indeterminate Speaker Representation Model for Speaker Diarization and Speaker Verification
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[17]  arXiv:2205.07319 (cross-list from cs.SD) [pdf]
Title: cMelGAN: An Efficient Conditional Generative Model Based on Mel Spectrograms
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[18]  arXiv:2205.07301 (cross-list from cs.GR) [pdf, other]
Title: Conditional Vector Graphics Generation for Music Cover Images
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[19]  arXiv:2205.07123 (cross-list from cs.CL) [pdf, other]
Title: The VoicePrivacy 2020 Challenge Evaluation Plan
Comments: arXiv admin note: text overlap with arXiv:2203.12468
Subjects: Computation and Language (cs.CL); Cryptography and Security (cs.CR); Audio and Speech Processing (eess.AS)
[20]  arXiv:2205.07100 (cross-list from cs.CL) [pdf, other]
Title: Multiformer: A Head-Configurable Transformer-Based Model for Direct Speech Translation
Comments: NAACL-SRW 2022
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[21]  arXiv:2205.06963 (cross-list from cs.CL) [pdf, other]
Title: Improved Consistency Training for Semi-Supervised Sequence-to-Sequence ASR via Speech Chain Reconstruction and Self-Transcribing
Comments: Submitted to INTERSPEECH 2022
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Mon, 16 May 2022 (showing first 4 of 5 entries)

[22]  arXiv:2205.06473 [pdf, ps, other]
Title: Joint Acoustic Echo Cancellation and Blind Source Extraction based on Independent Vector Extraction
Comments: Submitted to International Workshop on Acoustic Signal Enhancement (IWAENC 2022)
Subjects: Audio and Speech Processing (eess.AS)
[23]  arXiv:2205.06445 [pdf, other]
Title: Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition
Comments: arXiv admin note: text overlap with arXiv:2202.10290
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI)
[24]  arXiv:2205.06808 (cross-list from eess.SP) [pdf]
Title: High-Frequency Tunable Resistorless Memcapacitor Emulator and Application
Comments: 40 Pages, 25 figures, 6 Tables. arXiv admin note: substantial text overlap with arXiv:2205.06221
Subjects: Signal Processing (eess.SP); Audio and Speech Processing (eess.AS)
[25]  arXiv:2205.06799 (cross-list from cs.SD) [pdf, other]
Title: The ACM Multimedia 2022 Computational Paralinguistics Challenge: Vocalisations, Stuttering, Activity, & Mosquitoes
Comments: 5 pages, part of the ACM Multimedia 2022 Grand Challenge "The ACM Multimedia 2022 Computational Paralinguistics Challenge (ComParE 2022)"
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[ total of 48 entries: 1-25 | 26-48 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, new, 2205, contact, help  (Access key information)