We gratefully acknowledge support from
the Simons Foundation and member institutions.

Audio and Speech Processing

Authors and titles for recent submissions

[ total of 46 entries: 1-25 | 26-46 ]
[ showing 25 entries per page: fewer | more | all ]

Mon, 20 May 2024

[1]  arXiv:2405.10786 [pdf, other]
Title: Distinctive and Natural Speaker Anonymization via Singular Value Transformation-assisted Matrix
Comments: Accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing
Subjects: Audio and Speech Processing (eess.AS)
[2]  arXiv:2405.10510 (cross-list from eess.SP) [pdf, other]
Title: Implementation of the Feedforward Multichannel Virtual Sensing Active Noise Control (MVANC) by Using MATLAB
Authors: Boxiang Wang
Subjects: Signal Processing (eess.SP); Audio and Speech Processing (eess.AS)

Fri, 17 May 2024

[3]  arXiv:2405.10084 [pdf, other]
Title: Revisiting Deep Audio-Text Retrieval Through the Lens of Transportation
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[4]  arXiv:2405.10022 [pdf, other]
Title: Monaural speech enhancement on drone via Adapter based transfer learning
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[5]  arXiv:2405.10018 [pdf, other]
Title: Data-Efficient Low-Complexity Acoustic Scene Classification in the DCASE 2024 Challenge
Comments: Task Description Page: this https URL
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[6]  arXiv:2405.09940 [pdf, other]
Title: Robust Singing Voice Transcription Serves Synthesis
Comments: ACL 2024
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[7]  arXiv:2405.09768 [pdf, other]
Title: Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model
Comments: 11 pages, 4 figures. Language Resources and Evaluation Conference (LREC) 2024. demo: this https URL
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[8]  arXiv:2405.10272 (cross-list from cs.CV) [pdf, other]
Title: Faces that Speak: Jointly Synthesising Talking Face and Speech from Text
Comments: CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[9]  arXiv:2405.10211 (cross-list from cs.SD) [pdf, ps, other]
Title: Building a Luganda Text-to-Speech Model From Crowdsourced Data
Comments: Presented at the AfricaNLP workshop at ICLR 2024
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[10]  arXiv:2405.10102 (cross-list from cs.NE) [pdf, other]
Title: A novel Reservoir Architecture for Periodic Time Series Prediction
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[11]  arXiv:2405.10025 (cross-list from cs.CL) [pdf, other]
Title: Listen Again and Choose the Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models
Comments: 14 pages, Accepted by ACL 2024
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[12]  arXiv:2405.09901 (cross-list from cs.SD) [pdf, other]
Title: Whole-Song Hierarchical Generation of Symbolic Music Using Cascaded Diffusion Models
Comments: Proceedings of the International Conference on Learning Representations (ICLR 2024)
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[13]  arXiv:2405.09814 (cross-list from cs.GR) [pdf, other]
Title: Semantic Gesticulator: Semantics-Aware Co-Speech Gesture Synthesis
Comments: SIGGRAPH 2024 (Journal Track); Project page: this https URL
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[14]  arXiv:2405.09589 (cross-list from cs.LG) [pdf, other]
Title: Unveiling Hallucination in Text, Image, Video, and Audio Foundation Models: A Comprehensive Review
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[15]  arXiv:2405.09570 (cross-list from eess.SP) [pdf, other]
Title: FunnelNet: An End-to-End Deep Learning Framework to Monitor Digital Heart Murmur in Real-Time
Comments: 8-page main paper and 4-page supplementary material
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Thu, 16 May 2024

[16]  arXiv:2405.09142 [pdf, other]
Title: Speaker Embeddings With Weakly Supervised Voice Activity Detection For Efficient Speaker Diarization
Comments: Proceedings of Odyssey 2024: The Speaker and Language Recognition Workshop
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[17]  arXiv:2405.09470 (cross-list from cs.SD) [pdf, other]
Title: Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style Transfer
Comments: Accepted to SecTL (AsiaCCS Workshop) 2024
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[18]  arXiv:2405.09266 (cross-list from cs.CV) [pdf, other]
Title: Dance Any Beat: Blending Beats with Visuals in Dance Video Generation
Comments: 11 pages, 6 figures, demo page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[19]  arXiv:2405.09241 (cross-list from cs.SD) [pdf, other]
Title: SMUG-Explain: A Framework for Symbolic Music Graph Explanations
Comments: In Proceedings of the Sound and Music Computing Conference 2024 (SMC2024), Porto, Portugal
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[20]  arXiv:2405.09224 (cross-list from cs.SD) [pdf, other]
Title: Perception-Inspired Graph Convolution for Music Understanding Tasks
Comments: Accepted at the 33rd International Joint Conference on Artificial Intelligence (IJCAI-24)
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[21]  arXiv:2405.09171 (cross-list from cs.SD) [pdf, other]
Title: Hierarchical Emotion Prediction and Control in Text-to-Speech Synthesis
Comments: This is accepted to IEEE ICASSP 2024
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[22]  arXiv:2405.09062 (cross-list from cs.SD) [pdf, other]
Title: Naturalistic Music Decoding from EEG Data via Latent Diffusion Models
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[23]  arXiv:2405.08838 (cross-list from cs.SD) [pdf, other]
Title: PolyGlotFake: A Novel Multilingual and Multimodal DeepFake Dataset
Comments: 13 page, 4 figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)

Wed, 15 May 2024 (showing first 2 of 10 entries)

[24]  arXiv:2405.08742 [pdf, ps, other]
Title: A tunable binaural audio telepresence system capable of balancing immersive and enhanced modes
Comments: 5 pages, 4 figures
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[25]  arXiv:2405.08417 [pdf, other]
Title: Simple and Efficient Quantization Techniques for Neural Speech Coding
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[ total of 46 entries: 1-25 | 26-46 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, new, 2405, contact, help  (Access key information)