pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

623 papers in eess.AS · page 9

  1. eess.AS 2025-06-11 reviewed
    Text alone identifies speakers at 2% error in privacy tests

    You Are What You Say: Exploiting Linguistic Content for VoicePrivacy Attacks

    \"Unal Ege Gaznepoglu +6

  2. cs.SD 2025-06-08 reviewed
    AI bass model produces polyphony inside single harmonic tones

    Insights on Harmonic Tones from a Generative Music Experiment

    Emmanuel Deruty +1

  3. cs.CL 2025-06-05 reviewed
    Benchmark finds SpeechLLMs weak on speech nuances beyond text

    MMSU: A Massive Multi-task Spoken Language Understanding and Reasoning Benchmark

    Dingdong Wang +6

  4. eess.AS 2025-06-02 reviewed
    Ensemble method adds confidence intervals to speech boundaries

    Gradient boundaries through confidence intervals for forced alignment estimates using model ensembles

    Matthew C. Kelley

  5. cs.CL 2025-06-01 reviewed
    LLM pipeline creates sarcastic speech dataset with 73.63% F1

    Leveraging Large Language Models for Sarcastic Speech Annotation in Sarcasm Detection

    Zhu Li +4

  6. eess.AS 2025-05-31 reviewed
    Two-stage transfer learning predicts P.835 scores from 100 labels

    Quality Assessment of Noisy and Enhanced Speech with Limited Data: UWB-NTIS System for VoiceMOS 2024

    Marie Kune\v{s}ov\'a +2

  7. cs.SD 2025-05-30 reviewed
    Neural codec reaches 2.87 PESQ at 2.67 kbps

    SwitchCodec: A High-Fidelity Nerual Audio Codec With Sparse Quantization

    Jin Wang +4

  8. cs.CL 2025-05-28 reviewed
    Speech LMs miss meaning shifts from sentence stress

    StressTest: Can YOUR Speech LM Handle the Stress?

    Iddo Yosha +2

  9. cs.SD 2025-05-28 reviewed
    Fixed decoder raises audio steganography quality by over 10 dB

    FGAS: Fixed Decoder Network-Based Audio Steganography with Adversarial Perturbation Generation

    Jialin Yan +6

  10. cs.SD 2025-05-27 reviewed
    Tailored designs succeed on music AVQA where general models struggle

    Music Audio-Visual Question Answering Requires Specialized Multimodal Designs

    Wenhao You +11

  11. cs.SD 2025-05-23 reviewed
    CosyVoice 3 scales speech data to one million hours for stronger zero-shot results

    CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training

    Zhihao Du +21

  12. eess.AS 2025-05-21 reviewed
    Taxonomy sorts LALM benchmarks into four objective-based dimensions

    Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey

    Chih-Kai Yang +2

  13. cs.SD 2025-05-20 reviewed
    FMSD-TTS creates U-Tsang Amdo Kham speech from few clips

    FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for \"U-Tsang, Amdo and Kham Speech Dataset Generation

    Yutong Liu +9

  14. eess.AS 2025-05-20 reviewed
    Framework edits speech amid overlapping background noise

    SeamlessEdit: Background Noise Aware Zero-Shot Speech Editing with in-Context Enhancement

    Kuan-Yu Chen +3

  15. cs.SD 2025-05-19 reviewed
    One model translates music scores

    Unified Cross-modal Translation of Score Images, Symbolic Music, and Performance Audio

    Jongmin Jung +7

  16. cs.SD 2025-05-15 reviewed
    Random linear map turns audio embeddings into dynamic visuals

    LAV: Audio-Driven Dynamic Visual Generation with Neural Compression and StyleGAN2

    Jongmin Jung +1

  17. math.CO 2025-05-13 reviewed
    Tonnetz realized as twelve points and twelve lines in the plane

    Configurations, Tessellations and Tone Networks

    Jeffrey R. Boland +1

  18. cs.SD 2025-05-13 reviewed
    Drum grooves edited zero-shot by plain LLMs via spatial text grid

    Not that Groove: Zero-Shot Symbolic Music Editing

    Li Zhang

  19. eess.AS 2025-05-03 reviewed
    Device info at inference lifts scene classification baseline

    Low-Complexity Acoustic Scene Classification with Device Information in the DCASE 2025 Challenge

    Florian Schmid +5

  20. eess.AS 2025-05-01 reviewed
    Anonymized speech preserves clinical ratings but lowers perceived quality

    Perceptual implications of automatic anonymization in pathological speech

    Soroosh Tayebi Arasteh +13

  21. eess.AS 2025-04-25 reviewed
  22. eess.AS 2025-04-24 reviewed
    One speaker creates multiple sound zones with multi-frequency ultrasound

    Generating Localized Audible Zones Using a Single-Channel Parametric Loudspeaker

    Tao Zhuang +4

  23. cs.SD 2025-04-17 reviewed
    Multi-task attention CNN hits 97% accuracy on scarce underwater sounds

    A Multi-task Learning Balanced Attention Convolutional Neural Network Model for Few-shot Underwater Acoustic Target Recognition

    Wei Huang +5

  24. eess.AS 2025-04-16 reviewed
    Augmentation lifts deepfake detection accuracy under codecs and loss

    Benchmarking Audio Deepfake Detection Robustness in Real-world Communication Scenarios

    Haohan Shi +5

  25. eess.AS 2025-04-11 reviewed
    Reverberation features lift distance accuracy in 3D sound detection

    Reverberation-based Features for Sound Event Localization and Detection with Distance Estimation

    Davide Berghi +1

  26. cs.CL 2025-04-11 reviewed
    Survey groups spoken language models by architecture

    On The Landscape of Spoken Language Models: A Comprehensive Survey

    Siddhant Arora +9

  27. eess.AS 2025-04-01 reviewed
    Cyclic sound patterns engineered to trigger ASMR

    Is ASMR Engineerable? A Signal Processing and User Experience Study

    Zexin Fang +4

  28. cs.CL 2025-03-30 reviewed
    Hybrid model lifts end-turn accuracy at low compute cost

    Speculative End-Turn Detector for Efficient Speech Chatbot Assistant

    Hyunjong Ok +2

  29. cs.AR 2025-03-27 reviewed
    71.2 μW accelerator runs real-time speech recognition

    A 71.2-$\mu$W Speech Recognition Accelerator with Recurrent Spiking Neural Network

    Chih-Chyau Yang +1

  30. cs.CL 2025-03-26 reviewed
    Qwen2.5-Omni matches text performance on speech tasks

    Qwen2.5-Omni Technical Report

    Jin Xu +13

  31. cs.MM 2025-03-13 reviewed
    One model turns text, video or audio prompts into sound

    AudioX: A Unified Framework for Anything-to-Audio Generation

    Zeyue Tian +8

  32. cs.CL 2025-03-07 reviewed
    Benchmark shows industrial S2S models outperform academic ones on tone and emotion

    S2S-Arena: Evaluating Paralinguistic Instruction Following in Speech-to-Speech Models

    Feng Jiang +8

  33. cs.SD 2025-03-03 reviewed
    Single-stream codec splits speech for LLM voice control

    Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens

    Xinsheng Wang +24

  34. cs.CR 2025-02-27 reviewed
    Simple audio tweaks fool all tested deepfake detectors

    DeePen: Penetration Testing for Audio Deepfake Detection

    Nicolas M\"uller +7

  35. cs.GR 2025-02-25 reviewed
    Text prompts steer 3D dance generation to match music genres

    GCDance: Genre-Controlled Music-Driven 3D Full Body Dance Generation

    Xinran Liu +5

  36. cs.CL 2025-02-17 reviewed
    130B model unifies speech and text for real-time interaction

    Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

    Ailin Huang +144

  37. cs.SD 2025-02-17 reviewed
    Paired throat-acoustic dataset trains models to restore lost speech frequencies

    Throat and acoustic paired speech dataset for deep learning-based speech enhancement

    Yunsik Kim +2

  38. eess.AS 2025-02-09 reviewed
    Silent EMG signals map directly to phonemic text

    Non-invasive electromyographic speech neuroprosthesis: a geometric perspective

    Harshavardhana T. Gowda +1

  39. cs.SD 2025-02-07 reviewed
    Four-axis guidelines automate audio quality scoring

    Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound

    Andros Tjandra +12

  40. cs.SD 2025-02-06 reviewed
    Cross-attention audio watermark survives generative edits

    XAttnMark: Learning Robust Audio Watermarking with Cross-Attention

    Yixin Liu +4

  41. eess.AS 2025-02-04 reviewed
    Full recordings classify dementia without trimming speech

    Dementia classification from spontaneous speech using wrapper-based feature selection

    Marko Niemel\"a +3

  42. eess.AS 2025-01-30 reviewed
    Multilayer unit gives one reflector fine phase and amplitude control

    ML-ARIS: Multilayer Underwater Acoustic Reconfigurable Intelligent Surface with High-Resolution Reflection Control

    Lina Pu +2

  43. cs.SD 2025-01-13 reviewed
    Classical music networks show centuries of simplification

    Decoding Musical Evolution Through Network Science

    Niccolo' Di Marco +4

  44. cs.CV 2025-01-03 reviewed
    Staged training adds speech understanding to vision models

    VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

    Chaoyou Fu +15

  45. cs.LG 2024-12-20 reviewed
    Finetuning lets text-to-audio models respect event relations

    RiTTA: Modeling Event Relations in Text-to-Audio Generation

    Yuhang He +4

  46. cs.SD 2024-12-18 reviewed
    ResNet18 leads detection of machine-generated music

    Explainable Detection of Machine Generated Music and Early Systematic Evaluation

    Yupei Li +4

  47. cs.LG 2024-12-17 reviewed
    MoInCL reduces forgetting when MLLMs switch modalities and task types

    Modality-Inconsistent Continual Learning of Multimodal Large Language Models

    Weiguo Pian +5

  48. cs.SD 2024-12-13 reviewed
    CosyVoice 2 hits human parity in streaming speech synthesis

    CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models

    Zhihao Du +18

  49. cs.SD 2024-12-09 reviewed
    Diffusion refiner boosts any music source separator

    Improving Music Source Separation with Diffusion and Consistency Refinement

    Tornike Karchkhadze +3

  50. cs.CL 2024-12-03 reviewed
    End-to-end voice model hits SOTA on spoken QA and modeling

    GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbot

    Aohan Zeng +7