pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

623 papers in eess.AS · page 7

  1. cs.CL 2026-01-31 reviewed
    Single-layer tokenizer separates speaker identity from speech phonetics

    Kanade: A Simple Disentangled Tokenizer for Spoken Language Modeling

    Zhijie Huang +3

  2. eess.AS 2026-01-30 reviewed
    CALM halves biased errors in two-speaker ASR

    CALM: Joint Contextual Acoustic-Linguistic Modeling for Personalization of Multi-Speaker ASR

    Muhammad Shakeel +4

  3. cs.CL 2026-01-29 reviewed
  4. cs.HC 2026-01-29 reviewed
    Brief spatial sounds convey direction in XR

    Evaluating Spatialized Auditory Cues for Rapid Attention Capture in XR

    Yoonsang Kim +2

  5. eess.AS 2026-01-28 reviewed
    Learnable projector cuts prompt sensitivity in LLM speech recognition

    Reducing Prompt Sensitivity in LLM-based Speech Recognition Through Learnable Projection

    Sergio Burdisso +9

  6. cs.SD 2026-01-28 reviewed
    Longest utterances cut speech pre-training data in half

    A Study of Data Selection Strategies for Pre-training Self-Supervised Speech Models

    Ryan Whetten +3

  7. eess.AS 2026-01-27 reviewed
    Detector catches deepfake greetings in 0.5 seconds

    Audio Deepfake Detection at the First Greeting: "Hi!"

    Haohan Shi +4

  8. eess.AS 2026-01-26 reviewed
    Noise rejection lifts heart-sound CAD detection by 4 points

    Noise-Robust Contrastive Learning with an MFCC-Conformer For Coronary Artery Disease Detection

    Milan Marocchi +2

  9. eess.AS 2026-01-26 reviewed
    One model covers speech, expressive, and singing voice conversion

    OneVoice: One Model, Triple Scenarios-Towards Unified Zero-shot Voice Conversion

    Zhichao Wang +5

  10. cs.SD 2026-01-22 reviewed
    SWIM scales real-time ASR to 20 clients via buffer merging

    Sink or SWIM: Tackling Real-Time ASR at Scale

    Federico Bruzzone +3

  11. eess.AS 2026-01-22 reviewed
    Hybrid algorithm gives fast noise control with low error and high stability

    A Stabilized Hybrid Active Noise Control Algorithm of GFANC and FxNLMS with Online Clustering

    Zhengding Luo +5

  12. cs.SD 2026-01-22 reviewed
    Qwen3-TTS reaches SOTA multilingual TTS with 3-second cloning

    Qwen3-TTS Technical Report

    Hangrui Hu +15

  13. eess.AS 2026-01-21 reviewed
    Fast-ULCNet halves model size and cuts latency 34% for speech enhancement

    Fast-ULCNet: A fast and ultra low complexity network for single-channel speech enhancement

    Nicol\'as Arrieta Larraza +1

  14. eess.AS 2026-01-21 reviewed
    Mask polarization restores decisive outputs for speech enhancement at test time

    Test-Time Adaptation For Speech Enhancement Via Mask Polarization

    Tobias Raichle +2

  15. eess.AS 2026-01-21 reviewed
    Curvature-guided merge cuts forgetting in ASR continual learning

    Inverse-Hessian Regularization for Continual Learning in ASR

    Steven Vander Eeckt +1

  16. eess.AS 2026-01-18 reviewed
    Audio QA models miss when questions have no answer

    AQUA-Bench: Beyond Finding Answers to Knowing When There Are None in Audio Question Answering

    Chun-Yi Kuan +1

  17. cs.SD 2026-01-14 reviewed
    Self-reflection step raises speech recognition accuracy by 12.1% WER

    Speech-Hands: A Self-Reflection Voice Agentic Approach to Speech Recognition and Audio Reasoning with Omni Perception

    Zhen Wan +17

  18. eess.AS 2026-01-09 reviewed
    Hybrid model improves quality and consistency in speaker extraction

    Discriminative-Generative Target Speaker Extraction with Decoder-Only Language Models

    Bang Zeng +3

  19. cs.CL 2026-01-09 reviewed
    RL alignment closes speech-text reasoning gap in LLMs

    Closing the Modality Reasoning Gap for Speech Large Language Models

    Chaoren Wang +6

  20. eess.AS 2026-01-08 reviewed
    Low-frequency loss weighting solves delay learning in effect models

    Gradient-based Optimisation of Modulation Effects

    Alistair Carson +2

  21. eess.AS 2026-01-07 reviewed
    Encoder tracks speakers and timing together in one pass

    TellWhisper: Tell Whisper Who Speaks When

    Yifan Hu +4

  22. eess.AS 2026-01-07 reviewed
    ReStyle-TTS enables continuous relative style control in zero-shot TTS

    ReStyle-TTS: Relative and Continuous Style Control for Zero-Shot Speech Synthesis

    Haitao Li +5

  23. cs.LG 2026-01-07 reviewed
    Smart Embedding halves parameters in polyphonic music models

    Mathematical Foundations of Polyphonic Music Generation via Structural Inductive Bias

    Joonwon Seo

  24. eess.AS 2026-01-06 reviewed
    Fine-grained captions train multi-granular speech-text model

    Towards Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training

    Yifan Yang +10

  25. cs.SD 2026-01-06 reviewed
    Semantic neighbors fix prompt tuning for audio models

    Generalizable Prompt Tuning for Audio-Language Models via Semantic Expansion

    Jaehyuk Jang +3

  26. eess.AS 2026-01-06 reviewed
    Hybrid Mamba-Attention backbone matches SOTA on audio deepfake detection

    XLSR-MamBo: Scaling the Hybrid Mamba-Attention Backbone for Audio Deepfake Detection

    Kwok-Ho Ng +3

  27. cs.SD 2026-01-05 reviewed
    Per-layer compensation lowers word errors in low-bit ASR models

    Diagnostic-Driven Layer-Wise Compensation for Post-Training Quantization of Encoder-Decoder ASR Models

    Xinyu Wang +8

  28. cs.SD 2025-12-23 reviewed
    Anti-aliasing modules improve neural music and singing audio quality

    Aliasing-Free Neural Audio Synthesis

    Yicheng Gu +5

  29. eess.AS 2025-12-18 reviewed
    Noise modeling gives accurate FDN filters from noisy impulse responses

    Learning Filters in Feedback Delay Networks from Noisy Room Impulse Responses

    Gloria Dal Santo +3

  30. eess.AS 2025-12-15 reviewed
    Reserve retraining blocks poisoning in federated audio models

    REVERB-FL: Server-Side Adversarial and Reserve-Enhanced Federated Learning for Robust Audio Classification

    Sathwika Peechara +1

  31. eess.AS 2025-11-29 reviewed
    Decoders adapt to degradation while encoders stay invariant

    Where Does Speech Enhancement Adapt? Probing Study Under Controlled Degradation

    Yair Amar +2

  32. eess.AS 2025-11-26 reviewed
    Orchestral dataset supplies isolated stems for source separation

    The Spheres Dataset: Multitrack Orchestral Recordings for Music Source Separation and Information Retrieval

    Jaime Garcia-Martinez +7

  33. eess.AS 2025-11-25 reviewed
    Music language model fixes vocal pitch without references

    BERT-APC: A Reference-free Framework for Automatic Pitch Correction via Musical Context Inference

    Sungjae Kim +3

  34. cs.CL 2025-11-13 reviewed
    Benchmark shows speech models falter over repeated conversation turns

    MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models

    He Zhang +7

  35. eess.AS 2025-11-11 reviewed
    Dynamic int8 cuts Whisper-small size 57% while improving accuracy

    Quantizing Whisper-small: How design choices affect ASR performance

    Arthur S\"ohler +2

  36. eess.AS 2025-10-29 reviewed
    Older adults match or beat young listeners with simulated hearing loss

    Disentangling peripheral hearing loss from central and cognitive effects on speech intelligibility in older adults

    Toshio Irino +2

  37. cs.SD 2025-10-28 reviewed
    Sound localization maps tool actions onto 3D surgical scenes

    Sound Source Localization for Spatial Mapping of Surgical Actions in Dynamic Scenes

    Jonas Hein +5

  38. cs.SD 2025-10-28 reviewed
    EMG signals map to speech model space for direct audio synthesis

    emg2speech: Synthesizing speech from electromyography using self-supervised speech models

    Harshavardhana T. Gowda +2

  39. cs.CL 2025-10-22 reviewed
    MBR decoding beats beam search on ASR accuracy

    Re-evaluating Minimum Bayes Risk Decoding for Automatic Speech Recognition

    Yuu Jinnai

  40. eess.AS 2025-10-22 reviewed
    Replay-inclusive dataset lifts deepfake detector accuracy

    EchoFake: A Replay-Aware Dataset for Practical Speech Deepfake Detection

    Tong Zhang +2

  41. cs.LG 2025-10-21 reviewed
    RFM steering raises music note accuracy from 0.23 to 0.82

    Steering Autoregressive Music Generation with Recursive Feature Machines

    Daniel Zhao +4

  42. cs.AI 2025-10-19 reviewed
    New model handles listen, look, speak and act together

    End-to-end Listen, Look, Speak and Act

    Siyin Wang +6

  43. cs.SD 2025-10-16 reviewed
    LLM judges rate speech quality with explanations across languages

    SpeechLLM-as-Judges: Towards General and Interpretable Speech Quality Evaluation

    Hui Wang +11

  44. cs.SD 2025-10-13 reviewed
    Interleaved tokens unify speech and gesture synthesis

    Gelina: Unified Speech and Gesture Synthesis via Interleaved Token Prediction

    T\'eo Guichoux +7

  45. cs.LG 2025-10-13 reviewed
    Coefficient search in latent subspace adapts models with 63x less compute

    Efficient Test-Time Adaptation through Latent Subspace Coefficients Search

    Xinyu Luo +6

  46. cs.SD 2025-10-10 reviewed
    Fine-tuned video-to-audio model separates sounds while keeping generation ability

    MMAudioSep: Taming Video-to-Audio Generative Model Towards Video/Text-Queried Sound Separation

    Akira Takahashi +2

  47. cs.SD 2025-10-10 reviewed
    Progressive diffusion adds timing and clarity to text audio

    ControlAudio: Tackling Text-Guided, Timing-Indicated and Intelligible Audio Generation via Progressive Diffusion Modeling

    Yuxuan Jiang +5

  48. eess.AS 2025-10-09 reviewed
    Model subtraction fixes pseudo-label errors in speech AI

    Pseudo2Real: Task Arithmetic for Pseudo-Label Correction in Automatic Speech Recognition

    Yi-Cheng Lin +6

  49. eess.AS 2025-10-09 reviewed
    Tests reveal full-duplex systems confuse on overlaps and corrections

    Full-Duplex-Bench-v2: A Multi-Turn Evaluation Framework for Duplex Dialogue Systems with an Automated Examiner

    Guan-Ting Lin +6

  50. eess.AS 2025-10-08 reviewed
    VAPO stops AI from reading slides instead of listening

    VAPO: End-to-end Slide-Enhanced Speech Recognition with Omni-modal Large Language Models

    Rui Hu +4