pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

623 papers in eess.AS · page 3

  1. eess.AS 2026-05-02 reviewed
    Unified framework organizes 400 studies on speech AI bias

    Toward Fair Speech Technologies: A Comprehensive Survey of Bias and Fairness in Speech AI

    Yi-Cheng Lin +5

  2. cs.AI 2026-05-01 reviewed
    Clinician-reviewed AI creates personalized stuttering therapy plans

    Virtual Speech Therapist: A Clinician-in-the-Loop AI Speech Therapy Agent for Personalized and Supervised Therapy

    Shakeel Sheikh +6

  3. cs.SD 2026-05-01 reviewed
    Adversarial head erases script leakage from speaker embeddings

    LASE: Language-Adversarial Speaker Encoding for Indic Cross-Script Identity Preservation

    Venkata Pushpak Teja Menta

  4. cs.SD 2026-05-01 reviewed
    Filtered generative RIRs halve speaker distance errors

    Towards Improving Speaker Distance Estimation through Generative Impulse Response Augmentation

    Anton Ratnarajah +3

  5. cs.CL 2026-05-01 reviewed
    Encoding probe reconstructs LM internals from syntax and speaker cues

    Beyond Decodability: Reconstructing Language Model Representations with an Encoding Probe

    Gaofei Shen +4

  6. eess.AS 2026-05-01 reviewed
    Transformer generates ANC filters directly without decomposition

    Transformer-based End-to-End Control Filter Generation for Active Noise Control

    Ziyi Yang +5

  7. cs.SD 2026-05-01 reviewed
    Pretrained video-to-audio model estimates room acoustics

    MMAudioReverbs: Video-Guided Acoustic Modeling for Dereverberation and Room Impulse Response Estimation

    Akira Takahashi +3

  8. cs.SD 2026-05-01 reviewed
    One-step sampling matches multi-step audio quality at 8.5x speed

    Fast Text-to-Audio Generation with One-Step Sampling via Energy-Scoring and Auxiliary Contextual Representation Distillation

    Kuan-Po Huang +10

  9. cs.SD 2026-04-30 reviewed
    New pretraining creates encoder that spots voice deepfakes more reliably

    Alethia: A Foundational Encoder for Voice Deepfakes

    Yi Zhu +3

  10. eess.AS 2026-04-30 reviewed
    Pretrained embeddings classify elephant calls nearly as well as supervised models

    From Birdsong to Rumbles: Classifying Elephant Calls with Out-of-Species Embeddings

    Christiaan M. Geldenhuys +1

  11. cs.LG 2026-04-30 reviewed
    Multi-band fusion lifts bioacoustics accuracy over baseband

    Beyond the Baseband: Adaptive Multi-Band Encoding for Full-Spectrum Bioacoustics Classification

    Eklavya Sarkar +8

  12. eess.AS 2026-04-30 reviewed
    New benchmark makes AVSR considerably harder than LRS3

    LRS-VoxMM: A benchmark for in-the-wild audio-visual speech recognition

    Doyeop Kwak +3

  13. eess.AS 2026-04-30 reviewed
    Visual conditioning cuts WER by 16 points in overlapped conversations

    BUT System Description for CHiME-9 MCoRec Challenge

    Dominik Klement +4

  14. eess.AS 2026-04-30 reviewed
    Articulation knowledge improves speech extraction in movie audio

    A Knowledge-Driven Approach to Target Speech Extraction in the Presence of Background Sound Effects for Cinematic Audio Source Separation (CASS)

    Chun-wei Ho +3

  15. cs.SD 2026-04-30 reviewed
    Model predicts severe stuttering events from prior three seconds

    Predicting Upcoming Stuttering Events from Three-Second Audio: Stratified Evaluation Reveals Severity-Selective Precursors, and the Model Deploys Fully On-Device

    Nazar Kozak

  16. eess.AS 2026-04-29 reviewed
    Embedding emotion metrics fail for speech synthesis evaluation

    The False Resonance: A Critical Examination of Emotion Embedding Similarity for Speech Generation Evaluation

    Yun-Shao Tsai +7

  17. eess.AS 2026-04-29 reviewed
    Language branch in discriminator keeps speaker traits intact across languages

    Dual-LoRA: Parameter-Efficient Adversarial Disentanglement for Cross-Lingual Speaker Verification

    Qituan Shangguan +7

  18. eess.AS 2026-04-29 reviewed
    Semantic priors aid speech coding only below 6 kbps

    SPG-Codec: Exploring the Role and Boundaries of Semantic Priors in Ultra-Low-Bitrate Neural Speech Coding

    Mingyu Zhao +3

  19. eess.AS 2026-04-29 reviewed
    Diffusion model adds tunable prosody control to voice anonymization

    DiffAnon: Diffusion-based Prosody Control for Voice Anonymization

    Ismail Rasim Ulgen +4

  20. cs.SD 2026-04-29 reviewed
    Recurrence patterns in speech detect depression with AUC 0.689

    Recurrence-Based Nonlinear Vocal Dynamics as Digital Biomarkers for Depression Detection from Conversational Speech

    Himadri S Samanta

  21. eess.AS 2026-04-28 reviewed
    Synthetic data improves cross-lingual science voice cloning

    One Voice, Many Tongues: Cross-Lingual Voice Cloning for Scientific Speech

    Amanuel Gizachew Abebe +1

  22. eess.AS 2026-04-28 reviewed
    Cosine SupCon with delayed queue hits 8.29% ITW EER for deepfake audio

    Similarity Choice and Negative Scaling in Supervised Contrastive Learning for Deepfake Audio Detection

    Jaskirat Sudan +3

  23. eess.AS 2026-04-28 reviewed
    Human feedback restores naturalness to audio reasoning models

    Step-Audio-R1.5 Technical Report

    Yuxin Zhang +18

  24. eess.AS 2026-04-28 reviewed
    Fusion of noisy and enhanced speech aids speaker ID in noise

    UNet-Based Fusion and Exponential Moving Average Adaptation for Noise-Robust Speaker Recognition

    Chong-Xin Gan +6

  25. eess.AS 2026-04-28 reviewed
    Semantic uncertainty beats token-level for audio LLMs

    Walking Through Uncertainty: An Empirical Study of Uncertainty Estimation for Audio-Aware Large Language Models

    Chun-Yi Kuan +2

  26. cs.SD 2026-04-28 reviewed
    Frozen base TTS matches commercial Indic output via prompt recovery

    Praxy Voice: Voice-Prompt Recovery + BUPS for Commercial-Class Indic TTS from a Frozen Non-Indic Base at Zero Commercial-Training-Data Cost

    Venkata Pushpak Teja Menta

  27. eess.AS 2026-04-28 reviewed
    Azimuth-first strips cut DOA search cost for planar mics

    ASAP: An Azimuth-Priority Strip-Based Search Approach to Planar Microphone Array DOA Estimation in 3D

    Ming Huang +8

  28. cs.SD 2026-04-28 reviewed
    Speaker-adaptive network lifts conversation emotion accuracy

    ML-SAN: Multi-Level Speaker-Adaptive Network for Emotion Recognition in Conversations

    Kexue Wang +2

  29. eess.AS 2026-04-28 reviewed
    Rhythmic features distinguish Nyishi from Adi at 85 percent accuracy

    Cross-Linguistic Rhythmic and Spectral Feature-Based Analysis of Nyishi and Adi: Two Under-Resourced Languages of Arunachal Pradesh

    Deepshikha Gogoi +2

  30. cs.CL 2026-04-28 reviewed
    Aegyo speech raises first formant to mimic child vocal tracts

    Korean aegyo speech shows systematic F1 increase to signal childlike qualities

    Ji-eun Kim +1

  31. cs.SD 2026-04-27 reviewed
    Models keep 60-72% of audio scores with no sound input

    All That Glitters Is Not Audio: Rethinking Text Priors and Audio Reliance in Audio-Language Evaluation

    Leonardo Haw-Yang Foo +4

  32. cs.SD 2026-04-27 reviewed
    Segment-level prediction cuts oversegmentation in chord recognition

    An event-based sequence modeling approach to recognizing non-triad chords with oversegmentation minimization

    Leekyung Kim +1

  33. cs.SD 2026-04-27 reviewed
    One-step drifting field matches noisy speech to clean distributions

    Speech Enhancement Based on Drifting Models

    Liang Xu +4

  34. cs.SD 2026-04-27 reviewed
    One-step model drifts noisy speech straight to clean distribution

    Speech Enhancement Based on Drifting Models

    Liang Xu +4

  35. cs.SD 2026-04-27 reviewed
    The paper proposes DriftSE, a generative speech enhancement framework that uses a…

    Speech Enhancement Based on Drifting Models

    Liang Xu +4

  36. cs.CV 2026-04-26 reviewed
    Shared high-level tokens plus separate decoders improve talking audio-video

    Talker-T2AV: Joint Talking Audio-Video Generation with Autoregressive Diffusion Modeling

    Zhen Ye +10

  37. cs.SD 2026-04-25 reviewed
    Piano transcription pipeline separates neoclassical from historical composers by Zipf fit

    An audio-to-analysis pipeline with certified transcription for information-theoretic profiling of the piano repertoire

    Fred Jalbert-Desforges

  38. eess.AS 2026-04-25 reviewed
    Speaker recognition latent spaces form hierarchical semantic clusters

    Explainable AI in Speaker Recognition -- Making Latent Representations Understandable

    Yanze Xu +2

  39. eess.AS 2026-04-25 reviewed
    Neural predictor selects filters ahead for moving noise

    Predictive Directional Selective Fixed-Filter Active Noise Control for Moving Sources via a Convolutional Recurrent Neural Network

    Boxiang Wang +5

  40. eess.AS 2026-04-24 reviewed
    Fine-tuned Whisper keeps speaker IDs consistent across audio chunks

    Prompting Whisper for Joint Speech Transcription and Diarization

    Mariia Zamyrova +1

  41. eess.AS 2026-04-24 reviewed
    Diarization priors let LLMs handle multi-speaker ASR via dialogue queries

    DM-ASR: Diarization-aware Multi-speaker ASR with Large Language Models

    Li Li +5

  42. cs.SD 2026-04-24 reviewed
    Beat-guided transformer quantizes MIDI rhythms to scores

    Transformer-Based Rhythm Quantization of Performance MIDI Using Beat Annotations

    Maximilian Wachter +2

  43. eess.AS 2026-04-24 reviewed
    Hybrid DNN-search method improves audio effect estimation

    Audio Effect Estimation with DNN-Based Prediction and Search Algorithm

    Youichi Okita +1

  44. eess.AS 2026-04-24 reviewed
    Global timeline and tool reasoning sustain timing accuracy in long audio

    Listening with Time: Precise Temporal Awareness for Long-Form Audio Understanding

    Mingchen Shao +8

  45. cs.CL 2026-04-24 reviewed
    TTS-PRISM scores Mandarin speech on 12 perceptual axes

    TTS-PRISM: A Perceptual Reasoning and Interpretable Speech Model for Fine-Grained Diagnosis

    Xi Wang +10

  46. eess.AS 2026-04-24 reviewed
    One text-driven model generates speech

    UniSonate: A Unified Model for Speech, Music, and Sound Effect Generation with Text Instructions

    Chunyu Qiang +13

  47. eess.AS 2026-04-24 reviewed
    New fusion cuts Apollo speech errors by 1.1 percent

    Advancing automatic speech recognition using feature fusion with self-supervised learning features: A case study on Fearless Steps Apollo corpus

    Szu-Jui Chen +1

  48. eess.AS 2026-04-24 reviewed
    New speech model spots pronunciation errors without reference texts

    Beyond Acoustic Sparsity and Linguistic Bias: A Prompt-Free Paradigm for Mispronunciation Detection and Diagnosis

    Haopeng Geng +5

  49. cs.SD 2026-04-23 reviewed
    Cello portamento steepness declines as performance tempo increases

    Spectrographic Portamento Gradient Analysis: A Quantitative Method for Historical Cello Recordings with Application to Beethoven's Piano and Cello Sonatas, 1930--2012

    Ignasi Sole

  50. eess.AS 2026-04-23 reviewed
    Optical sensors record full key motion in historical instruments

    PHOTON: Non-Invasive Optical Tracking of Key-Lever Motion in Historical Keyboard Instruments

    Noah Jaffe +1