pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

623 papers in eess.AS · page 1

  1. eess.AS 2026-05-22 reviewed
    EMA and dual scoring produce TTS hardest to detect in WildSpoof

    Natural Yet Challenging to Detect: Robust In-the-Wild TTS through EMA and Dual-Scoring Prompt Selection -- Submission for WildSpoof 2026 TTS Track

    Renhe Sun +4

  2. eess.AS 2026-05-22 reviewed
    Frame-aligned fusion of two encoders cuts error in hearing-aid intelligibility prediction

    Frame-Aligned Fusion of Canary and WavLM for Non-Intrusive Intelligibility Prediction of Hearing-Aid-Processed Speech

    Kazushi Nakazawa

  3. eess.AS 2026-05-22 reviewed
    Acoustic fusion raises intelligibility correlation to 0.806

    Word-Level Modeling with Alignment-Aware Acoustic Fusion for Text-Assisted Intelligibility Prediction in Listeners with Hearing Loss

    Kazushi Nakazawa

  4. eess.AS 2026-05-22 reviewed
    Two-stage training matches full phoneme scoring with few labels

    A study on weakly-supervised training approaches for phoneme-level pronunciation scoring

    Jazm\'in Vidal +1

  5. eess.AS 2026-05-22 reviewed
    One model tops benchmarks in speech recognition

    StepAudio 2.5 Technical Report

    Bin Lin +100

  6. eess.AS 2026-05-22 reviewed
    Integrated gradients localize sound events at 0.39 IoU

    Evaluating the Temporal Detection Capability of Integrated Gradients Applied on Sound Classifier

    Martynas Dumpis +1

  7. eess.AS 2026-05-22 reviewed
    One model judges speech across many tasks with reasoning

    UniSRM: A Unified Speech Reward Model for Reasoning-Based Fine-grained Assessment

    Yuanyuan Wang +6

  8. cs.LG 2026-05-21 reviewed
    Plug-in losses approximate EDL objectives with decaying error

    Plug-in Losses for Evidential Deep Learning: A Simplified Framework for Uncertainty Estimation that Includes the Softmax Classifier

    Berk Hayta +3

  9. cs.AI 2026-05-21 reviewed
    LLM analysis outperforms acoustics for political pathos

    Beyond Acoustic Emotion Recognition: Multimodal Pathos Analysis in Political Speech Using LLM-Based and Acoustic Emotion Models

    Juergen Dietrich

  10. cs.SD 2026-05-21 reviewed
    Audio denoiser infers scene to keep relevant sounds

    Automatic Contextual Audio Denoising

    Diep Luong +3

  11. eess.AS 2026-05-21 reviewed
    Dual-stage phoneme search raises user keyword spotting to 97.85% AUC

    Effective User-defined Keyword Spotting with Dual-stage Matching, Multi-modal Enrollment, and Continual Adaptation

    Zhiqi Ai +5

  12. cs.SD 2026-05-21 reviewed
    Augmentations reduce TTS word error rate from 1.44 to 1.38

    RobustSpeechFlow: Learning Robust Text-to-Speech Trajectories via Augmentation-based Contrastive Flow Matching

    Jinhyeok Yang +5

  13. eess.AS 2026-05-21 reviewed
    Neighbor consistency cuts sound zone variation by over 50 percent

    Neighbor-Consistent Neural Filters for Robust Personal Sound Zones Under Localization Uncertainty

    Hao Jiang +1

  14. eess.AS 2026-05-20 reviewed
    Embeddings cluster speech degradations for better detection

    Speech Quality Embeddings for Improved Detection and Classification of Degradations in Speech Signals

    Michael Kuhlmann +2

  15. eess.AS 2026-05-20 reviewed
    Neural beamformer outperforms LCMV by learning constrained weights

    Linearly Constrained Deep Beamformer for Multi-Speaker Scenarios

    Ilai Zaidel +3

  16. eess.AS 2026-05-20 reviewed
    Survey unifies audio reasoning approaches in foundation models

    A Survey of Audio Reasoning in Multimodal Foundation Models

    Zhihan Guo +10

  17. eess.AS 2026-05-20 reviewed
    Neural net predicts room acoustics from geometry and materials

    From Numbers to Perception, Energy Decay Curves Prediction

    Imran Muhammad +1

  18. cs.SD 2026-05-20 reviewed
    Tropical bird detector trained on 50k-clip dataset hits 99.57% accuracy

    SEABAD: A Tropical Bird Activity Detection Dataset for Passive Acoustic Monitoring

    Muhammad Mun'im Ahmad Zabidi +2

  19. eess.AS 2026-05-20 reviewed
    Public speech data powers TTS models matching closed systems

    Raon-OpenTTS: Open Models and Data for Robust Text-to-Speech

    Semin Kim +10

  20. eess.AS 2026-05-20 reviewed
    Full-duplex model speaks and acts on the same 160 ms clock

    DuplexSLA: A Full-Duplex Spoken Language Model with Synchronized Speech, Language, and Action

    Haoyang Zhang +15

  21. eess.AS 2026-05-19 reviewed
    Planning step and targeted retrieval stabilize accuracy on longer audio

    PlanRAG-Audio: Planning and Retrieval Augmented Generation for Long-form Audio Understanding

    Masao Someki +19

  22. eess.AS 2026-05-19 reviewed
    Causal estimator improves short-window sound field reconstruction

    Causal Spatio-Temporal Sound Field Reconstruction

    David Sundstr\"om +3

  23. cs.SD 2026-05-19 reviewed
    Scaled simulations cut speech recognition errors over 30 percent

    Mega-ASR: Towards In-the-wild^2 Speech Recognition via Scaling up Real-world Acoustic Simulation

    Zhifei Xie +6

  24. eess.AS 2026-05-19 reviewed
    Cross-talk reduction on close-talk mics yields SOTA far-field separation

    Cross-Talk Speech Reduction, by Separation, for Separation

    Zhong-Qiu Wang +1

  25. eess.AS 2026-05-19 reviewed
    Block-diagonal matrices cut computation for distributed audio separation

    Fast Multichannel NMF with Block-Diagonal Spatial Covariance Matrices for Efficient Blind Source Separation Using Distributed Microphone Arrays

    Hirotaka Nishikori +4

  26. eess.AS 2026-05-18 reviewed
    Geometry conditioning adapts speaker extraction to any mic array

    Flexible Multi-Channel Target Speaker Extraction Using Geometry-Conditioned Spatially Selective Non-linear Filters

    Jiatong Li +2

  27. eess.AS 2026-05-18 reviewed
    Streaming CTC spotting enables real-time keyword biasing in ASR

    Contextual Biasing for Streaming ASR via CTC-based Word Spotting

    Kai-Chen Tsai +3

  28. eess.AS 2026-05-18 reviewed
    Streaming CTC spotting reduces WER and lifts keyword F-score in live ASR

    Contextual Biasing for Streaming ASR via CTC-based Word Spotting

    Kai-Chen Tsai +3

  29. eess.AS 2026-05-18 reviewed
    TNKP cuts misadjustment in fractional subband filters for ANC

    Fractional-Order Subband p-Norm Adaptive Filter via Transformation Nearest Kronecker Product Decomposition for Active Noise Control

    Jianhong Ye +3

  30. cs.MM 2026-05-18 reviewed
    Two-phase sampling matches contradictory audio prompts to video

    CounterFlow: A Two-Phase Inference-Time Sampling for Counterfactual Video Foley Generation

    Gyubin Lee +2

  31. eess.AS 2026-05-18 reviewed
    156-hour Urdu corpus supplies 12 paralinguistic labels

    UrduSpeech: A 156-Hour Urdu Speech Corpus with 12-Dimension Paralinguistic Annotations

    Attia Nafees ul Haq +4

  32. cs.CL 2026-05-18 reviewed
    Distillation cuts error rates for Nigerian speech recognition by 29%

    Sometin Beta Pass Notin (SBPN): Improving Multilingual ASR for Nigerian Languages via Knowledge Distillation

    Sewade Ogun

  33. eess.AS 2026-05-17 reviewed
    Per-class unreliability scalars boost audio tagging on weak labels

    Robust Audio Tagging under Class-wise Supervision Unreliability

    Yuanbo Hou +6

  34. eess.AS 2026-05-17 reviewed
    Projection heads align onomatopoeic images with sounds

    Audio-Image Cross-Modal Retrieval with Onomatopoeic Images

    Keisuke Imoto +2

  35. cs.CL 2026-05-17 reviewed
    ASR errors degrade Korean QA the same relative amount across LLMs

    Analyzing Error Propagation in Korean Spoken QA with ASR-LLM Cascades

    Donghyuk Jung +1

  36. eess.AS 2026-05-17 reviewed
    402M model tops music accompaniment benchmarks

    S2Accompanist: A Semantic-Aware and Structure-Guided Diffusion Model for Music Accompaniment Generation

    Huakang Chen +9

  37. eess.AS 2026-05-17 reviewed
    A single control filter optimized over multiple measured paths narrows performance…

    Robust Soft-Constrained Spatially Selective Active Noise Control for Hearables Under Secondary Path Variations

    Tong Xiao +3

  38. eess.AS 2026-05-17 reviewed
    Audio models confuse target speech with multilingual distractors

    Can Large Audio Language Models Ignore Multilingual Distractors? An Evaluation of Their Selective Auditory Attention Capabilities

    Heejoon Koo

  39. cs.SD 2026-05-16 reviewed
    Target-KL regularization sets exact bitrates for audio VAEs

    Taming Audio VAEs via Target-KL Regularization

    Prem Seetharaman +1

  40. eess.AS 2026-05-16 reviewed
    Alignment step fixes semantic drift in continuous speech synthesis

    SemaVoice: Semantic-Aware Continuous Autoregressive Speech Synthesis

    Huimeng Wang +9

  41. eess.AS 2026-05-15 reviewed
    Survey traces audio super-resolution shift to generative models

    A Survey of Advancing Audio Super-Resolution and Bandwidth Extension from Discriminative to Generative Models

    Ningyuan Yang +6

  42. eess.AS 2026-05-15 reviewed
    MedASR cuts medical dictation errors by 58%

    MedASR: An Open-Source Model for High-Accuracy Medical Dictation

    Ke Wu +4

  43. eess.AS 2026-05-15 reviewed
    Flow model restores speech in real time at 120 times lower compute

    Real-time Speech Restoration using Data Prediction Mean Flows

    Sebastian Braun

  44. eess.AS 2026-05-15 reviewed
    Augmentation and LLM fixes halve errors in oral cancer speech recognition

    Improving Automatic Speech Recognition for Speakers Treated for Oral Cancer using Data Augmentation and LLM Error Correction

    Hidde Folkertsma +6

  45. eess.AS 2026-05-14 reviewed
    Synthetic data nears real baselines for multi-talker speech tasks

    Mind the Gap: Impact of Synthetic Conversational Data on Multi-Talker ASR and Speaker Diarization

    Alexander Polok +5

  46. cs.SD 2026-05-14 reviewed
    SpeakerLLM turns speaker verification into natural-language reasoning

    SpeakerLLM: A Speaker-Specialized Audio-LLM for Speaker Understanding and Verification Reasoning

    KiHyun Nam +4

  47. eess.AS 2026-05-13 reviewed
    Benchmark standardizes early Parkinson's speech detection

    A Benchmark for Early-stage Parkinson's Disease Detection from Speech

    Terry Yi Zhong +5

    2 Piths
  48. eess.AS 2026-05-13 reviewed
    Framework filters FSD50K to single-source audio clips

    FSD50K-Solo: Automated Curation of Single-Source Sound Events

    Ningyuan Yang +6

  49. eess.AS 2026-05-12 reviewed
    SMC dataset exposes tempo bias in state-of-the-art beat tracking models

    The SMC Blind Spot: A Failure Mode Analysis of State-of-the-Art Beat Tracking

    Jaehoon Ahn +2

  50. cs.SD 2026-05-12 reviewed
    STRUM turns raw audio into playable rhythm charts at 0.84 F1 for drums

    STRUM: A Spectral Transcription and Rhythm Understanding Model for End-to-End Generation of Playable Rhythm-Game Charts

    Joshua Opria