pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

623 papers in eess.AS · page 4

  1. cs.LG 2026-04-23 reviewed
    One dilated CNN plus resampling matches AR denoising for periodic signals

    Dilated CNNs for Periodic Signal Processing: A Low-Complexity Approach

    Eli Gildish +2

  2. eess.AS 2026-04-23 reviewed
    Tutorial splits top open-source speaker diarization into seven stages

    DiariZen Explained: A Tutorial for the Open Source State-of-the-Art Speaker Diarization Pipeline

    Nikhil Raghav

  3. eess.AS 2026-04-23 reviewed
    New benchmark tests AI on handling speech overlaps and interruptions

    Full-Duplex Interaction in Spoken Dialogue Systems: A Comprehensive Study from the ICASSP 2026 HumDial Challenge

    Chengyou Wang +8

  4. cs.SD 2026-04-22 reviewed
    Benchmark reveals AI music models perceive notation but miss theory

    ONOTE: Benchmarking Omnimodal Notation Processing for Expert-level Music Intelligence

    Menghe Ma +7

  5. eess.AS 2026-04-22 reviewed
    MERT metrics better match human ratings for music source separation

    Embedding-Based Intrusive Evaluation Metrics for Musical Source Separation Using MERT Representations

    Paul A. Bereuter +1

  6. q-bio.NC 2026-04-22 reviewed
    Decorrelation reduces brain-to-text WER from 26.3% to 21.6%

    MoDAl: Self-Supervised Neural Modality Discovery via Decorrelation for Speech Neuroprosthesis

    Yuanhao Chen +1

  7. math.CO 2026-04-21 reviewed
    Diatonic seventh chords form a Fano configuration

    Tonnetz Theory, Classical Harmony, and the Combinatorial Geometry of Abstract Musical Resources

    Jeffrey R. Boland +1

  8. eess.AS 2026-04-21 reviewed
    Hyperbolic fusion spots Indic codec deepfakes

    Indic-CodecFake meets SATYAM: Towards Detecting Neural Audio Codec Synthesized Speech Deepfakes in Indic Languages

    Girish +3

  9. eess.AS 2026-04-21 reviewed
    Cascaded temporal stages yield natural TTS with fewer parameters

    Text-To-Speech with Chain-of-Details: modeling temporal dynamics in speech generation

    Jianbo Ma +1

  10. eess.AS 2026-04-21 reviewed
    Voice range tracks TTS capability while CPPs separate natural from robotic speech

    Voice Mapping of Text-to-Speech Systems: A Metric-Based Approach for Voice Quality Assessment

    Huanchen Cai +1

  11. cs.AI 2026-04-21 reviewed
    One LLM replaces VAD, ASR and interruption detection for live speech

    UAF: A Unified Audio Front-end LLM for Full-Duplex Speech Interaction

    Yadong Li +3

  12. cs.CL 2026-04-21 reviewed
    Unscripted phone calls form new benchmark for Indian speech recognition

    Voice of India: A Large-Scale Benchmark for Real-World Speech Recognition in India

    Kaushal Bhogale +13

  13. eess.AS 2026-04-21 reviewed
    Consistency regularization unifies offline and streaming RNNT ASR

    Reducing the Offline-Streaming Gap for Unified ASR Transducer with Consistency Regularization

    Andrei Andrusenko +5

  14. eess.AS 2026-04-21 reviewed
    Photoelectric servo cuts mic self-noise to 11 dBA

    Self-Noise Reduction for Capacitive Sensors via Photoelectric DC Servo: Application to Condenser Microphones

    Hirotaka Obo +3

  15. eess.SP 2026-04-20 reviewed
    Covariance reconstruction enables practical hybrid SMI

    Hybrid SMI Realization via Matrix Completion and Riemannian Manifold Optimization on Narrowband Sub-Array Based Architectures

    Tarun Suman Cousik +5

  16. cs.SD 2026-04-20 reviewed
    Rule-based alignment cuts rule violations in lyric-to-melody generation

    Aligning Language Models for Lyric-to-Melody Generation with Rule-Based Musical Constraints

    Hao Meng +4

  17. eess.AS 2026-04-20 reviewed
    Kernel plasticity lifts Hebbian audio learning to 76.3% accuracy

    Incremental learning for audio classification with Hebbian Deep Neural Networks

    Riccardo Casciotti +3

  18. eess.AS 2026-04-20 reviewed
    2.3B LLM-based ASR outperforms larger models

    NIM4-ASR: Towards Efficient, Robust, and Customizable Real-Time LLM-Based ASR

    Yuan Xie +11

  19. eess.AS 2026-04-20 reviewed
    Benchmark shows TTS systems lag on complex instructions

    MINT-Bench: A Comprehensive Multilingual Benchmark for Instruction-Following Text-to-Speech

    Huakang Chen +14

  20. eess.AS 2026-04-19 reviewed
    Non-verbal cues supervise speech emotion recognition across languages

    Prosody as Supervision: Bridging the Non-Verbal--Verbal for Multilingual Speech Emotion Recognition

    Girish +2

  21. eess.AS 2026-04-19 reviewed
    Hyperbolic model detects codec deepfakes in diseased voices

    HCFD: A Benchmark for Audio Deepfake Detection in Healthcare

    Mohd Mujtaba Akhtar +2

  22. cs.CL 2026-04-19 reviewed
    Translation system keeps laughter and tears in speech

    MoVE: Translating Laughter and Tears via Mixture of Vocalization Experts in Speech-to-Speech Translation

    Szu-Chi Chen +4

  23. eess.AS 2026-04-19 reviewed
    Audio models show bigger bias from gender than from accents

    VIBE: Voice-Induced open-ended Bias Evaluation for Large Audio-Language Models via Real-World Speech

    Yi-Cheng Lin +3

  24. eess.AS 2026-04-18 reviewed
    Anonymized speech trains AI models nearly as well as raw recordings

    Anonymization, Not Elimination: Utility-Preserved Speech Anonymization

    Yunchong Xiao +6

  25. eess.AS 2026-04-18 reviewed
    Room acoustics recast as state-space model of boundary integral equation

    A state-space representation of the boundary integral equation for room acoustic modelling

    Randall Ali +4

  26. cs.SD 2026-04-17 reviewed
    Pairwise audio comparisons lift deepfake detection up to 2x on wild data

    ICLAD: In-Context Learning with Comparison-Guidance for Audio Deepfake Detection

    Benjamin Chou +2

  27. eess.AS 2026-04-17 reviewed
    Neural-only detection risks falling short against future fake speech

    Neural Encoding Detection is Not All You Need for Synthetic Speech Detection

    Luca Cuccovillo +3

  28. cs.SD 2026-04-17 reviewed
    Compact network spots AI music via codec artifacts

    ArtifactNet: Detecting AI-Generated Music via Forensic Residual Physics

    Heewon Oh

  29. cs.SD 2026-04-17 reviewed
    Benchmark reveals speech AI limits on complex tool calls

    Audio2Tool: Speak, Call, Act -- A Dataset for Benchmarking Speech Tool Use

    Ramit Pahwa +6

  30. cs.CL 2026-04-17 reviewed
    Qwen3.5-Omni claims SOTA on 215 audio-visual tasks

    Qwen3.5-Omni Technical Report

    Qwen Team

  31. cs.SD 2026-04-16 reviewed
    Manual protocol measures bar-level tempo in historical chamber music

    A Manual Bar-by-Bar Tempo Measurement Protocol for Polyphonic Chamber Music Recordings: Design, Validation, and Application to Beethoven's Piano and Cello Sonatas

    Ignasi Sole

  32. cs.SD 2026-04-16 reviewed
    LSTM with MFCC features reaches 99% accuracy on speech emotions

    Speech Emotion Recognition Using MFCC Features and LSTM-Based Deep Learning Model

    Adelekun Oluwademilade +9

  33. cs.SD 2026-04-16 reviewed
    RL fine-tuning cuts speech WER to 3.2% at 200bps

    ClariCodec: Optimising Neural Speech Codes for 200bps Communication using Reinforcement Learning

    Junyi Wang +6

  34. cs.SD 2026-04-16 reviewed
    Acoustic features cut recall from 66% to 47% in volatility forecasts

    The Acoustic Camouflage Phenomenon: Re-evaluating Speech Features for Financial Risk Prediction

    Dhruvin Dungrani +1

  35. eess.AS 2026-04-16 reviewed
    SongBench rates AI songs on seven expert dimensions

    SongBench: A Fine-Grained Multi-Aspect Benchmark for Song Quality Assessment

    Dapeng Wu +7

  36. eess.AS 2026-04-16 reviewed
    UniPASE tops challenge by restoring clean phonetic content first

    UniPASE: A Generative Model for Universal Speech Enhancement with High Fidelity and Low Hallucinations

    Xiaobin Rong +4

  37. cs.SD 2026-04-16 reviewed
    SLMs spot norms in text yet ignore them when spoken

    VoxSafeBench: Not Just What Is Said, but Who, How, and Where

    Yuxiang Wang +11

  38. eess.AS 2026-04-15 reviewed
    Speaker overlap boosts speech depression detection accuracy

    Who is Speaking or Who is Depressed? A Controlled Study of Speaker Leakage in Speech-Based Depression Detection

    Hsiang-Chen Yeh +5

  39. eess.AS 2026-04-15 reviewed
    Speaker ID errors drop 93% with enhanced open-set tuning

    SpeakerRPL v2: Robust Open-set Speaker Identification through Enhanced Few-shot Foundation Tuning and Model Fusion

    Zhiyong Chen +4

  40. eess.AS 2026-04-15 reviewed
    LLM meta-evaluator beats speech quality predictors with few labels

    Few-Shot and Pseudo-Label Guided Speech Quality Evaluation with Large Language Models

    Ryandhimas E. Zezario +5

  41. eess.AS 2026-04-15 reviewed
    RBF SVM detects deepfake audio at 93 percent accuracy

    Classical Machine Learning Baselines for Deepfake Audio Detection on the Fake-or-Real Dataset

    Faheem Ahmad +2

  42. eess.AS 2026-04-14 reviewed
    Adapted speech LLMs predict word timestamps and lift ASR accuracy

    In-Sync: Adaptation of Speech Aware Large Language Models for ASR with Word Level Timestamp Predictions

    Xulin Fan +5

  43. eess.AS 2026-04-14 reviewed
    Prosody pretraining halves error on emotional speech deepfakes

    ProSDD: Learning Prosodic Representations for Speech Deepfake Detection against Expressive and Emotional Attacks

    Aurosweta Mahapatra +4

  44. cs.CL 2026-04-14 reviewed
    Async retrieval gives full-duplex speech models non-duplex factuality

    MoshiRAG: Asynchronous Knowledge Retrieval for Full-Duplex Speech Language Models

    Chung-Ming Chien +5

  45. cs.CL 2026-04-14 reviewed
    Async retrieval matches non-duplex factuality in full-duplex speech models

    MoshiRAG: Asynchronous Knowledge Retrieval for Full-Duplex Speech Language Models

    Chung-Ming Chien +5

  46. eess.AS 2026-04-14 reviewed
    Waveguides make real-time physical sound modeling practical

    Four Decades of Digital Waveguides

    Pablo Tablas de Paula +3

  47. eess.AS 2026-04-14 reviewed
    Audio model gains step-by-step reasoning from 545k curated samples

    Audio-Cogito: Towards Deep Audio Reasoning in Large Audio Language Models

    Longhao Li +7

  48. eess.AS 2026-04-14 reviewed
    One-step codec latent conversion enables streaming zero-shot VC

    X-VC: Zero-shot Streaming Voice Conversion in Codec Space

    Qixi Zheng +9

  49. eess.AS 2026-04-14 reviewed
    Circular mic array lets UAVs detect victims by sound

    Sky-Ear: An Unmanned Aerial Vehicle-Enabled Victim Sound Detection and Localization System

    Yi Hong +4

  50. eess.AS 2026-04-14 reviewed
    Delayed secondary speaker corrects both timbre and space

    Room compensation for loudspeaker reproduction using a supporting source

    James Brooks-Park +3