pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

623 papers in eess.AS · page 6

  1. cs.SD 2026-04-03 reviewed
    Dual-branch graphs disentangle features for emotion recognition

    Disentangled Dual-Branch Graph Learning for Conversational Emotion Recognition

    Chengling Guo +5

  2. cs.SD 2026-04-02 reviewed
    FastTurn detects turns faster by mixing early semantics with sound

    FastTurn: Unifying Acoustic and Streaming Semantic Cues for Low-Latency and Robust Turn Detection

    Chengyou Wang +10

  3. cs.SD 2026-04-02 reviewed
    RAVN is a navigation system for robots that uses audio signals to estimate how reliable…

    Reliability-Aware Geometric Fusion for Robust Audio-Visual Navigation

    Teng Liu +1

  4. cs.SD 2026-04-02 reviewed
    Spatial descriptors cut steps in audio-visual navigation

    Spatial-Aware Conditioned Fusion for Audio-Visual Navigation

    Shaohang Wu +1

  5. cs.SD 2026-04-02 reviewed
    Spatial fusion lifts audio-visual navigation on unheard sounds

    Audio Spatially-Guided Fusion for Audio-Visual Navigation

    Xinyu Zhou +1

  6. eess.AS 2026-04-02 reviewed
    PhiNet matches black-box speaker verification with phonetic explanations

    PhiNet: Speaker Verification with Phonetic Interpretability

    Yi Ma +3

  7. eess.AS 2026-04-02 reviewed
    Speech depression detector generalizes across languages and matches EEG markers

    Validating Computational Markers of Depressive Behavior: Cross-Linguistic Speech-Based Depression Detection with Neurophysiological Validation

    Fuxiang Tao +6

  8. eess.AS 2026-04-01 reviewed
    Diffusion U-Net matches vocal separation baselines

    Diff-VS: Efficient Audio-Aware Diffusion U-Net for Vocals Separation

    Yun-Ning (Amy) Hung +3

  9. cs.CL 2026-04-01 reviewed
    Zero-shot TTS reaches 600 languages with direct text-to-acoustic mapping

    OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models

    Han Zhu +9

  10. eess.AS 2026-03-31 reviewed
    Small models master Arabic speech through compressed distillation

    HARNESS: Lightweight Distilled Arabic Speech Foundation Models

    Vrunda N. Sukhadia +1

  11. eess.AS 2026-03-31 reviewed
    Asymmetric decoder refines speech separation with TF correlations

    Asymmetric Encoder-Decoder Based on Time-Frequency Correlation for Speech Separation

    Ui-Hyeop Shin +1

  12. cs.SD 2026-03-31 reviewed
    0.28 F1 jump in Arabic mispronunciation detection

    IQRA 2026: Interspeech Challenge on Automatic Pronunciation Assessment for Modern Standard Arabic (MSA)

    Yassine El Kheir +7

  13. eess.AS 2026-03-30 reviewed
    Hierarchical model predicts human ratings of AI-dubbed video at PCC > 0.75

    Can Hierarchical Cross-Modal Fusion Predict Human Perception of AI Dubbed Content?

    Ashwini Dasare +3

  14. cs.CL 2026-03-30 reviewed
    KoALa-Bench tests LALMs on Korean speech understanding and faithfulness

    KoALa-Bench: Evaluating Large Audio Language Models on Korean Speech Understanding and Faithfulness

    Jinyoung Kim +6

  15. eess.AS 2026-03-26 reviewed
    Fairness model quantifies each demographic's contribution to SER bias

    Explainable Speech Emotion Recognition: Weighted Attribute Fairness to Model Demographic Contributions to Social Bias

    Tomisin Ogunnubi +2

  16. eess.AS 2026-03-25 reviewed
    Diffusion model changes song lyrics while preserving melody

    YingMusic-Singer-Plus: Controllable Singing Voice Synthesis with Flexible Lyric Manipulation and Annotation-free Melody Guidance

    Chunbo Hao +8

  17. eess.AS 2026-03-24 reviewed
    Lightning V2 achieves 4x lower TTS cost on Tenstorrent vs L40S

    Rewriting TTS Inference Economics: Lightning V2 on Tenstorrent Achieves 4x Lower Cost Than NVIDIA L40S

    Ranjith M. S. +2

  18. eess.AS 2026-03-24 reviewed
    Continuous models needed to cut uncertainty in emotion AI

    Modelling Emotions is an Elusive Pursuit in Affective Computing

    Anders Rolighed Larsen +4

  19. cs.CL 2026-03-23 reviewed
    TiCo cuts spoken response duration error by 2.7 times

    TiCo: Time-Controllable Spoken Dialogue Model

    Kai-Wei Chang +4

  20. cs.SD 2026-03-20 reviewed
  21. eess.AS 2026-03-18 reviewed
    Dialogue models reason internally while listening

    The Silent Thought: Modeling Internal Cognition in Full-Duplex Spoken Dialogue Models via Latent Reasoning

    Donghang Wu +6

  22. eess.AS 2026-03-18 reviewed
    Dialogue models gain silent thinking via recursive latent updates

    The Silent Thought: Modeling Internal Cognition in Full-Duplex Spoken Dialogue Models via Latent Reasoning

    Donghang Wu +6

  23. cs.CL 2026-03-17 reviewed
    Neural models score TTS quality better than human raters

    Neural networks for Text-to-Speech evaluation

    Ilya Trofimenko +5

  24. eess.AS 2026-03-16 reviewed
    AI model mixes live music with zero latency

    AILive Mixer: A Deep Learning based Zero Latency Automatic Music Mixer for Live Music Performances

    Devansh Zurale +3

  25. eess.AS 2026-03-16 reviewed
    Pseudo-labels and contrastive pretraining reach 0.761 SRCC on unseen dysarthric speech

    Something from Nothing: Data Augmentation for Robust Severity Level Estimation of Dysarthric Speech

    Jaesung Bae +4

  26. eess.AS 2026-03-16 reviewed
    Tight integration beats shallow fusion for LLMs in speech recognition

    LLMs and Speech: Integration vs. Combination

    Robin Schmitt +4

  27. eess.AS 2026-03-16 reviewed
    Reward model judges spoken dialogues on prosody and natural phrasing

    SDiaReward: Modeling and Benchmarking Spoken Dialogue Rewards with Modality and Colloquialness

    Jingyu Lu +9

  28. eess.AS 2026-03-11 reviewed
    Harf-Speech matches expert Arabic speech scores at 0.79 correlation

    Harf-Speech: A Clinically Aligned Framework for Arabic Phoneme-Level Speech Assessment

    Asif Azad +8

  29. eess.AS 2026-03-10 reviewed
    Non-iterative dMWF matches centralized Wiener filter

    Distributed Multichannel Wiener Filtering for Wireless Acoustic Sensor Networks

    Paul Didier +6

  30. eess.AS 2026-03-10 reviewed
    Text-to-audio model generates room impulse responses

    Adapting a Text-to-Audio Model for Room Impulse Response Generation

    Kirak Kim +1

  31. cs.SD 2026-03-05 reviewed
    Spoof detectors guide hierarchical decoding for cleaner speech synthesis

    Hierarchical Decoding for Discrete Speech Synthesis with Multi-Resolution Spoof Detection

    Junchuan Zhao +2

  32. eess.AS 2026-03-03 reviewed
    SSL speech models put pitch and gender in first principal dimension

    Interpreting Speaker Characteristics in the Dimensions of Self-Supervised Speech Features

    Kyle Janse van Rensburg +2

  33. cs.SD 2026-03-02 reviewed
    Spoof detectors vary sharply across 66 languages

    When Spoof Detectors Travel: Evaluation Across 66 Languages in the Low-Resource Language Spoofing Corpus

    Kirill Borodin +4

  34. cs.SD 2026-03-02 reviewed
    Cross-ASR disagreement flags risky medical transcript segments

    From Black Box to Glass Box: Cross-Model ASR Disagreement to Prioto Review in Ambient AI Scribe Documentation

    Abdolamir Karbalaie +2

  35. eess.AS 2026-03-02 reviewed
    Large speech models outperform others at detecting audio deepfakes

    A SUPERB-Style Benchmark of Self-Supervised Speech Models for Audio Deepfake Detection

    Hashim Ali +3

  36. eess.SP 2026-02-27 reviewed
    LoRa enables secure 1.5 km peer-to-peer voice links

    Modeling and Link Budget Feasibility Analysis of Secure LoRa-Based Peer-to-Peer Communication for Short-Range Tactical Networks

    Ayush Kumar Agrawal +3

  37. eess.AS 2026-02-24 reviewed
    LMU and entropy fusion lift infant cry classification across domains

    LMU-Based Sequential Learning and Posterior Ensemble Fusion for Cross-Domain Infant Cry Classification

    Niloofar Jazaeri +3

  38. cs.SD 2026-02-24 reviewed
    MIDI plus structure labels keep long songs coherent

    MIDI-Informed Singing Accompaniment Generation in a Compositional Song Pipeline

    Fang-Duo Tsai +6

  39. eess.AS 2026-02-21 reviewed
    Speech models perform phonological vector arithmetic

    [b]=[d]-[t]+[p]: Self-supervised Speech Models Discover Phonological Vector Arithmetic

    Kwanghee Choi +4

  40. eess.AS 2026-02-18 reviewed
    Single mic estimates sound speed during playback

    Online Single-Channel Audio-Based Sound Speed Estimation for Robust Multi-Channel Audio Control

    Andreas Jonas Fuglsig +2

  41. eess.AS 2026-02-18 reviewed
    Acoustic maps from beamforming detect voice replays

    Multi-Channel Replay Speech Detection using Acoustic Maps

    Michael Neri +1

  42. eess.AS 2026-02-18 reviewed
    Machine identity knowledge conceals ASD weaknesses

    How Much Does Machine Identity Matter in Anomalous Sound Detection at Test Time?

    Kevin Wilkinghoff +2

  43. cs.CL 2026-02-16 reviewed
    LLM passes cut diarization errors in French clinical speech

    Iterative LLM-based improvement for French Clinical Interview Transcription and Speaker Diarization

    Ambre Marie (LaTIM) +3

  44. eess.AS 2026-02-16 reviewed
    Noise augmentation tops data strategies for Parkinson's speech enhancement

    Data Augmentation for Pathological Speech Enhancement

    Mingchi Hou +2

  45. cs.SD 2026-02-14 reviewed
    Branch analysis uncovers flawed specialization in anti-spoofing

    Interpreting Multi-Branch Anti-Spoofing Architectures: Correlating Internal Strategy with Empirical Performance

    Ivan Viakhirev +3

  46. eess.AS 2026-02-11 reviewed
    Speech enhancement pruning masks predict VAD and pitch at 93 percent accuracy

    From Diet to Free Lunch: Estimating Auxiliary Signal Properties using Dynamic Pruning Masks in Speech Enhancement Networks

    Riccardo Miccini +4

  47. cs.CL 2026-02-09 reviewed
    EEG-to-text model stops hallucinating by grounding every token in brain signals

    Escaping the BLEU Trap: A Signal-Grounded Framework with Decoupled Semantic Guidance for EEG-to-Text Decoding

    Yuchen Wang +4

  48. eess.AS 2026-02-03 reviewed
    Wavelet scattering features boost speech deepfake detection

    WST-X Series: Wavelet Scattering Transform for Interpretable Speech Deepfake Detection

    Xi Xuan +4

  49. eess.AS 2026-02-02 reviewed
    Transformer reconstructs room impulse responses from sparse mics

    RIR-Former: Coordinate-Guided Transformer for Continuous Reconstruction of Room Impulse Responses

    Shaoheng Xu +4

  50. eess.SP 2026-02-01 reviewed
    Audio foundation models integrate core tasks into signal processing classes

    Generative AI in Signal Processing Education: An Audio Foundation Model Based Approach

    Muhammad Salman Khan +3