archive

Every paper Pith has read. Search by title, abstract, or pith.

623 papers in eess.AS · page 6

cs.SD 2026-04-03 reviewed

Dual-branch graphs disentangle features for emotion recognition
Disentangled Dual-Branch Graph Learning for Conversational Emotion Recognition

Chengling Guo +5
cs.SD 2026-04-02 reviewed

FastTurn detects turns faster by mixing early semantics with sound
FastTurn: Unifying Acoustic and Streaming Semantic Cues for Low-Latency and Robust Turn Detection

Chengyou Wang +10
cs.SD 2026-04-02 reviewed

RAVN is a navigation system for robots that uses audio signals to estimate how reliable…
Reliability-Aware Geometric Fusion for Robust Audio-Visual Navigation

Teng Liu +1
cs.SD 2026-04-02 reviewed

Spatial descriptors cut steps in audio-visual navigation
Spatial-Aware Conditioned Fusion for Audio-Visual Navigation

Shaohang Wu +1
cs.SD 2026-04-02 reviewed

Spatial fusion lifts audio-visual navigation on unheard sounds
Audio Spatially-Guided Fusion for Audio-Visual Navigation

Xinyu Zhou +1
eess.AS 2026-04-02 reviewed

PhiNet matches black-box speaker verification with phonetic explanations
PhiNet: Speaker Verification with Phonetic Interpretability

Yi Ma +3
eess.AS 2026-04-02 reviewed

Speech depression detector generalizes across languages and matches EEG markers
Validating Computational Markers of Depressive Behavior: Cross-Linguistic Speech-Based Depression Detection with Neurophysiological Validation

Fuxiang Tao +6
eess.AS 2026-04-01 reviewed

Diffusion U-Net matches vocal separation baselines
Diff-VS: Efficient Audio-Aware Diffusion U-Net for Vocals Separation

Yun-Ning (Amy) Hung +3
cs.CL 2026-04-01 reviewed

Zero-shot TTS reaches 600 languages with direct text-to-acoustic mapping
OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models

Han Zhu +9
eess.AS 2026-03-31 reviewed

Small models master Arabic speech through compressed distillation
HARNESS: Lightweight Distilled Arabic Speech Foundation Models

Vrunda N. Sukhadia +1
eess.AS 2026-03-31 reviewed

Asymmetric decoder refines speech separation with TF correlations
Asymmetric Encoder-Decoder Based on Time-Frequency Correlation for Speech Separation

Ui-Hyeop Shin +1
cs.SD 2026-03-31 reviewed

0.28 F1 jump in Arabic mispronunciation detection
IQRA 2026: Interspeech Challenge on Automatic Pronunciation Assessment for Modern Standard Arabic (MSA)

Yassine El Kheir +7
eess.AS 2026-03-30 reviewed

Hierarchical model predicts human ratings of AI-dubbed video at PCC > 0.75
Can Hierarchical Cross-Modal Fusion Predict Human Perception of AI Dubbed Content?

Ashwini Dasare +3
cs.CL 2026-03-30 reviewed

KoALa-Bench tests LALMs on Korean speech understanding and faithfulness
KoALa-Bench: Evaluating Large Audio Language Models on Korean Speech Understanding and Faithfulness

Jinyoung Kim +6
eess.AS 2026-03-26 reviewed

Fairness model quantifies each demographic's contribution to SER bias
Explainable Speech Emotion Recognition: Weighted Attribute Fairness to Model Demographic Contributions to Social Bias

Tomisin Ogunnubi +2
eess.AS 2026-03-25 reviewed

Diffusion model changes song lyrics while preserving melody
YingMusic-Singer-Plus: Controllable Singing Voice Synthesis with Flexible Lyric Manipulation and Annotation-free Melody Guidance

Chunbo Hao +8
eess.AS 2026-03-24 reviewed

Lightning V2 achieves 4x lower TTS cost on Tenstorrent vs L40S
Rewriting TTS Inference Economics: Lightning V2 on Tenstorrent Achieves 4x Lower Cost Than NVIDIA L40S

Ranjith M. S. +2
eess.AS 2026-03-24 reviewed

Continuous models needed to cut uncertainty in emotion AI
Modelling Emotions is an Elusive Pursuit in Affective Computing

Anders Rolighed Larsen +4
cs.CL 2026-03-23 reviewed

TiCo cuts spoken response duration error by 2.7 times
TiCo: Time-Controllable Spoken Dialogue Model

Kai-Wei Chang +4
cs.SD 2026-03-20 reviewed

Hierarchical labels turn text into a wide-band control channel for long speech synthesis
Borderless Long Speech Synthesis

Xingchen Song +14
eess.AS 2026-03-18 reviewed

Dialogue models reason internally while listening
The Silent Thought: Modeling Internal Cognition in Full-Duplex Spoken Dialogue Models via Latent Reasoning

Donghang Wu +6
eess.AS 2026-03-18 reviewed

Dialogue models gain silent thinking via recursive latent updates
The Silent Thought: Modeling Internal Cognition in Full-Duplex Spoken Dialogue Models via Latent Reasoning

Donghang Wu +6
cs.CL 2026-03-17 reviewed

Neural models score TTS quality better than human raters
Neural networks for Text-to-Speech evaluation

Ilya Trofimenko +5
eess.AS 2026-03-16 reviewed

AI model mixes live music with zero latency
AILive Mixer: A Deep Learning based Zero Latency Automatic Music Mixer for Live Music Performances

Devansh Zurale +3
eess.AS 2026-03-16 reviewed

Pseudo-labels and contrastive pretraining reach 0.761 SRCC on unseen dysarthric speech
Something from Nothing: Data Augmentation for Robust Severity Level Estimation of Dysarthric Speech

Jaesung Bae +4
eess.AS 2026-03-16 reviewed

Tight integration beats shallow fusion for LLMs in speech recognition
LLMs and Speech: Integration vs. Combination

Robin Schmitt +4
eess.AS 2026-03-16 reviewed

Reward model judges spoken dialogues on prosody and natural phrasing
SDiaReward: Modeling and Benchmarking Spoken Dialogue Rewards with Modality and Colloquialness

Jingyu Lu +9
eess.AS 2026-03-11 reviewed

Harf-Speech matches expert Arabic speech scores at 0.79 correlation
Harf-Speech: A Clinically Aligned Framework for Arabic Phoneme-Level Speech Assessment

Asif Azad +8
eess.AS 2026-03-10 reviewed

Non-iterative dMWF matches centralized Wiener filter
Distributed Multichannel Wiener Filtering for Wireless Acoustic Sensor Networks

Paul Didier +6
eess.AS 2026-03-10 reviewed

Text-to-audio model generates room impulse responses
Adapting a Text-to-Audio Model for Room Impulse Response Generation

Kirak Kim +1
cs.SD 2026-03-05 reviewed

Spoof detectors guide hierarchical decoding for cleaner speech synthesis
Hierarchical Decoding for Discrete Speech Synthesis with Multi-Resolution Spoof Detection

Junchuan Zhao +2
eess.AS 2026-03-03 reviewed

SSL speech models put pitch and gender in first principal dimension
Interpreting Speaker Characteristics in the Dimensions of Self-Supervised Speech Features

Kyle Janse van Rensburg +2
cs.SD 2026-03-02 reviewed

Spoof detectors vary sharply across 66 languages
When Spoof Detectors Travel: Evaluation Across 66 Languages in the Low-Resource Language Spoofing Corpus

Kirill Borodin +4
cs.SD 2026-03-02 reviewed

Cross-ASR disagreement flags risky medical transcript segments
From Black Box to Glass Box: Cross-Model ASR Disagreement to Prioto Review in Ambient AI Scribe Documentation

Abdolamir Karbalaie +2
eess.AS 2026-03-02 reviewed

Large speech models outperform others at detecting audio deepfakes
A SUPERB-Style Benchmark of Self-Supervised Speech Models for Audio Deepfake Detection

Hashim Ali +3
eess.SP 2026-02-27 reviewed

LoRa enables secure 1.5 km peer-to-peer voice links
Modeling and Link Budget Feasibility Analysis of Secure LoRa-Based Peer-to-Peer Communication for Short-Range Tactical Networks

Ayush Kumar Agrawal +3
eess.AS 2026-02-24 reviewed

LMU and entropy fusion lift infant cry classification across domains
LMU-Based Sequential Learning and Posterior Ensemble Fusion for Cross-Domain Infant Cry Classification

Niloofar Jazaeri +3
cs.SD 2026-02-24 reviewed

MIDI plus structure labels keep long songs coherent
MIDI-Informed Singing Accompaniment Generation in a Compositional Song Pipeline

Fang-Duo Tsai +6
eess.AS 2026-02-21 reviewed

Speech models perform phonological vector arithmetic
[b]=[d]-[t]+[p]: Self-supervised Speech Models Discover Phonological Vector Arithmetic

Kwanghee Choi +4
eess.AS 2026-02-18 reviewed

Single mic estimates sound speed during playback
Online Single-Channel Audio-Based Sound Speed Estimation for Robust Multi-Channel Audio Control

Andreas Jonas Fuglsig +2
eess.AS 2026-02-18 reviewed

Acoustic maps from beamforming detect voice replays
Multi-Channel Replay Speech Detection using Acoustic Maps

Michael Neri +1
eess.AS 2026-02-18 reviewed

Machine identity knowledge conceals ASD weaknesses
How Much Does Machine Identity Matter in Anomalous Sound Detection at Test Time?

Kevin Wilkinghoff +2
cs.CL 2026-02-16 reviewed

LLM passes cut diarization errors in French clinical speech
Iterative LLM-based improvement for French Clinical Interview Transcription and Speaker Diarization

Ambre Marie (LaTIM) +3
eess.AS 2026-02-16 reviewed

Noise augmentation tops data strategies for Parkinson's speech enhancement
Data Augmentation for Pathological Speech Enhancement

Mingchi Hou +2
cs.SD 2026-02-14 reviewed

Branch analysis uncovers flawed specialization in anti-spoofing
Interpreting Multi-Branch Anti-Spoofing Architectures: Correlating Internal Strategy with Empirical Performance

Ivan Viakhirev +3
eess.AS 2026-02-11 reviewed

Speech enhancement pruning masks predict VAD and pitch at 93 percent accuracy
From Diet to Free Lunch: Estimating Auxiliary Signal Properties using Dynamic Pruning Masks in Speech Enhancement Networks

Riccardo Miccini +4
cs.CL 2026-02-09 reviewed

EEG-to-text model stops hallucinating by grounding every token in brain signals
Escaping the BLEU Trap: A Signal-Grounded Framework with Decoupled Semantic Guidance for EEG-to-Text Decoding

Yuchen Wang +4
eess.AS 2026-02-03 reviewed

Wavelet scattering features boost speech deepfake detection
WST-X Series: Wavelet Scattering Transform for Interpretable Speech Deepfake Detection

Xi Xuan +4
eess.AS 2026-02-02 reviewed

Transformer reconstructs room impulse responses from sparse mics
RIR-Former: Coordinate-Guided Transformer for Continuous Reconstruction of Room Impulse Responses

Shaoheng Xu +4
eess.SP 2026-02-01 reviewed

Audio foundation models integrate core tasks into signal processing classes
Generative AI in Signal Processing Education: An Audio Foundation Model Based Approach

Muhammad Salman Khan +3