pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

623 papers in eess.AS · page 5

  1. eess.AS 2026-04-14 reviewed
    Speech synthesis hits 49 ms first-byte latency via block-wise decoding

    An Ultra-Low Latency, End-to-End Streaming Speech Synthesis Architecture via Block-Wise Generation and Depth-Wise Codec Decoding

    Tianhui Su +4

  2. eess.AS 2026-04-14 reviewed
    Common word cues cut rare bias word errors by 16% in speech LLMs

    Contextual Biasing for ASR in Speech LLM with Common Word Cues and Bias Word Position Prediction

    Sashi Novitasari +3

  3. eess.AS 2026-04-14 reviewed
    VoxEffects dataset supplies exact effect chains for speech audio

    VoxEffects: A Speech-Oriented Audio Effects Dataset and Benchmark

    Zhe Zhang +2

  4. eess.AS 2026-04-14 reviewed
    Mamba predicts clean tokens to boost CI speech in noise

    TokenSE: a Mamba-based discrete token speech enhancement framework for cochlear implants

    Hsin-Tien Chiang +1

  5. eess.AS 2026-04-13 reviewed
    Pre-quantization fusion adds video to audio tokens without reconstruction loss

    Why Your Tokenizer Fails in Information Fusion: A Timing-Aware Pre-Quantization Fusion for Video-Enhanced Audio Tokenization

    Xiangyu Zhang +5

  6. eess.AS 2026-04-13 reviewed
    Watermark survives normal edits but breaks on deepfakes

    StreamMark: A Deep Learning-Based Semi-Fragile Audio Watermarking for Proactive Deepfake Detection

    Zhentao Liu +1

  7. eess.AS 2026-04-13 reviewed
    Audio AI models lose track of emotions in long talks

    HumDial-EIBench: A Human-Recorded Multi-Turn Emotional Intelligence Benchmark for Audio Language Models

    Shuiyuan Wang +7

  8. eess.AS 2026-04-13 reviewed
    LLM with cluster tags beats sequential diarization plus ASR

    Speaker Attributed Automatic Speech Recognition Using Speech Aware LLMS

    Hagai Aronowitz +4

  9. eess.AS 2026-04-13 reviewed
    Joint teacher-student updates cut speech WER by 4.6%

    Teaching the Teachers: Boosting unsupervised domain adaptation in speech recognition by ensemble update

    Rehan Ahmad +3

  10. eess.AS 2026-04-13 reviewed
    Neural estimator preserves direction in multichannel speech enhancement

    Direction-Preserving MIMO Speech Enhancement Using a Neural Covariance Estimator

    Thomas Deppisch

  11. eess.SP 2026-04-13 reviewed
    Deep learning ANC preserves speech while cutting non-stationary noise

    Speech-preserving active noise control: a deep learning approach in reverberant environments

    Shuning Dai

  12. cs.SD 2026-04-13 reviewed
    AF-Next outperforms similar open audio models on 20 benchmarks

    Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music

    Sreyan Ghosh +17

  13. cs.SD 2026-04-12 reviewed
    Synthetic labels keep music-flavor structure intact

    Multimodal Dataset Normalization and Perceptual Validation for Music-Taste Correspondences

    Matteo Spanio +2

  14. cs.CL 2026-04-11 reviewed
    Binary projection halves repetition in full-duplex speech models

    ASPIRin: Action Space Projection for Interactivity-Optimized Reinforcement Learning in Full-Duplex Speech Language Models

    Chi-Yuan Hsiao +5

  15. cs.LG 2026-04-10 reviewed
    Time-aware networks fix read bias in live speech translation

    Regularized Entropy Information Adaptation with Temporal-Awareness Networks for Simultaneous Speech Translation

    Joseph Liu +3

  16. eess.AS 2026-04-10 reviewed
    Self-control speech tasks sense student emotions

    Toward using Speech to Sense Student Emotion in Remote Learning Environments

    Sargam Vyas +5

  17. eess.AS 2026-04-10 reviewed
    Utterance filters pick reliable child ASR outputs at 97% precision

    Utterance-Level Methods for Identifying Reliable ASR-Output for Child Speech

    Gus Lathouwers +3

  18. eess.AS 2026-04-10 reviewed
    Diverse broadcast audio pretraining boosts SSL models

    Data Selection Effects on Self-Supervised Learning of Audio Representations for French Audiovisual Broadcasts

    Valentin Pelloin +3

  19. eess.AS 2026-04-10 reviewed
    Language model separates music stems via discrete tokens

    Discrete Token Modeling for Multi-Stem Music Source Separation with Language Models

    Pengbo Lyu +6

  20. cs.SD 2026-04-10 reviewed
    Model turns mixed dialogue audio into separate speaker tracks

    DialogueSidon: Recovering Full-Duplex Dialogue Tracks from In-the-Wild Dialogue Audio

    Wataru Nakata +4

  21. eess.AS 2026-04-10 reviewed
    Phoneme sequences outperform projectors in low-resource LLM ASR

    Phonemes vs. Projectors: An Investigation of Speech-Language Interfaces for LLM-based ASR

    Ziwei Li +4

  22. eess.AS 2026-04-10 reviewed
    Confidence weighting cuts medical ASR errors for Telugu and Kannada

    Enhancing ASR Performance in the Medical Domain for Dravidian Languages

    Sri Charan Devarakonda +5

  23. eess.AS 2026-04-10 reviewed
    Phonetic sync aligns dubbed audio to original lips

    PS-TTS: Phonetic Synchronization in Text-to-Speech for Achieving Natural Automated Dubbing

    Changi Hong +6

  24. cs.SD 2026-04-09 reviewed
    ASR models output wrong scripts in 21% of multilingual cases

    Script collapse in multilingual ASR: A reference-free metric and 100-pair benchmark

    Hanif Rahman

  25. eess.AS 2026-04-09 reviewed
    Audio prompts plus online RL lift conversational TTS quality

    Enhancing Conversational TTS with Cascaded Prompting and ICL-Based Online Reinforcement Learning

    Zhicheng Ouyang +6

  26. cs.SD 2026-04-09 reviewed
    Front-end choice dominates deepfake audio detector performance

    DeepFense: A Unified, Modular, and Extensible Framework for Robust Deepfake Audio Detection

    Yassine El Kheir +8

  27. eess.AS 2026-04-09 reviewed
    Ring mixing halves residual noise in unsupervised speech separation

    Ring Mixing with Auxiliary Signal-to-Consistency-Error Ratio Loss for Unsupervised Denoising in Speech Separation

    Matthew Maciejewski +1

  28. cs.SD 2026-04-09 reviewed
    Interaction history lifts device speech detection F1 to 0.95

    Selective Attention System (SAS): Device-Addressed Speech Detection for Real-Time On-Device Voice AI

    David Joohun Kim +3

  29. eess.AS 2026-04-09 reviewed
    TASU2 controls WER in CTC simulation for speech LLM adaptation

    TASU2: Controllable CTC Simulation for Alignment and Low-Resource Adaptation of Speech LLMs

    Jing Peng +7

  30. eess.AS 2026-04-09 reviewed
    Gaze cues select target speaker in multi-talker enhancement

    Tracking Listener Attention: Gaze-Guided Audio-Visual Speech Enhancement Framework

    Hsiang-Cheng Yang +5

  31. eess.AS 2026-04-09 reviewed
    Entropy metrics guide efficient LLM speech recognition

    Rethinking Entropy Allocation in LLM-based ASR: Understanding the Dynamics between Speech Encoders and LLMs

    Yuan Xie +6

  32. cs.SD 2026-04-08 reviewed
    Emotion recognition crosses languages with five source labels

    Semantic-Emotional Resonance Embedding: A Semi-Supervised Paradigm for Cross-Lingual Speech Emotion Recognition

    Ya Zhao +2

  33. eess.AS 2026-04-08 reviewed
    EvoTSE updates enrollment to cut confusion in speaker extraction

    EvoTSE: Evolving Enrollment for Target Speaker Extraction

    Zikai Liu +6

  34. eess.AS 2026-04-08 reviewed
    Attention module sharpens speech for cochlear implant users

    DAT-CFTNet: Speech Enhancement for Cochlear Implant Recipients using Attention-based Dual-Path Recurrent Neural Network

    Nursadul Mamun +1

  35. eess.AS 2026-04-08 reviewed
    Hierarchical loss lifts subtle fault detection in manufacturing

    Deep Hierarchical Knowledge Loss for Fault Intensity Diagnosis

    Yu Sha +10

  36. eess.AS 2026-04-08 reviewed
    One model learns both audio and speech traits via long-patch prediction

    ULTRAS -- Unified Learning of Transformer Representations for Audio and Speech Signals

    Ameenudeen P E +2

  37. cs.CV 2026-04-07 reviewed
    Residual CNN and BiGRU cut music score recognition error to 0.45%

    A High-Accuracy Optical Music Recognition Method Based on Bottleneck Residual Convolutions

    Junwen Ma +3

  38. eess.AS 2026-04-07 reviewed
    Voice dataset launches AI challenge for early ALS detection

    SAND: The Challenge on Speech Analysis for Neurodegenerative Disease Assessment

    Giovanna Sannino +12

  39. eess.AS 2026-04-07 reviewed
    Challenge dataset lets AI detect ALS from voice recordings

    SAND: The Challenge on Speech Analysis for Neurodegenerative Disease Assessment

    Giovanna Sannino +12

  40. eess.AS 2026-04-07 reviewed
    Model turns low-order reflections into full room impulse responses

    Multimodal Deep Learning Method for Real-Time Spatial Room Impulse Response Computing

    Zhiyu Li +3

  41. eess.AS 2026-04-07 reviewed
    Open-ear glasses cancel noise using only frame mics

    Active noise cancellation on open-ear smart glasses

    Kuang Yuan +7

  42. eess.AS 2026-04-06 reviewed
    Diarization models drop on child and older adult speech

    Exploring Speech Foundation Models for Speaker Diarization Across Lifespan

    Anfeng Xu +2

  43. eess.AS 2026-04-06 reviewed
    Joint training on all ages fixes diarization drops on child and older voices

    Exploring Speech Foundation Models for Speaker Diarization Across Lifespan

    Anfeng Xu +2

  44. eess.AS 2026-04-06 reviewed
    New benchmark tests voice agents on real disfluent speech and tool chains

    Full-Duplex-Bench-v3: Benchmarking Tool Use for Full-Duplex Voice Agents Under Real-World Disfluency

    Guan-Ting Lin +3

  45. cs.SD 2026-04-06 reviewed
    High-res audio plus subband experts beat 16 kHz detectors for singing fakes

    Joint Fullband-Subband Modeling for High-Resolution SingFake Detection

    Xuanjun Chen +5

  46. cs.SD 2026-04-06 reviewed
    Binaural attention lifts audio navigation success on unheard sounds

    Generalizable Audio-Visual Navigation via Binaural Difference Attention and Action Transition Prediction

    Jia Li +1

  47. cs.AR 2026-04-06 reviewed
    Bit partitioning lets one PE run FP8 or dual FP4 with 60% less area

    DHFP-PE: Dual-Precision Hybrid Floating Point Processing Element for AI Acceleration

    Shubham Kumar +3

  48. eess.AS 2026-04-04 reviewed
    Zero-shot KWS reaches 90% accuracy with 0.007% false alarms

    MALEFA: Multi-grAnularity Learning and Effective False Alarm Suppression for Zero-shot Keyword Spotting

    Lo-Ya Li +4

  49. eess.AS 2026-04-03 reviewed
    No enrollment needed: mixture yields usable speaker embeddings

    Unmixing the Crowd: Learning Mixture-to-Set Speaker Embeddings for Enrollment-Free Target Speech Extraction

    FNU Sidharth +3

  50. eess.AS 2026-04-03 reviewed
    Iterative reasoning lifts speaker attribution accuracy in group talks

    Speaker-Reasoner: Scaling Interaction Turns and Reasoning Patterns for Timestamped Speaker-Attributed ASR

    Zhennan Lin +7