Hung-yi Lee

Identifiers

name variant Hung-Yi Lee 0.60 · backfill

Papers (82)

Steering Where to Listen: Instruction-Based Activation Steering Redirects Temporal Attention in Large Audio-Language Models cs.SD · 2026 · author #2 as printed: Hung-Yi Lee
Mitigating Proxy-to-Wild Domain Gap in Deepfake Speech cs.SD · 2026 · author #6
Academic Text-to-Music Grand Challenge: Datasets, Baselines, and Evaluation Methods cs.SD · 2026 · author #4
Rethinking Dense Sequential Chains: Reasoning Language Models Can Extract Answers from Sparse, Order-Shuffling Chain-of-Thoughts cs.CL · 2026 · author #4
Rethinking Entropy Minimization in Test-Time Adaptation for Autoregressive Models eess.AS · 2026 · author #4
Toward Fair Speech Technologies: A Comprehensive Survey of Bias and Fairness in Speech AI eess.AS · 2026 · author #6
ReMedi: Reasoner for Medical Clinical Prediction cs.CL · 2026 · author #4
Fast Text-to-Audio Generation with One-Step Sampling via Energy-Scoring and Auxiliary Contextual Representation Distillation cs.SD · 2026 · author #9
The False Resonance: A Critical Examination of Emotion Embedding Similarity for Speech Generation Evaluation eess.AS · 2026 · author #8
Walking Through Uncertainty: An Empirical Study of Uncertainty Estimation for Audio-Aware Large Language Models eess.AS · 2026 · author #3
All That Glitters Is Not Audio: Rethinking Text Priors and Audio Reliance in Audio-Language Evaluation cs.SD · 2026 · author #5
LLM-Codec: Neural Audio Codec Meets Language Model Objectives cs.SD · 2026 · author #3
MoVE: Translating Laughter and Tears via Mixture of Vocalization Experts in Speech-to-Speech Translation cs.CL · 2026 · author #5
VIBE: Voice-Induced open-ended Bias Evaluation for Large Audio-Language Models via Real-World Speech eess.AS · 2026 · author #4
NVBench: A Benchmark for Speech Synthesis with Non-Verbal Vocalizations cs.SD · 2026 · author #11
CodaRAG: Connecting the Dots with Associativity Inspired by Complementary Learning cs.CL · 2026 · author #6
ASPIRin: Action Space Projection for Interactivity-Optimized Reinforcement Learning in Full-Duplex Speech Language Models cs.CL · 2026 · author #6
Full-Duplex-Bench-v3: Benchmarking Tool Use for Full-Duplex Voice Agents Under Real-World Disfluency eess.AS · 2026 · author #4
Joint Fullband-Subband Modeling for High-Resolution SingFake Detection cs.SD · 2026 · author #5
TiCo: Time-Controllable Spoken Dialogue Model cs.CL · 2026 · author #4
TW-Sound580K: A Regional Audio-Text Dataset with Verification-Guided Curation for Localized Audio-Language Modeling cs.SD · 2026 · author #7
AQUA-Bench: Beyond Finding Answers to Knowing When There Are None in Audio Question Answering eess.AS · 2026 · author #2
On the Fallacy of Global Token Perplexity in Spoken Language Model Evaluation cs.CL · 2026 · author #7
Style Amnesia: Investigating Speaking Style Degradation and Mitigation in Multi-Turn Spoken Language Models cs.CL · 2025 · author #3
Pseudo2Real: Task Arithmetic for Pseudo-Label Correction in Automatic Speech Recognition eess.AS · 2025 · author #7
Full-Duplex-Bench-v2: A Multi-Turn Evaluation Framework for Duplex Dialogue Systems with an Automated Examiner eess.AS · 2025 · author #7
When Silence Matters: The Impact of Irrelevant Audio on Text Reasoning in Large Audio-Language Models cs.SD · 2025 · author #3
Game-Time: Evaluating Temporal Dynamics in Spoken Language Models eess.AS · 2025 · author #9
Do You Hear What I Mean? Quantifying the Instruction-Perception Gap in Instruction-Guided Expressive Text-To-Speech Systems eess.AS · 2025 · author #5
Full-Duplex-Bench v1.5: Evaluating Overlap Handling for Full-Duplex Speech Models eess.AS · 2025 · author #7
An Exploration of Mamba for Speech Self-Supervised Models cs.CL · 2025 · author #8
Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey eess.AS · 2025 · author #3
On The Landscape of Spoken Language Models: A Comprehensive Survey cs.CL · 2025 · author #8
Speech-FT: Merging Pre-trained And Fine-Tuned Speech Representation Models For Cross-Task Generalization cs.CL · 2025 · author #4
CodecFake+: Codec-Based Resynthesized Data as a Proxy for Detecting CodecFake Speech cs.SD · 2025 · author #11
Cross-Lingual Transfer Learning for Question Answering cs.CL · 2019 · author #2
Mitigating the Impact of Speech Recognition Errors on Spoken Question Answering by Adversarial Domain Adaptation cs.CL · 2019 · author #3
Improved Speech Separation with Time-and-Frequency Cross-domain Joint Embedding and Clustering cs.SD · 2019 · author #3
End-to-end Text-to-speech for Low-resource Languages by Cross-Lingual Transfer Learning cs.CL · 2019 · author #4
From Semi-supervised to Almost-unsupervised Speech Recognition with Very-low Resource by Jointly Learning Phonetic Structures from Audio and Text Embeddings cs.CL · 2019 · author #3
Adversarial Learning of Label Dependency: A Novel Framework for Multi-class Classification cs.LG · 2018 · author #2
Improved Audio Embeddings by Adjacency-Based Clustering with Applications in Spoken Term Detection cs.CL · 2018 · author #3
Code-switching Sentence Generation by Generative Adversarial Networks and its Application to Data Augmentation cs.CL · 2018 · author #3
Adversarial Training of End-to-end Speech Recognition Using a Criticizing Language Model cs.CL · 2018 · author #2
Almost-unsupervised Speech Recognition with Close-to-zero Resource Based on Phonetic Structures Learned from Very Small Unpaired Speech and Text Data cs.CL · 2018 · author #4
Learning to Encode Text as Human-Readable Summaries using Generative Adversarial Networks cs.CL · 2018 · author #2
Proximal Policy Optimization and its Dynamic Version for Sequence Generation cs.CL · 2018 · author #4
Improving Conditional Sequence Generative Adversarial Networks by Stepwise Evaluation cs.CL · 2018 · author #2
Towards Audio to Scene Image Synthesis using Generative Adversarial Network cs.CL · 2018 · author #3
Rhythm-Flexible Voice Conversion without Parallel Data Using Cycle-GAN over Phoneme Posteriorgram Sequences cs.SD · 2018 · author #4
ODSQA: Open-domain Spoken Question Answering Dataset cs.CL · 2018 · author #4
Segmental Audio Word2Vec: Representing Utterances as Sequences of Vectors with Applications in Spoken Term Detection cs.CL · 2018 · author #2
Phonetic-and-Semantic Embedding of Spoken Words with Applications in Spoken Content Retrieval cs.CL · 2018 · author #4
Noise Adaptive Speech Enhancement using Domain Adversarial Training cs.SD · 2018 · author #3
Multi-target Voice Conversion without Parallel Data by Adversarially Learning Disentangled Audio Representations eess.AS · 2018 · author #3
Scalable Sentiment for Sequence-to-sequence Chatbot Response with Performance Analysis cs.CL · 2018 · author #5
Spoken SQuAD: A Study of Mitigating the Impact of Speech Recognition Errors on Listening Comprehension cs.CL · 2018 · author #4
Joint Learning of Interactive Spoken Content Retrieval and Trainable User Simulator cs.CL · 2018 · author #4
Completely Unsupervised Phoneme Recognition by Adversarially Learning Mapping Relationships from Audio Embeddings cs.CL · 2018 · author #3
Towards Unsupervised Automatic Speech Recognition Trained by Unaligned Speech and Text only cs.CL · 2018 · author #4
Supervised and Unsupervised Transfer Learning for Question Answering cs.CL · 2017 · author #2
Personalized word representations Carrying Personalized Semantics Learned from Social Network Posts cs.CL · 2017 · author #3
Mitigating the Impact of Speech Recognition Errors on Chatbot using Sequence-to-Sequence Model cs.CL · 2017 · author #4
Order-Preserving Abstractive Summarization for Spoken Content Based on Connectionist Temporal Classification cs.CL · 2017 · author #4
Query-based Attention CNN for Text Similarity Map cs.AI · 2017 · author #3
Query-by-example Spoken Term Detection using Attention-based Multi-hop Networks cs.CL · 2017 · author #2
Learning Chinese Word Representations From Glyphs Of Characters cs.CL · 2017 · author #2
Language Transfer of Audio Word2Vec: Learning Audio Segment Representations without Target Language Data cs.CL · 2017 · author #3
Personalized Acoustic Modeling by Weakly Supervised Multi-Task Deep Learning using Acoustic Tokens Discovered from Unlabeled Data cs.SD · 2017 · author #3
Gate Activation Signal Analysis for Gated Recurrent Neural Networks and Its Correlation with Phoneme Boundaries cs.SD · 2017 · author #3
Abstractive Headline Generation for Spoken Content by Attentive Recurrent Neural Networks with ASR Error Modeling cs.CL · 2016 · author #2
Attention-based Memory Selection Recurrent Network for Language Modeling cs.CL · 2016 · author #3
Interactive Spoken Content Retrieval by Deep Reinforcement Learning cs.CL · 2016 · author #4
Hierarchical Attention Model for Improved Machine Comprehension of Spoken Content cs.CL · 2016 · author #3
Towards Machine Comprehension of Spoken Content: Initial TOEFL Listening Comprehension Test by Machine cs.CL · 2016 · author #3
Neural Attention Models for Sequence Classification: Analysis and Application to Key Term Extraction and Dialogue Act Detection cs.CL · 2016 · author #2
Audio Word2Vec: Unsupervised Learning of Audio Segment Representations using Sequence-to-sequence Autoencoder cs.SD · 2016 · author #4
An Iterative Deep Learning Framework for Unsupervised Discovery of Speech Features and Linguistic Units with Applications on Spoken Term Detection cs.CL · 2016 · author #5
Towards Structured Deep Neural Network for Automatic Speech Recognition cs.CL · 2015 · author #2
A Multi-layered Acoustic Tokenizing Deep Neural Network (MAT-DNN) for Unsupervised Discovery of Linguistic Units and Generation of High Quality Features cs.CL · 2015 · author #7
Personalizing Universal Recurrent Neural Network Language Model with User Characteristic Features by Social Network Crowdsouring cs.CL · 2015 · author #2
Towards Structured Deep Neural Network for Automatic Speech Recognition cs.LG · 2015 · author #2

Mentions

2606.11400 #2 · arxiv_oai · confidence 0.70 Hung-Yi Lee
2501.08238 #11 · arxiv_oai · confidence 0.70 Hung-yi Lee
2606.07494 #6 · arxiv_oai · confidence 0.70 Hung-yi Lee
1511.02506 #2 · backfill · confidence 0.70 Hung-yi Lee
1506.02327 #7 · backfill · confidence 0.70 Hung-yi Lee
1506.01192 #2 · backfill · confidence 0.70 Hung-yi Lee
1506.01163 #2 · backfill · confidence 0.70 Hung-yi Lee
2601.06329 #7 · arxiv_oai · confidence 0.70 Hung-yi Lee
2605.21538 #4 · arxiv_oai · confidence 0.70 Hung-yi Lee

Frequent Coauthors

Lin-shan Lee 24 shared papers
Sung-Feng Huang 8 shared papers
Yi-Cheng Lin 8 shared papers
Guan-Ting Lin 6 shared papers
Chia-Hao Shen 5 shared papers
Kai-Wei Chang 5 shared papers
Yi-Chen Chen 5 shared papers
Cheng-Tao Chung 4 shared papers
Haibin Wu 4 shared papers
Jyh-Shing Roger Jang 4 shared papers
Kuan-Yu Chen 4 shared papers
Xuanjun Chen 4 shared papers
Yu Tsao 4 shared papers
Cheng-chieh Yeh 3 shared papers
Chia-Hsuan Lee 3 shared papers
Chun-Yi Kuan 3 shared papers
Huang-Cheng Chou 3 shared papers
James Glass 3 shared papers
Ju-chieh Chou 3 shared papers
Ke-Han Lu 3 shared papers