Hung-yi Lee
Identifiers
- name variant Hung-Yi Lee 0.60 · backfill
Papers (82)
- Steering Where to Listen: Instruction-Based Activation Steering Redirects Temporal Attention in Large Audio-Language Models cs.SD · 2026 · author #2 as printed: Hung-Yi Lee
- Mitigating Proxy-to-Wild Domain Gap in Deepfake Speech cs.SD · 2026 · author #6
- Academic Text-to-Music Grand Challenge: Datasets, Baselines, and Evaluation Methods cs.SD · 2026 · author #4
- Rethinking Dense Sequential Chains: Reasoning Language Models Can Extract Answers from Sparse, Order-Shuffling Chain-of-Thoughts cs.CL · 2026 · author #4
- Rethinking Entropy Minimization in Test-Time Adaptation for Autoregressive Models eess.AS · 2026 · author #4
- Toward Fair Speech Technologies: A Comprehensive Survey of Bias and Fairness in Speech AI eess.AS · 2026 · author #6
- ReMedi: Reasoner for Medical Clinical Prediction cs.CL · 2026 · author #4
- Fast Text-to-Audio Generation with One-Step Sampling via Energy-Scoring and Auxiliary Contextual Representation Distillation cs.SD · 2026 · author #9
- The False Resonance: A Critical Examination of Emotion Embedding Similarity for Speech Generation Evaluation eess.AS · 2026 · author #8
- Walking Through Uncertainty: An Empirical Study of Uncertainty Estimation for Audio-Aware Large Language Models eess.AS · 2026 · author #3
- All That Glitters Is Not Audio: Rethinking Text Priors and Audio Reliance in Audio-Language Evaluation cs.SD · 2026 · author #5
- LLM-Codec: Neural Audio Codec Meets Language Model Objectives cs.SD · 2026 · author #3
- MoVE: Translating Laughter and Tears via Mixture of Vocalization Experts in Speech-to-Speech Translation cs.CL · 2026 · author #5
- VIBE: Voice-Induced open-ended Bias Evaluation for Large Audio-Language Models via Real-World Speech eess.AS · 2026 · author #4
- NVBench: A Benchmark for Speech Synthesis with Non-Verbal Vocalizations cs.SD · 2026 · author #11
- CodaRAG: Connecting the Dots with Associativity Inspired by Complementary Learning cs.CL · 2026 · author #6
- ASPIRin: Action Space Projection for Interactivity-Optimized Reinforcement Learning in Full-Duplex Speech Language Models cs.CL · 2026 · author #6
- Full-Duplex-Bench-v3: Benchmarking Tool Use for Full-Duplex Voice Agents Under Real-World Disfluency eess.AS · 2026 · author #4
- Joint Fullband-Subband Modeling for High-Resolution SingFake Detection cs.SD · 2026 · author #5
- TiCo: Time-Controllable Spoken Dialogue Model cs.CL · 2026 · author #4
- TW-Sound580K: A Regional Audio-Text Dataset with Verification-Guided Curation for Localized Audio-Language Modeling cs.SD · 2026 · author #7
- AQUA-Bench: Beyond Finding Answers to Knowing When There Are None in Audio Question Answering eess.AS · 2026 · author #2
- On the Fallacy of Global Token Perplexity in Spoken Language Model Evaluation cs.CL · 2026 · author #7
- Style Amnesia: Investigating Speaking Style Degradation and Mitigation in Multi-Turn Spoken Language Models cs.CL · 2025 · author #3
- Pseudo2Real: Task Arithmetic for Pseudo-Label Correction in Automatic Speech Recognition eess.AS · 2025 · author #7
- Full-Duplex-Bench-v2: A Multi-Turn Evaluation Framework for Duplex Dialogue Systems with an Automated Examiner eess.AS · 2025 · author #7
- When Silence Matters: The Impact of Irrelevant Audio on Text Reasoning in Large Audio-Language Models cs.SD · 2025 · author #3
- Game-Time: Evaluating Temporal Dynamics in Spoken Language Models eess.AS · 2025 · author #9
- Do You Hear What I Mean? Quantifying the Instruction-Perception Gap in Instruction-Guided Expressive Text-To-Speech Systems eess.AS · 2025 · author #5
- Full-Duplex-Bench v1.5: Evaluating Overlap Handling for Full-Duplex Speech Models eess.AS · 2025 · author #7
- An Exploration of Mamba for Speech Self-Supervised Models cs.CL · 2025 · author #8
- Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey eess.AS · 2025 · author #3
- On The Landscape of Spoken Language Models: A Comprehensive Survey cs.CL · 2025 · author #8
- Speech-FT: Merging Pre-trained And Fine-Tuned Speech Representation Models For Cross-Task Generalization cs.CL · 2025 · author #4
- CodecFake+: Codec-Based Resynthesized Data as a Proxy for Detecting CodecFake Speech cs.SD · 2025 · author #11
- Cross-Lingual Transfer Learning for Question Answering cs.CL · 2019 · author #2
- Mitigating the Impact of Speech Recognition Errors on Spoken Question Answering by Adversarial Domain Adaptation cs.CL · 2019 · author #3
- Improved Speech Separation with Time-and-Frequency Cross-domain Joint Embedding and Clustering cs.SD · 2019 · author #3
- End-to-end Text-to-speech for Low-resource Languages by Cross-Lingual Transfer Learning cs.CL · 2019 · author #4
- From Semi-supervised to Almost-unsupervised Speech Recognition with Very-low Resource by Jointly Learning Phonetic Structures from Audio and Text Embeddings cs.CL · 2019 · author #3
- Adversarial Learning of Label Dependency: A Novel Framework for Multi-class Classification cs.LG · 2018 · author #2
- Improved Audio Embeddings by Adjacency-Based Clustering with Applications in Spoken Term Detection cs.CL · 2018 · author #3
- Code-switching Sentence Generation by Generative Adversarial Networks and its Application to Data Augmentation cs.CL · 2018 · author #3
- Adversarial Training of End-to-end Speech Recognition Using a Criticizing Language Model cs.CL · 2018 · author #2
- Almost-unsupervised Speech Recognition with Close-to-zero Resource Based on Phonetic Structures Learned from Very Small Unpaired Speech and Text Data cs.CL · 2018 · author #4
- Learning to Encode Text as Human-Readable Summaries using Generative Adversarial Networks cs.CL · 2018 · author #2
- Proximal Policy Optimization and its Dynamic Version for Sequence Generation cs.CL · 2018 · author #4
- Improving Conditional Sequence Generative Adversarial Networks by Stepwise Evaluation cs.CL · 2018 · author #2
- Towards Audio to Scene Image Synthesis using Generative Adversarial Network cs.CL · 2018 · author #3
- Rhythm-Flexible Voice Conversion without Parallel Data Using Cycle-GAN over Phoneme Posteriorgram Sequences cs.SD · 2018 · author #4
- ODSQA: Open-domain Spoken Question Answering Dataset cs.CL · 2018 · author #4
- Segmental Audio Word2Vec: Representing Utterances as Sequences of Vectors with Applications in Spoken Term Detection cs.CL · 2018 · author #2
- Phonetic-and-Semantic Embedding of Spoken Words with Applications in Spoken Content Retrieval cs.CL · 2018 · author #4
- Noise Adaptive Speech Enhancement using Domain Adversarial Training cs.SD · 2018 · author #3
- Multi-target Voice Conversion without Parallel Data by Adversarially Learning Disentangled Audio Representations eess.AS · 2018 · author #3
- Scalable Sentiment for Sequence-to-sequence Chatbot Response with Performance Analysis cs.CL · 2018 · author #5
- Spoken SQuAD: A Study of Mitigating the Impact of Speech Recognition Errors on Listening Comprehension cs.CL · 2018 · author #4
- Joint Learning of Interactive Spoken Content Retrieval and Trainable User Simulator cs.CL · 2018 · author #4
- Completely Unsupervised Phoneme Recognition by Adversarially Learning Mapping Relationships from Audio Embeddings cs.CL · 2018 · author #3
- Towards Unsupervised Automatic Speech Recognition Trained by Unaligned Speech and Text only cs.CL · 2018 · author #4
- Supervised and Unsupervised Transfer Learning for Question Answering cs.CL · 2017 · author #2
- Personalized word representations Carrying Personalized Semantics Learned from Social Network Posts cs.CL · 2017 · author #3
- Mitigating the Impact of Speech Recognition Errors on Chatbot using Sequence-to-Sequence Model cs.CL · 2017 · author #4
- Order-Preserving Abstractive Summarization for Spoken Content Based on Connectionist Temporal Classification cs.CL · 2017 · author #4
- Query-based Attention CNN for Text Similarity Map cs.AI · 2017 · author #3
- Query-by-example Spoken Term Detection using Attention-based Multi-hop Networks cs.CL · 2017 · author #2
- Learning Chinese Word Representations From Glyphs Of Characters cs.CL · 2017 · author #2
- Language Transfer of Audio Word2Vec: Learning Audio Segment Representations without Target Language Data cs.CL · 2017 · author #3
- Personalized Acoustic Modeling by Weakly Supervised Multi-Task Deep Learning using Acoustic Tokens Discovered from Unlabeled Data cs.SD · 2017 · author #3
- Gate Activation Signal Analysis for Gated Recurrent Neural Networks and Its Correlation with Phoneme Boundaries cs.SD · 2017 · author #3
- Abstractive Headline Generation for Spoken Content by Attentive Recurrent Neural Networks with ASR Error Modeling cs.CL · 2016 · author #2
- Attention-based Memory Selection Recurrent Network for Language Modeling cs.CL · 2016 · author #3
- Interactive Spoken Content Retrieval by Deep Reinforcement Learning cs.CL · 2016 · author #4
- Hierarchical Attention Model for Improved Machine Comprehension of Spoken Content cs.CL · 2016 · author #3
- Towards Machine Comprehension of Spoken Content: Initial TOEFL Listening Comprehension Test by Machine cs.CL · 2016 · author #3
- Neural Attention Models for Sequence Classification: Analysis and Application to Key Term Extraction and Dialogue Act Detection cs.CL · 2016 · author #2
- Audio Word2Vec: Unsupervised Learning of Audio Segment Representations using Sequence-to-sequence Autoencoder cs.SD · 2016 · author #4
- An Iterative Deep Learning Framework for Unsupervised Discovery of Speech Features and Linguistic Units with Applications on Spoken Term Detection cs.CL · 2016 · author #5
- Towards Structured Deep Neural Network for Automatic Speech Recognition cs.CL · 2015 · author #2
- A Multi-layered Acoustic Tokenizing Deep Neural Network (MAT-DNN) for Unsupervised Discovery of Linguistic Units and Generation of High Quality Features cs.CL · 2015 · author #7
- Personalizing Universal Recurrent Neural Network Language Model with User Characteristic Features by Social Network Crowdsouring cs.CL · 2015 · author #2
- Towards Structured Deep Neural Network for Automatic Speech Recognition cs.LG · 2015 · author #2
Mentions
- 2606.11400 #2 · arxiv_oai · confidence 0.70 Hung-Yi Lee
- 2501.08238 #11 · arxiv_oai · confidence 0.70 Hung-yi Lee
- 2606.07494 #6 · arxiv_oai · confidence 0.70 Hung-yi Lee
- 1511.02506 #2 · backfill · confidence 0.70 Hung-yi Lee
- 1506.02327 #7 · backfill · confidence 0.70 Hung-yi Lee
- 1506.01192 #2 · backfill · confidence 0.70 Hung-yi Lee
- 1506.01163 #2 · backfill · confidence 0.70 Hung-yi Lee
- 2601.06329 #7 · arxiv_oai · confidence 0.70 Hung-yi Lee
- 2605.21538 #4 · arxiv_oai · confidence 0.70 Hung-yi Lee
Frequent Coauthors
- Lin-shan Lee 24 shared papers
- Sung-Feng Huang 8 shared papers
- Yi-Cheng Lin 8 shared papers
- Guan-Ting Lin 6 shared papers
- Chia-Hao Shen 5 shared papers
- Kai-Wei Chang 5 shared papers
- Yi-Chen Chen 5 shared papers
- Cheng-Tao Chung 4 shared papers
- Haibin Wu 4 shared papers
- Jyh-Shing Roger Jang 4 shared papers
- Kuan-Yu Chen 4 shared papers
- Xuanjun Chen 4 shared papers
- Yu Tsao 4 shared papers
- Cheng-chieh Yeh 3 shared papers
- Chia-Hsuan Lee 3 shared papers
- Chun-Yi Kuan 3 shared papers
- Huang-Cheng Chou 3 shared papers
- James Glass 3 shared papers
- Ju-chieh Chou 3 shared papers
- Ke-Han Lu 3 shared papers