pith. sign in

Hung-yi Lee

Identifiers

  • name variant Hung-Yi Lee 0.60 · backfill

Papers (82)

  1. Steering Where to Listen: Instruction-Based Activation Steering Redirects Temporal Attention in Large Audio-Language Models cs.SD · 2026 · author #2 as printed: Hung-Yi Lee
  2. Mitigating Proxy-to-Wild Domain Gap in Deepfake Speech cs.SD · 2026 · author #6
  3. Academic Text-to-Music Grand Challenge: Datasets, Baselines, and Evaluation Methods cs.SD · 2026 · author #4
  4. Rethinking Dense Sequential Chains: Reasoning Language Models Can Extract Answers from Sparse, Order-Shuffling Chain-of-Thoughts cs.CL · 2026 · author #4
  5. Rethinking Entropy Minimization in Test-Time Adaptation for Autoregressive Models eess.AS · 2026 · author #4
  6. Toward Fair Speech Technologies: A Comprehensive Survey of Bias and Fairness in Speech AI eess.AS · 2026 · author #6
  7. ReMedi: Reasoner for Medical Clinical Prediction cs.CL · 2026 · author #4
  8. Fast Text-to-Audio Generation with One-Step Sampling via Energy-Scoring and Auxiliary Contextual Representation Distillation cs.SD · 2026 · author #9
  9. The False Resonance: A Critical Examination of Emotion Embedding Similarity for Speech Generation Evaluation eess.AS · 2026 · author #8
  10. Walking Through Uncertainty: An Empirical Study of Uncertainty Estimation for Audio-Aware Large Language Models eess.AS · 2026 · author #3
  11. All That Glitters Is Not Audio: Rethinking Text Priors and Audio Reliance in Audio-Language Evaluation cs.SD · 2026 · author #5
  12. LLM-Codec: Neural Audio Codec Meets Language Model Objectives cs.SD · 2026 · author #3
  13. MoVE: Translating Laughter and Tears via Mixture of Vocalization Experts in Speech-to-Speech Translation cs.CL · 2026 · author #5
  14. VIBE: Voice-Induced open-ended Bias Evaluation for Large Audio-Language Models via Real-World Speech eess.AS · 2026 · author #4
  15. NVBench: A Benchmark for Speech Synthesis with Non-Verbal Vocalizations cs.SD · 2026 · author #11
  16. CodaRAG: Connecting the Dots with Associativity Inspired by Complementary Learning cs.CL · 2026 · author #6
  17. ASPIRin: Action Space Projection for Interactivity-Optimized Reinforcement Learning in Full-Duplex Speech Language Models cs.CL · 2026 · author #6
  18. Full-Duplex-Bench-v3: Benchmarking Tool Use for Full-Duplex Voice Agents Under Real-World Disfluency eess.AS · 2026 · author #4
  19. Joint Fullband-Subband Modeling for High-Resolution SingFake Detection cs.SD · 2026 · author #5
  20. TiCo: Time-Controllable Spoken Dialogue Model cs.CL · 2026 · author #4
  21. TW-Sound580K: A Regional Audio-Text Dataset with Verification-Guided Curation for Localized Audio-Language Modeling cs.SD · 2026 · author #7
  22. AQUA-Bench: Beyond Finding Answers to Knowing When There Are None in Audio Question Answering eess.AS · 2026 · author #2
  23. On the Fallacy of Global Token Perplexity in Spoken Language Model Evaluation cs.CL · 2026 · author #7
  24. Style Amnesia: Investigating Speaking Style Degradation and Mitigation in Multi-Turn Spoken Language Models cs.CL · 2025 · author #3
  25. Pseudo2Real: Task Arithmetic for Pseudo-Label Correction in Automatic Speech Recognition eess.AS · 2025 · author #7
  26. Full-Duplex-Bench-v2: A Multi-Turn Evaluation Framework for Duplex Dialogue Systems with an Automated Examiner eess.AS · 2025 · author #7
  27. When Silence Matters: The Impact of Irrelevant Audio on Text Reasoning in Large Audio-Language Models cs.SD · 2025 · author #3
  28. Game-Time: Evaluating Temporal Dynamics in Spoken Language Models eess.AS · 2025 · author #9
  29. Do You Hear What I Mean? Quantifying the Instruction-Perception Gap in Instruction-Guided Expressive Text-To-Speech Systems eess.AS · 2025 · author #5
  30. Full-Duplex-Bench v1.5: Evaluating Overlap Handling for Full-Duplex Speech Models eess.AS · 2025 · author #7
  31. An Exploration of Mamba for Speech Self-Supervised Models cs.CL · 2025 · author #8
  32. Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey eess.AS · 2025 · author #3
  33. On The Landscape of Spoken Language Models: A Comprehensive Survey cs.CL · 2025 · author #8
  34. Speech-FT: Merging Pre-trained And Fine-Tuned Speech Representation Models For Cross-Task Generalization cs.CL · 2025 · author #4
  35. CodecFake+: Codec-Based Resynthesized Data as a Proxy for Detecting CodecFake Speech cs.SD · 2025 · author #11
  36. Cross-Lingual Transfer Learning for Question Answering cs.CL · 2019 · author #2
  37. Mitigating the Impact of Speech Recognition Errors on Spoken Question Answering by Adversarial Domain Adaptation cs.CL · 2019 · author #3
  38. Improved Speech Separation with Time-and-Frequency Cross-domain Joint Embedding and Clustering cs.SD · 2019 · author #3
  39. End-to-end Text-to-speech for Low-resource Languages by Cross-Lingual Transfer Learning cs.CL · 2019 · author #4
  40. From Semi-supervised to Almost-unsupervised Speech Recognition with Very-low Resource by Jointly Learning Phonetic Structures from Audio and Text Embeddings cs.CL · 2019 · author #3
  41. Adversarial Learning of Label Dependency: A Novel Framework for Multi-class Classification cs.LG · 2018 · author #2
  42. Improved Audio Embeddings by Adjacency-Based Clustering with Applications in Spoken Term Detection cs.CL · 2018 · author #3
  43. Code-switching Sentence Generation by Generative Adversarial Networks and its Application to Data Augmentation cs.CL · 2018 · author #3
  44. Adversarial Training of End-to-end Speech Recognition Using a Criticizing Language Model cs.CL · 2018 · author #2
  45. Almost-unsupervised Speech Recognition with Close-to-zero Resource Based on Phonetic Structures Learned from Very Small Unpaired Speech and Text Data cs.CL · 2018 · author #4
  46. Learning to Encode Text as Human-Readable Summaries using Generative Adversarial Networks cs.CL · 2018 · author #2
  47. Proximal Policy Optimization and its Dynamic Version for Sequence Generation cs.CL · 2018 · author #4
  48. Improving Conditional Sequence Generative Adversarial Networks by Stepwise Evaluation cs.CL · 2018 · author #2
  49. Towards Audio to Scene Image Synthesis using Generative Adversarial Network cs.CL · 2018 · author #3
  50. Rhythm-Flexible Voice Conversion without Parallel Data Using Cycle-GAN over Phoneme Posteriorgram Sequences cs.SD · 2018 · author #4
  51. ODSQA: Open-domain Spoken Question Answering Dataset cs.CL · 2018 · author #4
  52. Segmental Audio Word2Vec: Representing Utterances as Sequences of Vectors with Applications in Spoken Term Detection cs.CL · 2018 · author #2
  53. Phonetic-and-Semantic Embedding of Spoken Words with Applications in Spoken Content Retrieval cs.CL · 2018 · author #4
  54. Noise Adaptive Speech Enhancement using Domain Adversarial Training cs.SD · 2018 · author #3
  55. Multi-target Voice Conversion without Parallel Data by Adversarially Learning Disentangled Audio Representations eess.AS · 2018 · author #3
  56. Scalable Sentiment for Sequence-to-sequence Chatbot Response with Performance Analysis cs.CL · 2018 · author #5
  57. Spoken SQuAD: A Study of Mitigating the Impact of Speech Recognition Errors on Listening Comprehension cs.CL · 2018 · author #4
  58. Joint Learning of Interactive Spoken Content Retrieval and Trainable User Simulator cs.CL · 2018 · author #4
  59. Completely Unsupervised Phoneme Recognition by Adversarially Learning Mapping Relationships from Audio Embeddings cs.CL · 2018 · author #3
  60. Towards Unsupervised Automatic Speech Recognition Trained by Unaligned Speech and Text only cs.CL · 2018 · author #4
  61. Supervised and Unsupervised Transfer Learning for Question Answering cs.CL · 2017 · author #2
  62. Personalized word representations Carrying Personalized Semantics Learned from Social Network Posts cs.CL · 2017 · author #3
  63. Mitigating the Impact of Speech Recognition Errors on Chatbot using Sequence-to-Sequence Model cs.CL · 2017 · author #4
  64. Order-Preserving Abstractive Summarization for Spoken Content Based on Connectionist Temporal Classification cs.CL · 2017 · author #4
  65. Query-based Attention CNN for Text Similarity Map cs.AI · 2017 · author #3
  66. Query-by-example Spoken Term Detection using Attention-based Multi-hop Networks cs.CL · 2017 · author #2
  67. Learning Chinese Word Representations From Glyphs Of Characters cs.CL · 2017 · author #2
  68. Language Transfer of Audio Word2Vec: Learning Audio Segment Representations without Target Language Data cs.CL · 2017 · author #3
  69. Personalized Acoustic Modeling by Weakly Supervised Multi-Task Deep Learning using Acoustic Tokens Discovered from Unlabeled Data cs.SD · 2017 · author #3
  70. Gate Activation Signal Analysis for Gated Recurrent Neural Networks and Its Correlation with Phoneme Boundaries cs.SD · 2017 · author #3
  71. Abstractive Headline Generation for Spoken Content by Attentive Recurrent Neural Networks with ASR Error Modeling cs.CL · 2016 · author #2
  72. Attention-based Memory Selection Recurrent Network for Language Modeling cs.CL · 2016 · author #3
  73. Interactive Spoken Content Retrieval by Deep Reinforcement Learning cs.CL · 2016 · author #4
  74. Hierarchical Attention Model for Improved Machine Comprehension of Spoken Content cs.CL · 2016 · author #3
  75. Towards Machine Comprehension of Spoken Content: Initial TOEFL Listening Comprehension Test by Machine cs.CL · 2016 · author #3
  76. Neural Attention Models for Sequence Classification: Analysis and Application to Key Term Extraction and Dialogue Act Detection cs.CL · 2016 · author #2
  77. Audio Word2Vec: Unsupervised Learning of Audio Segment Representations using Sequence-to-sequence Autoencoder cs.SD · 2016 · author #4
  78. An Iterative Deep Learning Framework for Unsupervised Discovery of Speech Features and Linguistic Units with Applications on Spoken Term Detection cs.CL · 2016 · author #5
  79. Towards Structured Deep Neural Network for Automatic Speech Recognition cs.CL · 2015 · author #2
  80. A Multi-layered Acoustic Tokenizing Deep Neural Network (MAT-DNN) for Unsupervised Discovery of Linguistic Units and Generation of High Quality Features cs.CL · 2015 · author #7
  81. Personalizing Universal Recurrent Neural Network Language Model with User Characteristic Features by Social Network Crowdsouring cs.CL · 2015 · author #2
  82. Towards Structured Deep Neural Network for Automatic Speech Recognition cs.LG · 2015 · author #2

Mentions

  • 2606.11400 #2 · arxiv_oai · confidence 0.70 Hung-Yi Lee
  • 2501.08238 #11 · arxiv_oai · confidence 0.70 Hung-yi Lee
  • 2606.07494 #6 · arxiv_oai · confidence 0.70 Hung-yi Lee
  • 1511.02506 #2 · backfill · confidence 0.70 Hung-yi Lee
  • 1506.02327 #7 · backfill · confidence 0.70 Hung-yi Lee
  • 1506.01192 #2 · backfill · confidence 0.70 Hung-yi Lee
  • 1506.01163 #2 · backfill · confidence 0.70 Hung-yi Lee
  • 2601.06329 #7 · arxiv_oai · confidence 0.70 Hung-yi Lee
  • 2605.21538 #4 · arxiv_oai · confidence 0.70 Hung-yi Lee

Frequent Coauthors