pith. sign in

Haizhou Li

Identifiers

  • name variant Haizhou Li 0.60 · backfill

Papers (29)

  1. What Happens Before Decoding? Prefill Determines GUI Grounding in VLMs cs.CV · 2026 · author #7
  2. Bridging What the Model Thinks and How It Speaks: Self-Aware Speech Language Models for Expressive Speech Generation cs.CL · 2026 · author #10
  3. Ti-Audio: The First Multi-Dialectal End-to-End Speech LLM for Tibetan cs.SD · 2026 · author #8
  4. AffectSpeech: A Large-Scale Emotional Speech Dataset with Fine-Grained Textual Descriptions for Speech Emotion Captioning and Synthesis eess.AS · 2026 · author #5
  5. PhiNet: Speaker Verification with Phonetic Interpretability eess.AS · 2026 · author #4
  6. Neural Architecture Search of Time-to-First-Spike-Coded Spiking Neural Networks for Efficient Eye-based Emotion Recognition cs.NE · 2025 · author #6
  7. RAGCap-Bench: Benchmarking Capabilities of LLMs in Agentic Retrieval Augmented Generation Systems cs.CL · 2025 · author #4
  8. S2S-Arena: Evaluating Paralinguistic Instruction Following in Speech-to-Speech Models cs.CL · 2025 · author #9
  9. VoiceBench: Benchmarking LLM-Based Voice Assistants cs.CL · 2024 · author #6
  10. Acoustic Modeling for Automatic Lyrics-to-Audio Alignment eess.AS · 2019 · author #3
  11. Code-Switching Detection Using ASR-Generated Language Posteriors cs.CL · 2019 · author #4
  12. Large-Scale Speaker Diarization of Radio Broadcast Archives cs.CL · 2019 · author #6
  13. Multi-Graph Decoding for Code-Switching ASR cs.CL · 2019 · author #5
  14. VQVAE Unsupervised Unit Discovery and Multi-scale Code2Spec Inverter for Zerospeech Challenge 2019 cs.CL · 2019 · author #5
  15. I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences eess.AS · 2019 · author #12
  16. Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet eess.AS · 2019 · author #4
  17. Optimization of Speaker Extraction Neural Network with Magnitude and Temporal Spectrum Approximation Loss eess.AS · 2019 · author #4
  18. Deep Spiking Neural Network with Spike Count based Learning Rule cs.NE · 2019 · author #6
  19. Target Speaker Extraction for Overlapped Multi-Talker Speaker Verification eess.AS · 2019 · author #4
  20. On the End-to-End Solution to Mandarin-English Code-switching Speech Recognition cs.CL · 2018 · author #6
  21. Error Reduction Network for DBLSTM-based Voice Conversion eess.AS · 2018 · author #4
  22. Generative x-vectors for text-independent speaker verification eess.AS · 2018 · author #5
  23. Learning Acoustic Word Embeddings with Temporal Context for Query-by-Example Speech Search cs.CL · 2018 · author #6
  24. A Multi-State Diagnosis and Prognosis Framework with Feature Learning for Tool Condition Monitoring eess.SP · 2018 · author #5
  25. A Cost-Sensitive Deep Belief Network for Imbalanced Classification cs.LG · 2018 · author #3
  26. Statistical Parametric Speech Synthesis Using Generative Adversarial Networks Under A Multi-task Learning Framework cs.SD · 2017 · author #7
  27. Noise Robust Speech Recognition Using Multi-Channel Based Channel Selection And ChannelWeighting cs.SD · 2016 · author #5
  28. Spoofing detection under noisy conditions: a preliminary investigation and an initial database cs.LG · 2016 · author #5
  29. Fantastic 4 system for NIST 2015 Language Recognition Evaluation cs.CL · 2016 · author #16

Mentions

  • 2510.13910 #4 · arxiv_oai · confidence 0.70 Haizhou Li
  • 2410.17196 #6 · arxiv_oai · confidence 0.70 Haizhou Li

Frequent Coauthors