pith. sign in

Yuexian Zou

Identifiers

  • name variant Yuexian Zou 0.60 · backfill

Papers (104)

  1. Graph-PiT: Enhancing Structural Coherence in Part-Based Image Synthesis via Graph Priors cs.CV · 2026 · author #5
  2. Image Conductor: Precision Control for Interactive Video Synthesis cs.CV · 2024 · author #7
  3. Towards Spoken Language Understanding via Multi-level Multi-grained Contrastive Learning cs.CL · 2024 · author #5
  4. VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding cs.CV · 2024 · author #10
  5. VisionGPT: Vision-Language Understanding Agent Using Generalized Multimodal Framework cs.CV · 2024 · author #10
  6. WorldGPT: A Sora-Inspired Video AI Agent as Rich World Models from Text and Image Inputs cs.CV · 2024 · author #8
  7. Retrieval is Accurate Generation cs.CL · 2024 · author #6
  8. Embracing Language Inclusivity and Diversity in CLIP through Continual Language Learning cs.CV · 2024 · author #6
  9. AFL-Net: Integrating Audio, Facial, and Lip Modalities with a Two-step Cross-attention for Robust Speaker Diarization in the Wild cs.MM · 2023 · author #4
  10. ML-LMCL: Mutual Learning and Large-Margin Contrastive Learning for Improving ASR Robustness in Spoken Language Understanding cs.CL · 2023 · author #6
  11. UnifiedVisionGPT: Streamlining Vision-Oriented AI through Generalized Multimodal Framework cs.CV · 2023 · author #9
  12. Video Referring Expression Comprehension via Transformer with Content-conditioned Query cs.CV · 2023 · author #6
  13. NADiffuSE: Noise-aware Diffusion-based Model for Speech Enhancement cs.SD · 2023 · author #5
  14. MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning cs.CV · 2023 · author #6
  15. G2L: Semantically Aligned and Uniform Video Grounding via Geodesic and Game Theory cs.CV · 2023 · author #6
  16. Multimodal Prompt Learning for Product Title Generation with Extremely Limited Labels cs.CV · 2023 · author #7
  17. Customizing General-Purpose Foundation Models for Medical Report Generation cs.CV · 2023 · author #3
  18. HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec cs.SD · 2023 · author #6
  19. WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research eess.AS · 2023 · author #8
  20. TLAG: An Informative Trigger and Label-Aware Knowledge Guided Model for Dialogue-based Relation Extraction cs.CL · 2023 · author #5
  21. Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation cs.CV · 2023 · author #6
  22. PoseRAC: Pose Saliency Transformer for Repetitive Action Counting cs.CV · 2023 · author #3
  23. Improve Retrieval-based Dialogue System via Syntax-Informed Attention cs.AI · 2023 · author #5
  24. ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation cs.CL · 2023 · author #3
  25. Improving Text-Audio Retrieval by Text-aware Attention Pooling and Prior Matrix Revised Loss cs.SD · 2023 · author #3
  26. Improving Weakly Supervised Sound Event Detection with Causal Intervention cs.SD · 2023 · author #5
  27. FTM: A Frame-level Timeline Modeling Method for Temporal Graph Representation Learning cs.LG · 2023 · author #4
  28. FiTs: Fine-grained Two-stage Training for Knowledge-aware Question Answering cs.CL · 2023 · author #5
  29. SSVMR: Saliency-based Self-training for Video-Music Retrieval cs.MM · 2023 · author #5
  30. Exploiting Auxiliary Caption for Video Grounding cs.CV · 2023 · author #6
  31. Towards Unified All-Neural Beamforming for Time and Frequency Domain Speech Separation cs.SD · 2022 · author #3
  32. M3ST: Mix at Three Levels for Speech Translation cs.CL · 2022 · author #6
  33. Aligning Source Visual and Target Language Domains for Unpaired Video Captioning cs.CV · 2022 · author #5
  34. A Dynamic Graph Interactive Framework with Label-Semantic Injection for Spoken Language Understanding cs.CL · 2022 · author #5
  35. NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS cs.SD · 2022 · author #6
  36. DiMBERT: Learning Vision-Language Grounded Representations with Disentangled Multimodal-Attention cs.CV · 2022 · author #7
  37. Prophet Attention: Predicting Attention with Future Attention for Image Captioning cs.CV · 2022 · author #5
  38. Video Referring Expression Comprehension via Transformer with Content-aware Query cs.CV · 2022 · author #4
  39. Correspondence Matters for Video Referring Expression Comprehension cs.CV · 2022 · author #4
  40. LocVTP: Video-Text Pre-training for Temporal Localization cs.CV · 2022 · author #6
  41. Diffsound: Discrete Diffusion Model for Text-to-sound Generation cs.SD · 2022 · author #6
  42. Competence-based Multimodal Curriculum Learning for Medical Report Generation cs.CL · 2022 · author #3
  43. LAE: Language-Aware Encoder for Monolingual and Multilingual ASR cs.CL · 2022 · author #5
  44. Improving Dual-Microphone Speech Enhancement by Learning Cross-Channel Features with Multi-Head Attention eess.AS · 2022 · author #3
  45. End-to-end Spoken Conversational Question Answering: Task, Dataset and Model cs.CL · 2022 · author #6
  46. Speaker-Aware Mixture of Mixtures Training for Weakly Supervised Speaker Extraction eess.AS · 2022 · author #5
  47. RaDur: A Reference-aware and Duration-robust Network for Target Sound Detection cs.SD · 2022 · author #4
  48. A Mixed supervised Learning Framework for Target Sound Detection cs.SD · 2022 · author #3
  49. Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches eess.AS · 2022 · author #5
  50. Improving Target Sound Extraction with Timestamp Information cs.SD · 2022 · author #5
  51. Learning Decoupling Features Through Orthogonality Regularization cs.SD · 2022 · author #6
  52. SpatioTemporal Focus for Skeleton-based Action Recognition cs.CV · 2022 · author #3
  53. Integrating Lattice-Free MMI into End-to-End Speech Recognition cs.CL · 2022 · author #4
  54. Unsupervised Pre-training for Temporal Action Localization Tasks cs.CV · 2022 · author #6
  55. Improving Mandarin End-to-End Speech Recognition with Word N-gram Language Model cs.CL · 2022 · author #4
  56. Detect what you want: Target Sound Detection cs.SD · 2021 · author #3
  57. Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMI cs.AI · 2021 · author #7
  58. CLIP Meets Video Captioning: Concept-Aware Representation Learning Does Matter cs.CV · 2021 · author #3
  59. Improving the Performance of Automated Audio Captioning via Integrating the Acoustic and Semantic Information cs.SD · 2021 · author #4
  60. A Mutual learning framework for Few-shot Sound Event Detection cs.SD · 2021 · author #3
  61. Towards Joint Intent Detection and Slot Filling via Higher-order Attention cs.CL · 2021 · author #5
  62. On Pursuit of Designing Multi-modal Transformer for Video Grounding cs.CV · 2021 · author #5
  63. Self-supervised Contrastive Cross-Modality Representation Learning for Spoken Question Answering cs.CL · 2021 · author #3
  64. HAN: Higher-order Attention Network for Spoken Language Understanding cs.CL · 2021 · author #3
  65. Fully Non-Homogeneous Atmospheric Scattering Modeling with Convolutional Neural Networks for Single Image Dehazing cs.CV · 2021 · author #3
  66. Joint Multiple Intent Detection and Slot Filling via Self-distillation cs.CL · 2021 · author #3
  67. Deep Motion Prior for Weakly-Supervised Temporal Action Localization cs.CV · 2021 · author #5
  68. Text Anchor Based Metric Learning for Small-footprint Keyword Spotting cs.SD · 2021 · author #4
  69. O2NA: An Object-Oriented Non-Autoregressive Approach for Controllable Video Captioning cs.CL · 2021 · author #6
  70. Audio-Oriented Multimodal Machine Comprehension: Task, Dataset and Model cs.CL · 2021 · author #7
  71. Long-Short Temporal Modeling for Efficient Action Recognition cs.CV · 2021 · author #2
  72. SRF-Net: Selective Receptive Field Network for Anchor-Free Temporal Action Detection cs.CV · 2021 · author #3
  73. All You Need is a Second Look: Towards Arbitrary-Shaped Text Detection cs.CV · 2021 · author #4
  74. Exploring Semantic Relationships for Unpaired Image Captioning cs.CV · 2021 · author #4
  75. Exploring and Distilling Posterior and Prior Knowledge for Radiology Report Generation cs.CV · 2021 · author #5
  76. Self-supervised Dialogue Learning for Spoken Conversational Question Answering cs.CL · 2021 · author #3
  77. Unsupervised Multi-Target Domain Adaptation for Acoustic Scene Classification cs.SD · 2021 · author #3
  78. Rethinking Skip Connection with Layer Normalization in Transformers and ResNets cs.LG · 2021 · author #5
  79. RR-Net: Injecting Interactive Semantics in Human-Object Interaction Detection cs.CV · 2021 · author #2
  80. Complex Neural Spatial Filter: Enhancing Multi-channel Target Speech Separation in Complex Domain cs.SD · 2021 · author #3
  81. Layer Reduction: Accelerating Conformer-Based Self-Supervised Model via Layer Consistency cs.CL · 2021 · author #4
  82. SpecAugment++: A Hidden Space Data Augmentation Method for Acoustic Scene Classification eess.AS · 2021 · author #2
  83. CoLA: Weakly-Supervised Temporal Action Localization with Snippet Contrastive Learning cs.CV · 2021 · author #5
  84. A Global-local Attention Framework for Weakly Labelled Audio Tagging eess.AS · 2021 · author #2
  85. FWB-Net:Front White Balance Network for Color Shift Correction in Single Image Dehazing via Atmospheric Light Estimation cs.CV · 2021 · author #3
  86. Adaptive Bi-directional Attention: Exploring Multi-Granularity Representations for Machine Reading Comprehension cs.CL · 2020 · author #5
  87. Knowledge Distillation for Improved Accuracy in Spoken Question Answering cs.CL · 2020 · author #3
  88. Contextualized Attention-based Knowledge Transfer for Spoken Conversational Question Answering cs.CL · 2020 · author #3
  89. Towards Data Distillation for End-to-end Spoken Conversational Question Answering cs.CL · 2020 · author #5
  90. PIN: A Novel Parallel Interactive Network for Spoken Language Understanding cs.CL · 2020 · author #4
  91. PAN: Towards Fast Action Recognition via Learning Persistence of Appearance cs.CV · 2020 · author #2
  92. A Graph-based Interactive Reasoning for Human-Object Interaction Detection cs.CV · 2020 · author #2
  93. Acoustic Scene Classification with Spectrogram Processing Strategies cs.SD · 2020 · author #2
  94. All you need is a second look: Towards Tighter Arbitrary shape text detection cs.CV · 2020 · author #2
  95. Multi-modal Multi-channel Target Speech Separation eess.AS · 2020 · author #5
  96. GID-Net: Detecting Human-Object Interaction with Global and Instance Dependency cs.CV · 2020 · author #2 as printed: YueXian Zou
  97. Enhancing End-to-End Multi-channel Speech Separation via Spatial Feature Learning eess.AS · 2020 · author #7
  98. Temporal-Spatial Neural Filter: Direction Informed End-to-End Multi-channel Target Speech Separation cs.SD · 2020 · author #2
  99. Environmental Sound Classification with Parallel Temporal-spectral Attention cs.SD · 2019 · author #2
  100. Non-Autoregressive Coarse-to-Fine Video Captioning cs.CV · 2019 · author #2
  101. C-RPNs: Promoting Object Detection in real world via a Cascade Structure of Region Proposal Networks cs.CV · 2019 · author #2 as printed: YueXian Zou
  102. End-to-End Multi-Channel Speech Separation cs.SD · 2019 · author #8
  103. KCRC-LCD: Discriminative Kernel Collaborative Representation with Locality Constrained Dictionary for Visual Categorization cs.CV · 2014 · author #6
  104. Comparison of Spearman's rho and Kendall's tau in Normal and Contaminated Normal Models cs.IT · 2010 · author #4

Mentions

  • 2303.17395 #8 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2406.15339 #7 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2303.06458 #3 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2405.20852 #5 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2301.05997 #6 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2403.09530 #10 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2402.17532 #6 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2403.09027 #10 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2312.05730 #4 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2403.07944 #8 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2401.17186 #6 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2307.14277 #6 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2311.11375 #6 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2311.10125 #9 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2310.16402 #6 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2309.01212 #5 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2308.13218 #6 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2303.15932 #6 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2307.01969 #7 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2306.05642 #3 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2305.02765 #6 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2207.09983 #6 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2210.10914 #5 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2206.14579 #3 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2303.05681 #3 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2303.17119 #5 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2303.08450 #3 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2302.11814 #4 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2302.11799 #5 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2303.06605 #5 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2303.05678 #5 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2302.09328 #5 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2212.08348 #3 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2212.03657 #6 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2211.12148 #5 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2211.04023 #5 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2211.02448 #6 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2210.16431 #7 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2210.02953 #4 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2108.05607 #5 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2203.15614 #4 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2111.15162 #3 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2207.10400 #4 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2207.10362 #6 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2204.02088 #3 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2112.10153 #3 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2206.02093 #5 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2110.04474 #3 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2205.01280 #3 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2204.14272 #6 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2204.07375 #5 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2201.01995 #4 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2109.06085 #5 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2204.02143 #4 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2204.01355 #5 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2204.00821 #5 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2203.16772 #6 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2203.16767 #3 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2203.13609 #6 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2108.02359 #6 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2112.02498 #7 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2110.06100 #4 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2109.08890 #5 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2109.03381 #3 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2104.12359 #3 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2108.11916 #3 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2108.11292 #3 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2108.08042 #3 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2106.10658 #4 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2108.05516 #4 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2103.16392 #5 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2107.01571 #7 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2106.15787 #2 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2106.15258 #3 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2106.06963 #5 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2106.12720 #4 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2106.02182 #3 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2010.11066 #3 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2103.16858 #2 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2105.10340 #3 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2105.07205 #5 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2105.00812 #4 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2104.15015 #2 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2010.11067 #3 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 1911.12018 #2 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2102.01931 #2 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2012.10877 #5 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2101.08465 #3 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2003.07032 #5 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2010.08923 #5 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2009.13431 #4 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2008.03462 #2 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2007.06925 #2 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2007.03781 #2 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 1912.06808 #2 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2004.12436 #2 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2003.03927 #7 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 2003.05242 #2 · arxiv_oai · confidence 0.70 YueXian Zou
  • 2001.00391 #2 · arxiv_oai · confidence 0.70 Yuexian Zou
  • 1908.06665 #2 · arxiv_oai · confidence 0.70 YueXian Zou

Frequent Coauthors