Dahua Lin
Identifiers
- name variant Dahua Lin 0.60 · backfill
Papers (70)
- SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture cs.CV · 2026 · author #58
- WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation cs.CL · 2026 · author #15
- ResiHP: Taming LLM Training Failures with Dynamic Hybrid Parallelism cs.DC · 2026 · author #7
- OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis cs.AI · 2026 · author #14
- Tracing the Roots: A Multi-Agent Framework for Uncovering Data Lineage in Post-Training LLMs cs.AI · 2026 · author #12
- MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale cs.CV · 2026 · author #42
- Visual-ERM: Reward Modeling for Visual Equivalence cs.CV · 2026 · author #9
- Robo3R: Enhancing Robotic Manipulation with Accurate Feed-Forward 3D Reconstruction cs.RO · 2026 · author #6
- EAG-PT: Emission-Aware Gaussians and Path Tracing for Diffuse Indoor Scene Reconstruction and Editing cs.GR · 2026 · author #7
- MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing cs.CV · 2025 · author #59
- InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency cs.CV · 2025 · author #68
- InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models cs.CV · 2025 · author #48
- Visual-RFT: Visual Reinforcement Fine-Tuning cs.CV · 2025 · author #7
- Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling cs.CV · 2024 · author #39
- PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction cs.CV · 2024 · author #11
- MinerU: An Open-Source Solution for Precise Document Content Extraction cs.CV · 2024 · author #17
- InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output cs.CV · 2024 · author #26
- How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites cs.CV · 2024 · author #32
- Are We on the Right Way for Evaluating Large Vision-Language Models? cs.CV · 2024 · author #10
- InternLM2 Technical Report cs.CL · 2024 · author #100
- InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model cs.CV · 2024 · author #22
- ShareGPT4V: Improving Large Multi-Modal Models with Better Captions cs.CV · 2023 · author #8
- InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition cs.CV · 2023 · author #20
- MMBench: Is Your Multi-modal Model an All-around Player? cs.CV · 2023 · author #12
- AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning cs.CV · 2023 · author #8
- MMDetection: Open MMLab Detection Toolbox and Benchmark cs.CV · 2019 · author #25
- POPQORN: Quantifying Robustness of Recurrent Neural Networks cs.LG · 2019 · author #6
- Learning to Cluster Faces on an Affinity Graph cs.CV · 2019 · author #6
- Libra R-CNN: Towards Balanced Learning for Object Detection cs.CV · 2019 · author #6
- Self-Supervised Learning via Conditional Motion Propagation cs.CV · 2019 · author #4
- WIDER Face and Pedestrian Challenge 2018: Methods and Results cs.CV · 2019 · author #2
- Hybrid Task Cascade for Instance Segmentation cs.CV · 2019 · author #12
- Region Proposal by Guided Anchoring cs.CV · 2019 · author #5
- Monocular 3D Pose Recovery via Nonconvex Sparsity with Theoretical Analysis cs.CV · 2018 · author #2
- A Neural Compositional Paradigm for Image Captioning cs.CV · 2018 · author #3
- Improving On-policy Learning with Statistical Reward Accumulation cs.LG · 2018 · author #3
- Consensus-Driven Propagation in Massive Unlabeled Data for Face Recognition cs.CV · 2018 · author #4
- Penalizing Top Performers: Conservative Loss for Semantic Segmentation Adaptation cs.CV · 2018 · author #5
- Generative Adversarial Frontal View to Bird View Synthesis cs.CV · 2018 · author #5
- Pose Guided Human Video Generation cs.CV · 2018 · author #6
- Person Search in Videos with One Portrait Through Visual and Temporal Links cs.CV · 2018 · author #3
- Move Forward and Tell: A Progressive Generator of Video Descriptions cs.CV · 2018 · author #3
- Rethinking the Form of Latent States in Image Captioning cs.CV · 2018 · author #3
- Probabilistic Ensemble of Collaborative Filters cs.IR · 2018 · author #2
- From Trailers to Storylines: An Efficient Way to Learn from Movies cs.CV · 2018 · author #5
- Unifying Identification and Context Learning for Person Recognition cs.CV · 2018 · author #3
- Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination cs.CV · 2018 · author #4
- Optimizing Video Object Detection via a Scale-Time Lattice cs.CV · 2018 · author #7
- Low-Latency Video Semantic Segmentation cs.CV · 2018 · author #3
- Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition cs.CV · 2018 · author #3
- Accelerated Training for Massive Classification via Dynamic Class Selection cs.CV · 2018 · author #4
- Peephole: Predicting Network Performance Before Training cs.LG · 2017 · author #3
- Learning Sparse Visual Representations with Leaky Capped Norm Regularizers cs.LG · 2017 · author #2
- Be Your Own Prada: Fashion Synthesis with Structural Coherence cs.CV · 2017 · author #4
- Contrastive Learning for Image Captioning cs.CV · 2017 · author #2
- Scalable Estimation of Dirichlet Process Mixture Models on Distributed Data stat.ML · 2017 · author #2
- Integrating Specialized Classifiers Based on Continuous Time Markov Chain cs.LG · 2017 · author #2
- Discover and Learn New Objects from Documentaries cs.CV · 2017 · author #4
- Temporal Segment Networks for Action Recognition in Videos cs.CV · 2017 · author #5
- Temporal Action Detection with Structured Segment Networks cs.CV · 2017 · author #6
- Detecting Visual Relationships with Deep Relational Networks cs.CV · 2017 · author #3
- Towards Diverse and Natural Image Descriptions via a Conditional GAN cs.CV · 2017 · author #4
- UntrimmedNets for Weakly Supervised Action Recognition and Detection cs.CV · 2017 · author #3
- A Pursuit of Temporal Accuracy in General Activity Detection cs.CV · 2017 · author #4
- PolyNet: A Pursuit of Structural Diversity in Very Deep Networks cs.CV · 2016 · author #4
- Deep Markov Random Field for Image Modeling cs.CV · 2016 · author #2
- Temporal Segment Networks: Towards Good Practices for Deep Action Recognition cs.CV · 2016 · author #5
- CUHK & ETHZ & SIAT Submission to ActivityNet Challenge 2016 cs.CV · 2016 · author #7
- Adjustable Bounded Rectifiers: Towards Deep Binary Representations cs.LG · 2015 · author #2
- Generating Multi-Sentence Lingual Descriptions of Indoor Scenes cs.CV · 2015 · author #1
Mentions
- 2309.15112 #20 · arxiv_oai · confidence 0.70 Dahua Lin
- 2509.22186 #59 · arxiv_oai · confidence 0.70 Dahua Lin
- 2407.03320 #26 · arxiv_oai · confidence 0.70 Dahua Lin
- 2401.16420 #22 · arxiv_oai · confidence 0.70 Dahua Lin
- 2409.18839 #17 · arxiv_oai · confidence 0.70 Dahua Lin
Frequent Coauthors
- Kai Chen 18 shared papers
- Jiaqi Wang 17 shared papers
- Yu Qiao 16 shared papers
- Conghui He 15 shared papers
- Chen Change Loy 12 shared papers
- Yuanjun Xiong 11 shared papers
- Xiaoyi Dong 10 shared papers
- Haodong Duan 9 shared papers
- Limin Wang 9 shared papers
- Wei Li 9 shared papers
- Xingcheng Zhang 9 shared papers
- Yuhang Zang 9 shared papers
- Bin Wang 8 shared papers
- Bo Dai 8 shared papers
- Xiaoou Tang 8 shared papers
- Jianping Shi 7 shared papers
- Linke Ouyang 7 shared papers
- Pan Zhang 7 shared papers
- Chao Xu 6 shared papers
- Hang Yan 6 shared papers