pith. machine review for the scientific record. sign in

Dahua Lin

Identifiers

  • name variant Dahua Lin 0.60 · backfill

Papers (70)

  1. SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture cs.CV · 2026 · author #58
  2. WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation cs.CL · 2026 · author #15
  3. ResiHP: Taming LLM Training Failures with Dynamic Hybrid Parallelism cs.DC · 2026 · author #7
  4. OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis cs.AI · 2026 · author #14
  5. Tracing the Roots: A Multi-Agent Framework for Uncovering Data Lineage in Post-Training LLMs cs.AI · 2026 · author #12
  6. MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale cs.CV · 2026 · author #42
  7. Visual-ERM: Reward Modeling for Visual Equivalence cs.CV · 2026 · author #9
  8. Robo3R: Enhancing Robotic Manipulation with Accurate Feed-Forward 3D Reconstruction cs.RO · 2026 · author #6
  9. EAG-PT: Emission-Aware Gaussians and Path Tracing for Diffuse Indoor Scene Reconstruction and Editing cs.GR · 2026 · author #7
  10. MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing cs.CV · 2025 · author #59
  11. InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency cs.CV · 2025 · author #68
  12. InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models cs.CV · 2025 · author #48
  13. Visual-RFT: Visual Reinforcement Fine-Tuning cs.CV · 2025 · author #7
  14. Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling cs.CV · 2024 · author #39
  15. PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction cs.CV · 2024 · author #11
  16. MinerU: An Open-Source Solution for Precise Document Content Extraction cs.CV · 2024 · author #17
  17. InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output cs.CV · 2024 · author #26
  18. How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites cs.CV · 2024 · author #32
  19. Are We on the Right Way for Evaluating Large Vision-Language Models? cs.CV · 2024 · author #10
  20. InternLM2 Technical Report cs.CL · 2024 · author #100
  21. InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model cs.CV · 2024 · author #22
  22. ShareGPT4V: Improving Large Multi-Modal Models with Better Captions cs.CV · 2023 · author #8
  23. InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition cs.CV · 2023 · author #20
  24. MMBench: Is Your Multi-modal Model an All-around Player? cs.CV · 2023 · author #12
  25. AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning cs.CV · 2023 · author #8
  26. MMDetection: Open MMLab Detection Toolbox and Benchmark cs.CV · 2019 · author #25
  27. POPQORN: Quantifying Robustness of Recurrent Neural Networks cs.LG · 2019 · author #6
  28. Learning to Cluster Faces on an Affinity Graph cs.CV · 2019 · author #6
  29. Libra R-CNN: Towards Balanced Learning for Object Detection cs.CV · 2019 · author #6
  30. Self-Supervised Learning via Conditional Motion Propagation cs.CV · 2019 · author #4
  31. WIDER Face and Pedestrian Challenge 2018: Methods and Results cs.CV · 2019 · author #2
  32. Hybrid Task Cascade for Instance Segmentation cs.CV · 2019 · author #12
  33. Region Proposal by Guided Anchoring cs.CV · 2019 · author #5
  34. Monocular 3D Pose Recovery via Nonconvex Sparsity with Theoretical Analysis cs.CV · 2018 · author #2
  35. A Neural Compositional Paradigm for Image Captioning cs.CV · 2018 · author #3
  36. Improving On-policy Learning with Statistical Reward Accumulation cs.LG · 2018 · author #3
  37. Consensus-Driven Propagation in Massive Unlabeled Data for Face Recognition cs.CV · 2018 · author #4
  38. Penalizing Top Performers: Conservative Loss for Semantic Segmentation Adaptation cs.CV · 2018 · author #5
  39. Generative Adversarial Frontal View to Bird View Synthesis cs.CV · 2018 · author #5
  40. Pose Guided Human Video Generation cs.CV · 2018 · author #6
  41. Person Search in Videos with One Portrait Through Visual and Temporal Links cs.CV · 2018 · author #3
  42. Move Forward and Tell: A Progressive Generator of Video Descriptions cs.CV · 2018 · author #3
  43. Rethinking the Form of Latent States in Image Captioning cs.CV · 2018 · author #3
  44. Probabilistic Ensemble of Collaborative Filters cs.IR · 2018 · author #2
  45. From Trailers to Storylines: An Efficient Way to Learn from Movies cs.CV · 2018 · author #5
  46. Unifying Identification and Context Learning for Person Recognition cs.CV · 2018 · author #3
  47. Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination cs.CV · 2018 · author #4
  48. Optimizing Video Object Detection via a Scale-Time Lattice cs.CV · 2018 · author #7
  49. Low-Latency Video Semantic Segmentation cs.CV · 2018 · author #3
  50. Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition cs.CV · 2018 · author #3
  51. Accelerated Training for Massive Classification via Dynamic Class Selection cs.CV · 2018 · author #4
  52. Peephole: Predicting Network Performance Before Training cs.LG · 2017 · author #3
  53. Learning Sparse Visual Representations with Leaky Capped Norm Regularizers cs.LG · 2017 · author #2
  54. Be Your Own Prada: Fashion Synthesis with Structural Coherence cs.CV · 2017 · author #4
  55. Contrastive Learning for Image Captioning cs.CV · 2017 · author #2
  56. Scalable Estimation of Dirichlet Process Mixture Models on Distributed Data stat.ML · 2017 · author #2
  57. Integrating Specialized Classifiers Based on Continuous Time Markov Chain cs.LG · 2017 · author #2
  58. Discover and Learn New Objects from Documentaries cs.CV · 2017 · author #4
  59. Temporal Segment Networks for Action Recognition in Videos cs.CV · 2017 · author #5
  60. Temporal Action Detection with Structured Segment Networks cs.CV · 2017 · author #6
  61. Detecting Visual Relationships with Deep Relational Networks cs.CV · 2017 · author #3
  62. Towards Diverse and Natural Image Descriptions via a Conditional GAN cs.CV · 2017 · author #4
  63. UntrimmedNets for Weakly Supervised Action Recognition and Detection cs.CV · 2017 · author #3
  64. A Pursuit of Temporal Accuracy in General Activity Detection cs.CV · 2017 · author #4
  65. PolyNet: A Pursuit of Structural Diversity in Very Deep Networks cs.CV · 2016 · author #4
  66. Deep Markov Random Field for Image Modeling cs.CV · 2016 · author #2
  67. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition cs.CV · 2016 · author #5
  68. CUHK & ETHZ & SIAT Submission to ActivityNet Challenge 2016 cs.CV · 2016 · author #7
  69. Adjustable Bounded Rectifiers: Towards Deep Binary Representations cs.LG · 2015 · author #2
  70. Generating Multi-Sentence Lingual Descriptions of Indoor Scenes cs.CV · 2015 · author #1

Mentions

  • 2309.15112 #20 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2509.22186 #59 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2407.03320 #26 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2401.16420 #22 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2409.18839 #17 · arxiv_oai · confidence 0.70 Dahua Lin

Frequent Coauthors