pith. sign in

Dahua Lin

Identifiers

  • name variant Dahua Lin 0.60 · backfill

Papers (91)

  1. Scaling the Horizon, Not the Parameters: Reaching Trillion-Parameter Performance with a 35B Agent cs.CL · 2026 · author #18
  2. MemoryWAM: Efficient World Action Modeling with Persistent Memory cs.RO · 2026 · author #9
  3. Beyond the Current Observation: Evaluating Multimodal Large Language Models in Controllable Non-Markov Games cs.CV · 2026 · author #5
  4. SpecGen: Accelerating Agentic Kernel Optimization with Speculative Generation cs.DC · 2026 · author #7
  5. PermaVid: Consistent Video Generation Across Edits via Disentangled Context Memory cs.CV · 2026 · author #5
  6. CapRL++: Unified Reinforcement Learning with Verifiable Rewards for Dense Image and Video Captioning cs.CV · 2026 · author #13
  7. AdaGRPO: A Capability-Aware Adaptive Enhancement for Flow-based GRPO cs.CV · 2026 · author #11
  8. ThoughtFold: Folding Reasoning Chains via Introspective Preference Learning cs.AI · 2026 · author #8
  9. AMix-2: Establishing Protein as a Native Modality in Large Language Models q-bio.BM · 2026 · author #20
  10. SGMD: Score Gradient Matching Distillation for Few-Step Video Diffusion Distillation cs.CV · 2026 · author #7
  11. From Pixels to Words -- Towards Native One-Vision Models at Scale cs.CV · 2026 · author #20
  12. ETCHR: Editing To Clarify and Harness Reasoning cs.CV · 2026 · author #6
  13. NanoCP: Request-Level Dynamic Context Parallelism for Data-Expert Parallel Decoding cs.DC · 2026 · author #12
  14. Beyond Mode Collapse: Distribution Matching for Diverse Reasoning cs.AI · 2026 · author #10
  15. What and When to Distill: Selective Hindsight Distillation for Multi-Turn Agents cs.AI · 2026 · author #8
  16. SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture cs.CV · 2026 · author #58
  17. WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation cs.CL · 2026 · author #15
  18. ResiHP: Taming LLM Training Failures with Dynamic Hybrid Parallelism cs.DC · 2026 · author #7
  19. OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis cs.AI · 2026 · author #14
  20. Tracing the Roots: A Multi-Agent Framework for Uncovering Data Lineage in Post-Training LLMs cs.AI · 2026 · author #12
  21. MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale cs.CV · 2026 · author #42
  22. Demystifying Video Reasoning cs.CV · 2026 · author #12
  23. Visual-ERM: Reward Modeling for Visual Equivalence cs.CV · 2026 · author #9
  24. Robo3R: Enhancing Robotic Manipulation with Accurate Feed-Forward 3D Reconstruction cs.RO · 2026 · author #6
  25. EAG-PT: Emission-Aware Gaussians and Path Tracing for Diffuse Indoor Scene Reconstruction and Editing cs.GR · 2026 · author #7
  26. End-to-End Training for Autoregressive Video Diffusion via Self-Resampling cs.CV · 2025 · author #8
  27. MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing cs.CV · 2025 · author #59
  28. InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency cs.CV · 2025 · author #68
  29. InternBootcamp Technical Report: Boosting LLM Reasoning with Verifiable Task Scaling cs.CL · 2025 · author #15
  30. MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence cs.CV · 2025 · author #11
  31. InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models cs.CV · 2025 · author #48
  32. Visual-RFT: Visual Reinforcement Fine-Tuning cs.CV · 2025 · author #7
  33. Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation cs.RO · 2024 · author #5
  34. Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling cs.CV · 2024 · author #39
  35. PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction cs.CV · 2024 · author #11
  36. MinerU: An Open-Source Solution for Precise Document Content Extraction cs.CV · 2024 · author #17
  37. InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output cs.CV · 2024 · author #26
  38. How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites cs.CV · 2024 · author #32
  39. Are We on the Right Way for Evaluating Large Vision-Language Models? cs.CV · 2024 · author #10
  40. InternLM2 Technical Report cs.CL · 2024 · author #100
  41. RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition cs.CV · 2024 · author #8
  42. InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model cs.CV · 2024 · author #22
  43. ShareGPT4V: Improving Large Multi-Modal Models with Better Captions cs.CV · 2023 · author #8
  44. InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition cs.CV · 2023 · author #20
  45. MMBench: Is Your Multi-modal Model an All-around Player? cs.CV · 2023 · author #12
  46. AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning cs.CV · 2023 · author #8
  47. MMDetection: Open MMLab Detection Toolbox and Benchmark cs.CV · 2019 · author #25
  48. POPQORN: Quantifying Robustness of Recurrent Neural Networks cs.LG · 2019 · author #6
  49. Learning to Cluster Faces on an Affinity Graph cs.CV · 2019 · author #6
  50. Libra R-CNN: Towards Balanced Learning for Object Detection cs.CV · 2019 · author #6
  51. Self-Supervised Learning via Conditional Motion Propagation cs.CV · 2019 · author #4
  52. WIDER Face and Pedestrian Challenge 2018: Methods and Results cs.CV · 2019 · author #2
  53. Hybrid Task Cascade for Instance Segmentation cs.CV · 2019 · author #12
  54. Region Proposal by Guided Anchoring cs.CV · 2019 · author #5
  55. Monocular 3D Pose Recovery via Nonconvex Sparsity with Theoretical Analysis cs.CV · 2018 · author #2
  56. A Neural Compositional Paradigm for Image Captioning cs.CV · 2018 · author #3
  57. Improving On-policy Learning with Statistical Reward Accumulation cs.LG · 2018 · author #3
  58. Consensus-Driven Propagation in Massive Unlabeled Data for Face Recognition cs.CV · 2018 · author #4
  59. Penalizing Top Performers: Conservative Loss for Semantic Segmentation Adaptation cs.CV · 2018 · author #5
  60. Generative Adversarial Frontal View to Bird View Synthesis cs.CV · 2018 · author #5
  61. Pose Guided Human Video Generation cs.CV · 2018 · author #6
  62. Person Search in Videos with One Portrait Through Visual and Temporal Links cs.CV · 2018 · author #3
  63. Move Forward and Tell: A Progressive Generator of Video Descriptions cs.CV · 2018 · author #3
  64. Rethinking the Form of Latent States in Image Captioning cs.CV · 2018 · author #3
  65. Probabilistic Ensemble of Collaborative Filters cs.IR · 2018 · author #2
  66. From Trailers to Storylines: An Efficient Way to Learn from Movies cs.CV · 2018 · author #5
  67. Unifying Identification and Context Learning for Person Recognition cs.CV · 2018 · author #3
  68. Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination cs.CV · 2018 · author #4
  69. Optimizing Video Object Detection via a Scale-Time Lattice cs.CV · 2018 · author #7
  70. Low-Latency Video Semantic Segmentation cs.CV · 2018 · author #3
  71. Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition cs.CV · 2018 · author #3
  72. Accelerated Training for Massive Classification via Dynamic Class Selection cs.CV · 2018 · author #4
  73. Peephole: Predicting Network Performance Before Training cs.LG · 2017 · author #3
  74. Learning Sparse Visual Representations with Leaky Capped Norm Regularizers cs.LG · 2017 · author #2
  75. Be Your Own Prada: Fashion Synthesis with Structural Coherence cs.CV · 2017 · author #4
  76. Contrastive Learning for Image Captioning cs.CV · 2017 · author #2
  77. Scalable Estimation of Dirichlet Process Mixture Models on Distributed Data stat.ML · 2017 · author #2
  78. Integrating Specialized Classifiers Based on Continuous Time Markov Chain cs.LG · 2017 · author #2
  79. Discover and Learn New Objects from Documentaries cs.CV · 2017 · author #4
  80. Temporal Segment Networks for Action Recognition in Videos cs.CV · 2017 · author #5
  81. Temporal Action Detection with Structured Segment Networks cs.CV · 2017 · author #6
  82. Detecting Visual Relationships with Deep Relational Networks cs.CV · 2017 · author #3
  83. Towards Diverse and Natural Image Descriptions via a Conditional GAN cs.CV · 2017 · author #4
  84. UntrimmedNets for Weakly Supervised Action Recognition and Detection cs.CV · 2017 · author #3
  85. A Pursuit of Temporal Accuracy in General Activity Detection cs.CV · 2017 · author #4
  86. PolyNet: A Pursuit of Structural Diversity in Very Deep Networks cs.CV · 2016 · author #4
  87. Deep Markov Random Field for Image Modeling cs.CV · 2016 · author #2
  88. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition cs.CV · 2016 · author #5
  89. CUHK & ETHZ & SIAT Submission to ActivityNet Challenge 2016 cs.CV · 2016 · author #7
  90. Adjustable Bounded Rectifiers: Towards Deep Binary Representations cs.LG · 2015 · author #2
  91. Generating Multi-Sentence Lingual Descriptions of Indoor Scenes cs.CV · 2015 · author #1

Mentions

  • 2512.15702 #8 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2606.30616 #18 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2606.20562 #9 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2606.19338 #5 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2606.17518 #7 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2606.16449 #5 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2606.09393 #13 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2606.06828 #11 · arxiv_oai · confidence 0.70 Dahua Lin
  • 1511.06201 #2 · backfill · confidence 0.70 Dahua Lin
  • 1711.02857 #2 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2606.03503 #8 · arxiv_oai · confidence 0.70 Dahua Lin
  • 1503.00064 #1 · backfill · confidence 0.70 Dahua Lin
  • 2605.30963 #20 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2605.30116 #7 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2605.28820 #20 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2603.16870 #12 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2505.23764 #11 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2605.23897 #6 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2412.15109 #5 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2605.21100 #12 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2508.08636 #15 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2605.19461 #10 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2605.19447 #8 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2403.13805 #8 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2309.15112 #20 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2509.22186 #59 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2407.03320 #26 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2401.16420 #22 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2409.18839 #17 · arxiv_oai · confidence 0.70 Dahua Lin

Frequent Coauthors