pith. sign in

Yu Qiao

Identifiers

  • name variant Yu Qiao 0.60 · backfill

Papers (105)

  1. Faithful, Enriched, and Precise: Benchmarking Natural-Science Illustration Generation by T2I models cs.CV · 2026 · author #10
  2. Imagine Before You Predict: Interleaved Latent Visual Reasoning for Video Event Prediction cs.CV · 2026 · author #7
  3. CauTion: Knowing When to Trust LLMs for Ensemble Causal Discovery cs.LG · 2026 · author #5
  4. Smaller Models are Natural Explorers for Policy-Level Diversity in GRPO cs.LG · 2026 · author #10
  5. PARE: Pruning and Adaptive Routing for Efficient Video Generation cs.CV · 2026 · author #4
  6. Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling cs.AI · 2026 · author #25
  7. MARBLE: Multi-Aspect Reward Balance for Diffusion RL cs.CV · 2026 · author #4
  8. Teaching Thinking Models to Reason with Tools: A Full-Pipeline Recipe for Tool-Integrated Reasoning cs.CL · 2026 · author #7
  9. StableI2I: Spotting Unintended Changes in Image-to-Image Transition cs.CV · 2026 · author #7
  10. FedDAP: Domain-Aware Prototype Learning for Federated Learning under Domain Shift cs.CV · 2026 · author #3
  11. Domain-Aware Hybrid Quantum Learning via Correlation-Guided Circuit Design for Crime Pattern Analytics cs.LG · 2026 · author #4
  12. MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale cs.CV · 2026 · author #40
  13. SafeSci: Safety Evaluation of Large Language Models in Science Domains and Beyond cs.LG · 2026 · author #16
  14. Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models cs.CV · 2026 · author #14
  15. RAPO++: Cross-Stage Prompt Optimization for Text-to-Video Generation via Data Alignment and Test-Time Scaling cs.CV · 2025 · author #9
  16. InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy cs.RO · 2025 · author #10
  17. Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark cs.CV · 2025 · author #9
  18. MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing cs.CV · 2025 · author #57
  19. GenExam: A Multidisciplinary Text-to-Image Exam cs.CV · 2025 · author #5
  20. InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency cs.CV · 2025 · author #73
  21. A Survey on Foundation Models for Personalized Federated Intelligence cs.AI · 2025 · author #1
  22. InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models cs.CV · 2025 · author #49
  23. VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning cs.CV · 2025 · author #8
  24. VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness cs.CV · 2025 · author #11
  25. MM-Eureka: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning cs.CV · 2025 · author #13
  26. AgiBot World Colosseo: A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems cs.RO · 2025 · author #26
  27. InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling cs.CV · 2025 · author #14
  28. VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling cs.CV · 2024 · author #11
  29. Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling cs.CV · 2024 · author #40
  30. Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization cs.CL · 2024 · author #10
  31. OS-ATLAS: A Foundation Action Model for Generalist GUI Agents cs.CL · 2024 · author #11
  32. Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation cs.CV · 2024 · author #9
  33. MinerU: An Open-Source Solution for Precise Document Content Extraction cs.CV · 2024 · author #16
  34. InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output cs.CV · 2024 · author #25
  35. How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites cs.CV · 2024 · author #33
  36. Are We on the Right Way for Evaluating Large Vision-Language Models? cs.CV · 2024 · author #9
  37. InternLM2 Technical Report cs.CL · 2024 · author #99
  38. InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model cs.CV · 2024 · author #21
  39. Latte: Latent Diffusion Transformer for Video Generation cs.CV · 2024 · author #8
  40. InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks cs.CV · 2023 · author #14
  41. MVBench: A Comprehensive Multi-modal Video Understanding Benchmark cs.CV · 2023 · author #12
  42. SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models cs.CV · 2023 · author #16
  43. InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition cs.CV · 2023 · author #19
  44. InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation cs.CV · 2023 · author #16
  45. AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning cs.CV · 2023 · author #6
  46. Faster Segment Anything: Towards Lightweight SAM for Mobile Applications cs.CV · 2023 · author #3
  47. Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory cs.AI · 2023 · author #11
  48. VideoChat: Chat-Centric Video Understanding cs.CV · 2023 · author #9
  49. LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model cs.CV · 2023 · author #12
  50. LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention cs.CV · 2023 · author #10
  51. InternVideo: General Video Foundation Models via Generative and Discriminative Learning cs.CV · 2022 · author #17
  52. Product Image Recognition with Guidance Learning and Noisy Supervision cs.CV · 2019 · author #6
  53. Bootstrap Model Ensemble and Rank Loss for Engagement Intensity Regression cs.CV · 2019 · author #6
  54. Suppressing Model Overfitting for Image Super-Resolution Networks cs.CV · 2019 · author #3
  55. P2SGrad: Refined Gradients for Optimizing Deep Face Models cs.CV · 2019 · author #5
  56. AdaCos: Adaptively Scaling Cosine Logits for Effectively Learning Deep Face Representations cs.CV · 2019 · author #3
  57. Modulating Image Restoration with Continual Levels via Adaptive Feature Modification Layers cs.CV · 2019 · author #3
  58. Gluing action groupoids: Fredholm conditions and layer potentials math.OA · 2018 · author #3
  59. Super-Identity Convolutional Neural Network for Face Hallucination cs.CV · 2018 · author #5
  60. PIRM Challenge on Perceptual Image Enhancement on Smartphones: Report cs.CV · 2018 · author #42
  61. ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks cs.CV · 2018 · author #8
  62. Fredholm Groupoids and Layer Potentials on Conical Domains math.OA · 2018 · author #2
  63. Prostate Segmentation using 2D Bridged U-net cs.CV · 2018 · author #4
  64. Knowledge-based Fully Convolutional Network and Its Application in Segmentation of Lung CT Images cs.CV · 2018 · author #2
  65. Boosting up Scene Text Detectors with Guided CNN cs.CV · 2018 · author #6
  66. SpiderCNN: Deep Learning on Point Sets with Parameterized Convolutional Filters cs.CV · 2018 · author #5
  67. An end-to-end TextSpotter with Explicit Alignment and Attention cs.CV · 2018 · author #5
  68. LSTD: A Low-Shot Transfer Detector for Object Detection cs.CV · 2018 · author #4
  69. Structured Triplet Learning with POS-tag Guided Attention for Visual Question Answering cs.CV · 2018 · author #5
  70. FOTS: Fast Oriented Text Spotting with a Unified Network cs.CV · 2018 · author #5
  71. Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness Reward cs.CV · 2017 · author #2
  72. Deep Embedding Convolutional Neural Network for Synthesizing CT Image from T1-Weighted MR Image cs.CV · 2017 · author #5
  73. Single Shot Text Detector with Regional Attention cs.CV · 2017 · author #5
  74. Temporal Segment Networks for Action Recognition in Videos cs.CV · 2017 · author #4
  75. Fredholm conditions on non-compact manifolds: theory and examples math.OA · 2017 · author #3
  76. Analysis of the Mean Field Free Energy Functional of Electrolyte Solution with Non-zero Boundary Conditions and the Generalized PB/PNP Equations with Inhomogeneous Dielectric Permittivity cond-mat.soft · 2017 · author #2
  77. Range Loss for Deep Face Recognition with Long-tail cs.CV · 2016 · author #5
  78. Knowledge Guided Disambiguation for Large-Scale Scene Classification with Multi-Resolution CNNs cs.CV · 2016 · author #5
  79. Detecting Text in Natural Image with Connectionist Text Proposal Network cs.CV · 2016 · author #5
  80. Transferring Object-Scene Convolutional Neural Networks for Event Recognition in Still Images cs.CV · 2016 · author #3
  81. Weakly Supervised PatchNets: Describing and Aggregating Local Patches for Scene Recognition cs.CV · 2016 · author #5
  82. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition cs.CV · 2016 · author #4
  83. CUHK & ETHZ & SIAT Submission to ActivityNet Challenge 2016 cs.CV · 2016 · author #8
  84. DeepWriter: A Multi-Stream Deep CNN for Text-independent Writer Identification cs.CV · 2016 · author #2
  85. Real-time Action Recognition with Enhanced Motion Vector CNNs cs.CV · 2016 · author #4
  86. Actionness Estimation Using Hybrid Fully Convolutional Networks cs.CV · 2016 · author #2
  87. Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks cs.CV · 2016 · author #4
  88. Accurate Text Localization in Natural Image with Cascaded Convolutional Text Network cs.CV · 2016 · author #3
  89. Locally-Supervised Deep Hybrid Model for Scene Recognition cs.CV · 2016 · author #4
  90. Improvements in continuum modeling for biomolecular systems physics.bio-ph · 2015 · author #1
  91. Better Exploiting OS-CNNs for Better Event Recognition in Images cs.CV · 2015 · author #4
  92. Text-Attentional Convolutional Neural Networks for Scene Text Detection cs.CV · 2015 · author #3
  93. Local Multi-Grouped Binary Descriptor with Ring-based Pooling Configuration and Optimization cs.CV · 2015 · author #3
  94. A local approximation of fundamental measure theory incorporated into three dimensional Poisson-Nernst-Planck equations to account for hard sphere repulsion among ions physics.chem-ph · 2015 · author #1
  95. Places205-VGGNet Models for Scene Recognition cs.CV · 2015 · author #4
  96. Local Color Contrastive Descriptor for Image Classification cs.CV · 2015 · author #3
  97. Towards Good Practices for Very Deep Two-Stream ConvNets cs.CV · 2015 · author #4
  98. Reading Scene Text in Deep Convolutional Sequences cs.CV · 2015 · author #3
  99. Boosting Optical Character Recognition: A Super-Resolution Approach cs.CV · 2015 · author #5
  100. Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors cs.CV · 2015 · author #2
  101. Object-Scene Convolutional Neural Networks for Event Recognition in Images cs.CV · 2015 · author #4
  102. Bag of Visual Words and Fusion Methods for Action Recognition: Comprehensive Study and Good Practice cs.CV · 2014 · author #4
  103. A Study on Unsupervised Dictionary Learning and Feature Encoding for Action Classification cs.CV · 2013 · author #3
  104. Uniform shift estimates for transmission problems and optimal rates of convergence for the parametric Finite Element Method math.NA · 2012 · author #3
  105. Layer potentials C*-algebras of domains with conical points math.OA · 2011 · author #2

Mentions

  • 1509.06557 #3 · backfill · confidence 0.70 Yu Qiao
  • 2606.05949 #10 · arxiv_oai · confidence 0.70 Yu Qiao
  • 2606.05769 #7 · arxiv_oai · confidence 0.70 Yu Qiao
  • 1508.06427 #1 · backfill · confidence 0.70 Yu Qiao
  • 1508.01667 #4 · backfill · confidence 0.70 Yu Qiao
  • 1508.00307 #3 · backfill · confidence 0.70 Yu Qiao
  • 1507.02159 #4 · backfill · confidence 0.70 Yu Qiao
  • 1506.04395 #3 · backfill · confidence 0.70 Yu Qiao
  • 1212.6287 #3 · arxiv_oai · confidence 0.70 Yu Qiao
  • 1506.02211 #5 · backfill · confidence 0.70 Yu Qiao
  • 1505.04868 #2 · backfill · confidence 0.70 Yu Qiao
  • 1505.00296 #4 · backfill · confidence 0.70 Yu Qiao
  • 2606.03602 #5 · arxiv_oai · confidence 0.70 Yu Qiao
  • 2605.30789 #10 · arxiv_oai · confidence 0.70 Yu Qiao
  • 1405.4506 #4 · backfill · confidence 0.70 Yu Qiao
  • 1309.0309 #3 · backfill · confidence 0.70 Yu Qiao
  • 2605.27336 #4 · arxiv_oai · confidence 0.70 Yu Qiao
  • 1212.6287 #3 · backfill · confidence 0.70 Yu Qiao
  • 1111.5754 #2 · backfill · confidence 0.70 Yu Qiao
  • 2505.06907 #1 · arxiv_oai · confidence 0.70 Yu Qiao
  • 2410.05363 #9 · arxiv_oai · confidence 0.70 Yu Qiao
  • 2501.00574 #11 · arxiv_oai · confidence 0.70 Yu Qiao
  • 2306.14289 #3 · arxiv_oai · confidence 0.70 Yu Qiao
  • 2311.17005 #12 · arxiv_oai · confidence 0.70 Yu Qiao
  • 2309.15112 #19 · arxiv_oai · confidence 0.70 Yu Qiao
  • 2509.22186 #57 · arxiv_oai · confidence 0.70 Yu Qiao
  • 2407.03320 #25 · arxiv_oai · confidence 0.70 Yu Qiao
  • 2401.16420 #21 · arxiv_oai · confidence 0.70 Yu Qiao
  • 2311.07575 #16 · arxiv_oai · confidence 0.70 Yu Qiao
  • 2501.12386 #14 · arxiv_oai · confidence 0.70 Yu Qiao
  • 2212.03191 #17 · arxiv_oai · confidence 0.70 Yu Qiao
  • 2411.10442 #10 · arxiv_oai · confidence 0.70 Yu Qiao
  • 2409.18839 #16 · arxiv_oai · confidence 0.70 Yu Qiao
  • 2504.06958 #8 · arxiv_oai · confidence 0.70 Yu Qiao
  • 2305.17144 #11 · arxiv_oai · confidence 0.70 Yu Qiao

Frequent Coauthors