pith. sign in

Shuai Bai

Identifiers

  • name variant Shuai Bai 0.60 · backfill

Papers (24)

  1. Unified Multimodal Autoregressive Modeling with Shared Context-Visual Tokenizer is Key to Unification cs.CV · 2026 · author #10
  2. Qwen-RobotNav Technical Report: A Scalable Navigation Model Designed for an Agentic Navigation System cs.RO · 2026 · author #30
  3. Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments cs.RO · 2026 · author #15
  4. FineVLA: Fine-Grained Instruction Alignment for Steerable Vision-Language-Action Policies cs.RO · 2026 · author #13
  5. CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents cs.AI · 2026 · author #10
  6. MPDocBench-Parse: Benchmarking Practical Multi-page Document Parsing cs.AI · 2026 · author #8
  7. Qwen-Image-2.0 Technical Report cs.CV · 2026 · author #56
  8. Qwen3-VL-Seg: Unlocking Open-World Referring Segmentation with Vision-Language Grounding cs.CV · 2026 · author #6
  9. CC-OCR V2: Benchmarking Large Multimodal Models for Literacy in Real-world Document Processing cs.CL · 2026 · author #12
  10. Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking cs.CL · 2026 · author #6
  11. VLM4VLA: Revisiting Vision-Language-Models in Vision-Language-Action Models cs.CV · 2026 · author #8
  12. Qwen3-VL Technical Report cs.CV · 2025 · author #1
  13. Soft Adaptive Policy Optimization cs.LG · 2025 · author #8
  14. Unify Robot Actions in Camera Frame cs.RO · 2025 · author #10
  15. Revisiting Multimodal Positional Encoding in Vision-Language Models cs.CV · 2025 · author #7
  16. Qwen3-Omni Technical Report cs.CL · 2025 · author #24
  17. Qwen-Image Technical Report cs.CV · 2025 · author #8
  18. Qwen2.5-Omni Technical Report cs.CL · 2025 · author #6
  19. Qwen2.5-VL Technical Report cs.CV · 2025 · author #1
  20. Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution cs.CV · 2024 · author #2
  21. Qwen2 Technical Report cs.CL · 2024 · author #40
  22. Qwen Technical Report cs.CL · 2023 · author #2
  23. Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond cs.CV · 2023 · author #2
  24. Multi-hierarchical Independent Correlation Filters for Visual Tracking cs.CV · 2018 · author #1

Mentions

  • 2606.18112 #30 · arxiv_oai · confidence 0.70 Shuai Bai
  • 2606.18249 #10 · arxiv_oai · confidence 0.70 Shuai Bai
  • 2601.03309 #8 · arxiv_oai · confidence 0.70 Shuai Bai
  • 2605.30280 #15 · arxiv_oai · confidence 0.70 Shuai Bai
  • 2605.27284 #13 · arxiv_oai · confidence 0.70 Shuai Bai
  • 2605.25624 #10 · arxiv_oai · confidence 0.70 Shuai Bai
  • 2605.22100 #8 · arxiv_oai · confidence 0.70 Shuai Bai
  • 2407.10671 #40 · backfill · confidence 0.70 Shuai Bai
  • 2511.21631 #1 · backfill · confidence 0.70 Shuai Bai

Frequent Coauthors