pith. sign in

Mu Cai

Identifiers

  • name variant Mu Cai 0.60 · backfill

Papers (25)

  1. MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI Agents cs.CV · 2026 · author #4
  2. MuRF: Unlocking the Multi-Scale Potential of Vision Foundation Models cs.CV · 2026 · author #2
  3. Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities cs.CL · 2025 · author #1232
  4. Decomposing Complex Visual Comprehension into Atomic Visual Skills for Vision Language Models cs.CV · 2025 · author #6
  5. Magma: A Foundation Model for Multimodal AI Agents cs.CV · 2025 · author #8
  6. Humanity's Last Exam cs.LG · 2025 · author #837
  7. TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models cs.CV · 2024 · author #1
  8. Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos cs.CV · 2024 · author #2
  9. Removing Distributional Discrepancies in Captions Improves Image-Text Alignment cs.CV · 2024 · author #3
  10. Interpolating Video-LLMs: Toward Longer-sequence LMMs in a Training-free Manner cs.CV · 2024 · author #4
  11. Cross-Modal Self-Supervised Learning with Effective Contrastive Units for LiDAR Point Clouds cs.CV · 2024 · author #1
  12. CHARTOM: A Visual Theory-of-Mind Benchmark for LLMs on Misleading Charts cs.AI · 2024 · author #5
  13. VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation cs.CV · 2024 · author #2
  14. LLaRA: Supercharging Robot Learning Data for Vision-Language Policy cs.RO · 2024 · author #9
  15. Yo'LLaVA: Your Personalized Language and Vision Assistant cs.CV · 2024 · author #4
  16. Matryoshka Multimodal Models cs.CV · 2024 · author #1
  17. CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples cs.CV · 2024 · author #2
  18. ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts cs.CV · 2023 · author #1
  19. A Sentence Speaks a Thousand Images: Domain Generalization through Distilling CLIP with Language Guidance cs.CV · 2023 · author #4
  20. Investigating the Catastrophic Forgetting in Multimodal Large Language Models cs.CL · 2023 · author #4
  21. Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding cs.CV · 2023 · author #1
  22. Out-of-distribution Detection via Frequency-regularized Generative Models cs.LG · 2022 · author #1
  23. Masked Discrimination for Self-Supervised Learning on Point Clouds cs.CV · 2022 · author #2
  24. VOS: Learning What You Don't Know by Virtual Outlier Synthesis cs.LG · 2022 · author #3
  25. Frequency Domain Image Translation: More Photo-realistic, Better Identity-preserving cs.CV · 2020 · author #1

Mentions

  • 2408.14419 #5 · arxiv_oai · confidence 0.70 Mu Cai
  • 2505.20021 #6 · arxiv_oai · confidence 0.70 Mu Cai
  • 2502.13130 #8 · arxiv_oai · confidence 0.70 Mu Cai
  • 2406.20095 #9 · arxiv_oai · confidence 0.70 Mu Cai
  • 2406.09400 #4 · arxiv_oai · confidence 0.70 Mu Cai
  • 2410.10818 #1 · arxiv_oai · confidence 0.70 Mu Cai
  • 2410.02763 #2 · arxiv_oai · confidence 0.70 Mu Cai
  • 2409.12963 #4 · arxiv_oai · confidence 0.70 Mu Cai
  • 2410.00905 #3 · arxiv_oai · confidence 0.70 Mu Cai
  • 2409.06827 #1 · arxiv_oai · confidence 0.70 Mu Cai
  • 2407.10972 #2 · arxiv_oai · confidence 0.70 Mu Cai
  • 2405.17430 #1 · arxiv_oai · confidence 0.70 Mu Cai
  • 2306.06094 #1 · arxiv_oai · confidence 0.70 Mu Cai
  • 2402.13254 #2 · arxiv_oai · confidence 0.70 Mu Cai
  • 2312.00784 #1 · arxiv_oai · confidence 0.70 Mu Cai
  • 2309.10313 #4 · arxiv_oai · confidence 0.70 Mu Cai
  • 2309.12530 #4 · arxiv_oai · confidence 0.70 Mu Cai
  • 2208.09083 #1 · arxiv_oai · confidence 0.70 Mu Cai
  • 2203.11183 #2 · arxiv_oai · confidence 0.70 Mu Cai
  • 2202.01197 #3 · arxiv_oai · confidence 0.70 Mu Cai
  • 2011.13611 #1 · arxiv_oai · confidence 0.70 Mu Cai
  • 2605.18652 #4 · arxiv_oai · confidence 0.70 Mu Cai

Frequent Coauthors