A closed-loop self-evolving training system for spatial reasoning in MLLMs that iteratively generates QA pairs matched to the model's current capabilities via confidence feedback, achieving gains with an order of magnitude less data.
Vision-zero: Scalable vlm self-improvement via strategic gamified self-play
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 7verdicts
UNVERDICTED 7roles
background 1polarities
background 1representative citing papers
CV-Arena is a new 12K-pair benchmark for instruction-guided real-image editing with 16 task types, CogRetriever curation, and Active Elo mixed human-AI evaluation that finds gaps in 21 models and presents CV-Agent.
M2Note stores failed VLM trajectories as subject-guidance notes in an external notebook and retrieves them via multimodal RAG to avoid past errors during inference.
EvoVid proposes a temporal-centric self-evolution framework for Video-LLMs that uses temporal-aware Questioner and temporal-grounded Solver rewards to improve performance directly from unannotated videos.
Video-Zero is an annotation-free Questioner-Solver co-evolution framework that centers self-evolution on temporally localized evidence to improve video VLMs.
RISE proposes a self-evolving VLM framework with three designs to address challenges in question generation and solver adaptation, reporting consistent gains on seven benchmarks across two backbones.
Position paper identifies structural challenges in applying generic agentic AI to Earth Observation and outlines design principles for EO-native agents focused on geospatial state and validity.
citing papers explorer
-
Ouroboros-Spatial: Closing the Data-Model Loop for Spatial Reasoning
A closed-loop self-evolving training system for spatial reasoning in MLLMs that iteratively generates QA pairs matched to the model's current capabilities via confidence feedback, achieving gains with an order of magnitude less data.
-
CV-Arena: An Open Benchmark for Instructional Computer Vision Problem Solving with Human-AI Collaborative Preferences
CV-Arena is a new 12K-pair benchmark for instruction-guided real-image editing with 16 task types, CogRetriever curation, and Active Elo mixed human-AI evaluation that finds gaps in 21 models and presents CV-Agent.
-
M2Note: Continual Evolution of Vision Language Models via Mistake Notebook Learning
M2Note stores failed VLM trajectories as subject-guidance notes in an external notebook and retrieves them via multimodal RAG to avoid past errors during inference.
-
EvoVid: Temporal-Centric Self-Evolution for Video Large Language Models
EvoVid proposes a temporal-centric self-evolution framework for Video-LLMs that uses temporal-aware Questioner and temporal-grounded Solver rewards to improve performance directly from unannotated videos.
-
Video-Zero: Self-Evolution Video Understanding
Video-Zero is an annotation-free Questioner-Solver co-evolution framework that centers self-evolution on temporally localized evidence to improve video VLMs.
-
RISE: Reliable Improvement in Self-Evolving Vision-Language Models
RISE proposes a self-evolving VLM framework with three designs to address challenges in question generation and solver adaptation, reporting consistent gains on seven benchmarks across two backbones.
-
Agentic AI for Remote Sensing: Technical Challenges and Research Directions
Position paper identifies structural challenges in applying generic agentic AI to Earth Observation and outlines design principles for EO-native agents focused on geospatial state and validity.