VISOR is a unified agentic VRAG framework with Evidence Space structuring, visual action evaluation/correction, and dynamic sliding-window trajectories trained via GRPO-based RL that achieves SOTA performance on long-horizon visual reasoning benchmarks.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
AGREE boosts visual document retrieval by adding local relevance signals from MLLM attention maps to global document labels during retriever training.
citing papers explorer
-
VISOR: Agentic Visual Retrieval-Augmented Generation via Iterative Search and Over-horizon Reasoning
VISOR is a unified agentic VRAG framework with Evidence Space structuring, visual action evaluation/correction, and dynamic sliding-window trajectories trained via GRPO-based RL that achieves SOTA performance on long-horizon visual reasoning benchmarks.
-
Attention Grounded Enhancement for Visual Document Retrieval
AGREE boosts visual document retrieval by adding local relevance signals from MLLM attention maps to global document labels during retriever training.