PRP introduces proactive routing via Draft Rating Learning and Joint Rating Learning to route queries early between draft and target models for efficient multimodal reasoning.
G1: Bootstrapping perception and reasoning abilities of vision-language model via reinforcement learning
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
Odysseus adapts PPO with a turn-level critic and leverages pretrained VLM action priors to train agents achieving at least 3x average game progress over frontier models in long-horizon Super Mario Land.
Survey that defines agentic RL for LLMs via POMDPs, introduces a taxonomy of planning/tool-use/memory/reasoning capabilities and domains, and compiles open environments from over 500 papers.
RSICCLLM introduces a post-training framework with RSICI dataset, difference-aware supervised fine-tuning, and dual-negative preference optimization that claims to outperform much larger models on remote sensing image change captioning.
The paper organizes research on generalist game AI into Dataset, Model, Harness, and Benchmark pillars and charts a five-level progression from single-game mastery to agents that create and live inside game multiverses.
Gym-V supplies 179 visual environments showing that observation scaffolding like captions and rules matters more for training success than the choice of RL algorithm.
UAV-VL-R1 combines SFT and multi-stage GRPO reinforcement learning on a new 50,019-sample HRVQA-VL dataset to deliver substantially higher zero-shot accuracy on UAV visual reasoning tasks than both its 2B baseline and a 72B-scale model.
A survey compiling RL methods, challenges, data resources, and applications for enhancing reasoning in large language models and large reasoning models since DeepSeek-R1.
citing papers explorer
-
Before Thinking, Learn to Decide: Proactive Routing for Efficient Visual Reasoning
PRP introduces proactive routing via Draft Rating Learning and Joint Rating Learning to route queries early between draft and target models for efficient multimodal reasoning.
-
A Survey of Reinforcement Learning for Large Reasoning Models
A survey compiling RL methods, challenges, data resources, and applications for enhancing reasoning in large language models and large reasoning models since DeepSeek-R1.