RL-VLM-F: Reinforcement learn- ing from vision language foundation model feedback,

· 2024 · arXiv 2402.03681

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Reasoning Portability: Guiding Continual Learning for MLLMs in the RLVR Era

cs.LG · 2026-05-17 · unverdicted · novelty 7.0

Formalizes Reasoning Portability (RP) and proposes RDB-CL to modulate per-sample KL regularization in RLVR for MLLM continual learning, achieving +12.0% Last accuracy over vanilla RLVR baseline by preserving reusable reasoning on high-RP samples.

Beyond Pixels: Learning Invariant Rewards for Real-World Robotics From a Few Demonstrations

cs.RO · 2026-05-21 · unverdicted · novelty 6.0

A framework learns invariant symbolic reward functions from few demonstrations that generalize zero-shot to variations in robotic manipulation tasks.

AtlasVA: Self-Evolving Visual Skill Memory for Teacher-Free VLM Agents

cs.CV · 2026-05-18 · unverdicted · novelty 5.0

AtlasVA organizes VLM agent memory into spatial heatmaps, visual exemplars, and symbolic skills, evolving atlases from trajectories to act as potential-based shaping rewards in teacher-free reinforcement learning.

Reflection-Based Task Adaptation for Self-Improving VLA

cs.RO · 2025-10-14 · unverdicted · novelty 5.0

Reflective Self-Adaptation combines failure-reflective reinforcement learning with success-guided imitation learning to enable faster and more reliable task adaptation for pre-trained Vision-Language-Action models.

UAV-VL-R1: Generalizing Vision-Language Models via Supervised Fine-Tuning and Multi-Stage GRPO for UAV Visual Reasoning

cs.CV · 2025-08-15 · unverdicted · novelty 5.0

UAV-VL-R1 combines SFT and multi-stage GRPO reinforcement learning on a new 50,019-sample HRVQA-VL dataset to deliver substantially higher zero-shot accuracy on UAV visual reasoning tasks than both its 2B baseline and a 72B-scale model.

Meta-Learning and Meta-Reinforcement Learning -- Tracing the Path towards DeepMind's Adaptive Agent

cs.AI · 2026-02-23 · unverdicted · novelty 2.0

A survey provides a task-based formalization of meta-learning and meta-RL while chronicling algorithms that lead to DeepMind's Adaptive Agent.

citing papers explorer

Showing 6 of 6 citing papers.

Reasoning Portability: Guiding Continual Learning for MLLMs in the RLVR Era cs.LG · 2026-05-17 · unverdicted · none · ref 52
Formalizes Reasoning Portability (RP) and proposes RDB-CL to modulate per-sample KL regularization in RLVR for MLLM continual learning, achieving +12.0% Last accuracy over vanilla RLVR baseline by preserving reusable reasoning on high-RP samples.
Beyond Pixels: Learning Invariant Rewards for Real-World Robotics From a Few Demonstrations cs.RO · 2026-05-21 · unverdicted · none · ref 25
A framework learns invariant symbolic reward functions from few demonstrations that generalize zero-shot to variations in robotic manipulation tasks.
AtlasVA: Self-Evolving Visual Skill Memory for Teacher-Free VLM Agents cs.CV · 2026-05-18 · unverdicted · none · ref 43
AtlasVA organizes VLM agent memory into spatial heatmaps, visual exemplars, and symbolic skills, evolving atlases from trajectories to act as potential-based shaping rewards in teacher-free reinforcement learning.
Reflection-Based Task Adaptation for Self-Improving VLA cs.RO · 2025-10-14 · unverdicted · none · ref 39
Reflective Self-Adaptation combines failure-reflective reinforcement learning with success-guided imitation learning to enable faster and more reliable task adaptation for pre-trained Vision-Language-Action models.
UAV-VL-R1: Generalizing Vision-Language Models via Supervised Fine-Tuning and Multi-Stage GRPO for UAV Visual Reasoning cs.CV · 2025-08-15 · unverdicted · none · ref 5
UAV-VL-R1 combines SFT and multi-stage GRPO reinforcement learning on a new 50,019-sample HRVQA-VL dataset to deliver substantially higher zero-shot accuracy on UAV visual reasoning tasks than both its 2B baseline and a 72B-scale model.
Meta-Learning and Meta-Reinforcement Learning -- Tracing the Path towards DeepMind's Adaptive Agent cs.AI · 2026-02-23 · unverdicted · none · ref 182
A survey provides a task-based formalization of meta-learning and meta-RL while chronicling algorithms that lead to DeepMind's Adaptive Agent.

RL-VLM-F: Reinforcement learn- ing from vision language foundation model feedback,

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer