Evolve-vla: Test-time training from environment feedback for vision- language-action models.arXiv preprint arXiv:2512.14666

· 2025 · arXiv 2512.14666

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 4

citation-polarity summary

background 4

representative citing papers

DreamAvoid: Critical-Phase Test-Time Dreaming to Avoid Failures in VLA Policies

cs.RO · 2026-05-12 · unverdicted · novelty 7.0

DreamAvoid uses a Dream Trigger, Action Proposer, and Dream Evaluator trained on success/failure/boundary data to let VLA policies avoid critical-phase failures via test-time future dreaming.

Trust Your Instincts: Confidence-Driven Test-Time RL for Vision-Language-Action Models

cs.RO · 2026-06-29 · unverdicted · novelty 6.0

T^2VLA is a test-time reinforcement learning framework for VLAs that uses internal confidence to define intrinsic rewards via similarity to high-confidence expert demonstrations and a dual-expert bootstrapping mechanism.

Supervise What Survives: Geometry-Guided VLA Adaptation from Synthetic Robot Videos

cs.RO · 2026-06-23 · unverdicted · novelty 6.0

GRA extracts 2D waypoints from synthetic videos to supervise VLA vision while restricting action training to real data, outperforming pseudo-action baselines on real-robot tasks.

Agentic-VLA: Efficient Online Adaptation for Vision-Language-Action Models

cs.RO · 2026-05-21 · unverdicted · novelty 6.0

Agentic-VLA enables efficient online adaptation of VLA models, delivering +12.3% on long-horizon tasks, +28.5% in 1-shot learning, and 2.4x faster convergence on LIBERO through three new components.

Escaping the Diversity Trap in Robotic Manipulation via Anchor-Centric Adaptation

cs.RO · 2026-05-08 · unverdicted · novelty 6.0

Anchor-Centric Adaptation escapes the diversity trap by prioritizing repeated demonstrations at core anchors over broad coverage, yielding higher success rates under fixed data budgets in robotic manipulation.

PALM: Progress-Aware Policy Learning via Affordance Reasoning for Long-Horizon Robotic Manipulation

cs.RO · 2026-01-11 · unverdicted · novelty 6.0

PALM improves long-horizon robotic manipulation success by distilling affordance representations for object interaction and predicting within-subtask progress in a VLA model.

Is the Future Compatible? Diagnosing Dynamic Consistency in World Action Models

cs.RO · 2026-05-08 · unverdicted · novelty 5.0

Action-state consistency in World Action Models distinguishes successful from failed imagined futures and supports value-free selection of better rollouts via consensus among predictions.

Test-Time Training for Visual Foresight Vision-Language-Action Models

cs.CV · 2026-05-06 · unverdicted · novelty 5.0 · 2 refs

T³VF applies test-time training on natural future-prediction supervision pairs with adaptive filtering to mitigate OOD shifts in VF-VLA models at modest extra inference cost.

FAR: Failure-Aware Retry for Test-Time Recovery and Continual Policy Improvement

cs.RO · 2026-07-01 · unverdicted · novelty 4.0

FAR combines failure-contrastive preference adaptation with action perturbations for test-time recovery and continual policy improvement, reporting 17.6% and 11.7% success gains over diffusion policies in simulation and real-world manipulation tasks.

citing papers explorer

Showing 9 of 9 citing papers.

DreamAvoid: Critical-Phase Test-Time Dreaming to Avoid Failures in VLA Policies cs.RO · 2026-05-12 · unverdicted · none · ref 19
DreamAvoid uses a Dream Trigger, Action Proposer, and Dream Evaluator trained on success/failure/boundary data to let VLA policies avoid critical-phase failures via test-time future dreaming.
Trust Your Instincts: Confidence-Driven Test-Time RL for Vision-Language-Action Models cs.RO · 2026-06-29 · unverdicted · none · ref 1
T^2VLA is a test-time reinforcement learning framework for VLAs that uses internal confidence to define intrinsic rewards via similarity to high-confidence expert demonstrations and a dual-expert bootstrapping mechanism.
Supervise What Survives: Geometry-Guided VLA Adaptation from Synthetic Robot Videos cs.RO · 2026-06-23 · unverdicted · none · ref 14
GRA extracts 2D waypoints from synthetic videos to supervise VLA vision while restricting action training to real data, outperforming pseudo-action baselines on real-robot tasks.
Agentic-VLA: Efficient Online Adaptation for Vision-Language-Action Models cs.RO · 2026-05-21 · unverdicted · none · ref 1
Agentic-VLA enables efficient online adaptation of VLA models, delivering +12.3% on long-horizon tasks, +28.5% in 1-shot learning, and 2.4x faster convergence on LIBERO through three new components.
Escaping the Diversity Trap in Robotic Manipulation via Anchor-Centric Adaptation cs.RO · 2026-05-08 · unverdicted · none · ref 39
Anchor-Centric Adaptation escapes the diversity trap by prioritizing repeated demonstrations at core anchors over broad coverage, yielding higher success rates under fixed data budgets in robotic manipulation.
PALM: Progress-Aware Policy Learning via Affordance Reasoning for Long-Horizon Robotic Manipulation cs.RO · 2026-01-11 · unverdicted · none · ref 3
PALM improves long-horizon robotic manipulation success by distilling affordance representations for object interaction and predicting within-subtask progress in a VLA model.
Is the Future Compatible? Diagnosing Dynamic Consistency in World Action Models cs.RO · 2026-05-08 · unverdicted · none · ref 2
Action-state consistency in World Action Models distinguishes successful from failed imagined futures and supports value-free selection of better rollouts via consensus among predictions.
Test-Time Training for Visual Foresight Vision-Language-Action Models cs.CV · 2026-05-06 · unverdicted · none · ref 1 · 2 links
T³VF applies test-time training on natural future-prediction supervision pairs with adaptive filtering to mitigate OOD shifts in VF-VLA models at modest extra inference cost.
FAR: Failure-Aware Retry for Test-Time Recovery and Continual Policy Improvement cs.RO · 2026-07-01 · unverdicted · none · ref 41
FAR combines failure-contrastive preference adaptation with action perturbations for test-time recovery and continual policy improvement, reporting 17.6% and 11.7% success gains over diffusion policies in simulation and real-world manipulation tasks.

Evolve-vla: Test-time training from environment feedback for vision- language-action models.arXiv preprint arXiv:2512.14666

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer