Eva-VLA: Evaluating vision-language-action mod- els’ robustness under real-world physical variations

URLhttps://arxiv · 2025 · arXiv 2509.18953

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

STRONG-VLA: Decoupled Robustness Learning for Vision-Language-Action Models under Multimodal Perturbations

cs.RO · 2026-04-11 · unverdicted · novelty 7.0

STRONG-VLA uses decoupled two-stage training to improve VLA model robustness, yielding up to 16% higher task success rates under seen and unseen perturbations on the LIBERO benchmark.

Thermally Activated Dual-Modal Adversarial Clothing against AI Surveillance Systems

cs.AI · 2025-11-13 · unverdicted · novelty 7.0

Thermally activated clothing with thermochromic dyes and heaters creates dynamic adversarial patterns that evade AI surveillance in visible and infrared modalities while appearing ordinary when inactive.

Sequential Planning via Anchored Robotic Keypoints

cs.RO · 2026-06-29 · unverdicted · novelty 6.0

SPARK reaches 43.7% success on six LIBERO-PRO cells by LLM-generated typed behavior trees plus multi-prompt perception and recovery, more than doubling CaP-Agent0 and VLA baselines.

FATE-VLA:Failue-aware test generation for vision-language-action models

cs.RO · 2026-06-01 · unverdicted · novelty 6.0

FATE-VLA reframes VLA evaluation as active failure discovery and reports uncovering up to 29.7% more failures across four models while revealing diverse failure modes.

RoboStressBench: Benchmarking VLM Robustness to Physical Visual Stress in Embodied Scenes

cs.CV · 2026-05-30 · unverdicted · novelty 6.0

RoboStressBench decomposes visual stress into four physically grounded dimensions to benchmark VLM robustness in embodied scenes and proposes a stress-aware solver.

Capability and Robustness Cannot Both Be Free: An Information-Theoretic Bound for Vision-Language-Action Models

cs.CR · 2026-05-25 · unverdicted · novelty 6.0

Any VLA policy satisfies I(A*; Aπ) + [I(Aπ; Ãπ) − I(Aπ; δ)] ≤ H(A*) + I(X; X̃) by two applications of the Data Processing Inequality.

Position: Vision-Language-Action Models Cannot Be Verified to Perform Physical Reasoning

cs.RO · 2026-06-28 · conditional · novelty 5.0

VLA benchmark success rates cannot distinguish semantic generalization from physical reasoning due to an identifiability gap in current evaluation protocols.

Uncovering Vulnerability of Vision-Language-Action Models under Joint-Level Physical Faults

cs.RO · 2026-06-09 · unverdicted · novelty 5.0

VLA models exhibit joint-dependent success degradation under realistic physical faults, which J-PARC mitigates via latent regime inference and residual action correction.

VLAMotor: Test-Guided Enhancement of Vision-Language-Action Models via Agent-BasedData Synthesis

cs.RO · 2026-05-16 · unverdicted · novelty 5.0

VLAMotor exposes VLA failures via distance-aware uncertainty testing and synthesizes agent-planned repair data to fine-tune models, reporting 49.25% success rate gains in simulation and 57.5% on hardware.

Failing Forward: Adaptive Failure-Informed Learning for Vision-Language-Action Models

cs.RO · 2026-05-08 · unverdicted · novelty 5.0 · 2 refs

AFIL trains dual action generators on success and failure rollouts from a pretrained VLA to steer diffusion policies away from failure modes during inference.

Lost in Fog: Sensor Perturbations Expose Reasoning Fragility in Driving VLAs

cs.RO · 2026-05-20 · unverdicted · novelty 4.0 · 2 refs

Changes in Chain-of-Causation explanations under sensor perturbations correlate with 5.3× higher trajectory deviation in a driving VLA, and enabling such explanations yields 11.8% better accuracy.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Position: Vision-Language-Action Models Cannot Be Verified to Perform Physical Reasoning cs.RO · 2026-06-28 · conditional · none · ref 50
VLA benchmark success rates cannot distinguish semantic generalization from physical reasoning due to an identifiability gap in current evaluation protocols.

Eva-VLA: Evaluating vision-language-action mod- els’ robustness under real-world physical variations

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer