Uncovering Vulnerability of Vision-Language-Action Models under Joint-Level Physical Faults

Junha Chun; Minsoo Jo; Taeju Kwon; Taesup Kim; Youngjoon Jeong

arxiv: 2606.10501 · v1 · pith:2KKX7OXFnew · submitted 2026-06-09 · 💻 cs.RO

Uncovering Vulnerability of Vision-Language-Action Models under Joint-Level Physical Faults

Minsoo Jo , Taeju Kwon , Junha Chun , Youngjoon Jeong , Taesup Kim This is my paper

Pith reviewed 2026-06-27 12:48 UTC · model grok-4.3

classification 💻 cs.RO

keywords vision-language-action modelsjoint-level physical faultsrobot embodiment mismatchclosed-loop executionresidual calibrationpolicy robustnessactuator degradationfriction faults

0 comments

The pith

Vision-language-action models lose task success under joint-level physical faults even when motions remain feasible.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how physical changes at individual robot joints disrupt vision-language-action policies that map images and language to actions. It establishes that these faults produce joint-dependent drops in success rates and that the degradation stems from closed-loop execution mismatch rather than physical impossibility alone. The authors introduce a lightweight residual calibration method that infers the current fault regime from joint dynamics and applies corrective adjustments while leaving the original policy frozen. This matters for anyone planning to run such models on real hardware where wear, friction, and actuator issues arise over time. The work isolates the embodiment-side vulnerability as a distinct robustness problem separate from perceptual or semantic variations.

Core claim

VLA models are vulnerable when predicted actions are executed through a perturbed robot body. Our analysis reveals joint-dependent effects, with heterogeneous degradation in task success across affected joints. We also show that performance drops cannot be attributed solely to physical infeasibility, since feasible faults such as increased joint friction can still substantially reduce success rates and induce closed-loop execution mismatch. Motivated by these findings, we propose Joint-level Physical-fault Aware Residual Calibrator (J-PARC), a lightweight residual calibration framework built on top of a frozen VLA policy. J-PARC infers a latent joint-fault regime from recent joint dynamics a

What carries the argument

Joint-level Physical-fault Aware Residual Calibrator (J-PARC), which infers a latent fault regime from recent joint dynamics and applies regime-conditioned residual corrections to the actions of a frozen VLA policy.

If this is right

Task success degrades heterogeneously depending on which joint experiences the fault.
Feasible physical changes such as increased friction still induce substantial closed-loop mismatch and lower success rates.
A frozen VLA policy can be augmented with a lightweight, fault-regime-aware residual calibrator to recover robustness.
The calibrator leaves performance unchanged in fault-free conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Real deployments may require continuous monitoring of joint dynamics to trigger the right calibration regime.
The same residual-calibration pattern could be tested on other action-generation methods such as diffusion policies or reinforcement-learning controllers.
Combining multiple simultaneous joint faults would test whether the latent-regime inference remains effective under compound degradation.

Load-bearing premise

The listed joint-level faults produce a closed-loop execution mismatch that is representative of real robots and that the experimental setup isolates this mismatch from other factors.

What would settle it

A physical-robot experiment in which the same VLA policy shows no drop in task success under the described joint faults, or in which J-PARC yields no measurable improvement over the base policy under those faults.

Figures

Figures reproduced from arXiv: 2606.10501 by Junha Chun, Minsoo Jo, Taeju Kwon, Taesup Kim, Youngjoon Jeong.

**Figure 1.** Figure 1: VLA models under joint-level physical faults. A VLA policy successfully completes the LIBERO [18] task in the fault-free setting, but fails when different Franka Panda joints are locked. This illustrates how embodiment-side faults can change the robot’s realized motion without changing the policy output, motivating physical-fault aware action calibration. joint friction [15, 16, 17]. We treat range limits … view at source ↗

**Figure 2.** Figure 2: Heterogeneous and feasible joint-level fault effects. Joint-level faults cause heterogeneous performance degradation across joints, and increased friction can reduce success rates even when the task remains physically feasible. 2 VLA Models are Vulnerable to Joint-Level Physical faults We first investigate the vulnerability of VLA policies to joint-level physical faults during closedloop robot execution. … view at source ↗

**Figure 3.** Figure 3: Fault-accumulation recovery. Success drops as faults persist before release. To examine whether VLA policies can recover from states induced by persistent joint faults, we evaluate a fault-accumulation setting. The robot first executes under a locked-joint fault for a specified number of steps, allowing the fault-induced deviation to accumulate in the robot state. Then, starting from this accumulated state… view at source ↗

**Figure 4.** Figure 4: UMAP visualization of robot state transition distributions under different fault conditions. Joint π0.5 CIK ∆ j0 57.4 54.7 -2.6 j1 0.0 0.0 +0.0 j2 37.5 33.3 -4.2 j3 6.3 5.4 -0.9 j4 85.2 83.5 -1.7 j5 86.9 82.2 -4.8 j6 66.7 68.8 +2.1 Mean 48.6 46.8 -1.7 [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: T-SNE visualization of latent representations learned by the joint-fault regime encoder. We visualize encoder embeddings under locked-joint faults and increased-friction faults. The embeddings form joint-dependent clusters under both fault types. To evaluate whether the jointfault regime encoder captures meaningful execution contexts, we visualize its latent representations under different joint-level f… view at source ↗

**Figure 6.** Figure 6: Real-world evaluation on the Trossen WidowX AI bowl pick-and-place task. Trajectory overlays are drawn on the final observation frame. Under joint-level faults, the base policy without J-PARC often drifts away from the fault-free execution path, while J-PARC redirects the end-effector trajectory toward the successful placement behavior. Action-Space Robustness and Physical Execution Mismatch. Beyond visual… view at source ↗

read the original abstract

Deploying Vision-Language-Action (VLA) models in real robotic systems requires robustness not only to semantic and perceptual variations, but also to embodiment-side faults that change how actions are physically realized. Real robots can experience joint-level changes caused by actuator degradation, hardware faults, safety limits, collision damage, or wear-induced friction. These faults are critical because they alter the action-to-motion interface of a policy, disrupting the learned closed-loop relationship between commanded actions, realized motion, and subsequent observations. In this work, we study realistic joint-level physical faults and show that VLA models are vulnerable when predicted actions are executed through a perturbed robot body. Our analysis reveals joint-dependent effects, with heterogeneous degradation in task success across affected joints. We also show that performance drops cannot be attributed solely to physical infeasibility, since feasible faults such as increased joint friction can still substantially reduce success rates and induce closed-loop execution mismatch. Motivated by these findings, we propose Joint-level Physical-fault Aware Residual Calibrator (J-PARC), a lightweight residual calibration framework built on top of a frozen VLA policy. J-PARC infers a latent joint-fault regime from recent joint dynamics and conditions a shared residual calibrator on this regime, enabling adaptive action correction across faulty joints. Experiments show that J-PARC improves robustness under joint-level faults while preserving fault-free environment performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

VLA models degrade under joint faults like friction in ways not reducible to infeasibility alone, and J-PARC adds a lightweight residual fix on a frozen policy, but the isolation of policy-specific closed-loop mismatch still needs explicit verification.

read the letter

The paper's main contribution is showing that joint-level physical faults, such as increased friction, cause heterogeneous drops in VLA task success that go beyond simple physical impossibility. It also introduces J-PARC, a residual calibrator that infers a latent fault regime from recent dynamics and conditions corrections on it without retraining the base model.

What the work does well is target a deployment-relevant gap. Real robots experience actuator wear, safety limits, and friction changes, and the claim that these disrupt the learned action-observation loop is worth checking. Keeping the original VLA frozen while adding a small adaptive layer is a reasonable engineering choice that should preserve fault-free performance.

The soft spot is in the experimental isolation. The central assertion—that performance loss under feasible faults reflects closed-loop mismatch rather than generic execution failure—requires showing that the injected perturbations keep motions kinematically and dynamically feasible, and that an oracle controller under the same faults would not suffer the same drop. The abstract states the result but gives no fault-model equations, feasibility metrics, or such baselines. The stress-test concern about confounding factors like altered observation distributions therefore still applies until those controls appear in the full text.

This paper is for robotics groups working on VLA deployment and hardware robustness. Readers already thinking about embodiment mismatch will find the framing and the calibrator idea useful. It is coherent enough on its own terms to deserve referee time, even if the current evidence level is preliminary and the methods section will need expansion.

Referee Report

2 major / 2 minor

Summary. The paper claims that Vision-Language-Action (VLA) models are vulnerable to joint-level physical faults (actuator degradation, hardware faults, safety limits, collision damage, wear-induced friction) that alter the action-to-motion interface, causing heterogeneous degradation in task success rates across affected joints. It asserts that these drops cannot be attributed solely to physical infeasibility, as feasible faults like increased friction still induce closed-loop execution mismatch. Motivated by this, the authors propose J-PARC, a lightweight residual calibration framework on a frozen VLA policy that infers a latent joint-fault regime from recent joint dynamics and conditions a shared residual calibrator for adaptive action correction, with experiments showing improved robustness under faults while preserving fault-free performance.

Significance. If the experimental isolation of policy-specific mismatch holds, the work is significant for real-world VLA deployment in robotics, as it shifts focus from perceptual/semantic robustness to embodiment faults that disrupt the learned closed-loop relationship. The joint-dependent heterogeneity analysis and the practical J-PARC mitigation (which avoids retraining the base policy) could guide more reliable robotic systems. The emphasis on feasible faults provides a realistic lens on policy sensitivity beyond obvious kinematic violations.

major comments (2)

[Abstract and Experiments] Abstract and Experiments section: The central claim that 'performance drops cannot be attributed solely to physical infeasibility' and that 'feasible faults such as increased joint friction can still substantially reduce success rates' is load-bearing for distinguishing closed-loop mismatch from generic execution failure. However, no fault-model equations (e.g., definitions of friction coefficients or actuator degradation), kinematic/dynamic feasibility metrics, or baselines (e.g., oracle controller under identical perturbations) are supplied. This prevents verification that the observed heterogeneous success degradation is policy-specific rather than confounded by altered observation distributions or velocity limits.
[Proposed Method (J-PARC)] J-PARC description (proposed method section): The framework 'infers a latent joint-fault regime from recent joint dynamics and conditions a shared residual calibrator on this regime' is presented as the mitigation, but lacks any equations for regime inference, the conditioning mechanism, loss functions, or training details on how the calibrator is learned. Without these, it is impossible to assess whether J-PARC actually addresses the identified mismatch or merely fits to the experimental perturbations.

minor comments (2)

[Abstract] The abstract supplies no quantitative details on datasets, tasks, number of trials, or success-rate tables, which reduces clarity even for a high-level overview.
[Introduction and Method] Notation for 'joint dynamics' and 'latent joint-fault regime' is introduced without prior definition or reference to standard robotics terminology (e.g., joint velocity/position histories), which could be clarified for readers outside the immediate subfield.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which highlights important areas for strengthening the presentation of our claims and the J-PARC method. We address each major comment below.

read point-by-point responses

Referee: [Abstract and Experiments] Abstract and Experiments section: The central claim that 'performance drops cannot be attributed solely to physical infeasibility' and that 'feasible faults such as increased joint friction can still substantially reduce success rates' is load-bearing for distinguishing closed-loop mismatch from generic execution failure. However, no fault-model equations (e.g., definitions of friction coefficients or actuator degradation), kinematic/dynamic feasibility metrics, or baselines (e.g., oracle controller under identical perturbations) are supplied. This prevents verification that the observed heterogeneous success degradation is policy-specific rather than confounded by altered observation distributions or velocity limits.

Authors: We agree that the central claim requires explicit supporting details to isolate policy-specific closed-loop mismatch. In the revised manuscript, we will add the fault-model equations (including definitions of friction coefficients and actuator degradation), kinematic/dynamic feasibility metrics confirming the faults remain feasible, and an oracle controller baseline under identical perturbations. These additions will enable verification that the heterogeneous degradation is attributable to the VLA policy rather than confounders such as altered observations or velocity limits. revision: yes
Referee: [Proposed Method (J-PARC)] J-PARC description (proposed method section): The framework 'infers a latent joint-fault regime from recent joint dynamics and conditions a shared residual calibrator on this regime' is presented as the mitigation, but lacks any equations for regime inference, the conditioning mechanism, loss functions, or training details on how the calibrator is learned. Without these, it is impossible to assess whether J-PARC actually addresses the identified mismatch or merely fits to the experimental perturbations.

Authors: We acknowledge that the J-PARC description is currently high-level and omits the requested mathematical details. In the revised manuscript, we will provide the equations for latent joint-fault regime inference from recent joint dynamics, the conditioning mechanism for the shared residual calibrator, the loss functions for training the calibrator, and the full training procedure with hyperparameters. These will allow readers to evaluate how J-PARC specifically mitigates the identified mismatch. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical claims rest on experiments, not self-referential derivations

full rationale

The paper is an empirical study of VLA robustness under joint faults, with claims about heterogeneous degradation and non-infeasibility causes supported by experimental results rather than any derivation chain. No equations, fitted parameters renamed as predictions, self-citations used as uniqueness theorems, or ansatzes smuggled via prior work are present in the provided text. The proposed J-PARC framework is described at a high level without reducing to its own inputs by construction. This matches the default expectation for non-circular papers; the reader's assessment of score 1.0 is consistent with absence of any load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Abstract-only review yields minimal identifiable free parameters or invented entities beyond the proposed framework itself; domain assumptions about fault realism are stated but unverified.

axioms (1)

domain assumption Joint-level physical faults alter the action-to-motion interface and disrupt the learned closed-loop relationship between commanded actions, realized motion, and observations.
Invoked in the abstract as the core reason faults are critical for VLA policies.

invented entities (1)

J-PARC (Joint-level Physical-fault Aware Residual Calibrator) no independent evidence
purpose: Lightweight residual calibration framework that infers latent joint-fault regime and conditions a shared residual calibrator.
Introduced in the abstract as the proposed solution; no independent evidence of its effectiveness is supplied beyond the abstract claim.

pith-pipeline@v0.9.1-grok · 5784 in / 1313 out tokens · 25994 ms · 2026-06-27T12:48:32.616817+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 12 canonical work pages

[1]

Black, N

K. Black, N. Brown, J. Darpinian, K. Dhabalia, D. Driess, A. Esmail, M. R. Equi, C. Finn, N. Fusai, M. Y . Galliker, D. Ghosh, L. Groom, K. Hausman, b. ichter, S. Jakubczak, T. Jones, L. Ke, D. LeBlanc, S. Levine, A. Li-Bell, M. Mothukuri, S. Nair, K. Pertsch, A. Z. Ren, L. X. Shi, L. Smith, J. T. Springenberg, K. Stachowicz, J. Tanner, Q. Vuong, H. Walke...

2025
[2]

Black, N

K. Black, N. Brown, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, L. Groom, K. Hausman, B. Ichter, S. Jakubczak, T. Jones, L. Ke, S. Levine, A. Li-Bell, M. Mothukuri, S. Nair, K. Pertsch, L. X. Shi, J. Tanner, Q. Vuong, A. Walling, H. Wang, and U. Zhilinsky.π0: A vision-language- action flow model for general robot control, 2026. URL https://arxiv.org...

2026
[3]

Q. Bu, Y . Yang, J. Cai, S. Gao, G. Ren, M. Yao, P. Luo, and H. Li. Univla: Learning to act anywhere with task-centric latent actions.arXiv preprint arXiv:2505.06111, 2025

Pith/arXiv arXiv 2025
[4]

M. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. Foster, G. Lam, P. Sanketi, Q. Vuong, T. Kollar, B. Burchfiel, R. Tedrake, D. Sadigh, S. Levine, P. Liang, and C. Finn. Openvla: An open-source vision-language-action model.arXiv preprint arXiv:2406.09246, 2024

Pith/arXiv arXiv 2024
[5]

M. J. Kim, C. Finn, and P. Liang. Fine-tuning vision-language-action models: Optimizing speed and success, 2025. URLhttps://arxiv.org/abs/2502.19645

Pith/arXiv arXiv 2025
[6]

J. Guan, T. Ding, L. Cao, L. Pan, C. Wang, and X. Zheng. Probing the robustness of vision- language pretrained models: A multimodal adversarial attack approach, 2024. URL https: //arxiv.org/abs/2408.13461

arXiv 2024
[7]

X. Lu, J. Chen, S. Xiao, Z. Jin, R. Zhou, X. Ji, and W. Xu. Exploring the robustness of vision-language-action models against sensor attacks. InProceedings of the 2025 Workshop on Large AI Systems and Models with Privacy and Security Analysis, LAMPS ’25, page 11–18, New York, NY , USA, 2025. Association for Computing Machinery. ISBN 9798400718960. doi:10....

work page doi:10.1145/3733800.3763262 2025
[8]

S. Fei, S. Wang, J. Shi, Z. Dai, J. Cai, P. Qian, L. Ji, X. He, S. Zhang, Z. Fei, J. Fu, J. Gong, and X. Qiu. Libero-plus: In-depth robustness analysis of vision-language-action models.arXiv preprint arXiv:2510.13626, 2025

Pith/arXiv arXiv 2025
[9]

T.-H. Pham, G. Aikins, T. Truong, and K.-D. Nguyen. Adaptive compensation for robotic joint failures using partially observable reinforcement learning, 2024. URL https://arxiv.org/ abs/2409.14435

arXiv 2024
[10]

T. Hou, J. Tu, X. Gao, Z. Dong, P. Zhai, and L. Zhang. Multi-task learning of active fault-tolerant controller for leg failures in quadruped robots, 2024. URL https://arxiv.org/abs/2402. 08996

2024
[11]

G. G. Briscoe-Martinez, Y . Gautam, R. Shetty, A. Pasricha, M. M. Nicotra, and A. Roncone. Moving On, Even When You’re Broken: Fail-Active Trajectory Generation via Diffusion Policies Conditioned on Embodiment and Task.arXiv e-prints, art. arXiv:2602.02895, Feb

arXiv
[12]

doi:10.48550/arXiv.2602.02895

work page doi:10.48550/arxiv.2602.02895
[13]

I. Eski, S. Erkaya, S. Savas, and S. Yildirim. Fault detection on robot manipulators using artificial neural networks.Robotics and Computer-Integrated Manufacturing, 27(1):115–123,
[14]

doi:https://doi.org/10.1016/j.rcim.2010.06.017

ISSN 0736-5845. doi:https://doi.org/10.1016/j.rcim.2010.06.017. URL https://www. sciencedirect.com/science/article/pii/S0736584510000682. 9

work page doi:10.1016/j.rcim.2010.06.017 2010
[15]

M. Goel, A. Maciejewski, and V . Balakrishnan. Analyzing unidentified locked-joint failures in kinematically redundant manipulators.J. Field Robotics, 22:15–29, 01 2005. doi:10.1002/rob. 20046

work page doi:10.1002/rob 2005
[16]

Tinós and M

R. Tinós and M. H. Terra. Free-swinging and locked joint fault detection and isolation in cooperative manipulators. InThe European Symposium on Artificial Neural Networks, 2002. URLhttps://api.semanticscholar.org/CorpusID:13917572

2002
[17]

X. Liu, H. Li, J. Wang, and G. Cai. Dynamics analysis of flexible space robot with joint friction.Aerospace Science and Technology, 47:164–176, 2015. ISSN 1270-9638. doi:https: //doi.org/10.1016/j.ast.2015.09.030. URL https://www.sciencedirect.com/science/ article/pii/S1270963815002977

work page doi:10.1016/j.ast.2015.09.030 2015
[18]

L. Hao, R. Pagani, M. Beschi, and G. Legnani. Dynamic and friction parameters of an industrial robot: Identification, comparison and repetitiveness analysis.Robotics, 10(1), 2021. ISSN 2218-

2021
[19]

URL https://www.mdpi.com/2218-6581/10/1/49

doi:10.3390/robotics10010049. URL https://www.mdpi.com/2218-6581/10/1/49

work page doi:10.3390/robotics10010049
[20]

Bittencourt.Modeling and Diagnosis of Friction and Wear in Industrial Robots

A. Bittencourt.Modeling and Diagnosis of Friction and Wear in Industrial Robots. 09 2014. ISBN 9789175192512. doi:10.3384/diss.diva-109335

work page doi:10.3384/diss.diva-109335 2014
[21]

B. Liu, Y . Zhu, C. Gao, Y . Feng, Q. Liu, Y . Zhu, and P. Stone. Libero: benchmarking knowledge transfer for lifelong robot learning. InProceedings of the 37th International Conference on Neural Information Processing Systems, NIPS ’23, Red Hook, NY , USA, 2023. Curran Associates Inc

2023
[22]

Levine, A

S. Levine, A. Kumar, G. Tucker, and J. Fu. Offline reinforcement learning: Tutorial, review, and perspectives on open problems, 2020. URLhttps://arxiv.org/abs/2005.01643

Pith/arXiv arXiv 2020
[23]

J. Guo, Z. Wu, C. Tu, Y . Ma, X. Kong, Z. Liu, J. Ji, S. Zhang, Y . Chen, K. Chen, Q. Dou, Y . Yang, X. Liu, H. Zhao, W. Lv, and S. Li. On robustness of vision-language-action model against multi-modal perturbations. InThe Fourteenth International Conference on Learning Representations, 2026. URLhttps://openreview.net/forum?id=cS6xizdYD5

2026
[24]

Dariush, Y

B. Dariush, Y . Zhu, A. Arumbakkam, and K. Fujimura. Constrained closed loop inverse kinematics. In2010 IEEE International Conference on Robotics and Automation, pages 2499–2506, 2010. doi:10.1109/ROBOT.2010.5509456

work page doi:10.1109/robot.2010.5509456 2010
[25]

Welman.Inverse Kinematics and Geometric Constraints for Articulated Figure Manipulation [microform]

C. Welman.Inverse Kinematics and Geometric Constraints for Articulated Figure Manipulation [microform]. Canadian theses on microfiche. Thesis (M.Sc.)–Simon Fraser University, 1993. ISBN 9780315912564. URLhttps://books.google.co.kr/books?id=PqbDAAAACAAJ

1993
[26]

Brohan, N

A. Brohan, N. Brown, J. Carbajal, Y . Chebotar, J. Dabis, C. Finn, K. Gopalakrishnan, K. Haus- man, A. Herzog, J. Hsu, J. Ibarz, B. Ichter, A. Irpan, T. Jackson, S. Jesmonth, N. Joshi, R. Julian, D. Kalashnikov, Y . Kuang, I. Leal, K.-H. Lee, S. Levine, Y . Lu, U. Malla, D. Manjunath, I. Mordatch, O. Nachum, C. Parada, J. Peralta, E. Perez, K. Pertsch, J....

Pith/arXiv arXiv 2022
[27]

O. X.-E. Collaboration, A. O’Neill, A. Rehman, A. Gupta, A. Maddukuri, A. Gupta, A. Padalkar, A. Lee, A. Pooley, A. Gupta, A. Mandlekar, A. Jain, A. Tung, A. Bewley, A. Herzog, A. Irpan, A. Khazatsky, A. Rai, A. Gupta, A. Wang, A. Kolobov, A. Singh, A. Garg, A. Kembhavi, A. Xie, A. Brohan, A. Raffin, A. Sharma, A. Yavary, A. Jain, A. Balakrishna, A. Wahid...

Pith/arXiv arXiv 2023
[28]

Ghosh, H

Octo Model Team, D. Ghosh, H. Walke, K. Pertsch, K. Black, O. Mees, S. Dasari, J. Hejna, C. Xu, J. Luo, T. Kreiman, Y . Tan, L. Y . Chen, P. Sanketi, Q. Vuong, T. Xiao, D. Sadigh, C. Finn, and S. Levine. Octo: An open-source generalist robot policy. InProceedings of Robotics: Science and Systems, Delft, Netherlands, 2024

2024
[29]

Zitkovich, T

B. Zitkovich, T. Yu, S. Xu, P. Xu, T. Xiao, F. Xia, J. Wu, P. Wohlhart, S. Welker, A. Wahid, Q. Vuong, V . Vanhoucke, H. Tran, R. Soricut, A. Singh, J. Singh, P. Sermanet, P. R. Sanketi, G. Salazar, M. S. Ryoo, K. Reymann, K. Rao, K. Pertsch, I. Mordatch, H. Michalewski, Y . Lu, S. Levine, L. Lee, T.-W. E. Lee, I. Leal, Y . Kuang, D. Kalashnikov, R. Julia...

2023
[30]

Y . Ma, Z. Song, Y . Zhuang, J. Hao, and I. King. A survey on vision-language-action models for embodied AI.CoRR, abs/2405.14093, 2024

Pith/arXiv arXiv 2024
[31]

Bjorck, F

NVIDIA, :, J. Bjorck, F. Castañeda, N. Cherniadev, X. Da, R. Ding, L. J. Fan, Y . Fang, D. Fox, F. Hu, S. Huang, J. Jang, Z. Jiang, J. Kautz, K. Kundalia, L. Lao, Z. Li, Z. Lin, K. Lin, G. Liu, E. Llontop, L. Magne, A. Mandlekar, A. Narayan, S. Nasiriany, S. Reed, Y . L. Tan, G. Wang, Z. Wang, J. Wang, Q. Wang, J. Xiang, Y . Xie, Y . Xu, Z. Xu, S. Ye, Z. ...

Pith/arXiv arXiv 2025
[32]

Pertsch, K

K. Pertsch, K. Stachowicz, B. Ichter, D. Driess, S. Nair, Q. Vuong, O. Mees, C. Finn, and S. Levine. Fast: Efficient action tokenization for vision-language-action models, 2025. URL https://arxiv.org/abs/2501.09747

Pith/arXiv arXiv 2025
[33]

Shukor, D

M. Shukor, D. Aubakirova, F. Capuano, P. Kooijmans, S. Palma, A. Zouitine, M. Aractingi, C. Pascal, M. Russi, A. Marafioti, S. Alibert, M. Cord, T. Wolf, and R. Cadene. Smolvla: A vision-language-action model for affordable and efficient robotics, 2025. URL https: //arxiv.org/abs/2506.01844

Pith/arXiv arXiv 2025
[34]

Zhong, F

Y . Zhong, F. Bai, S. Cai, X. Huang, Z. Chen, X. Zhang, Y . Wang, S. Guo, T. Guan, K. N. Lui, Z. Qi, Y . Liang, Y . Chen, and Y . Yang. A survey on vision-language-action models: An action tokenization perspective, 2025. URLhttps://arxiv.org/abs/2507.01925

Pith/arXiv arXiv 2025
[35]

Z. Wang, Z. Zhou, J. Song, Y . Huang, Z. Shu, and L. Ma. Vlatest: Testing and evaluating vision-language-action models for robotic manipulation.Proc. ACM Softw. Eng., 2(FSE), June
[36]

URLhttps://doi.org/10.1145/3729343

doi:10.1145/3729343. URLhttps://doi.org/10.1145/3729343

work page doi:10.1145/3729343
[37]

Zhang, S

H. Zhang, S. Zhang, J. Jin, Q. Zeng, R. Li, and D. Wang. Robustvla: Robustness-aware reinforcement post-training for vision-language-action models, 2025. URL https://arxiv. org/abs/2511.01331

arXiv 2025
[38]

T. Wang, C. Han, J. Liang, W. Yang, D. Liu, L. X. Zhang, Q. Wang, J. Luo, and R. Tang. Exploring the adversarial vulnerabilities of vision-language-action models in robotics. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 6948–6958, October 2025

2025
[39]

H. Liu, S. Ruan, J. Long, J. Wu, J. Hou, H. Tang, T. Jiang, W. Zhou, and W. Yao. Eva-vla: Evaluating vision-language-action models’ robustness under real-world physical variations,
[40]

URLhttps://arxiv.org/abs/2509.18953

arXiv
[41]

Y . Yan, Y . Xie, Y . Zhang, L. Lyu, H. Wang, and Y . Jin. When alignment fails: Multimodal adversarial attacks on vision-language-action models, 2025. URL https://arxiv.org/abs/ 2511.16203

arXiv 2025
[42]

X. Zhou, G. Tie, G. Zhang, H. Wang, P. Zhou, and L. Sun. BadVLA: Towards backdoor attacks on vision-language-action models via objective-decoupled optimization. InThe Thirty- ninth Annual Conference on Neural Information Processing Systems, 2025. URL https: //openreview.net/forum?id=rEhVHla9zp

2025
[43]

Jamisola, A

R. Jamisola, A. Maciejewski, and R. Roberts. Failure-tolerant path planning for kinematically redundant manipulators anticipating locked-joint failures.IEEE Transactions on Robotics, 22 (4):603–612, 2006. doi:10.1109/TRO.2006.878959

work page doi:10.1109/tro.2006.878959 2006
[44]

Lewis and A

C. Lewis and A. Maciejewski. Fault tolerant operation of kinematically redundant manipulators for locked joint failures.IEEE Transactions on Robotics and Automation, 13(4):622–629, 1997. doi:10.1109/70.611335

work page doi:10.1109/70.611335 1997
[45]

Visinsky, J

M. Visinsky, J. Cavallaro, and I. Walker. Robotic fault detection and fault tolerance: A survey.Reliability Engineering & System Safety, 46(2):139–158, 1994. ISSN 0951-8320. doi: https://doi.org/10.1016/0951-8320(94)90132-5. URL https://www.sciencedirect.com/ science/article/pii/0951832094901325. 12 A Implementation and Evaluation Details A.1 Implementati...

work page doi:10.1016/0951-8320(94)90132-5 1994

[1] [1]

Black, N

K. Black, N. Brown, J. Darpinian, K. Dhabalia, D. Driess, A. Esmail, M. R. Equi, C. Finn, N. Fusai, M. Y . Galliker, D. Ghosh, L. Groom, K. Hausman, b. ichter, S. Jakubczak, T. Jones, L. Ke, D. LeBlanc, S. Levine, A. Li-Bell, M. Mothukuri, S. Nair, K. Pertsch, A. Z. Ren, L. X. Shi, L. Smith, J. T. Springenberg, K. Stachowicz, J. Tanner, Q. Vuong, H. Walke...

2025

[2] [2]

Black, N

K. Black, N. Brown, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, L. Groom, K. Hausman, B. Ichter, S. Jakubczak, T. Jones, L. Ke, S. Levine, A. Li-Bell, M. Mothukuri, S. Nair, K. Pertsch, L. X. Shi, J. Tanner, Q. Vuong, A. Walling, H. Wang, and U. Zhilinsky.π0: A vision-language- action flow model for general robot control, 2026. URL https://arxiv.org...

2026

[3] [3]

Q. Bu, Y . Yang, J. Cai, S. Gao, G. Ren, M. Yao, P. Luo, and H. Li. Univla: Learning to act anywhere with task-centric latent actions.arXiv preprint arXiv:2505.06111, 2025

Pith/arXiv arXiv 2025

[4] [4]

M. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. Foster, G. Lam, P. Sanketi, Q. Vuong, T. Kollar, B. Burchfiel, R. Tedrake, D. Sadigh, S. Levine, P. Liang, and C. Finn. Openvla: An open-source vision-language-action model.arXiv preprint arXiv:2406.09246, 2024

Pith/arXiv arXiv 2024

[5] [5]

M. J. Kim, C. Finn, and P. Liang. Fine-tuning vision-language-action models: Optimizing speed and success, 2025. URLhttps://arxiv.org/abs/2502.19645

Pith/arXiv arXiv 2025

[6] [6]

J. Guan, T. Ding, L. Cao, L. Pan, C. Wang, and X. Zheng. Probing the robustness of vision- language pretrained models: A multimodal adversarial attack approach, 2024. URL https: //arxiv.org/abs/2408.13461

arXiv 2024

[7] [7]

X. Lu, J. Chen, S. Xiao, Z. Jin, R. Zhou, X. Ji, and W. Xu. Exploring the robustness of vision-language-action models against sensor attacks. InProceedings of the 2025 Workshop on Large AI Systems and Models with Privacy and Security Analysis, LAMPS ’25, page 11–18, New York, NY , USA, 2025. Association for Computing Machinery. ISBN 9798400718960. doi:10....

work page doi:10.1145/3733800.3763262 2025

[8] [8]

S. Fei, S. Wang, J. Shi, Z. Dai, J. Cai, P. Qian, L. Ji, X. He, S. Zhang, Z. Fei, J. Fu, J. Gong, and X. Qiu. Libero-plus: In-depth robustness analysis of vision-language-action models.arXiv preprint arXiv:2510.13626, 2025

Pith/arXiv arXiv 2025

[9] [9]

T.-H. Pham, G. Aikins, T. Truong, and K.-D. Nguyen. Adaptive compensation for robotic joint failures using partially observable reinforcement learning, 2024. URL https://arxiv.org/ abs/2409.14435

arXiv 2024

[10] [10]

T. Hou, J. Tu, X. Gao, Z. Dong, P. Zhai, and L. Zhang. Multi-task learning of active fault-tolerant controller for leg failures in quadruped robots, 2024. URL https://arxiv.org/abs/2402. 08996

2024

[11] [11]

G. G. Briscoe-Martinez, Y . Gautam, R. Shetty, A. Pasricha, M. M. Nicotra, and A. Roncone. Moving On, Even When You’re Broken: Fail-Active Trajectory Generation via Diffusion Policies Conditioned on Embodiment and Task.arXiv e-prints, art. arXiv:2602.02895, Feb

arXiv

[12] [12]

doi:10.48550/arXiv.2602.02895

work page doi:10.48550/arxiv.2602.02895

[13] [13]

I. Eski, S. Erkaya, S. Savas, and S. Yildirim. Fault detection on robot manipulators using artificial neural networks.Robotics and Computer-Integrated Manufacturing, 27(1):115–123,

[14] [14]

doi:https://doi.org/10.1016/j.rcim.2010.06.017

ISSN 0736-5845. doi:https://doi.org/10.1016/j.rcim.2010.06.017. URL https://www. sciencedirect.com/science/article/pii/S0736584510000682. 9

work page doi:10.1016/j.rcim.2010.06.017 2010

[15] [15]

M. Goel, A. Maciejewski, and V . Balakrishnan. Analyzing unidentified locked-joint failures in kinematically redundant manipulators.J. Field Robotics, 22:15–29, 01 2005. doi:10.1002/rob. 20046

work page doi:10.1002/rob 2005

[16] [16]

Tinós and M

R. Tinós and M. H. Terra. Free-swinging and locked joint fault detection and isolation in cooperative manipulators. InThe European Symposium on Artificial Neural Networks, 2002. URLhttps://api.semanticscholar.org/CorpusID:13917572

2002

[17] [17]

X. Liu, H. Li, J. Wang, and G. Cai. Dynamics analysis of flexible space robot with joint friction.Aerospace Science and Technology, 47:164–176, 2015. ISSN 1270-9638. doi:https: //doi.org/10.1016/j.ast.2015.09.030. URL https://www.sciencedirect.com/science/ article/pii/S1270963815002977

work page doi:10.1016/j.ast.2015.09.030 2015

[18] [18]

L. Hao, R. Pagani, M. Beschi, and G. Legnani. Dynamic and friction parameters of an industrial robot: Identification, comparison and repetitiveness analysis.Robotics, 10(1), 2021. ISSN 2218-

2021

[19] [19]

URL https://www.mdpi.com/2218-6581/10/1/49

doi:10.3390/robotics10010049. URL https://www.mdpi.com/2218-6581/10/1/49

work page doi:10.3390/robotics10010049

[20] [20]

Bittencourt.Modeling and Diagnosis of Friction and Wear in Industrial Robots

A. Bittencourt.Modeling and Diagnosis of Friction and Wear in Industrial Robots. 09 2014. ISBN 9789175192512. doi:10.3384/diss.diva-109335

work page doi:10.3384/diss.diva-109335 2014

[21] [21]

B. Liu, Y . Zhu, C. Gao, Y . Feng, Q. Liu, Y . Zhu, and P. Stone. Libero: benchmarking knowledge transfer for lifelong robot learning. InProceedings of the 37th International Conference on Neural Information Processing Systems, NIPS ’23, Red Hook, NY , USA, 2023. Curran Associates Inc

2023

[22] [22]

Levine, A

S. Levine, A. Kumar, G. Tucker, and J. Fu. Offline reinforcement learning: Tutorial, review, and perspectives on open problems, 2020. URLhttps://arxiv.org/abs/2005.01643

Pith/arXiv arXiv 2020

[23] [23]

J. Guo, Z. Wu, C. Tu, Y . Ma, X. Kong, Z. Liu, J. Ji, S. Zhang, Y . Chen, K. Chen, Q. Dou, Y . Yang, X. Liu, H. Zhao, W. Lv, and S. Li. On robustness of vision-language-action model against multi-modal perturbations. InThe Fourteenth International Conference on Learning Representations, 2026. URLhttps://openreview.net/forum?id=cS6xizdYD5

2026

[24] [24]

Dariush, Y

B. Dariush, Y . Zhu, A. Arumbakkam, and K. Fujimura. Constrained closed loop inverse kinematics. In2010 IEEE International Conference on Robotics and Automation, pages 2499–2506, 2010. doi:10.1109/ROBOT.2010.5509456

work page doi:10.1109/robot.2010.5509456 2010

[25] [25]

Welman.Inverse Kinematics and Geometric Constraints for Articulated Figure Manipulation [microform]

C. Welman.Inverse Kinematics and Geometric Constraints for Articulated Figure Manipulation [microform]. Canadian theses on microfiche. Thesis (M.Sc.)–Simon Fraser University, 1993. ISBN 9780315912564. URLhttps://books.google.co.kr/books?id=PqbDAAAACAAJ

1993

[26] [26]

Brohan, N

A. Brohan, N. Brown, J. Carbajal, Y . Chebotar, J. Dabis, C. Finn, K. Gopalakrishnan, K. Haus- man, A. Herzog, J. Hsu, J. Ibarz, B. Ichter, A. Irpan, T. Jackson, S. Jesmonth, N. Joshi, R. Julian, D. Kalashnikov, Y . Kuang, I. Leal, K.-H. Lee, S. Levine, Y . Lu, U. Malla, D. Manjunath, I. Mordatch, O. Nachum, C. Parada, J. Peralta, E. Perez, K. Pertsch, J....

Pith/arXiv arXiv 2022

[27] [27]

O. X.-E. Collaboration, A. O’Neill, A. Rehman, A. Gupta, A. Maddukuri, A. Gupta, A. Padalkar, A. Lee, A. Pooley, A. Gupta, A. Mandlekar, A. Jain, A. Tung, A. Bewley, A. Herzog, A. Irpan, A. Khazatsky, A. Rai, A. Gupta, A. Wang, A. Kolobov, A. Singh, A. Garg, A. Kembhavi, A. Xie, A. Brohan, A. Raffin, A. Sharma, A. Yavary, A. Jain, A. Balakrishna, A. Wahid...

Pith/arXiv arXiv 2023

[28] [28]

Ghosh, H

Octo Model Team, D. Ghosh, H. Walke, K. Pertsch, K. Black, O. Mees, S. Dasari, J. Hejna, C. Xu, J. Luo, T. Kreiman, Y . Tan, L. Y . Chen, P. Sanketi, Q. Vuong, T. Xiao, D. Sadigh, C. Finn, and S. Levine. Octo: An open-source generalist robot policy. InProceedings of Robotics: Science and Systems, Delft, Netherlands, 2024

2024

[29] [29]

Zitkovich, T

B. Zitkovich, T. Yu, S. Xu, P. Xu, T. Xiao, F. Xia, J. Wu, P. Wohlhart, S. Welker, A. Wahid, Q. Vuong, V . Vanhoucke, H. Tran, R. Soricut, A. Singh, J. Singh, P. Sermanet, P. R. Sanketi, G. Salazar, M. S. Ryoo, K. Reymann, K. Rao, K. Pertsch, I. Mordatch, H. Michalewski, Y . Lu, S. Levine, L. Lee, T.-W. E. Lee, I. Leal, Y . Kuang, D. Kalashnikov, R. Julia...

2023

[30] [30]

Y . Ma, Z. Song, Y . Zhuang, J. Hao, and I. King. A survey on vision-language-action models for embodied AI.CoRR, abs/2405.14093, 2024

Pith/arXiv arXiv 2024

[31] [31]

Bjorck, F

NVIDIA, :, J. Bjorck, F. Castañeda, N. Cherniadev, X. Da, R. Ding, L. J. Fan, Y . Fang, D. Fox, F. Hu, S. Huang, J. Jang, Z. Jiang, J. Kautz, K. Kundalia, L. Lao, Z. Li, Z. Lin, K. Lin, G. Liu, E. Llontop, L. Magne, A. Mandlekar, A. Narayan, S. Nasiriany, S. Reed, Y . L. Tan, G. Wang, Z. Wang, J. Wang, Q. Wang, J. Xiang, Y . Xie, Y . Xu, Z. Xu, S. Ye, Z. ...

Pith/arXiv arXiv 2025

[32] [32]

Pertsch, K

K. Pertsch, K. Stachowicz, B. Ichter, D. Driess, S. Nair, Q. Vuong, O. Mees, C. Finn, and S. Levine. Fast: Efficient action tokenization for vision-language-action models, 2025. URL https://arxiv.org/abs/2501.09747

Pith/arXiv arXiv 2025

[33] [33]

Shukor, D

M. Shukor, D. Aubakirova, F. Capuano, P. Kooijmans, S. Palma, A. Zouitine, M. Aractingi, C. Pascal, M. Russi, A. Marafioti, S. Alibert, M. Cord, T. Wolf, and R. Cadene. Smolvla: A vision-language-action model for affordable and efficient robotics, 2025. URL https: //arxiv.org/abs/2506.01844

Pith/arXiv arXiv 2025

[34] [34]

Zhong, F

Y . Zhong, F. Bai, S. Cai, X. Huang, Z. Chen, X. Zhang, Y . Wang, S. Guo, T. Guan, K. N. Lui, Z. Qi, Y . Liang, Y . Chen, and Y . Yang. A survey on vision-language-action models: An action tokenization perspective, 2025. URLhttps://arxiv.org/abs/2507.01925

Pith/arXiv arXiv 2025

[35] [35]

Z. Wang, Z. Zhou, J. Song, Y . Huang, Z. Shu, and L. Ma. Vlatest: Testing and evaluating vision-language-action models for robotic manipulation.Proc. ACM Softw. Eng., 2(FSE), June

[36] [36]

URLhttps://doi.org/10.1145/3729343

doi:10.1145/3729343. URLhttps://doi.org/10.1145/3729343

work page doi:10.1145/3729343

[37] [37]

Zhang, S

H. Zhang, S. Zhang, J. Jin, Q. Zeng, R. Li, and D. Wang. Robustvla: Robustness-aware reinforcement post-training for vision-language-action models, 2025. URL https://arxiv. org/abs/2511.01331

arXiv 2025

[38] [38]

T. Wang, C. Han, J. Liang, W. Yang, D. Liu, L. X. Zhang, Q. Wang, J. Luo, and R. Tang. Exploring the adversarial vulnerabilities of vision-language-action models in robotics. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 6948–6958, October 2025

2025

[39] [39]

H. Liu, S. Ruan, J. Long, J. Wu, J. Hou, H. Tang, T. Jiang, W. Zhou, and W. Yao. Eva-vla: Evaluating vision-language-action models’ robustness under real-world physical variations,

[40] [40]

URLhttps://arxiv.org/abs/2509.18953

arXiv

[41] [41]

Y . Yan, Y . Xie, Y . Zhang, L. Lyu, H. Wang, and Y . Jin. When alignment fails: Multimodal adversarial attacks on vision-language-action models, 2025. URL https://arxiv.org/abs/ 2511.16203

arXiv 2025

[42] [42]

X. Zhou, G. Tie, G. Zhang, H. Wang, P. Zhou, and L. Sun. BadVLA: Towards backdoor attacks on vision-language-action models via objective-decoupled optimization. InThe Thirty- ninth Annual Conference on Neural Information Processing Systems, 2025. URL https: //openreview.net/forum?id=rEhVHla9zp

2025

[43] [43]

Jamisola, A

R. Jamisola, A. Maciejewski, and R. Roberts. Failure-tolerant path planning for kinematically redundant manipulators anticipating locked-joint failures.IEEE Transactions on Robotics, 22 (4):603–612, 2006. doi:10.1109/TRO.2006.878959

work page doi:10.1109/tro.2006.878959 2006

[44] [44]

Lewis and A

C. Lewis and A. Maciejewski. Fault tolerant operation of kinematically redundant manipulators for locked joint failures.IEEE Transactions on Robotics and Automation, 13(4):622–629, 1997. doi:10.1109/70.611335

work page doi:10.1109/70.611335 1997

[45] [45]

Visinsky, J

M. Visinsky, J. Cavallaro, and I. Walker. Robotic fault detection and fault tolerance: A survey.Reliability Engineering & System Safety, 46(2):139–158, 1994. ISSN 0951-8320. doi: https://doi.org/10.1016/0951-8320(94)90132-5. URL https://www.sciencedirect.com/ science/article/pii/0951832094901325. 12 A Implementation and Evaluation Details A.1 Implementati...

work page doi:10.1016/0951-8320(94)90132-5 1994