Uncovering Vulnerability of Vision-Language-Action Models under Joint-Level Physical Faults
Pith reviewed 2026-06-27 12:48 UTC · model grok-4.3
The pith
Vision-language-action models lose task success under joint-level physical faults even when motions remain feasible.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
VLA models are vulnerable when predicted actions are executed through a perturbed robot body. Our analysis reveals joint-dependent effects, with heterogeneous degradation in task success across affected joints. We also show that performance drops cannot be attributed solely to physical infeasibility, since feasible faults such as increased joint friction can still substantially reduce success rates and induce closed-loop execution mismatch. Motivated by these findings, we propose Joint-level Physical-fault Aware Residual Calibrator (J-PARC), a lightweight residual calibration framework built on top of a frozen VLA policy. J-PARC infers a latent joint-fault regime from recent joint dynamics a
What carries the argument
Joint-level Physical-fault Aware Residual Calibrator (J-PARC), which infers a latent fault regime from recent joint dynamics and applies regime-conditioned residual corrections to the actions of a frozen VLA policy.
If this is right
- Task success degrades heterogeneously depending on which joint experiences the fault.
- Feasible physical changes such as increased friction still induce substantial closed-loop mismatch and lower success rates.
- A frozen VLA policy can be augmented with a lightweight, fault-regime-aware residual calibrator to recover robustness.
- The calibrator leaves performance unchanged in fault-free conditions.
Where Pith is reading between the lines
- Real deployments may require continuous monitoring of joint dynamics to trigger the right calibration regime.
- The same residual-calibration pattern could be tested on other action-generation methods such as diffusion policies or reinforcement-learning controllers.
- Combining multiple simultaneous joint faults would test whether the latent-regime inference remains effective under compound degradation.
Load-bearing premise
The listed joint-level faults produce a closed-loop execution mismatch that is representative of real robots and that the experimental setup isolates this mismatch from other factors.
What would settle it
A physical-robot experiment in which the same VLA policy shows no drop in task success under the described joint faults, or in which J-PARC yields no measurable improvement over the base policy under those faults.
Figures
read the original abstract
Deploying Vision-Language-Action (VLA) models in real robotic systems requires robustness not only to semantic and perceptual variations, but also to embodiment-side faults that change how actions are physically realized. Real robots can experience joint-level changes caused by actuator degradation, hardware faults, safety limits, collision damage, or wear-induced friction. These faults are critical because they alter the action-to-motion interface of a policy, disrupting the learned closed-loop relationship between commanded actions, realized motion, and subsequent observations. In this work, we study realistic joint-level physical faults and show that VLA models are vulnerable when predicted actions are executed through a perturbed robot body. Our analysis reveals joint-dependent effects, with heterogeneous degradation in task success across affected joints. We also show that performance drops cannot be attributed solely to physical infeasibility, since feasible faults such as increased joint friction can still substantially reduce success rates and induce closed-loop execution mismatch. Motivated by these findings, we propose Joint-level Physical-fault Aware Residual Calibrator (J-PARC), a lightweight residual calibration framework built on top of a frozen VLA policy. J-PARC infers a latent joint-fault regime from recent joint dynamics and conditions a shared residual calibrator on this regime, enabling adaptive action correction across faulty joints. Experiments show that J-PARC improves robustness under joint-level faults while preserving fault-free environment performance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that Vision-Language-Action (VLA) models are vulnerable to joint-level physical faults (actuator degradation, hardware faults, safety limits, collision damage, wear-induced friction) that alter the action-to-motion interface, causing heterogeneous degradation in task success rates across affected joints. It asserts that these drops cannot be attributed solely to physical infeasibility, as feasible faults like increased friction still induce closed-loop execution mismatch. Motivated by this, the authors propose J-PARC, a lightweight residual calibration framework on a frozen VLA policy that infers a latent joint-fault regime from recent joint dynamics and conditions a shared residual calibrator for adaptive action correction, with experiments showing improved robustness under faults while preserving fault-free performance.
Significance. If the experimental isolation of policy-specific mismatch holds, the work is significant for real-world VLA deployment in robotics, as it shifts focus from perceptual/semantic robustness to embodiment faults that disrupt the learned closed-loop relationship. The joint-dependent heterogeneity analysis and the practical J-PARC mitigation (which avoids retraining the base policy) could guide more reliable robotic systems. The emphasis on feasible faults provides a realistic lens on policy sensitivity beyond obvious kinematic violations.
major comments (2)
- [Abstract and Experiments] Abstract and Experiments section: The central claim that 'performance drops cannot be attributed solely to physical infeasibility' and that 'feasible faults such as increased joint friction can still substantially reduce success rates' is load-bearing for distinguishing closed-loop mismatch from generic execution failure. However, no fault-model equations (e.g., definitions of friction coefficients or actuator degradation), kinematic/dynamic feasibility metrics, or baselines (e.g., oracle controller under identical perturbations) are supplied. This prevents verification that the observed heterogeneous success degradation is policy-specific rather than confounded by altered observation distributions or velocity limits.
- [Proposed Method (J-PARC)] J-PARC description (proposed method section): The framework 'infers a latent joint-fault regime from recent joint dynamics and conditions a shared residual calibrator on this regime' is presented as the mitigation, but lacks any equations for regime inference, the conditioning mechanism, loss functions, or training details on how the calibrator is learned. Without these, it is impossible to assess whether J-PARC actually addresses the identified mismatch or merely fits to the experimental perturbations.
minor comments (2)
- [Abstract] The abstract supplies no quantitative details on datasets, tasks, number of trials, or success-rate tables, which reduces clarity even for a high-level overview.
- [Introduction and Method] Notation for 'joint dynamics' and 'latent joint-fault regime' is introduced without prior definition or reference to standard robotics terminology (e.g., joint velocity/position histories), which could be clarified for readers outside the immediate subfield.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, which highlights important areas for strengthening the presentation of our claims and the J-PARC method. We address each major comment below.
read point-by-point responses
-
Referee: [Abstract and Experiments] Abstract and Experiments section: The central claim that 'performance drops cannot be attributed solely to physical infeasibility' and that 'feasible faults such as increased joint friction can still substantially reduce success rates' is load-bearing for distinguishing closed-loop mismatch from generic execution failure. However, no fault-model equations (e.g., definitions of friction coefficients or actuator degradation), kinematic/dynamic feasibility metrics, or baselines (e.g., oracle controller under identical perturbations) are supplied. This prevents verification that the observed heterogeneous success degradation is policy-specific rather than confounded by altered observation distributions or velocity limits.
Authors: We agree that the central claim requires explicit supporting details to isolate policy-specific closed-loop mismatch. In the revised manuscript, we will add the fault-model equations (including definitions of friction coefficients and actuator degradation), kinematic/dynamic feasibility metrics confirming the faults remain feasible, and an oracle controller baseline under identical perturbations. These additions will enable verification that the heterogeneous degradation is attributable to the VLA policy rather than confounders such as altered observations or velocity limits. revision: yes
-
Referee: [Proposed Method (J-PARC)] J-PARC description (proposed method section): The framework 'infers a latent joint-fault regime from recent joint dynamics and conditions a shared residual calibrator on this regime' is presented as the mitigation, but lacks any equations for regime inference, the conditioning mechanism, loss functions, or training details on how the calibrator is learned. Without these, it is impossible to assess whether J-PARC actually addresses the identified mismatch or merely fits to the experimental perturbations.
Authors: We acknowledge that the J-PARC description is currently high-level and omits the requested mathematical details. In the revised manuscript, we will provide the equations for latent joint-fault regime inference from recent joint dynamics, the conditioning mechanism for the shared residual calibrator, the loss functions for training the calibrator, and the full training procedure with hyperparameters. These will allow readers to evaluate how J-PARC specifically mitigates the identified mismatch. revision: yes
Circularity Check
No circularity: empirical claims rest on experiments, not self-referential derivations
full rationale
The paper is an empirical study of VLA robustness under joint faults, with claims about heterogeneous degradation and non-infeasibility causes supported by experimental results rather than any derivation chain. No equations, fitted parameters renamed as predictions, self-citations used as uniqueness theorems, or ansatzes smuggled via prior work are present in the provided text. The proposed J-PARC framework is described at a high level without reducing to its own inputs by construction. This matches the default expectation for non-circular papers; the reader's assessment of score 1.0 is consistent with absence of any load-bearing circular steps.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Joint-level physical faults alter the action-to-motion interface and disrupt the learned closed-loop relationship between commanded actions, realized motion, and observations.
invented entities (1)
-
J-PARC (Joint-level Physical-fault Aware Residual Calibrator)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Black, N
K. Black, N. Brown, J. Darpinian, K. Dhabalia, D. Driess, A. Esmail, M. R. Equi, C. Finn, N. Fusai, M. Y . Galliker, D. Ghosh, L. Groom, K. Hausman, b. ichter, S. Jakubczak, T. Jones, L. Ke, D. LeBlanc, S. Levine, A. Li-Bell, M. Mothukuri, S. Nair, K. Pertsch, A. Z. Ren, L. X. Shi, L. Smith, J. T. Springenberg, K. Stachowicz, J. Tanner, Q. Vuong, H. Walke...
2025
-
[2]
Black, N
K. Black, N. Brown, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, L. Groom, K. Hausman, B. Ichter, S. Jakubczak, T. Jones, L. Ke, S. Levine, A. Li-Bell, M. Mothukuri, S. Nair, K. Pertsch, L. X. Shi, J. Tanner, Q. Vuong, A. Walling, H. Wang, and U. Zhilinsky.π0: A vision-language- action flow model for general robot control, 2026. URL https://arxiv.org...
2026
-
[3]
Q. Bu, Y . Yang, J. Cai, S. Gao, G. Ren, M. Yao, P. Luo, and H. Li. Univla: Learning to act anywhere with task-centric latent actions.arXiv preprint arXiv:2505.06111, 2025
Pith/arXiv arXiv 2025
-
[4]
M. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. Foster, G. Lam, P. Sanketi, Q. Vuong, T. Kollar, B. Burchfiel, R. Tedrake, D. Sadigh, S. Levine, P. Liang, and C. Finn. Openvla: An open-source vision-language-action model.arXiv preprint arXiv:2406.09246, 2024
Pith/arXiv arXiv 2024
-
[5]
M. J. Kim, C. Finn, and P. Liang. Fine-tuning vision-language-action models: Optimizing speed and success, 2025. URLhttps://arxiv.org/abs/2502.19645
Pith/arXiv arXiv 2025
-
[6]
J. Guan, T. Ding, L. Cao, L. Pan, C. Wang, and X. Zheng. Probing the robustness of vision- language pretrained models: A multimodal adversarial attack approach, 2024. URL https: //arxiv.org/abs/2408.13461
arXiv 2024
-
[7]
X. Lu, J. Chen, S. Xiao, Z. Jin, R. Zhou, X. Ji, and W. Xu. Exploring the robustness of vision-language-action models against sensor attacks. InProceedings of the 2025 Workshop on Large AI Systems and Models with Privacy and Security Analysis, LAMPS ’25, page 11–18, New York, NY , USA, 2025. Association for Computing Machinery. ISBN 9798400718960. doi:10....
-
[8]
S. Fei, S. Wang, J. Shi, Z. Dai, J. Cai, P. Qian, L. Ji, X. He, S. Zhang, Z. Fei, J. Fu, J. Gong, and X. Qiu. Libero-plus: In-depth robustness analysis of vision-language-action models.arXiv preprint arXiv:2510.13626, 2025
Pith/arXiv arXiv 2025
-
[9]
T.-H. Pham, G. Aikins, T. Truong, and K.-D. Nguyen. Adaptive compensation for robotic joint failures using partially observable reinforcement learning, 2024. URL https://arxiv.org/ abs/2409.14435
arXiv 2024
-
[10]
T. Hou, J. Tu, X. Gao, Z. Dong, P. Zhai, and L. Zhang. Multi-task learning of active fault-tolerant controller for leg failures in quadruped robots, 2024. URL https://arxiv.org/abs/2402. 08996
2024
-
[11]
G. G. Briscoe-Martinez, Y . Gautam, R. Shetty, A. Pasricha, M. M. Nicotra, and A. Roncone. Moving On, Even When You’re Broken: Fail-Active Trajectory Generation via Diffusion Policies Conditioned on Embodiment and Task.arXiv e-prints, art. arXiv:2602.02895, Feb
-
[12]
doi:10.48550/arXiv.2602.02895
-
[13]
I. Eski, S. Erkaya, S. Savas, and S. Yildirim. Fault detection on robot manipulators using artificial neural networks.Robotics and Computer-Integrated Manufacturing, 27(1):115–123,
-
[14]
doi:https://doi.org/10.1016/j.rcim.2010.06.017
ISSN 0736-5845. doi:https://doi.org/10.1016/j.rcim.2010.06.017. URL https://www. sciencedirect.com/science/article/pii/S0736584510000682. 9
-
[15]
M. Goel, A. Maciejewski, and V . Balakrishnan. Analyzing unidentified locked-joint failures in kinematically redundant manipulators.J. Field Robotics, 22:15–29, 01 2005. doi:10.1002/rob. 20046
work page doi:10.1002/rob 2005
-
[16]
Tinós and M
R. Tinós and M. H. Terra. Free-swinging and locked joint fault detection and isolation in cooperative manipulators. InThe European Symposium on Artificial Neural Networks, 2002. URLhttps://api.semanticscholar.org/CorpusID:13917572
2002
-
[17]
X. Liu, H. Li, J. Wang, and G. Cai. Dynamics analysis of flexible space robot with joint friction.Aerospace Science and Technology, 47:164–176, 2015. ISSN 1270-9638. doi:https: //doi.org/10.1016/j.ast.2015.09.030. URL https://www.sciencedirect.com/science/ article/pii/S1270963815002977
-
[18]
L. Hao, R. Pagani, M. Beschi, and G. Legnani. Dynamic and friction parameters of an industrial robot: Identification, comparison and repetitiveness analysis.Robotics, 10(1), 2021. ISSN 2218-
2021
-
[19]
URL https://www.mdpi.com/2218-6581/10/1/49
doi:10.3390/robotics10010049. URL https://www.mdpi.com/2218-6581/10/1/49
-
[20]
Bittencourt.Modeling and Diagnosis of Friction and Wear in Industrial Robots
A. Bittencourt.Modeling and Diagnosis of Friction and Wear in Industrial Robots. 09 2014. ISBN 9789175192512. doi:10.3384/diss.diva-109335
-
[21]
B. Liu, Y . Zhu, C. Gao, Y . Feng, Q. Liu, Y . Zhu, and P. Stone. Libero: benchmarking knowledge transfer for lifelong robot learning. InProceedings of the 37th International Conference on Neural Information Processing Systems, NIPS ’23, Red Hook, NY , USA, 2023. Curran Associates Inc
2023
-
[22]
S. Levine, A. Kumar, G. Tucker, and J. Fu. Offline reinforcement learning: Tutorial, review, and perspectives on open problems, 2020. URLhttps://arxiv.org/abs/2005.01643
Pith/arXiv arXiv 2020
-
[23]
J. Guo, Z. Wu, C. Tu, Y . Ma, X. Kong, Z. Liu, J. Ji, S. Zhang, Y . Chen, K. Chen, Q. Dou, Y . Yang, X. Liu, H. Zhao, W. Lv, and S. Li. On robustness of vision-language-action model against multi-modal perturbations. InThe Fourteenth International Conference on Learning Representations, 2026. URLhttps://openreview.net/forum?id=cS6xizdYD5
2026
-
[24]
B. Dariush, Y . Zhu, A. Arumbakkam, and K. Fujimura. Constrained closed loop inverse kinematics. In2010 IEEE International Conference on Robotics and Automation, pages 2499–2506, 2010. doi:10.1109/ROBOT.2010.5509456
-
[25]
Welman.Inverse Kinematics and Geometric Constraints for Articulated Figure Manipulation [microform]
C. Welman.Inverse Kinematics and Geometric Constraints for Articulated Figure Manipulation [microform]. Canadian theses on microfiche. Thesis (M.Sc.)–Simon Fraser University, 1993. ISBN 9780315912564. URLhttps://books.google.co.kr/books?id=PqbDAAAACAAJ
1993
-
[26]
A. Brohan, N. Brown, J. Carbajal, Y . Chebotar, J. Dabis, C. Finn, K. Gopalakrishnan, K. Haus- man, A. Herzog, J. Hsu, J. Ibarz, B. Ichter, A. Irpan, T. Jackson, S. Jesmonth, N. Joshi, R. Julian, D. Kalashnikov, Y . Kuang, I. Leal, K.-H. Lee, S. Levine, Y . Lu, U. Malla, D. Manjunath, I. Mordatch, O. Nachum, C. Parada, J. Peralta, E. Perez, K. Pertsch, J....
Pith/arXiv arXiv 2022
-
[27]
O. X.-E. Collaboration, A. O’Neill, A. Rehman, A. Gupta, A. Maddukuri, A. Gupta, A. Padalkar, A. Lee, A. Pooley, A. Gupta, A. Mandlekar, A. Jain, A. Tung, A. Bewley, A. Herzog, A. Irpan, A. Khazatsky, A. Rai, A. Gupta, A. Wang, A. Kolobov, A. Singh, A. Garg, A. Kembhavi, A. Xie, A. Brohan, A. Raffin, A. Sharma, A. Yavary, A. Jain, A. Balakrishna, A. Wahid...
Pith/arXiv arXiv 2023
-
[28]
Ghosh, H
Octo Model Team, D. Ghosh, H. Walke, K. Pertsch, K. Black, O. Mees, S. Dasari, J. Hejna, C. Xu, J. Luo, T. Kreiman, Y . Tan, L. Y . Chen, P. Sanketi, Q. Vuong, T. Xiao, D. Sadigh, C. Finn, and S. Levine. Octo: An open-source generalist robot policy. InProceedings of Robotics: Science and Systems, Delft, Netherlands, 2024
2024
-
[29]
Zitkovich, T
B. Zitkovich, T. Yu, S. Xu, P. Xu, T. Xiao, F. Xia, J. Wu, P. Wohlhart, S. Welker, A. Wahid, Q. Vuong, V . Vanhoucke, H. Tran, R. Soricut, A. Singh, J. Singh, P. Sermanet, P. R. Sanketi, G. Salazar, M. S. Ryoo, K. Reymann, K. Rao, K. Pertsch, I. Mordatch, H. Michalewski, Y . Lu, S. Levine, L. Lee, T.-W. E. Lee, I. Leal, Y . Kuang, D. Kalashnikov, R. Julia...
2023
-
[30]
Y . Ma, Z. Song, Y . Zhuang, J. Hao, and I. King. A survey on vision-language-action models for embodied AI.CoRR, abs/2405.14093, 2024
Pith/arXiv arXiv 2024
-
[31]
NVIDIA, :, J. Bjorck, F. Castañeda, N. Cherniadev, X. Da, R. Ding, L. J. Fan, Y . Fang, D. Fox, F. Hu, S. Huang, J. Jang, Z. Jiang, J. Kautz, K. Kundalia, L. Lao, Z. Li, Z. Lin, K. Lin, G. Liu, E. Llontop, L. Magne, A. Mandlekar, A. Narayan, S. Nasiriany, S. Reed, Y . L. Tan, G. Wang, Z. Wang, J. Wang, Q. Wang, J. Xiang, Y . Xie, Y . Xu, Z. Xu, S. Ye, Z. ...
Pith/arXiv arXiv 2025
-
[32]
K. Pertsch, K. Stachowicz, B. Ichter, D. Driess, S. Nair, Q. Vuong, O. Mees, C. Finn, and S. Levine. Fast: Efficient action tokenization for vision-language-action models, 2025. URL https://arxiv.org/abs/2501.09747
Pith/arXiv arXiv 2025
-
[33]
M. Shukor, D. Aubakirova, F. Capuano, P. Kooijmans, S. Palma, A. Zouitine, M. Aractingi, C. Pascal, M. Russi, A. Marafioti, S. Alibert, M. Cord, T. Wolf, and R. Cadene. Smolvla: A vision-language-action model for affordable and efficient robotics, 2025. URL https: //arxiv.org/abs/2506.01844
Pith/arXiv arXiv 2025
-
[34]
Y . Zhong, F. Bai, S. Cai, X. Huang, Z. Chen, X. Zhang, Y . Wang, S. Guo, T. Guan, K. N. Lui, Z. Qi, Y . Liang, Y . Chen, and Y . Yang. A survey on vision-language-action models: An action tokenization perspective, 2025. URLhttps://arxiv.org/abs/2507.01925
Pith/arXiv arXiv 2025
-
[35]
Z. Wang, Z. Zhou, J. Song, Y . Huang, Z. Shu, and L. Ma. Vlatest: Testing and evaluating vision-language-action models for robotic manipulation.Proc. ACM Softw. Eng., 2(FSE), June
-
[36]
URLhttps://doi.org/10.1145/3729343
doi:10.1145/3729343. URLhttps://doi.org/10.1145/3729343
- [37]
-
[38]
T. Wang, C. Han, J. Liang, W. Yang, D. Liu, L. X. Zhang, Q. Wang, J. Luo, and R. Tang. Exploring the adversarial vulnerabilities of vision-language-action models in robotics. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 6948–6958, October 2025
2025
-
[39]
H. Liu, S. Ruan, J. Long, J. Wu, J. Hou, H. Tang, T. Jiang, W. Zhou, and W. Yao. Eva-vla: Evaluating vision-language-action models’ robustness under real-world physical variations,
-
[40]
URLhttps://arxiv.org/abs/2509.18953
-
[41]
Y . Yan, Y . Xie, Y . Zhang, L. Lyu, H. Wang, and Y . Jin. When alignment fails: Multimodal adversarial attacks on vision-language-action models, 2025. URL https://arxiv.org/abs/ 2511.16203
arXiv 2025
-
[42]
X. Zhou, G. Tie, G. Zhang, H. Wang, P. Zhou, and L. Sun. BadVLA: Towards backdoor attacks on vision-language-action models via objective-decoupled optimization. InThe Thirty- ninth Annual Conference on Neural Information Processing Systems, 2025. URL https: //openreview.net/forum?id=rEhVHla9zp
2025
-
[43]
R. Jamisola, A. Maciejewski, and R. Roberts. Failure-tolerant path planning for kinematically redundant manipulators anticipating locked-joint failures.IEEE Transactions on Robotics, 22 (4):603–612, 2006. doi:10.1109/TRO.2006.878959
-
[44]
C. Lewis and A. Maciejewski. Fault tolerant operation of kinematically redundant manipulators for locked joint failures.IEEE Transactions on Robotics and Automation, 13(4):622–629, 1997. doi:10.1109/70.611335
-
[45]
M. Visinsky, J. Cavallaro, and I. Walker. Robotic fault detection and fault tolerance: A survey.Reliability Engineering & System Safety, 46(2):139–158, 1994. ISSN 0951-8320. doi: https://doi.org/10.1016/0951-8320(94)90132-5. URL https://www.sciencedirect.com/ science/article/pii/0951832094901325. 12 A Implementation and Evaluation Details A.1 Implementati...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.