Health-Conditioned Vision-Language-Action Models for Malfunction-Aware Robot Control

H\"useyin Arslan; \"Ozg\"ur Erkent

arxiv: 2605.16056 · v1 · pith:NQJHXNCRnew · submitted 2026-05-15 · 💻 cs.RO

Health-Conditioned Vision-Language-Action Models for Malfunction-Aware Robot Control

H\"useyin Arslan , \"Ozg\"ur Erkent This is my paper

Pith reviewed 2026-05-20 18:13 UTC · model grok-4.3

classification 💻 cs.RO

keywords vision-language-action modelsmalfunction-aware controlhealth-conditioned VLArobot joint degradationLIBERO environmentVLA-Adapterhealth projectorteleoperated episodes

0 comments

The pith

A lightweight health projector module added to vision-language-action models lets robots adapt to degraded joints and finish tasks where standard models fail.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to make vision-language-action models aware of a robot's physical condition by feeding them a health vector that describes joint angles and torque limits. The authors add a small Health Projector module to the existing VLA-Adapter and train it on 128 teleoperated episodes of simulated joint malfunctions collected in the LIBERO environment. This matters because everyday robots suffer gradual wear that causes current systems to stop working on assigned tasks. If the approach holds, robots could keep operating through partial hardware failures instead of needing immediate fixes or full retraining. The reported outcome is that the modified model succeeds across different degradation setups on spatial tasks while the unmodified pretrained version cannot.

Core claim

By injecting a Health Projector module into the VLA-Adapter architecture and training it on a dataset of 128 teleoperated malfunction episodes collected in the LIBERO environment, the health-conditioned model can successfully complete spatial tasks using degraded joints, whereas the unmodified Libero-Spatial-Pro model cannot.

What carries the argument

The Health Projector module, which accepts a health vector of joint operation angles and torque capabilities and conditions the model's action predictions to account for physical degradation.

If this is right

The model can adjust its behavior to varied configurations of degraded joints without retraining the entire pretrained VLA-Adapter.
Task success becomes possible even when joint angles or torque outputs are reduced below nominal levels.
Only a small module addition is required rather than a full redesign of the vision-language-action pipeline.
The trained adaptation generalizes across different degradation patterns encountered in the collected simulation data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same conditioning approach could be tested with online estimation of the health vector from onboard sensors instead of an external input.
The method might extend to additional failure modes such as gripper weakness or sensor drift beyond joint degradation.
Direct transfer experiments from the simulation-trained model to physical robots would clarify how well the learned adaptations hold in real hardware.
Combining this health conditioning with predictive maintenance alerts could reduce downtime in deployed robot systems.

Load-bearing premise

The health vector supplied to the projector accurately captures the robot's current joint operation angles and torque capabilities, and the 128 teleoperated malfunction episodes collected in simulation are representative enough for the model to generalize across varied degradation patterns.

What would settle it

Running the health-conditioned model on joint degradation patterns that differ from those in the 128 training episodes and finding that it fails at the same rate as the unmodified baseline, or observing that inaccurate health vectors cause the model to produce ineffective actions, would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.16056 by H\"useyin Arslan, \"Ozg\"ur Erkent.

read the original abstract

Research on Vision Language Action (VLA) models has been increasing rapidly in recent years. Although some of them focus on detecting, preventing, and recovering from task failures, they usually don't deal with adapting to robot's physical failures. In real-life scenarios, most robots face physical degradations in various ways such as joint degradation, actuator failure, or weak gripper. We introduce malfunction-aware (health-conditioned) VLA that takes a health vector as an input that gives information about robots' joints' operation angle and torque capability, and adapts its predictions to complete the tasks with the degraded joints. To achieve this, we inject a Health Projector module to the VLA-Adapter architecture and train it on malfunction robot data we collected on the LIBERO environment [1]. We collect 128 teleoperated episodes on Libero-Spatial tasks. Our results show that, with a very lightweight addition, the model can learn to operate successfully with different configurations of degraded joints which the default pretrained VLA-Adapter's Libero-Spatial-Pro model cannot. The code and dataset will be available soon at https://github.com/h-arslan/health-aware-vla

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes malfunction-aware Vision-Language-Action (VLA) models by augmenting VLA-Adapter with a lightweight Health Projector module. This module conditions the model on a health vector that encodes joint operation angles and torque capabilities, enabling adaptation to physical degradations such as joint failures. The approach is demonstrated by collecting and training on 128 teleoperated malfunction episodes in the LIBERO simulation environment for Libero-Spatial tasks, with the claim that the resulting model succeeds on degraded configurations where the default pretrained VLA-Adapter fails.

Significance. If the empirical results hold and generalize, the work addresses a practical gap in deploying VLA models on physical robots subject to hardware degradation. The lightweight projector design avoids full retraining and could support efficient adaptation; releasing the code and dataset would further strengthen reproducibility and enable follow-on research in robust robot control.

major comments (3)

[Data Collection] Data Collection section: the central generalization claim depends on the 128 teleoperated episodes spanning a representative range of degradation patterns (different joints, angle limits, torque reductions). The manuscript must report the exact distribution of these patterns and any held-out test configurations to establish that success is not limited to the collected set.
[Results] Results section: the abstract states successful operation but supplies no quantitative success rates, baseline comparisons (e.g., against fine-tuned VLA-Adapter or oracle health signals), ablation studies on the projector, or error analysis. These metrics are load-bearing for the claim that the health-conditioned model outperforms the default pretrained model.
[Methods / Health Projector] Health vector definition and inference: the paper must clarify whether the health vector is an oracle signal supplied during training and evaluation or estimated from real sensor data at deployment time; if the former, the adaptation claim does not yet extend to realistic malfunction detection.

minor comments (2)

[Abstract] Abstract: the citation to LIBERO [1] should be expanded to a full reference; the phrase 'different configurations of degraded joints' would benefit from a brief parenthetical example of the degradation types tested.
[Architecture] Notation: ensure consistent use of 'health vector' versus 'Health Projector' throughout; a diagram showing the exact insertion point of the projector into the VLA-Adapter would improve clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important areas for improving clarity, rigor, and completeness, particularly regarding data details, quantitative evaluation, and methodological clarifications. We address each major comment below and will incorporate revisions to strengthen the paper.

read point-by-point responses

Referee: [Data Collection] Data Collection section: the central generalization claim depends on the 128 teleoperated episodes spanning a representative range of degradation patterns (different joints, angle limits, torque reductions). The manuscript must report the exact distribution of these patterns and any held-out test configurations to establish that success is not limited to the collected set.

Authors: We agree that reporting the distribution of degradation patterns is necessary to support the generalization claims. In the revised manuscript, we will expand the Data Collection section with a table detailing the breakdown of the 128 episodes by joint type, angle limit reductions, and torque capability decreases. We will also explicitly describe the held-out test configurations and how they differ from the training degradations to demonstrate that performance is not limited to the collected set. revision: yes
Referee: [Results] Results section: the abstract states successful operation but supplies no quantitative success rates, baseline comparisons (e.g., against fine-tuned VLA-Adapter or oracle health signals), ablation studies on the projector, or error analysis. These metrics are load-bearing for the claim that the health-conditioned model outperforms the default pretrained model.

Authors: We acknowledge that the current version lacks the quantitative metrics needed to fully substantiate the performance claims. We will revise the Results section to include success rates for the health-conditioned model compared to the baseline pretrained VLA-Adapter, additional baselines such as fine-tuned VLA-Adapter and oracle health signals, ablation studies on the Health Projector module, and an error analysis of failure modes. These additions will provide a more complete and rigorous evaluation of the approach. revision: yes
Referee: [Methods / Health Projector] Health vector definition and inference: the paper must clarify whether the health vector is an oracle signal supplied during training and evaluation or estimated from real sensor data at deployment time; if the former, the adaptation claim does not yet extend to realistic malfunction detection.

Authors: We appreciate this clarification request. In the present work, the health vector is supplied as an oracle signal during both training and evaluation in the LIBERO simulation. We will update the Methods section to state this explicitly. We will also add a limitations paragraph and future work discussion on estimating the health vector from onboard sensor data to extend the approach toward realistic deployment scenarios. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical training on new malfunction data

full rationale

The paper describes collecting 128 teleoperated malfunction episodes in simulation, injecting a lightweight Health Projector module into the existing VLA-Adapter architecture, and training on this data to adapt to joint degradations. The central claim is validated by direct empirical comparison against the baseline pretrained Libero-Spatial-Pro model on the same task suite. No equations, parameters, or uniqueness claims are defined in terms of the target result; the health vector is an explicit input encoding joint angles and torques, and performance is measured on held-out or varied degradation configurations rather than by construction. The derivation chain consists of standard data collection plus supervised adaptation and does not reduce to self-definition, fitted-input renaming, or load-bearing self-citation.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim depends on the new Health Projector module and the assumption that the collected simulation episodes suffice for learning adaptation; no free parameters are explicitly fitted beyond standard model training.

free parameters (1)

Health vector parameterization
Exact scaling or discretization of joint angle and torque values inside the health vector is not specified.

axioms (1)

domain assumption Existing VLA-Adapter models remain functional after insertion of an additional projector module.
Invoked when the authors state they inject the Health Projector into the VLA-Adapter architecture.

invented entities (1)

Health Projector module no independent evidence
purpose: Processes the health vector and conditions the VLA output for malfunction-aware control.
New architectural component introduced in this work.

pith-pipeline@v0.9.0 · 5742 in / 1376 out tokens · 59812 ms · 2026-05-20T18:13:00.972366+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose a new policy that is health-conditioned: ât:t+C=πθ(ot,l,h) where the health vector h is projected into the model’s latent space via an MLP and fused with the action prediction head
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Health Projector: two-layer MLP … fh=W2 GELU(W1 h+b1)+b2 … zero-initialized so that fh=0 at the start of training

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 2 internal anchors

[1]

Libero: Benchmarking knowledge transfer for lifelong robot learning,

B. Liuet al., “Libero: Benchmarking knowledge transfer for lifelong robot learning,” inAdvances in Neural Information Processing Systems (NeurIPS), 2024

work page 2024
[2]

$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control

K. Blacket al., “π 0: A vision-language-action flow model for general robot control,”arXiv preprint arXiv:2410.24164, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[3]

Project gr00t: Foundation models for humanoid robots,

NVIDIA, “Project gr00t: Foundation models for humanoid robots,” Online, 2024, available: https://developer.nvidia.com/isaac

work page 2024
[4]

Vla-adapter: Efficient adaptation of vision- language-action models for autonomous manipulation,

Y . Chenet al., “Vla-adapter: Efficient adaptation of vision- language-action models for autonomous manipulation,”arXiv preprint arXiv:2501.09789, 2025

work page arXiv 2025
[5]

Robots that can adapt like animals,

A. Cully, J. Clune, D. Tarapore, and J.-B. Mouret, “Robots that can adapt like animals,”Nature, vol. 521, no. 7553, pp. 503–507, 2015

work page 2015
[6]

Impedance control: An approach to manipulation,

N. Hogan, “Impedance control: An approach to manipulation,”ASME Journal of Dynamic Systems, Measurement, and Control, vol. 107, no. 1, pp. 1–24, 1985

work page 1985
[7]

AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation

J. Duanet al., “Aha: A vision-language-model for detecting and reasoning over failures in robotic manipulation,”arXiv preprint arXiv:2410.00371, 2024

work page arXiv 2024
[8]

Safe: Multitask failure detection for vision-language- action models,

SAFE Authors, “Safe: Multitask failure detection for vision-language- action models,” arXiv preprint, 2025, available: https://vla-safe.github. io/

work page 2025
[9]

I-failsense: Towards general robotic failure detection with vision-language models,

C. Grissetet al., “I-failsense: Towards general robotic failure detection with vision-language models,” inProceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2026

work page 2026
[10]

Rt-2: Vision-language-action models transfer web knowledge to robotic control,

A. Brohanet al., “Rt-2: Vision-language-action models transfer web knowledge to robotic control,” inProceedings of the Conference on Robot Learning (CoRL), 2023

work page 2023
[11]

Openvla: An open-source vision-language-action model,

M. J. Kimet al., “Openvla: An open-source vision-language-action model,” inProceedings of the Conference on Robot Learning (CoRL), 2024

work page 2024
[12]

Octo: An open-source generalist robot policy,

Octo Model Teamet al., “Octo: An open-source generalist robot policy,” inProceedings of Robotics: Science and Systems (RSS), 2024

work page 2024
[13]

Qwen2.5 Technical Report

A. Yanget al., “Qwen2.5: A party of foundation models,”arXiv preprint arXiv:2412.15115, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[14]

Sigmoid loss for language image pre-training,

X. Zhaiet al., “Sigmoid loss for language image pre-training,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023

work page 2023
[15]

Fault tolerant properties of kinematically redundant manipulators,

A. A. Maciejewski, “Fault tolerant properties of kinematically redundant manipulators,” inProceedings of the IEEE International Conference on Robotics and Automation (ICRA), 1990, pp. 638–642

work page 1990
[16]

Bibliographical review on reconfigurable fault- tolerant control systems,

Y . Zhang and J. Jiang, “Bibliographical review on reconfigurable fault- tolerant control systems,”Annual Reviews in Control, vol. 32, no. 2, pp. 229–252, 2008

work page 2008
[17]

Domain randomization for transferring deep neural networks from simulation to the real world,

J. Tobinet al., “Domain randomization for transferring deep neural networks from simulation to the real world,” inProceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017, pp. 23–30

work page 2017
[18]

Towards embodiment scaling laws in robot locomotion,

B. Aiet al., “Towards embodiment scaling laws in robot locomotion,” inProceedings of the Conference on Robot Learning (CoRL), 2025

work page 2025

[1] [1]

Libero: Benchmarking knowledge transfer for lifelong robot learning,

B. Liuet al., “Libero: Benchmarking knowledge transfer for lifelong robot learning,” inAdvances in Neural Information Processing Systems (NeurIPS), 2024

work page 2024

[2] [2]

$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control

K. Blacket al., “π 0: A vision-language-action flow model for general robot control,”arXiv preprint arXiv:2410.24164, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[3] [3]

Project gr00t: Foundation models for humanoid robots,

NVIDIA, “Project gr00t: Foundation models for humanoid robots,” Online, 2024, available: https://developer.nvidia.com/isaac

work page 2024

[4] [4]

Vla-adapter: Efficient adaptation of vision- language-action models for autonomous manipulation,

Y . Chenet al., “Vla-adapter: Efficient adaptation of vision- language-action models for autonomous manipulation,”arXiv preprint arXiv:2501.09789, 2025

work page arXiv 2025

[5] [5]

Robots that can adapt like animals,

A. Cully, J. Clune, D. Tarapore, and J.-B. Mouret, “Robots that can adapt like animals,”Nature, vol. 521, no. 7553, pp. 503–507, 2015

work page 2015

[6] [6]

Impedance control: An approach to manipulation,

N. Hogan, “Impedance control: An approach to manipulation,”ASME Journal of Dynamic Systems, Measurement, and Control, vol. 107, no. 1, pp. 1–24, 1985

work page 1985

[7] [7]

AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation

J. Duanet al., “Aha: A vision-language-model for detecting and reasoning over failures in robotic manipulation,”arXiv preprint arXiv:2410.00371, 2024

work page arXiv 2024

[8] [8]

Safe: Multitask failure detection for vision-language- action models,

SAFE Authors, “Safe: Multitask failure detection for vision-language- action models,” arXiv preprint, 2025, available: https://vla-safe.github. io/

work page 2025

[9] [9]

I-failsense: Towards general robotic failure detection with vision-language models,

C. Grissetet al., “I-failsense: Towards general robotic failure detection with vision-language models,” inProceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2026

work page 2026

[10] [10]

Rt-2: Vision-language-action models transfer web knowledge to robotic control,

A. Brohanet al., “Rt-2: Vision-language-action models transfer web knowledge to robotic control,” inProceedings of the Conference on Robot Learning (CoRL), 2023

work page 2023

[11] [11]

Openvla: An open-source vision-language-action model,

M. J. Kimet al., “Openvla: An open-source vision-language-action model,” inProceedings of the Conference on Robot Learning (CoRL), 2024

work page 2024

[12] [12]

Octo: An open-source generalist robot policy,

Octo Model Teamet al., “Octo: An open-source generalist robot policy,” inProceedings of Robotics: Science and Systems (RSS), 2024

work page 2024

[13] [13]

Qwen2.5 Technical Report

A. Yanget al., “Qwen2.5: A party of foundation models,”arXiv preprint arXiv:2412.15115, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[14] [14]

Sigmoid loss for language image pre-training,

X. Zhaiet al., “Sigmoid loss for language image pre-training,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023

work page 2023

[15] [15]

Fault tolerant properties of kinematically redundant manipulators,

A. A. Maciejewski, “Fault tolerant properties of kinematically redundant manipulators,” inProceedings of the IEEE International Conference on Robotics and Automation (ICRA), 1990, pp. 638–642

work page 1990

[16] [16]

Bibliographical review on reconfigurable fault- tolerant control systems,

Y . Zhang and J. Jiang, “Bibliographical review on reconfigurable fault- tolerant control systems,”Annual Reviews in Control, vol. 32, no. 2, pp. 229–252, 2008

work page 2008

[17] [17]

Domain randomization for transferring deep neural networks from simulation to the real world,

J. Tobinet al., “Domain randomization for transferring deep neural networks from simulation to the real world,” inProceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017, pp. 23–30

work page 2017

[18] [18]

Towards embodiment scaling laws in robot locomotion,

B. Aiet al., “Towards embodiment scaling laws in robot locomotion,” inProceedings of the Conference on Robot Learning (CoRL), 2025

work page 2025