pith. sign in

arxiv: 2605.17144 · v1 · pith:AWVOQ37Anew · submitted 2026-05-16 · 💻 cs.RO · cs.AI· cs.LG

Contrastive Conceptor Activation Steering (COAST): Unlocking Vision-Language-Action Models through Hidden States

Pith reviewed 2026-05-20 14:26 UTC · model grok-4.3

classification 💻 cs.RO cs.AIcs.LG
keywords vision-language-action modelsactivation steeringconceptorrobotic policieslatent subspacestraining-free adaptationsuccess and failure rollouts
0
0 comments X

The pith

Steering VLA hidden states toward success subspaces from a few examples raises task success rates by over 20 percent in simulation and 40 percent on real robots.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Contrastive Conceptor Activation Steering (COAST) as a way to improve brittle Vision-Language-Action models by locating task-critical directions in their latent space. It fits linear conceptor operators on small sets of successful and failed rollouts to define success subspaces, then projects the model's activations into those subspaces at inference time. This training-free adjustment produces large gains across flow-matching, autoregressive, and diffusion policies. A sympathetic reader would care because the results indicate that current VLAs already encode much of the needed knowledge yet fail to use it effectively during action generation.

Core claim

COAST builds conceptors as linear operators that soft-project hidden states into the principal components of success distributions identified contrastively from success and failure trajectories. Applying these operators at inference time steers the residual stream of three architecturally distinct VLA policies, producing absolute mean success-rate gains exceeding 20 percent in simulation and 40 percent on physical robots. The resulting subspace geometry shows that failure modes share substantial structure across tasks while success representations stay largely task-specific, which sometimes permits a fitted conceptor to improve performance on new tasks without refitting.

What carries the argument

A contrastive conceptor, a linear operator that identifies and projects onto success-critical principal components derived from paired success and failure rollout examples.

If this is right

  • The same steering procedure works across flow-matching, autoregressive, and diffusion VLA architectures without any retraining.
  • Failure-mode subspaces can be reused across tasks that share similar error patterns.
  • Much of the task-relevant knowledge already exists inside the VLA latent representations and can be accessed by residual-stream adjustment.
  • The action-decoding bottleneck can be relieved by directing activations toward success distributions identified from data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Many VLA failures may stem from misalignment between encoded knowledge and the decoding pathway rather than missing capability.
  • Similar contrastive subspace methods could be tested on non-robotic vision-language models to check whether latent steering generalizes beyond action generation.
  • Online refitting of conceptors from recent rollouts might allow the same framework to adapt to distribution shift during deployment.

Load-bearing premise

Subspaces extracted from a small number of success and failure rollouts contain generalizable task-critical structure that transfers to new instances of the same task and sometimes to related tasks without refitting.

What would settle it

Measuring zero or negative change in task success rate when a conceptor fitted on one set of rollouts is applied to a new but related task whose failure modes differ would falsify the transfer claim.

Figures

Figures reproduced from arXiv: 2605.17144 by Brandon Yang, Lyle Ungar, Miranda Muqing Miao, Subin Kim.

Figure 1
Figure 1. Figure 1: Overview of COAST. COAST steers a frozen robot policy at inference time by multi￾plicatively gating its residual stream with a contrastive conceptor fit from rollout activations. Of￾fline, success and failure rollouts yield layer-ℓ activations, from which closed-form conceptors C + and C − are fit and composed via Boolean subspace algebra into Csteer = C + ∧ ¬C −. Inference, the residual h at layer ℓ is ga… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of Environment Setup. We evaluate on three simulation benchmarks of in￾creasing difficulty, MetaWorld ML45 (Yu et al., 2021), LIBERO-10 (Liu et al., 2023), and a select RoboCasa (Nasiriany et al., 2024) subset, plus three real-robot tasks on the DROID platform. • RoboCasa (Nasiriany et al., 2024) covers 7 atomic-seen tasks. Every episode re-samples the kitchen layout, object instances, placements,… view at source ↗
Figure 3
Figure 3. Figure 3: The contrastive subspace is low-rank and predictive. (A) Eigenvalue spectra of success (C+, solid) and failure (C−, dashed) conceptors on MetaWorld ML45 decay rapidly. (B) Per-task steering gain ∆SR against overlap sim(C+, C−). Spearman ρ = 0.59, p = 0.002. Outcome-relevant computation is low-rank, but not rank-one. Although the residual stream of the π0.5 model operates in a 1024- dimensional space, the e… view at source ↗
Figure 4
Figure 4. Figure 4: Conceptor steering pulls activations toward the success region. (A) Per-step activa￾tions projected onto the top two eigenvectors of Csteer, the subspace maximally separating baseline success from failure. Solid ellipses: baseline 2σ regions; dashed: steered; large markers: centroids. (B) Projection onto v1(Csteer) over normalized trajectory time (means ±1σ). (a) π0.5 LIBERO (b) π0.5 RoboCasa Target Self T… view at source ↗
Figure 5
Figure 5. Figure 5: Tasks share failure geometry but not success geometry, enabling cross-task steer￾ing. Left: Joint PCA of layer-11 activations on π0.5 LIBERO, 2σ ellipses per task. Failure acti￾vations (top) spread into task-specific regions whose overlap varies; success activations (bottom) cluster uniformly across tasks. Right: Each dot is one source→target pair on LIBERO or Robo￾Casa. Failure-subspace containment tr(C f… view at source ↗
Figure 6
Figure 6. Figure 6: Steering Performance by Layer. (a) Normalized SR using the top-3 parameter combi￾nations per method at each layer. The solid line represents the mean of the top-3 configurations, while the shaded band shows the spread (min-max). COAST was evaluated over 12 combinations (α, β, strategy), CAA over 3 (α), and SAE over 2 (α). (b) Absolute peak Success Rate achieved at each layer. COAST consistently outperforms… view at source ↗
Figure 7
Figure 7. Figure 7: Three-stage hyperparameter selection, validated on LIBERO-10 with π0.5. (A) Quota q(C) across layers (boxes) and mean steered success (red) both peak at L=11, so quota alone identifies the correct layer. (B) Mean success versus overlap at L=11. Success rises as overlap decreases, then saturates. (C) Overlap decreases monotonically with aperture α, so selecting the overlap band determines a narrow α range w… view at source ↗
Figure 8
Figure 8. Figure 8: Successful Real World Rollouts of π0.5-COAST. We evaluate on three different tasks: Open Drawer, Close Microwave, and Put Duck in Cabinet over 15 independent trials per task. This is an example trajectory of COAST in each of the tasks. A.13 Filtered Behavioral Cloning Baseline (SFT) As a parametric counterpart to our inference-time activation-steering intervention, we run a Filtered Behavioral Cloning (fil… view at source ↗
read the original abstract

Vision-Language-Action (VLA) models leverage powerful perceptual priors from web-scale Vision-Language Model (VLM) pre-training, yet they remain surprisingly brittle in practice, frequently failing at simple robotic tasks. To mitigate this, we propose Contrastive Conceptor Activation Steering (COAST). COAST builds on the notion of a "conceptor", a linear operator that soft-projects data into the principal components of a target distribution. COAST uses conceptors to identify success-critical subspaces for a target robotic task from a few examples of success and failure rollouts. At inference time, it steers VLA latents into these identified success subspaces to improve task outcomes. Across three architecturally distinct neural policies (flow-matching VLA, autoregressive VLA, and Diffusion Policy), COAST improves absolute mean simulation and real-robot task success rate by over 20 and 40% respectively. The activation subspace geometry reveals that failure modes share substantial structure across tasks while success representations remain largely task-specific. When tasks share similar failure modes, this structure enables previously fitted conceptors to improve performance on new tasks without refitting. Ultimately, our results suggest that current VLAs retain substantial task-relevant knowledge in their latent representations, and that the action expert's decoding bottleneck could be mitigated by steering its residual stream toward task-relevant subspaces. COAST provides a lightweight, training-free path to unlocking these latent capabilities by steering the model towards its own "success" distributions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes Contrastive Conceptor Activation Steering (COAST) for Vision-Language-Action (VLA) models. COAST derives linear conceptors from a small set of success and failure rollouts to identify task-critical subspaces in the model's hidden states. At inference, these conceptors steer activations toward success subspaces. The approach is tested on three architecturally distinct policies (flow-matching VLA, autoregressive VLA, Diffusion Policy) in simulation and on real robots, with reported absolute mean success-rate gains exceeding 20% (simulation) and 40% (real). The work also examines subspace geometry, observing that failure modes share structure across tasks while success representations are more task-specific, enabling some zero-shot transfer of fitted conceptors.

Significance. If the results are robust, the contribution is significant: it offers a training-free, lightweight method to unlock latent task-relevant knowledge already present in VLA residual streams, addressing the action-decoding bottleneck without retraining. Credit is due for the multi-architecture evaluation, inclusion of real-robot experiments, and the geometric analysis of shared failure subspaces. The use of held-out rollouts for conceptor fitting avoids direct circularity in the performance claims.

major comments (2)
  1. [Section 4] Section 4 (Experimental Results): the central claim of >20% simulation and >40% real-robot absolute gains across three model families is presented without specifying the number of trials per task, the statistical tests applied, exact baseline implementations, or data-exclusion criteria. These omissions make it impossible to judge whether the reported improvements are statistically reliable or reproducible.
  2. [§3.2] §3.2 (Conceptor Construction): the method fits conceptors on a small number of success/failure trajectories and assumes the resulting subspaces capture generalizable, task-critical directions. No sensitivity analysis to the number of examples, no ablation on spurious correlations in high-dimensional VLA streams, and no explicit test of transfer under distribution shift are provided, leaving the weakest assumption unexamined.
minor comments (2)
  1. [§3] The contrastive conceptor definition would be easier to follow if an explicit equation for the contrastive operator (success minus failure) were added alongside the standard conceptor formula.
  2. [Figure 3] Figure captions for the subspace-geometry visualizations should explicitly label success versus failure components and indicate whether the plots are averaged across tasks.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We have revised the manuscript to address the concerns about experimental reporting and methodological validation, adding the requested details and analyses while preserving the core contributions.

read point-by-point responses
  1. Referee: [Section 4] Section 4 (Experimental Results): the central claim of >20% simulation and >40% real-robot absolute gains across three model families is presented without specifying the number of trials per task, the statistical tests applied, exact baseline implementations, or data-exclusion criteria. These omissions make it impossible to judge whether the reported improvements are statistically reliable or reproducible.

    Authors: We agree these details are essential for assessing reliability and reproducibility. The original manuscript reported mean success rates but did not explicitly state trial counts, tests, or exclusion rules. In the revision we have added: 50 trials per task in simulation and 25 trials per real-robot task; paired t-tests with reported p-values < 0.01 confirming statistical significance of the gains; exact baseline reproductions matching the original policy papers (same checkpoints, hyperparameters, and evaluation protocols); and data-exclusion criteria limited to hardware-induced failures (<5% of trials). These clarifications appear in Section 4, Table 1, and a new supplementary table. revision: yes

  2. Referee: [§3.2] §3.2 (Conceptor Construction): the method fits conceptors on a small number of success/failure trajectories and assumes the resulting subspaces capture generalizable, task-critical directions. No sensitivity analysis to the number of examples, no ablation on spurious correlations in high-dimensional VLA streams, and no explicit test of transfer under distribution shift are provided, leaving the weakest assumption unexamined.

    Authors: We acknowledge that the original submission lacked these robustness checks. We have now added: (i) a sensitivity study varying the number of success/failure examples from 5 to 50, showing that mean gains stabilize after approximately 10 examples; (ii) an ablation that randomizes success/failure labels and demonstrates that the resulting conceptors yield near-zero or negative gains, indicating the subspaces are not driven by spurious high-dimensional correlations; (iii) explicit distribution-shift experiments on held-out tasks with altered lighting, object positions, and minor dynamics changes, confirming that conceptors transfer effectively when failure subspaces overlap. These results are reported in the revised §3.2 and new Appendix C. revision: yes

Circularity Check

0 steps flagged

No significant circularity: empirical gains measured on held-out rollouts

full rationale

The paper defines conceptors from a small set of success/failure rollouts and evaluates steering performance on separate test rollouts across simulation and real-robot settings. This separation means reported success-rate improvements are not forced by the fitting procedure itself. No load-bearing step reduces to a self-definition, fitted-input-as-prediction, or self-citation chain; the central claim is an empirical intervention whose validity rests on generalization to new instances rather than algebraic equivalence to the input data.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on the prior definition of conceptors as linear projectors onto principal components of a target distribution and on the empirical claim that success/failure rollouts suffice to identify useful subspaces.

free parameters (1)
  • Number of success and failure rollout examples
    Used to estimate the target distribution for each conceptor; the exact count is described only as 'a few'.
axioms (1)
  • domain assumption Conceptor as a linear operator that soft-projects data into the principal components of a target distribution
    Invoked to justify identifying success-critical subspaces from limited rollout data.

pith-pipeline@v0.9.0 · 5805 in / 1281 out tokens · 47013 ms · 2026-05-20T14:26:09.164872+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages

  1. [1]

    2025 , eprint=

    Latent Activation Editing: Inference-Time Refinement of Learned Policies for Safer Multirobot Navigation , author=. 2025 , eprint=

  2. [2]

    Reinforcement learning for flow- matching policies.arXiv preprint arXiv:2507.15073, 2025

    Reinforcement Learning for Flow-Matching Policies , author=. arXiv preprint arXiv:2507.15073 , year=

  3. [3]

    2026 , eprint=

    Conceptors for Semantic Steering , author=. 2026 , eprint=

  4. [4]

    arXiv preprint arXiv:2603.10052 , year=

    OmniGuide: Universal Guidance Fields for Enhancing Generalist Robot Policies , author=. arXiv preprint arXiv:2603.10052 , year=

  5. [5]

    2025 , eprint=

    Steering Large Language Models using Conceptors: Improving Addition-Based Activation Engineering , author=. 2025 , eprint=

  6. [6]

    2025 , eprint=

    Learning Affordances at Inference-Time for Vision-Language-Action Models , author=. 2025 , eprint=

  7. [7]

    2026 , eprint=

    Do What You Say: Steering Vision-Language-Action Models via Runtime Reasoning-Action Alignment Verification , author=. 2026 , eprint=

  8. [8]

    arXiv preprint arXiv:2512.02834 , year=

    Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach , author=. arXiv preprint arXiv:2512.02834 , year=

  9. [9]

    2026 , eprint=

    Steerable Vision-Language-Action Policies for Embodied Reasoning and Hierarchical Control , author=. 2026 , eprint=

  10. [10]

    2025 , eprint=

    Mechanistic Finetuning of Vision-Language-Action Models via Few-Shot Demonstrations , author=. 2025 , eprint=

  11. [11]

    2026 , eprint=

    Correctness-Optimized Residual Activation Lens (CORAL): Transferrable and Calibration-Aware Inference-Time Steering , author=. 2026 , eprint=

  12. [12]

    Steering Llama 2 via Contrastive Activation Addition , url =

    Rimsky, Nina and Gabrieli, Nick and Schulz, Julian and Tong, Meg and Hubinger, Evan and Turner, Alexander. Steering Llama 2 via Contrastive Activation Addition. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.828

  13. [13]

    2025 , eprint=

    Representation Engineering: A Top-Down Approach to AI Transparency , author=. 2025 , eprint=

  14. [14]

    2026 , eprint=

    VLS: Steering Pretrained Robot Policies via Vision-Language Models , author=. 2026 , eprint=

  15. [15]

    2025 , eprint=

    Contrastive Representation Regularization for Vision-Language-Action Models , author=. 2025 , eprint=

  16. [16]

    Anwar and Manzoor A

    Momin Ahmad Khan and Novak Boskov and Fatima M. Anwar and Manzoor A. Khan , booktitle=. Controlling Vision. 2025 , url=

  17. [17]

    2024 , eprint=

    OpenVLA: An Open-Source Vision-Language-Action Model , author=. 2024 , eprint=

  18. [18]

    Open X-Embodiment: Robotic Learning Datasets and RT-X Models : Open X-Embodiment Collaboration0 , year=

    O'Neill, Abby and Rehman, Abdul and Maddukuri, Abhiram and Gupta, et al , booktitle=. Open X-Embodiment: Robotic Learning Datasets and RT-X Models : Open X-Embodiment Collaboration0 , year=

  19. [19]

    Black, Kevin and Brown, Noah and Darpinian, James and Dhabalia, Karan and Driess, Danny and Esmail, Adnan and Equi, Michael Robert and Finn, Chelsea and Fusai, Niccolo and Galliker, Manuel Y. and Ghosh, Dibya and Groom, Lachy and Hausman, Karol and ichter, brian and Jakubczak, Szymon and Jones, Tim and Ke, Liyiming and LeBlanc, Devin and Levine, Sergey an...

  20. [20]

    2025 , eprint=

    FAST: Efficient Action Tokenization for Vision-Language-Action Models , author=. 2025 , eprint=

  21. [21]

    2023 , url=

    Bo Liu and Yifeng Zhu and Chongkai Gao and Yihao Feng and qiang liu and Yuke Zhu and Peter Stone , booktitle=. 2023 , url=

  22. [22]

    2024 , eprint=

    RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots , author=. 2024 , eprint=

  23. [23]

    Alexander Khazatsky and Karl Pertsch and Suraj Nair and Ashwin Balakrishna and Sudeep Dasari and Siddharth Karamcheti and Soroush Nasiriany and Mohan Kumar Srirama and Lawrence Yunliang Chen and Kirsty Ellis and Peter David Fagan and Joey Hejna and Masha Itkina and Marion Lepert and Yecheng Jason Ma and Patrick Tree Miller and Jimmy Wu and Suneel Belkhale...

  24. [24]

    2024 , eprint=

    Policy Decorator: Model-Agnostic Online Refinement for Large Policy Model , author=. 2024 , eprint=

  25. [25]

    The International Journal of Robotics Research , volume=

    Diffusion policy: Visuomotor policy learning via action diffusion , author=. The International Journal of Robotics Research , volume=. 2025 , publisher=

  26. [26]

    2025 , eprint=

    Steering Your Diffusion Policy with Latent Space Reinforcement Learning , author=. 2025 , eprint=

  27. [27]

    MINT: Foundation Model Interventions , year=

    Steering Large Language Models using Conceptors: Improving Addition-Based Activation Engineering , author=. MINT: Foundation Model Interventions , year=

  28. [28]

    The Twelfth International Conference on Learning Representations , year=

    Sparse Autoencoders Find Highly Interpretable Features in Language Models , author=. The Twelfth International Conference on Learning Representations , year=

  29. [29]

    2026 , eprint=

    Observing and Controlling Features in Vision-Language-Action Models , author=. 2026 , eprint=

  30. [30]

    2025 , eprint=

    Mechanistic interpretability for steering vision-language-action models , author=. 2025 , eprint=

  31. [31]

    Thirty-seventh Conference on Neural Information Processing Systems , year=

    Inference-Time Intervention: Eliciting Truthful Answers from a Language Model , author=. Thirty-seventh Conference on Neural Information Processing Systems , year=

  32. [32]

    2024 , eprint=

    Steering Language Models With Activation Engineering , author=. 2024 , eprint=

  33. [33]

    2024 , eprint=

    Scaling and evaluating sparse autoencoders , author=. 2024 , eprint=

  34. [34]

    2014 , eprint=

    Controlling Recurrent Neural Networks by Conceptors , author=. 2014 , eprint=

  35. [35]

    2026 , eprint=

    _0 : A Vision-Language-Action Flow Model for General Robot Control , author=. 2026 , eprint=

  36. [36]

    2025 , eprint=

    GR00T N1: An Open Foundation Model for Generalist Humanoid Robots , author=. 2025 , eprint=

  37. [37]

    2024 , eprint=

    PaliGemma: A versatile 3B VLM for transfer , author=. 2024 , eprint=

  38. [38]

    Sigmoid Loss for Language Image Pre-Training , year=

    Zhai, Xiaohua and Mustafa, Basil and Kolesnikov, Alexander and Beyer, Lucas , booktitle=. Sigmoid Loss for Language Image Pre-Training , year=

  39. [39]

    The Eleventh International Conference on Learning Representations , year=

    Flow Matching for Generative Modeling , author=. The Eleventh International Conference on Learning Representations , year=

  40. [40]

    2021 , eprint=

    Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning , author=. 2021 , eprint=

  41. [41]

    , author=

    Lora: Low-rank adaptation of large language models. , author=. Iclr , volume=

  42. [42]

    Extracting Latent Steering Vectors from Pretrained Language Models

    Subramani, Nishant and Suresh, Nivedita and Peters, Matthew. Extracting Latent Steering Vectors from Pretrained Language Models. Findings of the Association for Computational Linguistics: ACL 2022. 2022. doi:10.18653/v1/2022.findings-acl.48