CrossVLA: Cross-Paradigm Post-Training and Inference Optimization for Vision-Language-Action Models

Zhi Liu

arxiv: 2605.21854 · v1 · pith:SA6ONG4Wnew · submitted 2026-05-21 · 💻 cs.CV · cs.AI

CrossVLA: Cross-Paradigm Post-Training and Inference Optimization for Vision-Language-Action Models

Zhi Liu This is my paper

Pith reviewed 2026-05-22 08:04 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords Vision-Language-Action modelsDirect Preference Optimizationparameter-efficient fine-tuningflow-matchinginference optimizationLIBERO benchmark

0 comments

The pith

A surrogate log-probability estimator lets Direct Preference Optimization work on continuous-action Vision-Language-Action models, where DoRA adapters deliver larger gains than LoRA.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to run preference alignment on both autoregressive and flow-matching Vision-Language-Action models by replacing the usual probability calculation with a fast surrogate based on flow-matching. This change makes DPO feasible on continuous-action backbones that previously could not use it without expensive ODE solves. Experiments then compare two low-rank adaptation methods and find that DoRA produces higher success rates than LoRA when both are applied to the same OpenVLA supervised fine-tuning baseline. The same work maps out where inference time is spent, showing that the denoising loop accounts for most latency and that simple caching strategies cannot exceed a modest speed-up without lowering task performance. A separate pre-training step for a multi-view temporal projection head is also released as a reusable initialization.

Core claim

CrossVLA introduces a surrogate flow-matching log-probability estimator that lets Direct Preference Optimization run on continuous-action Vision-Language-Action backbones without full probability-flow ODE integration. Using this estimator, a head-to-head comparison finds that DoRA as the parameter-efficient adapter improves success rate over the OpenVLA supervised fine-tuning baseline by a mean of 10.4 percentage points across the four LIBERO suites, with gains of 20.0 points on Object, 11.0 on Long-horizon, 8.0 on Goal, and 2.7 on Spatial tasks and zero seed variance on the Object suite. Inference profiling shows the denoise loop consumes 78.6 percent of sample_actions latency while prefixK

What carries the argument

The surrogate flow-matching log-probability estimator, which approximates the log-probability required by DPO on continuous-action models without running the full ODE integration.

If this is right

DPO becomes applicable to continuous-action VLAs such as pi-0.5 without prohibitive integration cost.
DoRA is the stronger choice over LoRA when performing parameter-efficient preference alignment on VLA models.
Inference optimizations should target the denoising loop rather than caching, since caching strategies top out at 21 percent acceleration and often reduce success rate.
A multi-view temporal projection head pretrained on 6000 LIBERO frames can serve as a high-recall initialization for downstream task retrieval.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same surrogate technique could be tested on other continuous-control policies outside the VLA setting to check whether DPO scales across action representations.
The latency breakdown suggests that future work on VLA speed should focus on reducing denoising steps or accelerating the denoiser itself.
The released projection head may transfer to new robot embodiments if the multi-view and temporal features prove robust to camera and timing changes.

Load-bearing premise

The surrogate flow-matching log-probability estimator accurately approximates the true log-probability needed for DPO on continuous-action backbones without requiring full probability-flow ODE integration.

What would settle it

Compare success rates on the LIBERO suites when the same DPO training runs use the surrogate estimator versus exact log-probabilities obtained from full probability-flow ODE integration; a large gap would falsify the approximation claim.

Figures

Figures reproduced from arXiv: 2605.21854 by Zhi Liu.

**Figure 2.** Figure 2: Left: π0.5 sample_actions latency breakdown. Right: cache strategy benchmark on LIBERO Spatial × 50 trials. Strategy 1: Chunk-level Cache Naive: cache the whole 10-step action chunk; on the next env step, if visual signature is similar (cosine ≥ 0.95) to the cached observation, reuse cached chunk’s i-th action. The cache mechanism worked (82% reuse), yet the run was both +30% slower and −20% in success rat… view at source ↗

**Figure 3.** Figure 3: Multi-view + temporal contrastive training on real LIBERO RLDS with frozen SigLIP [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative k-NN retrieval examples: 6 query frames [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

read the original abstract

Vision-Language-Action (VLA) models have rapidly converged on a small set of architectural patterns: discrete-token autoregression (e.g. OpenVLA) and continuous-action flow-matching (e.g. pi-0.5). Yet preference alignment via Direct Preference Optimisation (DPO) -- the de-facto post-training step in language models -- has been studied almost exclusively on autoregressive VLAs. We present CrossVLA, an empirical study of cross-paradigm VLA post-training. Three contributions: (i) a surrogate flow-matching log-probability estimator that lets DPO operate on continuous-action backbones without probability-flow ODE integration; (ii) a head-to-head comparison of LoRA and DoRA as the parameter-efficient layer for VLA DPO, finding DoRA improves over OpenVLA SFT by a mean +10.4 pp across LIBERO 4-suite (600 trials, 3 seeds) -- per-suite +20.0 Object, +11.0 Long-horizon, +8.0 Goal, +2.7 Spatial -- with zero seed variance on Object (38/50 on each of 3 seeds); (iii) an inference-time anatomy showing the denoise loop dominates 78.6% of sample_actions latency and prefix-K/V caching a la VLA-Cache caps at a 21% acceleration ceiling -- both chunk-level and token-level cache strategies degrade success rate to 0-80% in our benchmarks. We further pretrain a multi-view + temporal projection head on 6000 LIBERO frames, achieving 99.5% k-NN recall@1 for same-task retrieval (36x over random), available as a downstream initialisation. All code, ckpts, training logs, and reproduction scripts are open at https://github.com/lz-googlefycy/vla-lab.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CrossVLA gives a usable surrogate for DPO on continuous flow-matching VLAs plus a clean LoRA-DoRA comparison, but the approximation still needs direct checks against exact ODE integration.

read the letter

The paper's main contribution is a surrogate estimator for log probabilities under flow matching. This lets DPO run on continuous-action VLAs without full probability-flow ODE solves at every step. They pair it with a head-to-head test of LoRA versus DoRA for parameter-efficient post-training and some timing measurements on the denoising loop during inference. The code, checkpoints, and logs are all released, which is the right move for this kind of work.

Referee Report

2 major / 2 minor

Summary. The paper presents CrossVLA, an empirical study of cross-paradigm post-training for Vision-Language-Action models. It introduces a surrogate flow-matching log-probability estimator to enable Direct Preference Optimization (DPO) on continuous-action flow-matching backbones (e.g., pi-0.5) without full probability-flow ODE integration. The central result is a head-to-head comparison showing that DoRA as the parameter-efficient layer for VLA DPO improves over OpenVLA SFT by a mean +10.4 pp across the LIBERO 4-suite (600 trials, 3 seeds), with per-suite gains of +20.0 pp (Object), +11.0 pp (Long-horizon), +8.0 pp (Goal), and +2.7 pp (Spatial), including zero seed variance on the Object suite (38/50 success on each seed). Additional contributions include an inference-time analysis (denoise loop at 78.6% of latency, prefix caching capped at 21% acceleration) and pretraining a multi-view + temporal projection head on 6000 LIBERO frames (99.5% k-NN recall@1). All code and checkpoints are released openly.

Significance. If the surrogate estimator proves accurate, the work would meaningfully extend preference alignment to continuous-action VLAs and provide actionable guidance on DoRA versus LoRA for post-training. The open release of code, checkpoints, training logs, and reproduction scripts is a clear strength that supports reproducibility and downstream use. The reported gains with zero seed variance on one suite are noteworthy for robotic control tasks, though their interpretation hinges on the validity of the DPO objective under the approximation.

major comments (2)

[Methods, surrogate flow-matching log-probability estimator] The surrogate flow-matching log-probability estimator (contribution (i) and associated methods description) is load-bearing for the DPO results on continuous-action backbones, yet the manuscript provides no validation against exact probability-flow ODE integration, no bias analysis, and no correlation study with ground-truth log-probabilities. Without these, the reported +10.4 pp mean gain and per-suite improvements could reflect optimization of a distorted objective rather than genuine preference alignment.
[§4, DPO implementation details] §4 (Results on LIBERO), the DPO loss formulation and exact hyperparameter settings used with the surrogate are not specified in sufficient detail to allow independent verification of the DoRA versus LoRA comparison. This omission makes it difficult to isolate whether the gains arise from the parameter-efficient method or from interactions with the unvalidated estimator.

minor comments (2)

[Abstract] The abstract states 600 trials across three seeds with zero variance on the Object suite; clarify whether the per-suite trial counts are balanced and whether success rates are computed identically across all four LIBERO suites.
[Inference-time anatomy] The inference-time analysis reports a 21% acceleration ceiling for prefix-K/V caching; include the precise latency breakdown table or figure reference to support the 78.6% denoise-loop dominance claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We appreciate the positive remarks on the open release of code and checkpoints as well as the reported empirical gains. We address each major comment below and commit to revisions that directly respond to the concerns raised.

read point-by-point responses

Referee: [Methods, surrogate flow-matching log-probability estimator] The surrogate flow-matching log-probability estimator (contribution (i) and associated methods description) is load-bearing for the DPO results on continuous-action backbones, yet the manuscript provides no validation against exact probability-flow ODE integration, no bias analysis, and no correlation study with ground-truth log-probabilities. Without these, the reported +10.4 pp mean gain and per-suite improvements could reflect optimization of a distorted objective rather than genuine preference alignment.

Authors: We agree that the absence of a direct validation study for the surrogate estimator is a limitation. The manuscript presents the estimator as a practical approximation that enables DPO without repeated ODE solves and demonstrates downstream task improvements, but it does not report correlation coefficients, bias measurements, or side-by-side comparisons against full probability-flow integration. In the revised manuscript we will add an appendix subsection that evaluates the surrogate on a held-out set of trajectories, reporting Pearson correlation with exact log-probabilities, mean absolute error, and a brief discussion of any observed bias. This addition will allow readers to assess the fidelity of the approximation independently of the final task metrics. revision: yes
Referee: [§4, DPO implementation details] §4 (Results on LIBERO), the DPO loss formulation and exact hyperparameter settings used with the surrogate are not specified in sufficient detail to allow independent verification of the DoRA versus LoRA comparison. This omission makes it difficult to isolate whether the gains arise from the parameter-efficient method or from interactions with the unvalidated estimator.

Authors: We concur that additional implementation details are required for reproducibility. While the manuscript outlines the overall DPO procedure and the use of the surrogate estimator, it does not provide the precise adapted loss equation or the full set of hyperparameters. In the revision we will expand the relevant section (and add a table if space permits) to state the exact DPO objective, the value of the beta coefficient, the number of preference pairs, batch size, learning-rate schedule, number of training steps, and all other settings applied to both the DoRA and LoRA runs. These clarifications will make it possible to replicate the comparison and to separate the contribution of the parameter-efficient adapter from any effects of the estimator. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical VLA alignment study relies on external benchmarks and open reproduction

full rationale

The manuscript is an empirical comparison of post-training methods (DPO on LoRA/DoRA for discrete vs. continuous VLA backbones) evaluated on the external LIBERO benchmark suite with 600 trials and 3 seeds. The surrogate flow-matching log-probability estimator is presented as an engineering approximation whose validity is assessed by downstream task performance rather than by any closed-form derivation. No equations, uniqueness theorems, or self-citations are invoked to force the reported gains (+10.4 pp mean) or the zero seed variance on the Object suite; all results are obtained by running open code against held-out tasks. The work is therefore self-contained against external benchmarks and does not reduce any central claim to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claims rest on standard machine-learning assumptions about preference optimization and flow-matching dynamics plus the validity of the newly introduced surrogate estimator; no additional free parameters or invented physical entities are described.

axioms (1)

domain assumption A surrogate estimator can stand in for the true log-probability of a flow-matching model sufficiently well to support stable DPO updates.
This premise is required for the first contribution to function and is not derived from first principles in the abstract.

invented entities (1)

surrogate flow-matching log-probability estimator no independent evidence
purpose: Enable Direct Preference Optimization on continuous-action VLA backbones without probability-flow ODE integration.
Newly proposed component whose accuracy is asserted to be adequate for the reported experiments.

pith-pipeline@v0.9.0 · 5876 in / 1562 out tokens · 43971 ms · 2026-05-22T08:04:38.850127+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We adopt a surrogate based on the conditional flow-matching loss itself... log p̃θ(x1|obs) = −1/T_eval Σ ∥vθ(xt,t,obs)−vtarget∥²

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages · 8 internal anchors

[1]

A general theoretical paradigm to understand learning from human preferences.arXiv preprint arXiv:2310.12036,

Mohammad Gheshlaghi Azar, Mark Rowland, Bilal Piot, Daniel Guo, Daniele Calandriello, Michal Valko, and Rémi Munos. A general theoretical paradigm to understand learning from human preferences.arXiv preprint arXiv:2310.12036,

work page arXiv
[2]

$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control

Kevin Black, Noah Brown, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, et al. π0: A vision-language-action flow model for general robot control.arXiv preprint arXiv:2410.24164,

work page internal anchor Pith review Pith/arXiv arXiv
[3]

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, et al. Rt-2: Vision-language-action models transfer web knowledge to robotic control.arXiv preprint arXiv:2307.15818,

work page internal anchor Pith review Pith/arXiv arXiv
[4]

PaLi-3 vision lan- guage models: Smaller, faster, stronger,

Xi Chen, Xiao Wang, Lucas Beyer, Alexander Kolesnikov, Jialin Wu, Paul V oigtlaender, et al. Pali-3 vision language models: Smaller, faster, stronger.arXiv preprint arXiv:2310.09199,

work page arXiv
[5]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-AI. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948,

work page internal anchor Pith review Pith/arXiv arXiv
[6]

KTO: Model Alignment as Prospect Theoretic Optimization

Kawin Ethayarajh, Winnie Xu, Niklas Muennighoff, Dan Jurafsky, and Douwe Kiela. Kto: Model alignment as prospect theoretic optimization.arXiv preprint arXiv:2402.01306,

work page internal anchor Pith review Pith/arXiv arXiv
[7]

ORPO: Monolithic Preference Optimization without Reference Model

12 Jiwoo Hong, Noah Lee, and James Thorne. Orpo: Monolithic preference optimization without reference model.arXiv preprint arXiv:2403.07691,

work page internal anchor Pith review Pith/arXiv arXiv
[8]

Robomamba: Efficient vision-language-action model for robotic reasoning and manipulation.arXiv preprint arXiv:2406.04339, 2024a

Jiaming Liu et al. Robomamba: Efficient vision-language-action model for robotic reasoning and manipulation.arXiv preprint arXiv:2406.04339, 2024a. Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang- Ting Cheng, and Min-Hung Chen. Dora: Weight-decomposed low-rank adaptation. InICML, 2024b. Songming Liu, Lingxuan Wu, Ban...

work page arXiv
[9]

Simpo: Simple preference optimization with a reference- free reward.arXiv preprint arXiv:2405.14734,

Yu Meng, Mengzhou Xia, and Danqi Chen. Simpo: Simple preference optimization with a reference- free reward.arXiv preprint arXiv:2405.14734,

work page arXiv
[10]

Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D Manning, and Chelsea Finn

Physical Intelligence and openpi 2025.09 release. Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D Manning, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model. InNeurIPS,

work page 2025
[11]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Mingchuan Zhang, YK Li, Y Wu, and Daya Guo. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300,

work page internal anchor Pith review Pith/arXiv arXiv
[12]

Robotic Control via Embodied Chain-of-Thought Reasoning

13 Ag2Manip team. Ag2manip: Learning novel manipulation skills with agent-agnostic visual and action representations. InIROS, 2024a. ChatVLA team. Chatvla: Unified multimodal understanding and robot control with vision-language- action model.arXiv preprint, 2025a. CLIP-DoRA team. Clip-dora: Weight-decomposed low-rank adaptation for efficient vision-langua...

work page internal anchor Pith review Pith/arXiv arXiv
[13]

TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation

Targets autoregressive VLAs; we test transfer to flow-matchingπ 0.5 (§4.5). Jian Wen, Jian Zhang, et al. Tinyvla: Toward fast, data-efficient vision-language-action models for robotic manipulation.arXiv preprint arXiv:2409.12514,

work page internal anchor Pith review Pith/arXiv arXiv
[14]

Qa-lora: Quantization-aware low-rank adaptation of large language models.arXiv preprint arXiv:2309.14717,

Yuhui Xu, Lingxi Xie, Xiaotao Gu, Xin Chen, Heng Chang, Hengheng Zhang, Zhengsu Chen, Xiaopeng Zhang, and Qi Tian. Qa-lora: Quantization-aware low-rank adaptation of large language models.arXiv preprint arXiv:2309.14717,

work page arXiv

[1] [1]

A general theoretical paradigm to understand learning from human preferences.arXiv preprint arXiv:2310.12036,

Mohammad Gheshlaghi Azar, Mark Rowland, Bilal Piot, Daniel Guo, Daniele Calandriello, Michal Valko, and Rémi Munos. A general theoretical paradigm to understand learning from human preferences.arXiv preprint arXiv:2310.12036,

work page arXiv

[2] [2]

$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control

Kevin Black, Noah Brown, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, et al. π0: A vision-language-action flow model for general robot control.arXiv preprint arXiv:2410.24164,

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, et al. Rt-2: Vision-language-action models transfer web knowledge to robotic control.arXiv preprint arXiv:2307.15818,

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

PaLi-3 vision lan- guage models: Smaller, faster, stronger,

Xi Chen, Xiao Wang, Lucas Beyer, Alexander Kolesnikov, Jialin Wu, Paul V oigtlaender, et al. Pali-3 vision language models: Smaller, faster, stronger.arXiv preprint arXiv:2310.09199,

work page arXiv

[5] [5]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-AI. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948,

work page internal anchor Pith review Pith/arXiv arXiv

[6] [6]

KTO: Model Alignment as Prospect Theoretic Optimization

Kawin Ethayarajh, Winnie Xu, Niklas Muennighoff, Dan Jurafsky, and Douwe Kiela. Kto: Model alignment as prospect theoretic optimization.arXiv preprint arXiv:2402.01306,

work page internal anchor Pith review Pith/arXiv arXiv

[7] [7]

ORPO: Monolithic Preference Optimization without Reference Model

12 Jiwoo Hong, Noah Lee, and James Thorne. Orpo: Monolithic preference optimization without reference model.arXiv preprint arXiv:2403.07691,

work page internal anchor Pith review Pith/arXiv arXiv

[8] [8]

Robomamba: Efficient vision-language-action model for robotic reasoning and manipulation.arXiv preprint arXiv:2406.04339, 2024a

Jiaming Liu et al. Robomamba: Efficient vision-language-action model for robotic reasoning and manipulation.arXiv preprint arXiv:2406.04339, 2024a. Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang- Ting Cheng, and Min-Hung Chen. Dora: Weight-decomposed low-rank adaptation. InICML, 2024b. Songming Liu, Lingxuan Wu, Ban...

work page arXiv

[9] [9]

Simpo: Simple preference optimization with a reference- free reward.arXiv preprint arXiv:2405.14734,

Yu Meng, Mengzhou Xia, and Danqi Chen. Simpo: Simple preference optimization with a reference- free reward.arXiv preprint arXiv:2405.14734,

work page arXiv

[10] [10]

Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D Manning, and Chelsea Finn

Physical Intelligence and openpi 2025.09 release. Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D Manning, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model. InNeurIPS,

work page 2025

[11] [11]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Mingchuan Zhang, YK Li, Y Wu, and Daya Guo. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300,

work page internal anchor Pith review Pith/arXiv arXiv

[12] [12]

Robotic Control via Embodied Chain-of-Thought Reasoning

13 Ag2Manip team. Ag2manip: Learning novel manipulation skills with agent-agnostic visual and action representations. InIROS, 2024a. ChatVLA team. Chatvla: Unified multimodal understanding and robot control with vision-language- action model.arXiv preprint, 2025a. CLIP-DoRA team. Clip-dora: Weight-decomposed low-rank adaptation for efficient vision-langua...

work page internal anchor Pith review Pith/arXiv arXiv

[13] [13]

TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation

Targets autoregressive VLAs; we test transfer to flow-matchingπ 0.5 (§4.5). Jian Wen, Jian Zhang, et al. Tinyvla: Toward fast, data-efficient vision-language-action models for robotic manipulation.arXiv preprint arXiv:2409.12514,

work page internal anchor Pith review Pith/arXiv arXiv

[14] [14]

Qa-lora: Quantization-aware low-rank adaptation of large language models.arXiv preprint arXiv:2309.14717,

Yuhui Xu, Lingxi Xie, Xiaotao Gu, Xin Chen, Heng Chang, Hengheng Zhang, Zhengsu Chen, Xiaopeng Zhang, and Qi Tian. Qa-lora: Quantization-aware low-rank adaptation of large language models.arXiv preprint arXiv:2309.14717,

work page arXiv