RoVLA: Multi-Consistency Constraints for Robust Vision-Language-Action Models
Pith reviewed 2026-05-20 04:44 UTC · model grok-4.3
The pith
Enforcing consistency under instruction rewrites, trajectory steps, and observation disturbances lets vision-language-action models generalize better to task and visual shifts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
RoVLA incorporates three complementary consistency constraints into end-to-end vision-language-action policy training. Instructional Consistency requires the model to output identical actions for semantically equivalent instruction rewrites. Evolutionary Consistency requires coherent action predictions across successive steps of a trajectory. Observational Consistency requires unchanged predictions before and after targeted visual and proprioceptive perturbations. By minimizing violations of these invariances, the training process reduces dependence on superficial correlations present in the data distribution and yields policies that remain effective under task and observation shifts.
What carries the argument
Multi-consistency constraints (Instructional Consistency, Evolutionary Consistency, and Observational Consistency) that penalize changes in action predictions under semantically equivalent, temporally progressive, and sensor-perturbed inputs.
If this is right
- Policies trained with the three consistency terms outperform standard baselines on LIBERO-Plus and RoboTwin 2.0 benchmarks.
- The same policies maintain higher success rates when task descriptions or visual conditions differ from training.
- Real-world manipulation experiments show improved reliability under the same shifts.
- The model relies less on spurious correlations and more on stable semantic-state-action relationships.
- No additional large-scale pretraining or post-hoc adaptation is required to obtain the robustness gains.
Where Pith is reading between the lines
- The same consistency approach could be transferred to other embodied sequence tasks such as navigation or multi-step assembly.
- Combining the constraints with existing large-scale vision-language pretraining might produce even stronger zero-shot behavior.
- Explicit invariance modeling offers a data-efficient route to robustness that does not require ever-larger training corpora.
- One could measure whether the constraints also reduce sensitivity to changes in robot morphology or gripper type.
Load-bearing premise
The chosen transformations are assumed to represent the distribution shifts that matter in real deployment without creating new failure modes or over-constraining the policy.
What would settle it
A controlled test in which a RoVLA-trained model is evaluated on paraphrased instructions and perturbed observations that were never used as consistency examples during training; if performance drops to the level of ordinary baselines, the claimed robustness benefit does not hold.
Figures
read the original abstract
Vision-Language-Action (VLA) models have shown strong performance on embodied manipulation, yet they remain brittle under visual observation changes, paraphrased language instructions, and compounded perturbations. This limitation suggests that existing methods still rely heavily on shallow correlations in the training distribution, rather than learning stable couplings among task semantics, environment states, and action generation. Although recent efforts improve robustness through larger-scale training, post-training adaptation, or enhanced predictive modeling, they rarely enforce invariance-oriented consistency within the end-to-end policy itself. To address this issue, we propose RoVLA, a robust vision-language-action framework with multi-consistency constraints. RoVLA enforces consistency under three complementary transformations: instruction semantics, trajectory evolution, and observation perturbation. Specifically, Instructional Consistency (IC) promotes stable grounding under semantically equivalent instruction rewrites, Evolutionary Consistency (EC) preserves coherent action intent throughout the generation process, and Observational Consistency (OC) improves robustness to visual and proprioceptive perturbations by enforcing consistent predictions before and after targeted disturbances. By explicitly modeling these invariances during training, RoVLA reduces reliance on superficial correlations and improves robustness and generalization. Experiments on LIBERO-Plus, RoboTwin 2.0, and real-world manipulation tasks show that RoVLA consistently outperforms strong baseline methods and exhibits superior robustness under diverse task and observation shifts. These results demonstrate the effectiveness of multi-consistency learning for robust embodied control. Codes will be available at https://github.com/HCPLab-SYSU/RoVLA.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces RoVLA, a vision-language-action model that applies multi-consistency constraints during training: Instructional Consistency (IC) under semantically equivalent instruction rewrites, Evolutionary Consistency (EC) across trajectory steps to preserve action intent, and Observational Consistency (OC) under targeted visual and proprioceptive perturbations. The central claim is that explicitly enforcing these invariances reduces reliance on superficial correlations in the training distribution, yielding improved robustness and generalization. Experiments on LIBERO-Plus, RoboTwin 2.0, and real-world manipulation tasks are reported to show consistent outperformance over strong baselines under task and observation shifts.
Significance. If the experimental results hold after isolating the contribution of the consistency terms, the work could meaningfully advance robust embodied control by providing an end-to-end mechanism for learning stable couplings among semantics, states, and actions. The complementary nature of the three consistency types and the planned code release are positive features that support reproducibility and further investigation.
major comments (1)
- [Experiments] Experiments section: the manuscript must include a control experiment training a baseline on the identical set of augmented data (semantic rewrites, trajectory steps, and disturbances) but using only standard supervised loss without the IC/EC/OC consistency terms. Without this ablation, it remains unclear whether the reported robustness gains on LIBERO-Plus and RoboTwin 2.0 shifts arise from the proposed multi-consistency mechanism or simply from the stronger supervision signal provided by the transformed pairs, directly addressing the concern that consistency losses may be redundant with the augmentations themselves.
minor comments (2)
- [Abstract] Abstract: quantitative metrics, baseline names, ablation summaries, and statistical tests are absent, making it difficult to assess the magnitude and reliability of the claimed outperformance.
- [Method] The description of the three transformations should clarify whether they are applied only at training time or also at test time, and how the consistency losses are balanced with the primary task loss (e.g., via coefficients or scheduling).
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and detailed review of our manuscript. We have carefully considered the major comment on the Experiments section and agree that the requested control experiment will strengthen the paper by better isolating the contribution of the multi-consistency constraints.
read point-by-point responses
-
Referee: [Experiments] Experiments section: the manuscript must include a control experiment training a baseline on the identical set of augmented data (semantic rewrites, trajectory steps, and disturbances) but using only standard supervised loss without the IC/EC/OC consistency terms. Without this ablation, it remains unclear whether the reported robustness gains on LIBERO-Plus and RoboTwin 2.0 shifts arise from the proposed multi-consistency mechanism or simply from the stronger supervision signal provided by the transformed pairs, directly addressing the concern that consistency losses may be redundant with the augmentations themselves.
Authors: We agree that this control experiment is essential to rule out the possibility that robustness gains arise merely from the augmented data rather than the consistency losses themselves. In the revised manuscript, we will add results from training the baseline model on the identical augmented dataset (semantic rewrites, trajectory steps, and disturbances) but using only the standard supervised loss without the IC, EC, or OC terms. These results will be reported on LIBERO-Plus and RoboTwin 2.0 under the same task and observation shifts, with direct comparisons to the full RoVLA model to demonstrate the specific benefit of the multi-consistency mechanism. revision: yes
Circularity Check
No circularity: consistency losses and evaluation metrics remain independent
full rationale
The paper defines IC, EC, and OC as auxiliary consistency losses applied to transformed inputs (semantically equivalent instructions, trajectory steps, and perturbed observations) during training. These losses are not mathematically equivalent to the reported success metrics, which are measured on held-out tasks and distribution shifts in LIBERO-Plus, RoboTwin 2.0, and real-world settings. No equations reduce the robustness claims to the training objectives by construction, no self-citations serve as load-bearing uniqueness theorems, and no fitted parameters are relabeled as predictions. The derivation from multi-consistency training to empirical gains is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- Consistency loss coefficients
axioms (1)
- domain assumption Enforcing prediction invariance under the three defined transformations improves robustness to real-world distribution shifts.
Reference graph
Works this paper leans on
-
[1]
Robot manip- ulation based on embodied visual perception: A survey,
S. Wang, M. N. Nikolić, T. L. Lam, Q. Gao, R. Ding, and T. Zhang, “Robot manip- ulation based on embodied visual perception: A survey, ”CAAI Transactions on Intelligence Technology, vol. 10, no. 4, pp. 945–958, 2025
work page 2025
-
[2]
An image is worth 16x16 words: Transformers for image recognition at scale,
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale, ” in International Conference on Learning Representations, 2021
work page 2021
-
[3]
Pointnet++: Deep hierarchical feature learn- ing on point sets in a metric space,
C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical feature learn- ing on point sets in a metric space, ”Advances in neural information processing systems, vol. 30, 2017
work page 2017
-
[4]
Dspnet: Dual-vision scene perception for robust 3d question answering,
J. Luo, Y. Liu, W. Chen, Z. Li, Y. Wang, G. Li, and L. Lin, “Dspnet: Dual-vision scene perception for robust 3d question answering, ” inProceedings of the Computer Vision and Pattern Recognition Conference, pp. 14169–14178, 2025
work page 2025
-
[5]
A survey on large language models for automated planning,
M. Aghzal, E. Plaku, G. J. Stein, and Z. Yao, “A survey on large language models for automated planning, ”arXiv preprint arXiv:2502.12435, 2025
-
[6]
A. Yang, A. Li, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Gao, C. Huang, C. Lv, et al., “Qwen3 technical report, ”arXiv preprint arXiv:2505.09388, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[7]
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
W. Wang, Z. Gao, L. Gu, H. Pu, L. Cui, X. Wei, Z. Liu, L. Jing, S. Ye, J. Shao,et al., “Internvl3. 5: Advancing open-source multimodal models in versatility, reasoning, and efficiency, ”arXiv preprint arXiv:2508.18265, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[8]
Scalable diffusion models with transformers,
W. Peebles and S. Xie, “Scalable diffusion models with transformers, ” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 4195–4205, 2023
work page 2023
-
[9]
Flow Matching for Generative Modeling
Y. Lipman, R. T. Chen, H. Ben-Hamu, M. Nickel, and M. Le, “Flow matching for generative modeling, ”arXiv preprint arXiv:2210.02747, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[10]
Diffusion policy: Visuomotor policy learning via action diffusion,
C. Chi, Z. Xu, S. Feng, E. Cousineau, Y. Du, B. Burchfiel, R. Tedrake, and S. Song, “Diffusion policy: Visuomotor policy learning via action diffusion, ” The International Journal of Robotics Research, vol. 44, no. 10-11, pp. 1684–1704, 2025
work page 2025
-
[11]
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective
Y. Zhong, F. Bai, S. Cai, X. Huang, Z. Chen, X. Zhang, Y. Wang, S. Guo, T. Guan, K. N. Lui, et al., “A survey on vision-language-action models: An action tokeniza- tion perspective, ”arXiv preprint arXiv:2507.01925, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[12]
OpenVLA: An Open-Source Vision-Language-Action Model
M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. Foster, G. Lam, P. Sanketi, et al., “Openvla: An open-source vision-language- action model, ”arXiv preprint arXiv:2406.09246, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[13]
$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control
K. Black, N. Brown, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, L. Groom, K. Hausman, B. Ichter,et al., “𝜋0: A vision-language-action flow model for general robot control, ”arXiv preprint arXiv:2410.24164, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[14]
InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy
X. Chen, Y. Chen, Y. Fu, N. Gao, J. Jia, W. Jin, H. Li, Y. Mu, J. Pang, Y. Qiao, et al., “Internvla-m1: A spatially guided vision-language-action framework for generalist robot policy, ”arXiv preprint arXiv:2510.13778, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[15]
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
J. Bjorck, F. Castañeda, N. Cherniadev, X. Da, R. Ding, L. Fan, Y. Fang, D. Fox, F. Hu, S. Huang, et al., “Gr00t n1: An open foundation model for generalist humanoid robots, ”arXiv preprint arXiv:2503.14734, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[16]
𝜋0.5: a vision-language-action model with open-world generalization,
K. Black, N. Brown, J. Darpinian, K. Dhabalia, D. Driess, A. Esmail, M. R. Equi, C. Finn, N. Fusai, M. Y. Galliker,et al., “𝜋0.5: a vision-language-action model with open-world generalization, ” in9th Annual Conference on Robot Learning, 2025
work page 2025
-
[17]
GR00T N1.6: An Improved Open Foundation Model for Generalist Humanoid Robots,
NVIDIA GEAR Team, “GR00T N1.6: An Improved Open Foundation Model for Generalist Humanoid Robots, ” 2025. Technical report
work page 2025
-
[18]
RT-1: Robotics Transformer for Real-World Control at Scale
A. Brohan, N. Brown, J. Carbajal, Y. Chebotar, J. Dabis, C. Finn, K. Gopalakrishnan, K. Hausman, A. Herzog, J. Hsu, et al., “Rt-1: Robotics transformer for real-world control at scale, ”arXiv preprint arXiv:2212.06817, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[19]
Rt-2: Vision-language-action models transfer web knowledge to robotic control,
B. Zitkovich, T. Yu, S. Xu, P. Xu, T. Xiao, F. Xia, J. Wu, P. Wohlhart, S. Welker, A. Wahid,et al., “Rt-2: Vision-language-action models transfer web knowledge to robotic control, ” inConference on Robot Learning, pp. 2165–2183, PMLR, 2023
work page 2023
-
[20]
LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models
S. Fei, S. Wang, J. Shi, Z. Dai, J. Cai, P. Qian, L. Ji, X. He, S. Zhang, Z. Fei,et al., “Libero-plus: In-depth robustness analysis of vision-language-action models, ” arXiv preprint arXiv:2510.13626, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[21]
T. Chen, Z. Chen, B. Chen, Z. Cai, Y. Liu, Z. Li, Q. Liang, X. Lin, Y. Ge, Z. Gu, et al., “Robotwin 2.0: A scalable data generator and benchmark with strong domain randomization for robust bimanual robotic manipulation, ”arXiv preprint arXiv:2506.18088, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[22]
Exploring the adversarial vulnerabilities of vision-language-action models in robotics,
T. Wang, C. Han, J. Liang, W. Yang, D. Liu, L. X. Zhang, Q. Wang, J. Luo, and R. Tang, “Exploring the adversarial vulnerabilities of vision-language-action models in robotics, ” inProceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6948–6958, 2025
work page 2025
-
[23]
Instructvla: Vision-language-action instruction tuning from understanding to manipulation
S. Yang, H. Li, Y. Chen, B. Wang, Y. Tian, T. Wang, H. Wang, F. Zhao, Y. Liao, and J. Pang, “Instructvla: Vision-language-action instruction tuning from under- standing to manipulation, ”arXiv preprint arXiv:2507.17520, 2025
-
[24]
Interactive Post-Training for Vision-Language-Action Models
S. Tan, K. Dou, Y. Zhao, and P. Krähenbühl, “Interactive post-training for vision- language-action models, ”arXiv preprint arXiv:2505.17016, 2025
work page internal anchor Pith review arXiv 2025
-
[25]
WorldVLA: Towards Autoregressive Action World Model
J. Cen, C. Yu, H. Yuan, Y. Jiang, S. Huang, J. Guo, Y. Gao, Z. Chen, J. Yu, X. Wang, et al., “Worldvla: Towards autoregressive action world model, ”arXiv preprint arXiv:2506.21539, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[26]
Unified vision-language-action model.arXiv preprint arXiv:2506.19850, 2025
Y. Wang, X. Li, W. Wang, J. Zhang, Y. Li, Y. Chen, J. Zhang, H. Xu, Z. Zhang, D. Wang, et al., “Univla: Unified vision-language-action model, ”arXiv preprint arXiv:2506.19850, 2025
-
[27]
Aligning cyber space with physical world: A comprehensive survey on embodied ai,
Y. Liu, W. Chen, Y. Bai, X. Liang, G. Li, W. Gao, and L. Lin, “Aligning cyber space with physical world: A comprehensive survey on embodied ai, ” IEEE/ASME Transactions on Mechatronics, 2025
work page 2025
-
[28]
Open x-embodiment: Robotic learning datasets and rt-x models: Open x-embodiment collaboration 0,
A. O’Neill, A. Rehman, A. Maddukuri, A. Gupta, A. Padalkar, A. Lee, A. Pooley, A. Gupta, A. Mandlekar, A. Jain, et al., “Open x-embodiment: Robotic learning datasets and rt-x models: Open x-embodiment collaboration 0, ” in 2024 IEEE International Conference on Robotics and Automation (ICRA), pp. 6892–6903, IEEE, 2024
work page 2024
-
[29]
Octo: An Open-Source Generalist Robot Policy
O. M. Team, D. Ghosh, H. Walke, K. Pertsch, K. Black, O. Mees, S. Dasari, J. Hejna, T. Kreiman, C. Xu, et al., “Octo: An open-source generalist robot policy, ”arXiv preprint arXiv:2405.12213, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[30]
Q. Bu, J. Cai, L. Chen, X. Cui, Y. Ding, S. Feng, S. Gao, X. He, X. Hu, X. Huang, et al., “Agibot world colosseo: A large-scale manipulation platform for scalable and intelligent embodied systems, ”arXiv preprint arXiv:2503.06669, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[31]
$\pi^{*}_{0.6}$: a VLA That Learns From Experience
P. Intelligence, A. Amin, R. Aniceto, A. Balakrishna, K. Black, K. Conley, G. Con- nors, J. Darpinian, K. Dhabalia, J. DiCarlo, et al., “𝜋0.6∗: a vla that learns from experience, ”arXiv preprint arXiv:2511.14759, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[32]
Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success
M. J. Kim, C. Finn, and P. Liang, “Fine-tuning vision-language-action models: Optimizing speed and success, ”arXiv preprint arXiv:2502.19645, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[33]
FAST: Efficient Action Tokenization for Vision-Language-Action Models
K. Pertsch, K. Stachowicz, B. Ichter, D. Driess, S. Nair, Q. Vuong, O. Mees, C. Finn, and S. Levine, “Fast: Efficient action tokenization for vision-language-action models, ”arXiv preprint arXiv:2501.09747, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[34]
Vlatest: Testing and evaluating vision-language-action models for robotic manipulation,
Z. Wang, Z. Zhou, J. Song, Y. Huang, Z. Shu, and L. Ma, “Vlatest: Testing and evaluating vision-language-action models for robotic manipulation, ”Proceedings of the ACM on Software Engineering, vol. 2, no. FSE, pp. 1615–1638, 2025
work page 2025
-
[35]
Rynnvla-002: A unified vision-language-action and world model,
J. Cen, S. Huang, Y. Yuan, K. Li, H. Yuan, C. Yu, Y. Jiang, J. Guo, X. Li, H. Luo, et al., “Rynnvla-002: A unified vision-language-action and world model, ”arXiv preprint arXiv:2511.17502, 2025
-
[36]
Motus: A Unified Latent Action World Model
H. Bi, H. Tan, S. Xie, Z. Wang, S. Huang, H. Liu, R. Zhao, Y. Feng, C. Xiang, Y. Rong, et al., “Motus: A unified latent action world model, ” arXiv preprint arXiv:2512.13030, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[37]
VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning
G. Lu, W. Guo, C. Zhang, Y. Zhou, H. Jiang, Z. Gao, Y. Tang, and Z. Wang, “Vla-rl: Towards masterful and general robotic manipulation with scalable reinforcement learning, ”arXiv preprint arXiv:2505.18719, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[38]
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
H. Li, Y. Zuo, J. Yu, Y. Zhang, Z. Yang, K. Zhang, X. Zhu, Y. Zhang, T. Chen, G. Cui, et al., “Simplevla-rl: Scaling vla training via reinforcement learning, ” arXiv preprint arXiv:2509.09674, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[39]
Robustvla: Robustness- aware reinforcement post-training for vision-language-action models,
H. Zhang, S. Zhang, J. Jin, Q. Zeng, R. Li, and D. Wang, “Robustvla: Robustness- aware reinforcement post-training for vision-language-action models, ”arXiv preprint arXiv:2511.01331, 2025
-
[40]
A. Tarvainen and H. Valpola, “Mean teachers are better role models: Weight- averaged consistency targets improve semi-supervised deep learning results, ” Advances in neural information processing systems, vol. 30, 2017
work page 2017
-
[41]
Virtual adversarial training: a regularization method for supervised and semi-supervised learning,
T. Miyato, S.-i. Maeda, M. Koyama, and S. Ishii, “Virtual adversarial training: a regularization method for supervised and semi-supervised learning, ” IEEE transactions on pattern analysis and machine intelligence, vol. 41, no. 8, pp. 1979– 1993, 2018
work page 1979
-
[42]
Fixmatch: Simplifying semi-supervised learning with consistency and confidence,
K. Sohn, D. Berthelot, N. Carlini, Z. Zhang, H. Zhang, C. A. Raffel, E. D. Cubuk, A. Kurakin, and C.-L. Li, “Fixmatch: Simplifying semi-supervised learning with consistency and confidence, ”Advances in neural information processing systems, vol. 33, pp. 596–608, 2020
work page 2020
-
[43]
Image augmentation is all you need: Reg- ularizing deep reinforcement learning from pixels,
D. Yarats, I. Kostrikov, and R. Fergus, “Image augmentation is all you need: Reg- ularizing deep reinforcement learning from pixels, ” inInternational conference on learning representations, 2020
work page 2020
-
[44]
Towards Deep Learning Models Resistant to Adversarial Attacks
A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks, ”arXiv preprint arXiv:1706.06083, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[45]
Explaining and Harnessing Adversarial Examples
I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples, ”CoRR, vol. abs/1412.6572, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[46]
Libero: Benchmarking knowledge transfer for lifelong robot learning,
B. Liu, Y. Zhu, C. Gao, Y. Feng, Q. Liu, Y. Zhu, and P. Stone, “Libero: Benchmarking knowledge transfer for lifelong robot learning, ”Advances in Neural Information Processing Systems, vol. 36, pp. 44776–44791, 2023
work page 2023
-
[47]
Decoupled Weight Decay Regularization
I. Loshchilov, “Decoupled weight decay regularization, ” arXiv preprint arXiv:1711.05101, 2017. A More Implementation details of Instructional Consistency We provide more implementation details for the instruction rewrit- ing process employed by "Instruction Consistency" (IC). IC does not introduce an additional explicit loss. Instead, it expands each sin...
work page internal anchor Pith review Pith/arXiv arXiv 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.