$\pi_0$-EqM: Equilibrium Matching for Closed-Loop Vision-Language-Action Control

Congsheng Xu; Huanming Liu; Jianmin Ji; Yao Mu

arxiv: 2605.23128 · v1 · pith:RD5NM7CUnew · submitted 2026-05-22 · 💻 cs.RO

π₀-EqM: Equilibrium Matching for Closed-Loop Vision-Language-Action Control

Huanming Liu , Congsheng Xu , Jianmin Ji , Yao Mu This is my paper

Pith reviewed 2026-05-25 04:41 UTC · model grok-4.3

classification 💻 cs.RO

keywords vision-language-actionequilibrium matchingflow matchingrobotic manipulationclosed-loop controlpolicy designaction decoder

0 comments

The pith

Replacing the flow-matching expert in a VLA model with an Equilibrium Matching decoder raises average task success from 40.4% to 50.2% on RoboTwin under a fixed 300-step budget.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper replaces the flow-matching expert inside the π₀ VLA model with an Equilibrium Matching decoder while leaving the upstream vision-language stack untouched. Under a matched 300-step inference budget, this produces higher success rates on robotic manipulation benchmarks. Average success across 19 RoboTwin tasks rises from 40.4% to 50.2%, with competitive results retained on LIBERO and the largest gain on LIBERO-10. Threshold scans also uncover a task-dependent non-monotonic link between residual and success that the authors call the stationarity-executability gap. The work treats inference depth as an explicit element of policy design and points toward an energy-based view of VLA models.

Core claim

By substituting an Equilibrium Matching decoder for the original flow-matching expert in π₀, the resulting π₀-EqM policy achieves higher success rates on robotic manipulation benchmarks without altering the upstream vision-language-action stack. Under a matched 300-step budget, it improves average success on RoboTwin from 40.4% to 50.2% across 19 tasks and reaches 87.0% on LIBERO-10. The approach reveals a non-monotonic relation between residual and success that depends on the task.

What carries the argument

The Equilibrium Matching (EqM) decoder, which replaces the flow-matching expert in the VLA stack and performs iterative denoising under a fixed step budget.

If this is right

Inference depth in iterative VLA control becomes part of policy design rather than a fixed hyperparameter.
VLA models admit an energy-based perspective that can guide composable action generation across tasks.
Task-dependent stationarity-executability gaps can inform decoder choice or step allocation per task.
Closed-loop control can exploit temporal reuse across cycles when the decoder supports state-dependent compute.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The decoder swap may extend to other flow-based VLA models that currently rely on fixed-horizon sampling.
Observed residuals could be monitored at runtime to adapt the number of denoising steps on the fly.
An energy-based formulation might allow skill composition by combining multiple trained decoders without retraining the vision-language backbone.
The stationarity-executability gap could be measured on new robot embodiments to predict which decoder type will perform best before deployment.

Load-bearing premise

The Equilibrium Matching decoder integrates directly into the existing π₀ VLA stack without upstream changes and the fixed 300-step budget constitutes a fair comparison against the original flow-matching expert.

What would settle it

Running the original π₀ and π₀-EqM on the same 19 RoboTwin tasks while varying the inference step budget from 100 to 500 steps would show whether the reported success gains hold only at the matched 300-step point or persist across budgets.

Figures

Figures reproduced from arXiv: 2605.23128 by Congsheng Xu, Huanming Liu, Jianmin Ji, Yao Mu.

**Figure 1.** Figure 1: Overview of π0-EqM. We replace only the action decoder in π0 and cast action generation as iterative equilibrium solving, enabling adaptive stopping and warm starts. An executable intermediate action may appear before full numerical convergence. explicit diffusion-time semantics, making stopping and reuse deployment decisions rather than sampler-specific interventions. Under a matched 300-step inference b… view at source ↗

**Figure 2.** Figure 2: Threshold scans on two RoboTwin tasks. The preferred threshold [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Qualitative EqM inference trajectory on click alarmclock, showing early attraction, semantic shaping, and over-refinement. executability gap for the mismatch between numerical stationarity and physical utility: the iterate with the smallest residual need not execute best. Together, Proposition 1 and the threshold scans suggest a two-part reading of early stopping: residual thresholds monitor solver progre… view at source ↗

read the original abstract

Currently, Vision-Language-Action (VLA) models have become the most adopted paradigm for robotic manipulation for its great potential for task generalization. While most generative flow-matching action decoders for VLA control are often deployed with fixed sampling horizons, limiting state-dependent compute and temporal reuse across control cycles. We present $\pi_0$-EqM, which replaces the flow-matching expert in $\pi_0$ with an Equilibrium Matching (EqM) decoder while leaving the upstream VLA stack unchanged. Under a matched 300-step budget, $\pi_0$-EqM improves RoboTwin average success from 40.4% to 50.2% across 19 tasks and remains competitive on LIBERO, with its clearest gain on LIBERO-10 (87.0%). Two threshold scans reveal a task-dependent non-monotonic relation between residual and success, which we term the stationarity--executability gap. The results suggest that inference depth in iterative VLA control is part of policy design and introduce an energy-based VLA perspective that may inform future work on composable action generation across tasks and embodiments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper reports a 9.8-point RoboTwin gain by swapping π₀'s flow-matching decoder for Equilibrium Matching under a fixed 300-step budget, but supplies no methods or variance to confirm the comparison is controlled.

read the letter

The headline result is a lift from 40.4% to 50.2% average success on RoboTwin across 19 tasks when the action decoder is changed to EqM while the rest of the π₀ VLA stack stays fixed. They also note a non-monotonic residual-success relation they call the stationarity-executability gap and suggest an energy-based framing for iterative VLA control. That is the concrete observation the abstract puts forward. The work applies an existing EqM technique to this particular pipeline and runs it on standard manipulation suites, which at least gives a data point on decoder choice under a matched step budget. The clearest reported gain is on LIBERO-10 at 87%. The abstract does not claim EqM itself is new, only that the substitution produces these numbers. The main weakness is that nothing in the provided text shows the experimental protocol, baseline step counts, variance across seeds, or statistical tests. The stress-test point about whether the original flow-matching run was also locked to exactly 300 steps with identical upstream settings is not addressed, so the delta cannot yet be attributed cleanly to the decoder swap. Without those details the numerical claim stays unverified. The gap observation comes from threshold scans, but again no supporting tables or prior citations appear here to show it has not been seen before. This paper is aimed at researchers tuning inference in closed-loop VLAs who want to see decoder variants on RoboTwin and LIBERO. A reader already working on iterative action generation might extract a usable idea from the energy perspective if the full methods confirm the setup. Right now the evidence is too thin for strong claims. I would send it to peer review only if the authors add the missing protocol, ablations, and variance numbers; otherwise it risks wasting referee time on unverifiable deltas.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes π₀-EqM by replacing the flow-matching action decoder in the π₀ VLA model with an Equilibrium Matching (EqM) decoder while leaving the upstream vision-language stack unchanged. It reports an improvement in average success rate on RoboTwin from 40.4% to 50.2% across 19 tasks under a matched 300-step budget, competitive performance on LIBERO (with 87.0% on LIBERO-10), and identifies a task-dependent non-monotonic relation between residual and success termed the stationarity--executability gap, suggesting an energy-based perspective for VLA control.

Significance. If the experimental claims are substantiated, the work shows that inference-time decoder replacement can improve closed-loop VLA performance without upstream retraining, highlighting inference depth as part of policy design. The explicit isolation of the decoder change and the identification of the gap provide a concrete starting point for future composable action generation across tasks and embodiments.

major comments (3)

[Abstract] Abstract: The abstract states numerical improvements (RoboTwin 40.4% → 50.2%) and introduces the stationarity--executability gap but supplies no experimental protocol, baseline details, variance measures, or statistical tests, preventing verification that the data support the claim.
[Experimental Setup / Results] The headline performance claim requires that the original π₀ flow-matching baseline was also run at precisely 300 steps with no other changes to the upstream VLA stack, observation normalization, or reward shaping. The manuscript must provide explicit confirmation that EqM's equilibrium iteration has equivalent per-step cost and that the fixed budget constitutes an apples-to-apples comparison isolating the decoder effect.
[Results] The stationarity--executability gap is defined via threshold scans on the authors' own EqM outputs; the manuscript should clarify whether this gap is an independent, falsifiable phenomenon or reduces to quantities defined by the EqM formulation itself.

minor comments (2)

[Method] The claim that the upstream VLA stack is left unchanged should be supported by a direct statement or ablation confirming no retraining occurred when swapping the action head.
[Abstract] The abstract could report the baseline value on LIBERO-10 for direct comparison with the 87.0% figure.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments, which help improve the clarity of our work. We address each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: The abstract states numerical improvements (RoboTwin 40.4% → 50.2%) and introduces the stationarity--executability gap but supplies no experimental protocol, baseline details, variance measures, or statistical tests, preventing verification that the data support the claim.

Authors: We agree that the abstract, being a concise summary, does not include full experimental details. These are elaborated in the Experimental Setup (Section 4) and Results (Section 5) sections, including the matched 300-step budget and task protocols. To address the concern, we will include variance measures (standard deviations across multiple seeds) and note the number of evaluation trials in the revised manuscript's results section. The abstract will be updated to reference the main text for protocol details if space permits. revision: partial
Referee: [Experimental Setup / Results] The headline performance claim requires that the original π₀ flow-matching baseline was also run at precisely 300 steps with no other changes to the upstream VLA stack, observation normalization, or reward shaping. The manuscript must provide explicit confirmation that EqM's equilibrium iteration has equivalent per-step cost and that the fixed budget constitutes an apples-to-apples comparison isolating the decoder effect.

Authors: The experiments were conducted with the π₀ flow-matching baseline run at exactly 300 steps under identical conditions to π₀-EqM, with no modifications to the upstream vision-language stack, observation normalization, or reward shaping. EqM is formulated to match the per-step computational cost of flow-matching iterations. We will add an explicit confirmation paragraph in the Experimental Setup section to highlight this controlled comparison isolating the decoder substitution. revision: yes
Referee: [Results] The stationarity--executability gap is defined via threshold scans on the authors' own EqM outputs; the manuscript should clarify whether this gap is an independent, falsifiable phenomenon or reduces to quantities defined by the EqM formulation itself.

Authors: The stationarity--executability gap is an empirical observation derived from threshold scans on residual values from EqM-generated actions, showing a task-dependent non-monotonic relationship with success rates. While it utilizes EqM residuals, the gap itself is not a tautological consequence of the EqM equations but rather an observed trade-off between achieving low residuals (stationarity) and achieving high task success (executability). This can be falsified by conducting similar analyses on alternative action decoders or policies, which we suggest as future work. We will revise the manuscript to explicitly discuss this distinction and its implications. revision: yes

Circularity Check

0 steps flagged

No significant circularity in method or claims

full rationale

The paper describes an empirical replacement of the flow-matching decoder in an existing π₀ VLA model with a new Equilibrium Matching decoder, reports benchmark success rates under a fixed step budget, and names an observed non-monotonic pattern as the stationarity--executability gap. No mathematical derivation, parameter fitting, or self-citation chain is presented that reduces the reported performance deltas or the new perspective to quantities defined by the paper's own inputs. The claims rest on external benchmark comparisons that remain falsifiable outside the authors' choices.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no free parameters, axioms, or invented entities can be identified from the provided text.

pith-pipeline@v0.9.0 · 5736 in / 1140 out tokens · 27336 ms · 2026-05-25T04:41:06.236813+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel / Jcost uniqueness matches

?

matches
MATCHES: this paper passage directly uses, restates, or depends on the cited Recognition theorem or module.

EqM learns a time-invariant conditional vector field f_θ(A; c_t) whose roots correspond to the target equilibrium action chunks... LEqM = E[||f_θ(A_γ; c) - w(γ)(A - ε)||^{2}]... Inference as equilibrium solving: A^(k+1) = eA^(k) - η f_θ(eA^(k); c)
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_strictMono_of_one_lt / orbit refinement under J-cost echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

Proposition 1 (Local Potential Descent). Assume f_θ(A; c) = ∇E_t(A) ... E_t(A^(k+1)) ≤ E_t(A^(k)) ... r_k bounds distance to equilibrium via contraction factor ρ
IndisputableMonolith/Cost.lean Jcost_pos_of_ne_one matches

?

matches
MATCHES: this paper passage directly uses, restates, or depends on the cited Recognition theorem or module.

stationarity-executability gap: the iterate with the smallest residual need not execute best

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages · 7 internal anchors

[1]

RT-1: Robotics Transformer for Real-World Control at Scale

A. Brohan, N. Brown, J. Carbajal, Y . Chebotar, J. Dabis, C. Finn, K. Gopalakrishnan, K. Hausman, A. Herzog, J. Hsu,et al., “RT-1: Robotics transformer for real-world control at scale,”arXiv preprint arXiv:2212.06817, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[2]

RT-2: Vision-language-action models transfer web knowledge to robotic control,

B. Zitkovich, T. Yu, S. Xu, P. Xu, T. Xiao, F. Xia, J. Wu, P. Wohlhart, S. Welker, A. Wahid,et al., “RT-2: Vision-language-action models transfer web knowledge to robotic control,” inConference on Robot Learning. PMLR, 2023, pp. 2165–2183

work page 2023
[3]

Octo: An Open-Source Generalist Robot Policy

O. M. Team, D. Ghosh, H. Walke, K. Pertsch, K. Black, O. Mees, S. Dasari, J. Hejna, T. Kreiman, C. Xu,et al., “Octo: An open-source generalist robot policy,”arXiv preprint arXiv:2405.12213, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[4]

OpenVLA: An Open-Source Vision-Language-Action Model

M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. Foster, G. Lam, P. Sanketi,et al., “Open- VLA: An open-source vision-language-action model,”arXiv preprint arXiv:2406.09246, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[5]

$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control

K. Black, N. Brown, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, L. Groom, K. Hausman, B. Ichter,et al., “π 0: A vision- language-action flow model for general robot control,”arXiv preprint arXiv:2410.24164, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[6]

Flow Matching for Generative Modeling

Y . Lipman, R. T. Chen, H. Ben-Hamu, M. Nickel, and M. Le, “Flow matching for generative modeling,”arXiv preprint arXiv:2210.02747, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[7]

Diffusion policy: Visuomotor policy learning via action diffusion,

C. Chi, Z. Xu, S. Feng, E. Cousineau, Y . Du, B. Burchfiel, R. Tedrake, and S. Song, “Diffusion policy: Visuomotor policy learning via action diffusion,”The International Journal of Robotics Research, vol. 44, no. 10-11, pp. 1684–1704, 2025

work page 2025
[8]

Equilibrium matching: Generative modeling with implicit energy-based models,

R. Wang and Y . Du, “Equilibrium matching: Generative modeling with implicit energy-based models,”arXiv preprint arXiv:2510.02300, 2025

work page arXiv 2025
[9]

Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware

T. Z. Zhao, V . Kumar, S. Levine, and C. Finn, “Learning fine-grained bimanual manipulation with low-cost hardware,”arXiv preprint arXiv:2304.13705, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[10]

Control-limited differential dynamic programming,

Y . Tassa, N. Mansard, and E. Todorov, “Control-limited differential dynamic programming,” in2014 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2014, pp. 1168–1175

work page 2014
[11]

Differen- tiable mpc for end-to-end planning and control,

B. Amos, I. Jimenez, J. Sacks, B. Boots, and J. Z. Kolter, “Differen- tiable mpc for end-to-end planning and control,”Advances in neural information processing systems, vol. 31, 2018

work page 2018
[12]

Implicit generation and modeling with energy based models,

Y . Du and I. Mordatch, “Implicit generation and modeling with energy based models,”Advances in Neural Information Processing Systems, vol. 32, 2019

work page 2019
[13]

Implicit behavioral cloning,

P. Florence, C. Lynch, A. Zeng, O. A. Ramirez, A. Wahid, L. Downs, A. Wong, J. Lee, I. Mordatch, and J. Tompson, “Implicit behavioral cloning,” inConference on Robot Learning. PMLR, 2022, pp. 158– 168

work page 2022
[14]

RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation

T. Chen, Z. Chen, B. Chen, Z. Cai, Y . Liu, Z. Li, Q. Liang, X. Lin, Y . Ge, Z. Gu,et al., “RoboTwin 2.0: A scalable data generator and benchmark with strong domain randomization for robust bimanual robotic manipulation,”arXiv preprint arXiv:2506.18088, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[15]

LIBERO: Benchmarking knowledge transfer for lifelong robot learn- ing,

B. Liu, Y . Zhu, C. Gao, Y . Feng, Q. Liu, Y . Zhu, and P. Stone, “LIBERO: Benchmarking knowledge transfer for lifelong robot learn- ing,”Advances in Neural Information Processing Systems, vol. 36, pp. 44 776–44 791, 2023

work page 2023

[1] [1]

RT-1: Robotics Transformer for Real-World Control at Scale

A. Brohan, N. Brown, J. Carbajal, Y . Chebotar, J. Dabis, C. Finn, K. Gopalakrishnan, K. Hausman, A. Herzog, J. Hsu,et al., “RT-1: Robotics transformer for real-world control at scale,”arXiv preprint arXiv:2212.06817, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[2] [2]

RT-2: Vision-language-action models transfer web knowledge to robotic control,

B. Zitkovich, T. Yu, S. Xu, P. Xu, T. Xiao, F. Xia, J. Wu, P. Wohlhart, S. Welker, A. Wahid,et al., “RT-2: Vision-language-action models transfer web knowledge to robotic control,” inConference on Robot Learning. PMLR, 2023, pp. 2165–2183

work page 2023

[3] [3]

Octo: An Open-Source Generalist Robot Policy

O. M. Team, D. Ghosh, H. Walke, K. Pertsch, K. Black, O. Mees, S. Dasari, J. Hejna, T. Kreiman, C. Xu,et al., “Octo: An open-source generalist robot policy,”arXiv preprint arXiv:2405.12213, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[4] [4]

OpenVLA: An Open-Source Vision-Language-Action Model

M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. Foster, G. Lam, P. Sanketi,et al., “Open- VLA: An open-source vision-language-action model,”arXiv preprint arXiv:2406.09246, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[5] [5]

$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control

K. Black, N. Brown, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, L. Groom, K. Hausman, B. Ichter,et al., “π 0: A vision- language-action flow model for general robot control,”arXiv preprint arXiv:2410.24164, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[6] [6]

Flow Matching for Generative Modeling

Y . Lipman, R. T. Chen, H. Ben-Hamu, M. Nickel, and M. Le, “Flow matching for generative modeling,”arXiv preprint arXiv:2210.02747, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[7] [7]

Diffusion policy: Visuomotor policy learning via action diffusion,

C. Chi, Z. Xu, S. Feng, E. Cousineau, Y . Du, B. Burchfiel, R. Tedrake, and S. Song, “Diffusion policy: Visuomotor policy learning via action diffusion,”The International Journal of Robotics Research, vol. 44, no. 10-11, pp. 1684–1704, 2025

work page 2025

[8] [8]

Equilibrium matching: Generative modeling with implicit energy-based models,

R. Wang and Y . Du, “Equilibrium matching: Generative modeling with implicit energy-based models,”arXiv preprint arXiv:2510.02300, 2025

work page arXiv 2025

[9] [9]

Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware

T. Z. Zhao, V . Kumar, S. Levine, and C. Finn, “Learning fine-grained bimanual manipulation with low-cost hardware,”arXiv preprint arXiv:2304.13705, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[10] [10]

Control-limited differential dynamic programming,

Y . Tassa, N. Mansard, and E. Todorov, “Control-limited differential dynamic programming,” in2014 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2014, pp. 1168–1175

work page 2014

[11] [11]

Differen- tiable mpc for end-to-end planning and control,

B. Amos, I. Jimenez, J. Sacks, B. Boots, and J. Z. Kolter, “Differen- tiable mpc for end-to-end planning and control,”Advances in neural information processing systems, vol. 31, 2018

work page 2018

[12] [12]

Implicit generation and modeling with energy based models,

Y . Du and I. Mordatch, “Implicit generation and modeling with energy based models,”Advances in Neural Information Processing Systems, vol. 32, 2019

work page 2019

[13] [13]

Implicit behavioral cloning,

P. Florence, C. Lynch, A. Zeng, O. A. Ramirez, A. Wahid, L. Downs, A. Wong, J. Lee, I. Mordatch, and J. Tompson, “Implicit behavioral cloning,” inConference on Robot Learning. PMLR, 2022, pp. 158– 168

work page 2022

[14] [14]

RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation

T. Chen, Z. Chen, B. Chen, Z. Cai, Y . Liu, Z. Li, Q. Liang, X. Lin, Y . Ge, Z. Gu,et al., “RoboTwin 2.0: A scalable data generator and benchmark with strong domain randomization for robust bimanual robotic manipulation,”arXiv preprint arXiv:2506.18088, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[15] [15]

LIBERO: Benchmarking knowledge transfer for lifelong robot learn- ing,

B. Liu, Y . Zhu, C. Gao, Y . Feng, Q. Liu, Y . Zhu, and P. Stone, “LIBERO: Benchmarking knowledge transfer for lifelong robot learn- ing,”Advances in Neural Information Processing Systems, vol. 36, pp. 44 776–44 791, 2023

work page 2023