arxiv: 2605.10166 · v1 · submitted 2026-05-11 · 💻 cs.RO

Recognition: 2 theorem links

· Lean Theorem

Data-Asymmetric Latent Imagination and Reranking for 3D Robotic Imitation Learning

Lianghao Luo , Xizhou Bu , Ruyan Liu , Qingqiu Huang , Chufeng Tang , Xiaoshuai Hao , Hongbo Wang , Wei Li

Authors on Pith no claims yet

Pith reviewed 2026-05-12 03:25 UTC · model grok-4.3

classification 💻 cs.RO

keywords robotic imitation learninglatent world models3D point cloudsaction rerankingdiffusion policiesmixed-quality demonstrationsflow-matching policies

0 comments

The pith

DALI-R improves 3D robot imitation policies by reranking actions using rollouts imagined from a latent world model trained on mixed-quality data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Robotic imitation learning often discards suboptimal or failed trajectories even though they contain useful information about dynamics and failure modes. The paper demonstrates that a latent world model over 3D point clouds can generate imagined future states from these mixed-quality trajectories. A task completion scorer then evaluates and reranks candidate action chunks produced by a base policy. When applied to diffusion and flow-matching policies, the combined system raises average success rates on standard manipulation benchmarks. The added inference cost remains below 0.7 times the base policy cost.

Core claim

The central claim is that a Latent World Model trained on mixed-quality 3D point-cloud trajectories can generate sufficiently accurate imagined rollouts to let a Task Completion Scorer rerank action chunks, thereby lifting task success rates for 3D base policies without any additional high-quality demonstrations.

What carries the argument

The Data-Asymmetric Latent Imagination and Reranking (DALI-R) framework, which trains the latent world model and scorer on the full mixed-quality dataset while restricting the base policy to high-quality data only.

If this is right

Both diffusion-based and flow-matching 3D policies receive measurable success-rate gains on Adroit and MetaWorld tasks.
The method adds less than 0.7 times the original inference cost while using only existing mixed-quality data.
Failure modes and exploratory trajectories become assets rather than waste for improving decision quality.
The framework separates data quality requirements between the policy and the auxiliary models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same separation of data quality could be tested in real-robot settings where collecting optimal demonstrations is especially expensive.
Reranking might combine with uncertainty estimates to further reduce the impact of model errors in the imagined rollouts.
The approach could be extended to other sensor modalities if a corresponding latent world model can be trained on mixed data.
Success-rate gains may vary with the degree of suboptimality in the training trajectories; systematic sweeps would quantify that dependence.

Load-bearing premise

The latent world model produces imagined trajectories accurate enough that the scorer can reliably pick better actions than the base policy would choose on its own.

What would settle it

Run the base policy and the reranked version side-by-side on the same test episodes; if the reranked actions produce equal or lower success rates on the held-out tasks, the central claim is false.

Figures

Figures reproduced from arXiv: 2605.10166 by Chufeng Tang, Hongbo Wang, Lianghao Luo, Qingqiu Huang, Ruyan Liu, Wei Li, Xiaoshuai Hao, Xizhou Bu.

**Figure 1.** Figure 1: Overview of DALI-R, our data-asymmetric latent imagination and reranking framework. (A) During training, the Base 3D Policy is trained only on successful expert data, while mixed-quality trajectories, including imperfect-success and failed data, are used to train the Latent World Model and Task Completion Scorer. (B) At inference time, stochastic point dropout produces perturbed point-cloud observations f… view at source ↗

**Figure 2.** Figure 2: Diagnostic visualization of the learned Task Completion Scorer and Latent World Model. Scorer plots compare predicted completion scores with Monte-Carlo targets, while WM plots compare scores from predicted and ground-truth future latents. Dashed/solid curves denote references/predictions, and blue/orange curves denote successful/failed trajectories. small number of network function evaluations. This effic… view at source ↗

**Figure 3.** Figure 3: Inference-time efficiency and candidate generation analysis. (a) Latency under different Ncand, with 2D Video + VLM shown only as a simulated latency reference and the dotted line denoting a 100 ms budget. (b) Single-seed candidate-scaling diagnostic on Disassemble for 3D DP + Ours and 3D FM + Ours. (c) Proposal-diversity visualization: clean candidates collapse near one action, while point dropout produce… view at source ↗

**Figure 4.** Figure 4: shows representative rendered observations from the six evaluated simulation tasks. We evaluate two Adroit dexterous-hand tasks, Door and Pen, and four MetaWorld gripper manipulation tasks: Disassemble, Shelf-Place, Stick-Pull, and Pick-Place-Wall [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

read the original abstract

Robotic imitation learning typically assumes access to optimal demonstrations, yet real-world data collection often yields suboptimal, exploratory, or even failed trajectories. Discarding such data wastes valuable information about environment dynamics and failure modes, which can instead be leveraged to improve decision-making. While 3D policies reduce reliance on high-quality demonstrations through strong spatial generalization, they still require large-scale data to achieve high task success. To address this, we propose DALI-R, a Data-Asymmetric Latent Imagination and Reranking framework for 3D robotic imitation learning from mixed-quality trajectories. It learns a Latent World Model over 3D point clouds for imagined rollouts and a Task Completion Scorer that reranks candidate action chunks, improving decision-making without additional high-quality demonstrations. We instantiate DALI-R with both diffusion and efficient flow-matching policies and evaluate it on Adroit and MetaWorld benchmarks. Across the two evaluated 3D base policies, DALI-R achieves an average $6.8$\% improvement in success rate while incurring less than $0.7\times$ additional inference overhead.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DALI-R shows a modest gain from reranking actions using imagined rollouts from a latent world model trained on mixed 3D data, but the central assumption about rollout accuracy lacks direct support.

read the letter

The paper introduces DALI-R, which trains a latent world model over 3D point clouds from mixed-quality trajectories to produce imagined rollouts, then reranks candidate action chunks with a task completion scorer. They apply this on top of existing diffusion and flow-matching 3D policies and report an average 6.8% success rate improvement on Adroit and MetaWorld with under 0.7x added inference cost. The framing treats suboptimal and failed trajectories as useful for learning dynamics rather than discarding them, which fits real data collection where perfect demonstrations are scarce. The data-asymmetric split—world model sees everything while the policy and scorer can emphasize better data—is a reasonable practical choice and keeps the overhead low enough to matter for deployment. Instantiating the method on two different base policies helps show it is not tied to one architecture. The main limitation is that the reported gain depends on the latent world model producing rollouts accurate enough for the scorer to reliably prefer better actions. Mixed-quality training data can introduce errors that compound over multiple steps, especially in contact-rich tasks, and nothing in the abstract shows prediction error, rollout fidelity metrics, or an ablation of the world model trained on clean versus mixed data. Without those checks it is hard to know whether the 6.8% comes from better imagination or from other factors in the implementation. This is aimed at roboticists working on 3D imitation learning who already have noisy trajectory data and want to improve policies without new high-quality collection. Readers focused on world models or reranking for planning will see the specific combination clearly. The work has enough of a concrete idea and benchmark numbers to deserve peer review, though the authors will need to add direct evidence on the imagination step for the claims to hold up. I would send it out and ask referees to check the world model accuracy and ablations first.

Referee Report

3 major / 2 minor

Summary. The paper proposes DALI-R, a Data-Asymmetric Latent Imagination and Reranking framework for 3D robotic imitation learning from mixed-quality trajectories. It trains a Latent World Model (LWM) on 3D point clouds to generate imagined rollouts and a Task Completion Scorer to rerank action chunks produced by base 3D policies (instantiated with both diffusion and flow-matching models). On Adroit and MetaWorld benchmarks, DALI-R reports an average 6.8% success-rate improvement over the base policies while adding less than 0.7× inference overhead.

Significance. If the central empirical claim holds under proper verification, the work would be significant for imitation learning: it demonstrates a practical way to extract value from suboptimal and failed trajectories via latent imagination and reranking, thereby lowering the data-quality barrier for high-performing 3D policies. The dual-policy instantiation and explicit overhead measurement are positive features that support broader applicability.

major comments (3)

[§4 (Experiments) and Table 1] §4 (Experiments) and Table 1: The 6.8% average success-rate improvement is presented without training hyperparameters, number of random seeds, statistical significance tests, or per-task variance; this absence makes it impossible to determine whether the reported gain is robust or could be explained by training stochasticity.
[§3.1 (Latent World Model)] §3.1 (Latent World Model): The claim that imagined rollouts from an LWM trained on mixed-quality point clouds are sufficiently accurate for the Task Completion Scorer to reliably improve decisions is load-bearing, yet the manuscript supplies no single-step or multi-step prediction error metrics, no rollout fidelity ablations, and no comparison of LWM performance when trained on high-quality versus mixed data.
[§3.2 (Task Completion Scorer) and §4.3 (Ablations)] §3.2 (Task Completion Scorer) and §4.3 (Ablations): No quantitative breakdown is given of how often the scorer selects a better action chunk than the base policy versus cases where reranking degrades performance; without this, the 6.8% gain cannot be confidently attributed to the proposed components rather than other factors.

minor comments (2)

[§4.2] The overhead claim (<0.7×) should be accompanied by a precise definition of the measurement (wall-clock time per action chunk, relative to which baseline, on which hardware) in the main text rather than only the abstract.
[§2] Notation for the latent state, point-cloud encoding, and action-chunk representation is introduced without a consolidated table of symbols, which would aid readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful review and constructive suggestions. The comments correctly identify areas where additional experimental details and analyses would strengthen the presentation of our results. We address each point below and will incorporate the requested information in the revised manuscript.

read point-by-point responses

Referee: [§4 (Experiments) and Table 1] §4 (Experiments) and Table 1: The 6.8% average success-rate improvement is presented without training hyperparameters, number of random seeds, statistical significance tests, or per-task variance; this absence makes it impossible to determine whether the reported gain is robust or could be explained by training stochasticity.

Authors: We agree that these details are necessary to establish robustness. In the revision we will expand Section 4 to list all training hyperparameters, state the number of random seeds (we used 5), report per-task success rates with standard deviations, and include statistical significance tests (paired t-tests across seeds) comparing DALI-R to the base policies. Updated Table 1 will reflect these changes. revision: yes
Referee: [§3.1 (Latent World Model)] §3.1 (Latent World Model): The claim that imagined rollouts from an LWM trained on mixed-quality point clouds are sufficiently accurate for the Task Completion Scorer to reliably improve decisions is load-bearing, yet the manuscript supplies no single-step or multi-step prediction error metrics, no rollout fidelity ablations, and no comparison of LWM performance when trained on high-quality versus mixed data.

Authors: The predictive fidelity of the LWM is indeed central. While end-to-end task improvements provide indirect evidence, we will add direct metrics in the revised Section 3.1: single-step and 10-step point-cloud prediction MSE, rollout visualizations, and an ablation comparing LWM variants trained on high-quality-only versus mixed-quality data. These additions will quantify the accuracy of imagined trajectories used by the scorer. revision: yes
Referee: [§3.2 (Task Completion Scorer) and §4.3 (Ablations)] §3.2 (Task Completion Scorer) and §4.3 (Ablations): No quantitative breakdown is given of how often the scorer selects a better action chunk than the base policy versus cases where reranking degrades performance; without this, the 6.8% gain cannot be confidently attributed to the proposed components rather than other factors.

Authors: We acknowledge that a per-decision breakdown would strengthen attribution. In the revised Section 4.3 we will add a quantitative analysis reporting (i) the fraction of timesteps where the scorer selects a higher-completion action chunk than the base policy and (ii) the fraction where it selects a lower one, together with the resulting success-rate delta in each case. This will be presented as a new table or bar plot. revision: yes

Circularity Check

0 steps flagged

No circularity: new components and empirical gains are independent of fitted inputs

full rationale

The paper introduces a Latent World Model and Task Completion Scorer as additional modules trained on mixed-quality data, then reports empirical success-rate gains on Adroit and MetaWorld. No equations or self-citations are shown that define the reported 6.8% improvement as a direct algebraic consequence of the same data used to fit the base policy or the new modules. The derivation chain (train LWM on point clouds → generate imagined rollouts → score and rerank action chunks) remains an independent modeling choice whose validity is tested by external benchmarks rather than by construction. Minor self-citations to prior 3D policy work exist but are not load-bearing for the central claim.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the unstated assumption that 3D point-cloud world models trained on mixed data remain predictive enough for reranking; no explicit free parameters, axioms, or invented entities are listed in the abstract.

pith-pipeline@v0.9.0 · 5511 in / 1015 out tokens · 40251 ms · 2026-05-12T03:25:54.196355+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

DALI-R ... learns a Latent World Model over 3D point clouds for imagined rollouts and a Task Completion Scorer that reranks candidate action chunks
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The Latent World Model Wψ predicts a chunk-level residual transition ... trained with a supervised latent prediction loss on Dmix

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · 3 internal anchors

[1]

Argall, Sonia Chernova, Manuela M

Brenna D. Argall, Sonia Chernova, Manuela M. Veloso, and Brett Browning. A survey of robot learning from demonstration.Robotics Auton. Syst., 57(5):469–483, 2009

work page 2009
[2]

Ramirez, Ayzaan Wahid, Laura Downs, Adrian Wong, Johnny Lee, Igor Mordatch, and Jonathan Tompson

Pete Florence, Corey Lynch, Andy Zeng, Oscar A. Ramirez, Ayzaan Wahid, Laura Downs, Adrian Wong, Johnny Lee, Igor Mordatch, and Jonathan Tompson. Implicit behavioral cloning. InConference on Robot Learning, 8-11 November 2021, London, UK, Proceedings of Machine Learning Research, pages 158–168, 2021

work page 2021
[3]

Diffusion policy: Visuomotor policy learning via action diffusion

Cheng Chi, Siyuan Feng, Yilun Du, Zhenjia Xu, Eric Cousineau, Benjamin Burchfiel, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion. InRobotics: Science and Systems XIX, Daegu, Republic of Korea, July 10-14, 2023, 2023

work page 2023
[4]

Zhao, Vikash Kumar, Sergey Levine, and Chelsea Finn

Tony Z. Zhao, Vikash Kumar, Sergey Levine, and Chelsea Finn. Learning fine-grained bimanual manipula- tion with low-cost hardware. InRobotics: Science and Systems XIX, Daegu, Republic of Korea, July 10-14, 2023, 2023

work page 2023
[5]

Gordon, and Drew Bagnell

Stéphane Ross, Geoffrey J. Gordon, and Drew Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. InProceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2011, Fort Lauderdale, USA, April 11-13, 2011, JMLR Proceedings, pages 627–635, 2011

work page 2011
[6]

Brown, Wonjoon Goo, Prabhat Nagarajan, and Scott Niekum

Daniel S. Brown, Wonjoon Goo, Prabhat Nagarajan, and Scott Niekum. Extrapolating beyond suboptimal demonstrations via inverse reinforcement learning from observations. InProceedings of the 36th Inter- national Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, Proceedings of Machine Learning Research, pages 783–792, 2019

work page 2019
[7]

Inference-time enhancement of generative robot policies via predictive world modeling.IEEE Robotics Autom

Han Qi, Haocheng Yin, Aris Zhu, Yilun Du, and Heng Yang. Inference-time enhancement of generative robot policies via predictive world modeling.IEEE Robotics Autom. Lett., 11(5):5534–5541, 2026

work page 2026
[8]

Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning

Moo Jin Kim, Yihuai Gao, Tsung-Yi Lin, Yen-Chen Lin, Yunhao Ge, Grace Lam, Percy Liang, Shuran Song, Ming-Yu Liu, Chelsea Finn, and Jinwei Gu. Cosmos policy: Fine-tuning video models for visuomotor control and planning.CoRR, abs/2601.16163, 2026

work page internal anchor Pith review arXiv 2026
[9]

Patel, Paul Pu Liang, Daniel Khashabi, Cheng Peng, Rama Chellappa, Tianmin Shu, Alan L

Jiahan Zhang, Muqing Jiang, Nanru Dai, Taiming Lu, Arda Uzunoglu, Shunchi Zhang, Yana Wei, Jiahao Wang, Vishal M. Patel, Paul Pu Liang, Daniel Khashabi, Cheng Peng, Rama Chellappa, Tianmin Shu, Alan L. Yuille, Yilun Du, and Jieneng Chen. World-in-world: World models in a closed-loop world.CoRR, abs/2510.18135, 2025

work page arXiv 2025
[10]

Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023, 2023

work page 2023
[11]

Learning complex dexterous manipulation with deep reinforcement learning and demon- strations

Aravind Rajeswaran, Vikash Kumar, Abhishek Gupta, Giulia Vezzani, John Schulman, Emanuel Todorov, and Sergey Levine. Learning complex dexterous manipulation with deep reinforcement learning and demon- strations. InRobotics: Science and Systems XIV , Carnegie Mellon University, Pittsburgh, Pennsylvania, USA, June 26-30, 2018, 2018

work page 2018
[12]

Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning

Tianhe Yu, Deirdre Quillen, Zhanpeng He, Ryan Julian, Karol Hausman, Chelsea Finn, and Sergey Levine. Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In3rd Annual Conference on Robot Learning, CoRL 2019, Osaka, Japan, October 30 - November 1, 2019, Proceedings, Proceedings of Machine Learning Research, pages 1094–1100, 2019

work page 2019
[13]

3d diffusion policy: Generalizable visuomotor policy learning via simple 3d representations

Yanjie Ze, Gu Zhang, Kangning Zhang, Chenyuan Hu, Muhan Wang, and Huazhe Xu. 3d diffusion policy: Generalizable visuomotor policy learning via simple 3d representations. InRobotics: Science and Systems XX, Delft, The Netherlands, July 15-19, 2024, 2024

work page 2024
[14]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. InAdvances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020

work page 2020
[15]

Denoising diffusion implicit models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. In9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021, 2021

work page 2021
[16]

Perceiver-actor: A multi-task transformer for robotic manipulation

Mohit Shridhar, Lucas Manuelli, and Dieter Fox. Perceiver-actor: A multi-task transformer for robotic manipulation. InConference on Robot Learning, CoRL 2022, 14-18 December 2022, Auckland, New Zealand, Proceedings of Machine Learning Research, pages 785–799, 2022. 10

work page 2022
[17]

Act3d: 3d feature field transformers for multi-task robotic manipulation

Théophile Gervet, Zhou Xian, Nikolaos Gkanatsios, and Katerina Fragkiadaki. Act3d: 3d feature field transformers for multi-task robotic manipulation. InConference on Robot Learning, CoRL 2023, 6-9 November 2023, Atlanta, GA, USA, Proceedings of Machine Learning Research, pages 3949–3965, 2023

work page 2023
[18]

RVT: robotic view transformer for 3d object manipulation

Ankit Goyal, Jie Xu, Yijie Guo, Valts Blukis, Yu-Wei Chao, and Dieter Fox. RVT: robotic view transformer for 3d object manipulation. InConference on Robot Learning, CoRL 2023, 6-9 November 2023, Atlanta, GA, USA, Proceedings of Machine Learning Research, pages 694–710, 2023

work page 2023
[19]

Consistency models

Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. InInternational Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, Proceedings of Machine Learning Research, pages 32211–32252, 2023

work page 2023
[20]

Flowpolicy: Enabling fast and robust 3d flow-based policy via consistency flow matching for robot manipulation

Qinglun Zhang, Zhen Liu, Haoqiang Fan, Guanghui Liu, Bing Zeng, and Shuaicheng Liu. Flowpolicy: Enabling fast and robust 3d flow-based policy via consistency flow matching for robot manipulation. In Thirty-Ninth AAAI Conference on Artificial Intelligence, Thirty-Seventh Conference on Innovative Applica- tions of Artificial Intelligence, Fifteenth Symposiu...

work page 2025
[21]

Conservative q-learning for offline reinforcement learning

Aviral Kumar, Aurick Zhou, George Tucker, and Sergey Levine. Conservative q-learning for offline reinforcement learning. InAdvances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020

work page 2020
[22]

Offline reinforcement learning with implicit q-learning

Ilya Kostrikov, Ashvin Nair, and Sergey Levine. Offline reinforcement learning with implicit q-learning. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022, 2022

work page 2022
[23]

Ziebart, Andrew L

Brian D. Ziebart, Andrew L. Maas, J. Andrew Bagnell, and Anind K. Dey. Maximum entropy inverse reinforcement learning. InProceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, AAAI 2008, Chicago, Illinois, USA, July 13-17, 2008, pages 1433–1438, 2008

work page 2008
[24]

Kroese, Shie Mannor, and Reuven Y

Pieter-Tjerk de Boer, Dirk P. Kroese, Shie Mannor, and Reuven Y . Rubinstein. A tutorial on the cross- entropy method.Ann. Oper. Res., 134(1):19–67, 2005

work page 2005
[25]

Rehg, Byron Boots, and Evange- los A

Grady Williams, Nolan Wagener, Brian Goldfain, Paul Drews, James M. Rehg, Byron Boots, and Evange- los A. Theodorou. Information theoretic MPC for model-based reinforcement learning. In2017 IEEE International Conference on Robotics and Automation, ICRA 2017, Singapore, Singapore, May 29 - June 3, 2017, pages 1714–1721, 2017

work page 2017
[26]

Deep visual foresight for planning robot motion

Chelsea Finn and Sergey Levine. Deep visual foresight for planning robot motion. In2017 IEEE International Conference on Robotics and Automation, ICRA 2017, Singapore, Singapore, May 29 - June 3, 2017, pages 2786–2793, 2017

work page 2017
[27]

Tenenbaum, and Sergey Levine

Michael Janner, Yilun Du, Joshua B. Tenenbaum, and Sergey Levine. Planning with diffusion for flexible behavior synthesis. InInternational Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, Proceedings of Machine Learning Research, pages 9902–9915, 2022

work page 2022
[28]

Temporal difference learning for model predictive control

Nicklas Hansen, Hao Su, and Xiaolong Wang. Temporal difference learning for model predictive control. InInternational Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, Proceedings of Machine Learning Research, pages 8387–8406, 2022

work page 2022
[29]

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy P. Lillicrap. Mastering diverse domains through world models.CoRR, abs/2301.04104, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[30]

Learning universal policies via text-guided video generation

Yilun Du, Sherry Yang, Bo Dai, Hanjun Dai, Ofir Nachum, Josh Tenenbaum, Dale Schuurmans, and Pieter Abbeel. Learning universal policies via text-guided video generation. InAdvances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023, 2023

work page 2023
[31]

Mujoco: A physics engine for model-based control

Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. In2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2012, Vilamoura, Algarve, Portugal, October 7-12, 2012, pages 5026–5033, 2012

work page 2012
[32]

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Peng Wang, Shuai Bai, Sinan Tan, Shijie Wang, Zhihao Fan, Jinze Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Yang Fan, Kai Dang, Mengfei Du, Xuancheng Ren, Rui Men, Dayiheng Liu, Chang Zhou, Jingren Zhou, and Junyang Lin. Qwen2-vl: Enhancing vision-language model’s perception of the world at any resolution.CoRR, abs/2409.12191, 2024. 11 A Benchma...

work page internal anchor Pith review Pith/arXiv arXiv 2024