Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making

Biwei Huang; Fan Feng; Kun Zhang; Minghao Fu; Selena Ge; Yingyao Hu; Yujia Zheng; Zeyu Tang; Zijian Li

arxiv: 2605.16054 · v1 · pith:ELEA6PW2new · submitted 2026-05-15 · 💻 cs.LG · cs.AI

Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making

Fan Feng , Selena Ge , Minghao Fu , Zijian Li , Yujia Zheng , Zeyu Tang , Yingyao Hu , Biwei Huang

show 1 more author

Kun Zhang

This is my paper

Pith reviewed 2026-05-20 20:49 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords diffusion modelslatent dynamicsdecision makingplanning and controladaptive policiescausal modelssequence modelinggenerative decision-making

0 comments

The pith

A diffusion model for decision-making identifies hidden latent dynamics from short observation sequences to adapt planning and control.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to establish that decision-making framed as sequence generation benefits from explicitly recovering and using evolving latent factors that drive environment transitions, rewards, and agent behavior. It proves that these latent processes can be identified from small temporal blocks of observations under mild conditions. From this, the authors build Ada-Diffuser, a causal diffusion model that jointly captures observed interaction patterns and the underlying latent dynamics, then deploys them for both planning and policy learning. A sympathetic reader would care because many control problems involve unobserved variables that change over time, and handling them directly could produce agents that adjust more reliably to shifts in dynamics or rewards.

Core claim

We theoretically show that under mild conditions, the latent process can be identified from small temporal blocks of observations. Building on this insight, we introduce Ada-Diffuser, a causal diffusion model that learns the temporal structure of observed interactions and the underlying latent dynamics simultaneously, and furthermore, leverages them for planning and control. With a modular design, Ada-Diffuser supports both planning and policy learning tasks, enabling adaptation to latent variations in dynamics, rewards, and latent actions.

What carries the argument

Ada-Diffuser, the causal diffusion model that jointly learns observed temporal structures and latent dynamics from minimal observations for adaptive planning and control.

If this is right

Ada-Diffuser can perform both planning and policy learning within the same modular framework.
The approach enables adaptation to changes in latent dynamics, rewards, and latent actions during execution.
Accurate latent inference supports more robust decision-making on simulated control and robotic tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the identification result holds, similar latent-recovery techniques could extend to partially observable real-world robotics where only brief trajectory snippets are available.
The causal diffusion structure may offer a route to handling non-stationary environments by treating latent shifts as identifiable changes rather than noise.
Combining this with other sequence models could test whether the small-block identification property transfers beyond diffusion.

Load-bearing premise

The latent process can be identified from small temporal blocks of observations under mild conditions.

What would settle it

A controlled benchmark where ground-truth latent factors are known but Ada-Diffuser's inferred latents do not improve planning success rate or sample efficiency over a standard diffusion baseline without latent modeling.

Figures

Figures reproduced from arXiv: 2605.16054 by Biwei Huang, Fan Feng, Kun Zhang, Minghao Fu, Selena Ge, Yingyao Hu, Yujia Zheng, Zeyu Tang, Zijian Li.

**Figure 2.** Figure 2: Overview of the Ada-Diffuser framework. The modular design consists of two main stages: latent context identification (Stage 1, Section 4.2), followed by a causal diffusion model (Stage 2, Section 4.3) that models the generative structure of the trajectories. The learned model is then used for planning or policy learning conditioned on the inferred latent context. Theorem 1 (Identifiability on Latent Facto… view at source ↗

**Figure 3.** Figure 3: zig-zag sampling (2 steps). Training: Given a noisy input x kt t with noise level kt, we first sample an initial latent context from the prior: cˆ prior t ∼ pϕ(ct | ct−1), and use it to denoise the observation: xˆ (0) t = ϵθ(x kt t , kt, cˆ prior t ). Then we infer the latent using the posterior network, conditioned on a broader temporal window including future observations (accessible in offline data): cˆ… view at source ↗

**Figure 4.** Figure 4: (a). Identification Results (i.e., Linear Probing MSE, [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Results on environments without explicitly designed latent factors. Complete results are [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Verification of the assumptions. (a) Transition separability in Cheetah under the hyperparameter setting (m, n) = (5, 0.5). (b) Transition separability under a weak–context setting (m, n) = (0.2, 0.2), where the context barely affects the dynamics. (c) Average reward drop when planning with vs. without conditioning on c, plotted against the transition separability. (d) k–distributions for (5, 0.5), (e) k–… view at source ↗

**Figure 7.** Figure 7: An illustration of the zig-zag sampling process with a block of 4 time steps. [PITH_FULL_IMAGE:figures/full_fig_p034_7.png] view at source ↗

**Figure 8.** Figure 8: Illustrations of the Benchmarks. From left to right: Half-Cheetah, Ant, Walker, FrankaKitchen, Maze2D, and LIBERO [PITH_FULL_IMAGE:figures/full_fig_p041_8.png] view at source ↗

**Figure 9.** Figure 9: Illustrations of RoboMimic Benchmark. function that challenges the policy to adapt to shifting goals. Specifically, we consider dt = σ(5 · sin(2πt/200)), where σ(·) denotes the sigmoid function, α controls the sharpness of the transition, and T determines the switching period. This formulation induces a smooth periodic change in the preferred direction of movement, requiring the policy to adapt to graduall… view at source ↗

**Figure 10.** Figure 10: Identification results (MSE of linear probing and R2 ) versus the length of temporal blocks. Left: Cheetah with time-varying wind; Right: Cheetah with time-varying rewards. Clustering We assess whether the learned latent space organizes states by the underlying context on the Cheetah wind-change task, where the ground-truth latent evolves as fw(t) = 5 + 5 sin(0.5t). We sample 1000 time steps, discretize f… view at source ↗

**Figure 11.** Figure 11: Clustering (t-SNE)results on Cheetah wind-change. [PITH_FULL_IMAGE:figures/full_fig_p046_11.png] view at source ↗

**Figure 12.** Figure 12: Results with different planning and execution horizons. We evaluate on Kitchen-partial and Libero-Long experiments. we do not explicitly impose latent variables, our model implicitly learns representations that can track stochasticity and support smooth control. 47 [PITH_FULL_IMAGE:figures/full_fig_p047_12.png] view at source ↗

read the original abstract

Recent work has framed decision-making as a sequence modeling problem using generative models such as diffusion models. Although promising, these approaches often overlook latent factors that exhibit evolving dynamics, elements that are fundamental to environment transitions, reward structures, and high-level agent behavior. Explicitly modeling these hidden processes is essential for both precise dynamics modeling and effective decision-making. In this paper, we propose a unified framework that explicitly incorporates latent dynamic inference into generative decision-making from minimal yet sufficient observations. We theoretically show that under mild conditions, the latent process can be identified from small temporal blocks of observations. Building on this insight, we introduce Ada-Diffuser, a causal diffusion model that learns the temporal structure of observed interactions and the underlying latent dynamics simultaneously, and furthermore, leverages them for planning and control. With a modular design, Ada-Diffuser supports both planning and policy learning tasks, enabling adaptation to latent variations in dynamics, rewards, and latent actions. Experiments on simulated control and robotic benchmarks demonstrate its effectiveness in accurate latent inference and adaptive policy learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds latent dynamic inference to diffusion decision models with a claimed identifiability result from short observation blocks, but the closed-loop shift concern looks real and unaddressed in the abstract.

read the letter

The core contribution is a framework that infers evolving latent factors alongside a causal diffusion model for planning and control. It claims that under mild conditions the latents can be recovered from small temporal blocks of observations, then uses that to adapt to changes in dynamics and rewards. That combination is new enough to stand out from standard diffusion RL work, and the modular design that supports both planning and policy learning is a practical plus for robotics-style tasks. Experiments on simulated control and robotic benchmarks are mentioned, which at least gives a place to check whether the adaptation actually helps in non-stationary settings. The identifiability claim is the part that needs the most scrutiny. The abstract states it without showing conditions or a proof sketch, so it is impossible to tell whether the argument assumes passive observation sequences or already accounts for the policy feeding back into the data distribution. The stress-test note is on point here: once actions are chosen using the inferred latents, future observation blocks are no longer drawn from the same distribution, and any stationarity or independence assumptions in the proof could break. If the full paper does not add assumptions on the policy or value function to close that loop, the theoretical justification for simultaneous learning weakens. This work is aimed at people already using generative models for sequential decision making who also care about hidden non-stationarity. A reader who wants concrete mechanisms for latent adaptation in control will get something usable from the experiments even if the theory needs tightening. It is worth sending to peer review so the identifiability argument and any closed-loop handling can be checked properly.

Referee Report

2 major / 2 minor

Summary. The paper introduces Ada-Diffuser, a causal diffusion model for decision-making that integrates latent dynamic inference. It claims to theoretically demonstrate that under mild conditions, the latent process can be identified from small temporal blocks of observations, enabling simultaneous learning of temporal structures and latent dynamics for planning and control. Experiments on simulated control and robotic benchmarks demonstrate effectiveness in latent inference and adaptive policy learning.

Significance. If the identifiability result holds and extends to the closed-loop setting, the work could advance generative sequence models for decision-making by explicitly handling evolving latent factors that affect dynamics and rewards. The modular design supporting both planning and policy learning tasks is a concrete strength that allows adaptation to latent variations.

major comments (2)

[Abstract] Abstract (theoretical result paragraph): the claim that the latent process can be identified from small temporal blocks under mild conditions is load-bearing for the justification of Ada-Diffuser, yet the manuscript supplies neither the specific conditions, derivation steps, nor proof sketch. Without these, it is impossible to verify whether the identification is independent of the downstream diffusion component or survives the distribution shift induced by policy-dependent actions.
[Planning and control sections] Planning and control sections (e.g., the description of causal diffusion for adaptive decision-making): the identifiability argument is developed for passive observation sequences, but once the model is used for planning, actions are selected from inferred latents and create a feedback loop. This closed-loop effect can violate the stationarity or independence assumptions typical of block-identifiability results; the paper does not supply additional assumptions on the policy or value function that would restore validity.

minor comments (2)

[Introduction] Notation for temporal blocks and latent variables should be introduced with explicit definitions and an early figure or diagram to improve readability.
[Related Work] Related-work discussion would benefit from explicit comparison to prior latent-variable diffusion models in RL and to identifiability results for dynamical systems.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. These highlight key points about the presentation of the theoretical identifiability result and its extension from passive observations to closed-loop planning and control. We respond to each major comment below and will make revisions to improve clarity and rigor without altering the core contributions.

read point-by-point responses

Referee: [Abstract] Abstract (theoretical result paragraph): the claim that the latent process can be identified from small temporal blocks under mild conditions is load-bearing for the justification of Ada-Diffuser, yet the manuscript supplies neither the specific conditions, derivation steps, nor proof sketch. Without these, it is impossible to verify whether the identification is independent of the downstream diffusion component or survives the distribution shift induced by policy-dependent actions.

Authors: We agree that the abstract states the identifiability claim at a high level without enumerating the precise conditions or including a proof sketch. In the revision we will expand the abstract to reference the specific mild conditions (sufficiently informative temporal blocks, latent observability, and block-wise independence) and add a concise proof outline to the main text. The result is derived solely from the causal structure of the observed sequences and holds independently of the subsequent diffusion model used for generation. The identification is performed on offline observational data; any distribution shift arising from policy-dependent actions during deployment is handled by the online latent inference module rather than by re-deriving the theorem under closed-loop data. revision: yes
Referee: [Planning and control sections] Planning and control sections (e.g., the description of causal diffusion for adaptive decision-making): the identifiability argument is developed for passive observation sequences, but once the model is used for planning, actions are selected from inferred latents and create a feedback loop. This closed-loop effect can violate the stationarity or independence assumptions typical of block-identifiability results; the paper does not supply additional assumptions on the policy or value function that would restore validity.

Authors: We acknowledge that the core identifiability theorem is stated for passive observation sequences. In the planning and control sections the model performs causal latent inference on recent observation blocks and then generates actions conditioned on the current latent estimate, which indeed introduces a feedback loop. The manuscript currently relies on empirical results across benchmarks to demonstrate that adaptation remains effective. In the revision we will add an explicit discussion paragraph that states the additional working assumptions needed for the closed-loop case (e.g., that the policy is Lipschitz in the latent estimate and that latent dynamics vary slowly relative to the planning horizon) and note the resulting bounded deviation from the original stationarity conditions. revision: yes

Circularity Check

0 steps flagged

No significant circularity; theoretical identifiability presented as independent of model fitting

full rationale

The paper's central derivation begins with a theoretical claim that the latent process is identifiable from small temporal blocks under mild conditions, followed by introduction of the Ada-Diffuser model that builds on this insight for simultaneous learning of temporal structure and latent dynamics. No equations or derivations are provided in the abstract or described sections that define the latent process in terms of the diffusion outputs or that rename fitted parameters as predictions. The identification result is positioned as a prior, independent contribution rather than a self-referential fit or self-citation load-bearing step. The derivation chain therefore remains self-contained against external benchmarks, with the model leveraging the claimed identifiability without reducing to it by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the assumption that latent processes are identifiable from small temporal observation blocks under mild conditions; no free parameters or invented entities are described in the abstract.

axioms (1)

domain assumption Under mild conditions, the latent process can be identified from small temporal blocks of observations.
This is the stated basis for the theoretical result that enables the rest of the framework.

pith-pipeline@v0.9.0 · 5732 in / 1141 out tokens · 39330 ms · 2026-05-20T20:49:51.360581+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We theoretically show that under mild conditions, the latent process can be identified from small temporal blocks of observations... Theorem 1 (Identifiability on Latent Factors). Under Assumptions 1-3, the posterior distribution of latent factor with consecutive observations p(ct | x t−2:t+1) can be identifiable up to an invertible transformation
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Ada-Diffuser, a causal diffusion model that learns the temporal structure of observed interactions and the underlying latent dynamics simultaneously

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

283 extracted references · 283 canonical work pages · 30 internal anchors

[1]

nature , volume=

Human-level control through deep reinforcement learning , author=. nature , volume=. 2015 , publisher=

work page 2015
[2]

arXiv preprint arXiv:2310.10625 , year=

Video Language Planning , author=. arXiv preprint arXiv:2310.10625 , year=

work page arXiv
[3]

Conference on robot learning , pages=

Accelerating reinforcement learning with learned skill priors , author=. Conference on robot learning , pages=. 2021 , organization=

work page 2021
[4]

Advances in Neural Information Processing Systems , volume=

Goal-conditioned predictive coding for offline reinforcement learning , author=. Advances in Neural Information Processing Systems , volume=

work page
[5]

Learning for Dynamics and Control , pages=

Planning from Images with Deep Latent Gaussian Process Dynamics , author=. Learning for Dynamics and Control , pages=. 2020 , organization=

work page 2020
[6]

Advances in neural information processing systems , volume=

Learning universal policies via text-guided video generation , author=. Advances in neural information processing systems , volume=

work page
[7]

Proceedings of the 27th IEEE Conference on Decision and Control , pages=

Receding horizon control of nonlinear systems , author=. Proceedings of the 27th IEEE Conference on Decision and Control , pages=. 1988 , organization=

work page 1988
[8]

International conference on machine learning , pages=

Efficient off-policy meta-reinforcement learning via probabilistic context variables , author=. International conference on machine learning , pages=. 2019 , organization=

work page 2019
[9]

Conference on Robot Learning (CoRL) , year=

What Matters in Learning from Offline Human Demonstrations for Robot Manipulation , author=. Conference on Robot Learning (CoRL) , year=

work page
[10]

Journal of mathematical analysis and applications , volume=

Optimal control of Markov processes with incomplete state information I , author=. Journal of mathematical analysis and applications , volume=. 1965 , publisher=

work page 1965
[11]

Octo: An Open-Source Generalist Robot Policy

Octo: An open-source generalist robot policy , author=. arXiv preprint arXiv:2405.12213 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[12]

Advances in neural information processing systems , volume=

Denoising diffusion probabilistic models , author=. Advances in neural information processing systems , volume=

work page
[13]

Advances in neural information processing systems , volume=

Diffusion models beat gans on image synthesis , author=. Advances in neural information processing systems , volume=

work page
[14]

International Conference on Machine Learning , pages=

Metadiffuser: Diffusion model as conditional planner for offline meta-rl , author=. International Conference on Machine Learning , pages=. 2023 , organization=

work page 2023
[15]

arXiv preprint arXiv:2302.01877 , year=

Adaptdiffuser: Diffusion models as adaptive self-evolving planners , author=. arXiv preprint arXiv:2302.01877 , year=

work page arXiv
[16]

arXiv preprint arXiv:2107.02729 , year=

Adarl: What, where, and how to adapt in transfer reinforcement learning , author=. arXiv preprint arXiv:2107.02729 , year=

work page arXiv
[17]

Advances in Neural Information Processing Systems , volume=

Learning dynamic attribute-factored world models for efficient multi-object reinforcement learning , author=. Advances in Neural Information Processing Systems , volume=

work page
[18]

arXiv preprint arXiv:2505.08361 , year=

Modeling unseen environments with language-guided composable causal components in reinforcement learning , author=. arXiv preprint arXiv:2505.08361 , year=

work page arXiv
[19]

The Thirteenth International Conference on Learning Representations , year=

SafeDiffuser: Safe Planning with Diffusion Probabilistic Models , author=. The Thirteenth International Conference on Learning Representations , year=

work page
[20]

Advances in Neural Information Processing Systems , volume=

Madiff: Offline multi-agent learning with diffusion models , author=. Advances in Neural Information Processing Systems , volume=

work page
[21]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Motiondiffuser: Controllable multi-agent motion prediction using diffusion , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[22]

The Twelfth International Conference on Learning Representations , year=

AlignDiff: Aligning Diverse Human Preferences via Behavior-Customisable Diffusion Model , author=. The Twelfth International Conference on Learning Representations , year=

work page
[23]

Advances in neural information processing systems , volume=

Diffusion model is an effective planner and data synthesizer for multi-task reinforcement learning , author=. Advances in neural information processing systems , volume=

work page
[24]

Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning

Diffusion policies as an expressive policy class for offline reinforcement learning , author=. arXiv preprint arXiv:2208.06193 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[26]

Conference on robot learning , pages=

Xskill: Cross embodiment skill discovery , author=. Conference on robot learning , pages=. 2023 , organization=

work page 2023
[27]

International Conference on Machine Learning , pages=

Contrastive energy prediction for exact energy-guided diffusion sampling in offline reinforcement learning , author=. International Conference on Machine Learning , pages=. 2023 , organization=

work page 2023
[28]

Advances in Neural Information Processing Systems , volume=

Latent plan transformer for trajectory abstraction: Planning as latent space inference , author=. Advances in Neural Information Processing Systems , volume=

work page
[30]

Open-Sora: Democratizing Efficient Video Production for All

Open-sora: Democratizing efficient video production for all , author=. arXiv preprint arXiv:2412.20404 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[31]

Advances in Neural Information Processing Systems , volume=

Video diffusion models , author=. Advances in Neural Information Processing Systems , volume=

work page
[32]

Advances in Neural Information Processing Systems , volume=

Diffusion forcing: Next-token prediction meets full-sequence diffusion , author=. Advances in Neural Information Processing Systems , volume=

work page
[33]

Advances in Neural Information Processing Systems , volume=

Ar-diffusion: Auto-regressive diffusion model for text generation , author=. Advances in Neural Information Processing Systems , volume=

work page
[34]

2025 , url=

MAGI-1: Autoregressive Video Generation at Scale , author=. 2025 , url=

work page 2025
[35]

arXiv preprint arXiv:2410.08151 , year=

Progressive autoregressive video diffusion models , author=. arXiv preprint arXiv:2410.08151 , year=

work page arXiv
[38]

arXiv preprint arXiv:2402.03570 , year=

Diffusion world model: Future modeling beyond step-by-step rollout for offline reinforcement learning , author=. arXiv preprint arXiv:2402.03570 , year=

work page arXiv
[39]

2023 , eprint=

IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies , author=. 2023 , eprint=

work page 2023
[40]

The Thirteenth International Conference on Learning Representations , year=

Diffusion Policy Policy Optimization , author=. The Thirteenth International Conference on Learning Representations , year=

work page
[41]

International Conference on Machine Learning (ICML) , year=

Latent Diffusion Planning for Imitation Learning , author=. International Conference on Machine Learning (ICML) , year=

work page
[42]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Skilldiffuser: Interpretable hierarchical planning via skill abstractions in diffusion-based task execution , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[43]

IJCAI: proceedings of the conference , volume=

Hidden parameter markov decision processes: A semiparametric regression approach for discovering latent task parametrizations , author=. IJCAI: proceedings of the conference , volume=

work page
[44]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Generalized hidden parameter mdps: Transferable model-based rl in a handful of trials , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[45]

Journal of Artificial Intelligence Research , volume=

Efficient solution algorithms for factored MDPs , author=. Journal of Artificial Intelligence Research , volume=

work page
[46]

2002 , publisher=

Optimal Learning: Computational procedures for Bayes-adaptive Markov decision processes , author=. 2002 , publisher=

work page 2002
[47]

, author=

Some Bayesian decision problems in a Markov chain. , author=. 1965 , school=

work page 1965
[48]

The Twelfth International Conference on Learning Representations , year=

Reasoning with Latent Diffusion in Offline Reinforcement Learning , author=. The Twelfth International Conference on Learning Representations , year=

work page
[49]

Proceedings of the 38th International Conference on Machine Learning , pages =

Deep Reinforcement Learning amidst Continual Structured Non-Stationarity , author =. Proceedings of the 38th International Conference on Machine Learning , pages =. 2021 , editor =

work page 2021
[50]

The Twelfth International Conference on Learning Representations , year=

Efficient Planning with Latent Diffusion , author=. The Twelfth International Conference on Learning Representations , year=

work page
[51]

Classifier-Free Diffusion Guidance

Classifier-free diffusion guidance , author=. arXiv preprint arXiv:2207.12598 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[52]

1998 , publisher=

Reinforcement learning: An introduction , author=. 1998 , publisher=

work page 1998
[53]

2024 IEEE International Conference on Robotics and Automation (ICRA) , pages=

Decomposing the generalization gap in imitation learning for visual robotic manipulation , author=. 2024 IEEE International Conference on Robotics and Automation (ICRA) , pages=. 2024 , organization=

work page 2024
[54]

Advances in Neural Information Processing Systems , volume=

Scaling proprioceptive-visual learning with heterogeneous pre-trained transformers , author=. Advances in Neural Information Processing Systems , volume=

work page
[56]

Artificial intelligence , volume=

Planning and acting in partially observable stochastic domains , author=. Artificial intelligence , volume=. 1998 , publisher=

work page 1998
[57]

Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets

Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets , author=. arXiv preprint arXiv:2504.02792 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[58]

Unified Video Action Model

Unified Video Action Model , author=. arXiv preprint arXiv:2503.00200 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[59]

IEEE Transactions on Robotics , volume=

Partially observable markov decision processes in robotics: A survey , author=. IEEE Transactions on Robotics , volume=. 2022 , publisher=

work page 2022
[60]

Artificial intelligence in medicine , volume=

Planning treatment of ischemic heart disease with partially observable Markov decision processes , author=. Artificial intelligence in medicine , volume=. 2000 , publisher=

work page 2000
[61]

NPJ Digital Medicine , volume=

Making machine learning matter to clinicians: model actionability in medical decision-making , author=. NPJ Digital Medicine , volume=. 2023 , publisher=

work page 2023
[62]

IEEE Robotics and Automation Letters , year=

Learning online belief prediction for efficient pomdp planning in autonomous driving , author=. IEEE Robotics and Automation Letters , year=

work page
[63]

Advances in Neural Information Processing Systems , volume=

Sequence model imitation learning with unobserved contexts , author=. Advances in Neural Information Processing Systems , volume=

work page
[64]

Advances in neural information processing systems , volume=

Data quality in imitation learning , author=. Advances in neural information processing systems , volume=

work page
[65]

arXiv preprint arXiv:2408.14037 , year=

Re-mix: Optimizing data mixtures for large scale imitation learning , author=. arXiv preprint arXiv:2408.14037 , year=

work page arXiv
[68]

Journal of artificial intelligence research , volume=

Value-function approximations for partially observable Markov decision processes , author=. Journal of artificial intelligence research , volume=

work page
[70]

International conference on machine learning , pages=

Deep variational reinforcement learning for POMDPs , author=. International conference on machine learning , pages=. 2018 , organization=

work page 2018
[72]

Learning Interactive Real-World Simulators

Learning Interactive Real-World Simulators , author=. arXiv preprint arXiv:2310.06114 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[73]

Advances in Neural Information Processing Systems , volume=

Predictive-state decoders: Encoding the future into recurrent networks , author=. Advances in Neural Information Processing Systems , volume=

work page
[74]

Advances in Neural Information Processing Systems , volume=

Parallel sampling of diffusion models , author=. Advances in Neural Information Processing Systems , volume=

work page
[75]

$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control

\_0 : A Vision-Language-Action Flow Model for General Robot Control , author=. arXiv preprint arXiv:2410.24164 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[76]

, title =

Dunford, Nelson and Schwartz, Jacob T. , title =. 1971 , publisher =

work page 1971
[77]

$\pi_{0.5}$: a Vision-Language-Action Model with Open-World Generalization

\_\ 0.5\ : a Vision-Language-Action Model with Open-World Generalization , author=. arXiv preprint arXiv:2504.16054 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[78]

International Conference on Machine Learning , pages=

Multi-task reinforcement learning with context-based representations , author=. International Conference on Machine Learning , pages=. 2021 , organization=

work page 2021
[79]

arXiv preprint arXiv:2402.15957 , year=

DynaMITE-RL: A Dynamic Model for Improved Temporal Meta-Reinforcement Learning , author=. arXiv preprint arXiv:2402.15957 , year=

work page arXiv
[80]

POMDIFFUSER: LONG-MEMORY MEETS LONG-PLANNING FOR POMDPS , author=

work page
[81]

Journal of Machine Learning Research , volume=

Varibad: Variational bayes-adaptive deep rl via meta-learning , author=. Journal of Machine Learning Research , volume=

work page
[83]

Neural computation , volume=

Efficient training of artificial neural networks for autonomous navigation , author=. Neural computation , volume=. 1991 , publisher=

work page 1991
[84]

arXiv preprint arXiv:2311.01223 , year=

Diffusion models for reinforcement learning: A survey , author=. arXiv preprint arXiv:2311.01223 , year=

work page arXiv
[85]

Advances in Neural Information Processing Systems , volume=

Rl for latent mdps: Regret guarantees and a lower bound , author=. Advances in Neural Information Processing Systems , volume=

work page
[86]

Uncertainty in Artificial Intelligence , pages=

Probabilistic task modelling for meta-learning , author=. Uncertainty in Artificial Intelligence , pages=. 2021 , organization=

work page 2021
[87]

Advances in neural information processing systems , volume=

Decision transformer: Reinforcement learning via sequence modeling , author=. Advances in neural information processing systems , volume=

work page
[88]

international conference on machine learning , pages=

Online decision transformer , author=. international conference on machine learning , pages=. 2022 , organization=

work page 2022
[90]

Planning with Diffusion for Flexible Behavior Synthesis

Planning with diffusion for flexible behavior synthesis , author=. arXiv preprint arXiv:2205.09991 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[91]

ACM Computing Surveys , volume=

Reinforcement learning based recommender systems: A survey , author=. ACM Computing Surveys , volume=. 2022 , publisher=

work page 2022

Showing first 80 references.

[1] [1]

nature , volume=

Human-level control through deep reinforcement learning , author=. nature , volume=. 2015 , publisher=

work page 2015

[2] [2]

arXiv preprint arXiv:2310.10625 , year=

Video Language Planning , author=. arXiv preprint arXiv:2310.10625 , year=

work page arXiv

[3] [3]

Conference on robot learning , pages=

Accelerating reinforcement learning with learned skill priors , author=. Conference on robot learning , pages=. 2021 , organization=

work page 2021

[4] [4]

Advances in Neural Information Processing Systems , volume=

Goal-conditioned predictive coding for offline reinforcement learning , author=. Advances in Neural Information Processing Systems , volume=

work page

[5] [5]

Learning for Dynamics and Control , pages=

Planning from Images with Deep Latent Gaussian Process Dynamics , author=. Learning for Dynamics and Control , pages=. 2020 , organization=

work page 2020

[6] [6]

Advances in neural information processing systems , volume=

Learning universal policies via text-guided video generation , author=. Advances in neural information processing systems , volume=

work page

[7] [7]

Proceedings of the 27th IEEE Conference on Decision and Control , pages=

Receding horizon control of nonlinear systems , author=. Proceedings of the 27th IEEE Conference on Decision and Control , pages=. 1988 , organization=

work page 1988

[8] [8]

International conference on machine learning , pages=

Efficient off-policy meta-reinforcement learning via probabilistic context variables , author=. International conference on machine learning , pages=. 2019 , organization=

work page 2019

[9] [9]

Conference on Robot Learning (CoRL) , year=

What Matters in Learning from Offline Human Demonstrations for Robot Manipulation , author=. Conference on Robot Learning (CoRL) , year=

work page

[10] [10]

Journal of mathematical analysis and applications , volume=

Optimal control of Markov processes with incomplete state information I , author=. Journal of mathematical analysis and applications , volume=. 1965 , publisher=

work page 1965

[11] [11]

Octo: An Open-Source Generalist Robot Policy

Octo: An open-source generalist robot policy , author=. arXiv preprint arXiv:2405.12213 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[12] [12]

Advances in neural information processing systems , volume=

Denoising diffusion probabilistic models , author=. Advances in neural information processing systems , volume=

work page

[13] [13]

Advances in neural information processing systems , volume=

Diffusion models beat gans on image synthesis , author=. Advances in neural information processing systems , volume=

work page

[14] [14]

International Conference on Machine Learning , pages=

Metadiffuser: Diffusion model as conditional planner for offline meta-rl , author=. International Conference on Machine Learning , pages=. 2023 , organization=

work page 2023

[15] [15]

arXiv preprint arXiv:2302.01877 , year=

Adaptdiffuser: Diffusion models as adaptive self-evolving planners , author=. arXiv preprint arXiv:2302.01877 , year=

work page arXiv

[16] [16]

arXiv preprint arXiv:2107.02729 , year=

Adarl: What, where, and how to adapt in transfer reinforcement learning , author=. arXiv preprint arXiv:2107.02729 , year=

work page arXiv

[17] [17]

Advances in Neural Information Processing Systems , volume=

Learning dynamic attribute-factored world models for efficient multi-object reinforcement learning , author=. Advances in Neural Information Processing Systems , volume=

work page

[18] [18]

arXiv preprint arXiv:2505.08361 , year=

Modeling unseen environments with language-guided composable causal components in reinforcement learning , author=. arXiv preprint arXiv:2505.08361 , year=

work page arXiv

[19] [19]

The Thirteenth International Conference on Learning Representations , year=

SafeDiffuser: Safe Planning with Diffusion Probabilistic Models , author=. The Thirteenth International Conference on Learning Representations , year=

work page

[20] [20]

Advances in Neural Information Processing Systems , volume=

Madiff: Offline multi-agent learning with diffusion models , author=. Advances in Neural Information Processing Systems , volume=

work page

[21] [21]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Motiondiffuser: Controllable multi-agent motion prediction using diffusion , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page

[22] [22]

The Twelfth International Conference on Learning Representations , year=

AlignDiff: Aligning Diverse Human Preferences via Behavior-Customisable Diffusion Model , author=. The Twelfth International Conference on Learning Representations , year=

work page

[23] [23]

Advances in neural information processing systems , volume=

Diffusion model is an effective planner and data synthesizer for multi-task reinforcement learning , author=. Advances in neural information processing systems , volume=

work page

[24] [24]

Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning

Diffusion policies as an expressive policy class for offline reinforcement learning , author=. arXiv preprint arXiv:2208.06193 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[25] [26]

Conference on robot learning , pages=

Xskill: Cross embodiment skill discovery , author=. Conference on robot learning , pages=. 2023 , organization=

work page 2023

[26] [27]

International Conference on Machine Learning , pages=

Contrastive energy prediction for exact energy-guided diffusion sampling in offline reinforcement learning , author=. International Conference on Machine Learning , pages=. 2023 , organization=

work page 2023

[27] [28]

Advances in Neural Information Processing Systems , volume=

Latent plan transformer for trajectory abstraction: Planning as latent space inference , author=. Advances in Neural Information Processing Systems , volume=

work page

[28] [30]

Open-Sora: Democratizing Efficient Video Production for All

Open-sora: Democratizing efficient video production for all , author=. arXiv preprint arXiv:2412.20404 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[29] [31]

Advances in Neural Information Processing Systems , volume=

Video diffusion models , author=. Advances in Neural Information Processing Systems , volume=

work page

[30] [32]

Advances in Neural Information Processing Systems , volume=

Diffusion forcing: Next-token prediction meets full-sequence diffusion , author=. Advances in Neural Information Processing Systems , volume=

work page

[31] [33]

Advances in Neural Information Processing Systems , volume=

Ar-diffusion: Auto-regressive diffusion model for text generation , author=. Advances in Neural Information Processing Systems , volume=

work page

[32] [34]

2025 , url=

MAGI-1: Autoregressive Video Generation at Scale , author=. 2025 , url=

work page 2025

[33] [35]

arXiv preprint arXiv:2410.08151 , year=

Progressive autoregressive video diffusion models , author=. arXiv preprint arXiv:2410.08151 , year=

work page arXiv

[34] [38]

arXiv preprint arXiv:2402.03570 , year=

Diffusion world model: Future modeling beyond step-by-step rollout for offline reinforcement learning , author=. arXiv preprint arXiv:2402.03570 , year=

work page arXiv

[35] [39]

2023 , eprint=

IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies , author=. 2023 , eprint=

work page 2023

[36] [40]

The Thirteenth International Conference on Learning Representations , year=

Diffusion Policy Policy Optimization , author=. The Thirteenth International Conference on Learning Representations , year=

work page

[37] [41]

International Conference on Machine Learning (ICML) , year=

Latent Diffusion Planning for Imitation Learning , author=. International Conference on Machine Learning (ICML) , year=

work page

[38] [42]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Skilldiffuser: Interpretable hierarchical planning via skill abstractions in diffusion-based task execution , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page

[39] [43]

IJCAI: proceedings of the conference , volume=

Hidden parameter markov decision processes: A semiparametric regression approach for discovering latent task parametrizations , author=. IJCAI: proceedings of the conference , volume=

work page

[40] [44]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Generalized hidden parameter mdps: Transferable model-based rl in a handful of trials , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page

[41] [45]

Journal of Artificial Intelligence Research , volume=

Efficient solution algorithms for factored MDPs , author=. Journal of Artificial Intelligence Research , volume=

work page

[42] [46]

2002 , publisher=

Optimal Learning: Computational procedures for Bayes-adaptive Markov decision processes , author=. 2002 , publisher=

work page 2002

[43] [47]

, author=

Some Bayesian decision problems in a Markov chain. , author=. 1965 , school=

work page 1965

[44] [48]

The Twelfth International Conference on Learning Representations , year=

Reasoning with Latent Diffusion in Offline Reinforcement Learning , author=. The Twelfth International Conference on Learning Representations , year=

work page

[45] [49]

Proceedings of the 38th International Conference on Machine Learning , pages =

Deep Reinforcement Learning amidst Continual Structured Non-Stationarity , author =. Proceedings of the 38th International Conference on Machine Learning , pages =. 2021 , editor =

work page 2021

[46] [50]

The Twelfth International Conference on Learning Representations , year=

Efficient Planning with Latent Diffusion , author=. The Twelfth International Conference on Learning Representations , year=

work page

[47] [51]

Classifier-Free Diffusion Guidance

Classifier-free diffusion guidance , author=. arXiv preprint arXiv:2207.12598 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[48] [52]

1998 , publisher=

Reinforcement learning: An introduction , author=. 1998 , publisher=

work page 1998

[49] [53]

2024 IEEE International Conference on Robotics and Automation (ICRA) , pages=

Decomposing the generalization gap in imitation learning for visual robotic manipulation , author=. 2024 IEEE International Conference on Robotics and Automation (ICRA) , pages=. 2024 , organization=

work page 2024

[50] [54]

Advances in Neural Information Processing Systems , volume=

Scaling proprioceptive-visual learning with heterogeneous pre-trained transformers , author=. Advances in Neural Information Processing Systems , volume=

work page

[51] [56]

Artificial intelligence , volume=

Planning and acting in partially observable stochastic domains , author=. Artificial intelligence , volume=. 1998 , publisher=

work page 1998

[52] [57]

Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets

Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets , author=. arXiv preprint arXiv:2504.02792 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[53] [58]

Unified Video Action Model

Unified Video Action Model , author=. arXiv preprint arXiv:2503.00200 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[54] [59]

IEEE Transactions on Robotics , volume=

Partially observable markov decision processes in robotics: A survey , author=. IEEE Transactions on Robotics , volume=. 2022 , publisher=

work page 2022

[55] [60]

Artificial intelligence in medicine , volume=

Planning treatment of ischemic heart disease with partially observable Markov decision processes , author=. Artificial intelligence in medicine , volume=. 2000 , publisher=

work page 2000

[56] [61]

NPJ Digital Medicine , volume=

Making machine learning matter to clinicians: model actionability in medical decision-making , author=. NPJ Digital Medicine , volume=. 2023 , publisher=

work page 2023

[57] [62]

IEEE Robotics and Automation Letters , year=

Learning online belief prediction for efficient pomdp planning in autonomous driving , author=. IEEE Robotics and Automation Letters , year=

work page

[58] [63]

Advances in Neural Information Processing Systems , volume=

Sequence model imitation learning with unobserved contexts , author=. Advances in Neural Information Processing Systems , volume=

work page

[59] [64]

Advances in neural information processing systems , volume=

Data quality in imitation learning , author=. Advances in neural information processing systems , volume=

work page

[60] [65]

arXiv preprint arXiv:2408.14037 , year=

Re-mix: Optimizing data mixtures for large scale imitation learning , author=. arXiv preprint arXiv:2408.14037 , year=

work page arXiv

[61] [68]

Journal of artificial intelligence research , volume=

Value-function approximations for partially observable Markov decision processes , author=. Journal of artificial intelligence research , volume=

work page

[62] [70]

International conference on machine learning , pages=

Deep variational reinforcement learning for POMDPs , author=. International conference on machine learning , pages=. 2018 , organization=

work page 2018

[63] [72]

Learning Interactive Real-World Simulators

Learning Interactive Real-World Simulators , author=. arXiv preprint arXiv:2310.06114 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[64] [73]

Advances in Neural Information Processing Systems , volume=

Predictive-state decoders: Encoding the future into recurrent networks , author=. Advances in Neural Information Processing Systems , volume=

work page

[65] [74]

Advances in Neural Information Processing Systems , volume=

Parallel sampling of diffusion models , author=. Advances in Neural Information Processing Systems , volume=

work page

[66] [75]

$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control

\_0 : A Vision-Language-Action Flow Model for General Robot Control , author=. arXiv preprint arXiv:2410.24164 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[67] [76]

, title =

Dunford, Nelson and Schwartz, Jacob T. , title =. 1971 , publisher =

work page 1971

[68] [77]

$\pi_{0.5}$: a Vision-Language-Action Model with Open-World Generalization

\_\ 0.5\ : a Vision-Language-Action Model with Open-World Generalization , author=. arXiv preprint arXiv:2504.16054 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[69] [78]

International Conference on Machine Learning , pages=

Multi-task reinforcement learning with context-based representations , author=. International Conference on Machine Learning , pages=. 2021 , organization=

work page 2021

[70] [79]

arXiv preprint arXiv:2402.15957 , year=

DynaMITE-RL: A Dynamic Model for Improved Temporal Meta-Reinforcement Learning , author=. arXiv preprint arXiv:2402.15957 , year=

work page arXiv

[71] [80]

POMDIFFUSER: LONG-MEMORY MEETS LONG-PLANNING FOR POMDPS , author=

work page

[72] [81]

Journal of Machine Learning Research , volume=

Varibad: Variational bayes-adaptive deep rl via meta-learning , author=. Journal of Machine Learning Research , volume=

work page

[73] [83]

Neural computation , volume=

Efficient training of artificial neural networks for autonomous navigation , author=. Neural computation , volume=. 1991 , publisher=

work page 1991

[74] [84]

arXiv preprint arXiv:2311.01223 , year=

Diffusion models for reinforcement learning: A survey , author=. arXiv preprint arXiv:2311.01223 , year=

work page arXiv

[75] [85]

Advances in Neural Information Processing Systems , volume=

Rl for latent mdps: Regret guarantees and a lower bound , author=. Advances in Neural Information Processing Systems , volume=

work page

[76] [86]

Uncertainty in Artificial Intelligence , pages=

Probabilistic task modelling for meta-learning , author=. Uncertainty in Artificial Intelligence , pages=. 2021 , organization=

work page 2021

[77] [87]

Advances in neural information processing systems , volume=

Decision transformer: Reinforcement learning via sequence modeling , author=. Advances in neural information processing systems , volume=

work page

[78] [88]

international conference on machine learning , pages=

Online decision transformer , author=. international conference on machine learning , pages=. 2022 , organization=

work page 2022

[79] [90]

Planning with Diffusion for Flexible Behavior Synthesis

Planning with diffusion for flexible behavior synthesis , author=. arXiv preprint arXiv:2205.09991 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[80] [91]

ACM Computing Surveys , volume=

Reinforcement learning based recommender systems: A survey , author=. ACM Computing Surveys , volume=. 2022 , publisher=

work page 2022