pith. sign in

arxiv: 2605.16054 · v1 · pith:ELEA6PW2new · submitted 2026-05-15 · 💻 cs.LG · cs.AI

Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making

Pith reviewed 2026-05-20 20:49 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords diffusion modelslatent dynamicsdecision makingplanning and controladaptive policiescausal modelssequence modelinggenerative decision-making
0
0 comments X

The pith

A diffusion model for decision-making identifies hidden latent dynamics from short observation sequences to adapt planning and control.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to establish that decision-making framed as sequence generation benefits from explicitly recovering and using evolving latent factors that drive environment transitions, rewards, and agent behavior. It proves that these latent processes can be identified from small temporal blocks of observations under mild conditions. From this, the authors build Ada-Diffuser, a causal diffusion model that jointly captures observed interaction patterns and the underlying latent dynamics, then deploys them for both planning and policy learning. A sympathetic reader would care because many control problems involve unobserved variables that change over time, and handling them directly could produce agents that adjust more reliably to shifts in dynamics or rewards.

Core claim

We theoretically show that under mild conditions, the latent process can be identified from small temporal blocks of observations. Building on this insight, we introduce Ada-Diffuser, a causal diffusion model that learns the temporal structure of observed interactions and the underlying latent dynamics simultaneously, and furthermore, leverages them for planning and control. With a modular design, Ada-Diffuser supports both planning and policy learning tasks, enabling adaptation to latent variations in dynamics, rewards, and latent actions.

What carries the argument

Ada-Diffuser, the causal diffusion model that jointly learns observed temporal structures and latent dynamics from minimal observations for adaptive planning and control.

If this is right

  • Ada-Diffuser can perform both planning and policy learning within the same modular framework.
  • The approach enables adaptation to changes in latent dynamics, rewards, and latent actions during execution.
  • Accurate latent inference supports more robust decision-making on simulated control and robotic tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the identification result holds, similar latent-recovery techniques could extend to partially observable real-world robotics where only brief trajectory snippets are available.
  • The causal diffusion structure may offer a route to handling non-stationary environments by treating latent shifts as identifiable changes rather than noise.
  • Combining this with other sequence models could test whether the small-block identification property transfers beyond diffusion.

Load-bearing premise

The latent process can be identified from small temporal blocks of observations under mild conditions.

What would settle it

A controlled benchmark where ground-truth latent factors are known but Ada-Diffuser's inferred latents do not improve planning success rate or sample efficiency over a standard diffusion baseline without latent modeling.

Figures

Figures reproduced from arXiv: 2605.16054 by Biwei Huang, Fan Feng, Kun Zhang, Minghao Fu, Selena Ge, Yingyao Hu, Yujia Zheng, Zeyu Tang, Zijian Li.

Figure 2
Figure 2. Figure 2: Overview of the Ada-Diffuser framework. The modular design consists of two main stages: latent context identification (Stage 1, Section 4.2), followed by a causal diffusion model (Stage 2, Section 4.3) that models the generative structure of the trajectories. The learned model is then used for planning or policy learning conditioned on the inferred latent context. Theorem 1 (Identifiability on Latent Facto… view at source ↗
Figure 3
Figure 3. Figure 3: zig-zag sampling (2 steps). Training: Given a noisy input x kt t with noise level kt, we first sample an initial latent context from the prior: cˆ prior t ∼ pϕ(ct | ct−1), and use it to denoise the observation: xˆ (0) t = ϵθ(x kt t , kt, cˆ prior t ). Then we infer the latent using the posterior network, conditioned on a broader temporal window including future observations (accessible in offline data): cˆ… view at source ↗
Figure 4
Figure 4. Figure 4: (a). Identification Results (i.e., Linear Probing MSE, [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Results on environments without explicitly designed latent factors. Complete results are [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Verification of the assumptions. (a) Transition separability in Cheetah under the hy￾perparameter setting (m, n) = (5, 0.5). (b) Transition separability under a weak–context setting (m, n) = (0.2, 0.2), where the context barely affects the dynamics. (c) Average reward drop when planning with vs. without conditioning on c, plotted against the transition separability. (d) k–distributions for (5, 0.5), (e) k–… view at source ↗
Figure 7
Figure 7. Figure 7: An illustration of the zig-zag sampling process with a block of 4 time steps. [PITH_FULL_IMAGE:figures/full_fig_p034_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Illustrations of the Benchmarks. From left to right: Half-Cheetah, Ant, Walker, Franka￾Kitchen, Maze2D, and LIBERO [PITH_FULL_IMAGE:figures/full_fig_p041_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Illustrations of RoboMimic Benchmark. function that challenges the policy to adapt to shifting goals. Specifically, we consider dt = σ(5 · sin(2πt/200)), where σ(·) denotes the sigmoid function, α controls the sharpness of the transition, and T determines the switching period. This formulation induces a smooth periodic change in the preferred direction of movement, requiring the policy to adapt to graduall… view at source ↗
Figure 10
Figure 10. Figure 10: Identification results (MSE of linear probing and R2 ) versus the length of temporal blocks. Left: Cheetah with time-varying wind; Right: Cheetah with time-varying rewards. Clustering We assess whether the learned latent space organizes states by the underlying context on the Cheetah wind-change task, where the ground-truth latent evolves as fw(t) = 5 + 5 sin(0.5t). We sample 1000 time steps, discretize f… view at source ↗
Figure 11
Figure 11. Figure 11: Clustering (t-SNE)results on Cheetah wind-change. [PITH_FULL_IMAGE:figures/full_fig_p046_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Results with different planning and execution horizons. We evaluate on Kitchen-partial and Libero-Long experiments. we do not explicitly impose latent variables, our model implicitly learns representations that can track stochasticity and support smooth control. 47 [PITH_FULL_IMAGE:figures/full_fig_p047_12.png] view at source ↗
read the original abstract

Recent work has framed decision-making as a sequence modeling problem using generative models such as diffusion models. Although promising, these approaches often overlook latent factors that exhibit evolving dynamics, elements that are fundamental to environment transitions, reward structures, and high-level agent behavior. Explicitly modeling these hidden processes is essential for both precise dynamics modeling and effective decision-making. In this paper, we propose a unified framework that explicitly incorporates latent dynamic inference into generative decision-making from minimal yet sufficient observations. We theoretically show that under mild conditions, the latent process can be identified from small temporal blocks of observations. Building on this insight, we introduce Ada-Diffuser, a causal diffusion model that learns the temporal structure of observed interactions and the underlying latent dynamics simultaneously, and furthermore, leverages them for planning and control. With a modular design, Ada-Diffuser supports both planning and policy learning tasks, enabling adaptation to latent variations in dynamics, rewards, and latent actions. Experiments on simulated control and robotic benchmarks demonstrate its effectiveness in accurate latent inference and adaptive policy learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces Ada-Diffuser, a causal diffusion model for decision-making that integrates latent dynamic inference. It claims to theoretically demonstrate that under mild conditions, the latent process can be identified from small temporal blocks of observations, enabling simultaneous learning of temporal structures and latent dynamics for planning and control. Experiments on simulated control and robotic benchmarks demonstrate effectiveness in latent inference and adaptive policy learning.

Significance. If the identifiability result holds and extends to the closed-loop setting, the work could advance generative sequence models for decision-making by explicitly handling evolving latent factors that affect dynamics and rewards. The modular design supporting both planning and policy learning tasks is a concrete strength that allows adaptation to latent variations.

major comments (2)
  1. [Abstract] Abstract (theoretical result paragraph): the claim that the latent process can be identified from small temporal blocks under mild conditions is load-bearing for the justification of Ada-Diffuser, yet the manuscript supplies neither the specific conditions, derivation steps, nor proof sketch. Without these, it is impossible to verify whether the identification is independent of the downstream diffusion component or survives the distribution shift induced by policy-dependent actions.
  2. [Planning and control sections] Planning and control sections (e.g., the description of causal diffusion for adaptive decision-making): the identifiability argument is developed for passive observation sequences, but once the model is used for planning, actions are selected from inferred latents and create a feedback loop. This closed-loop effect can violate the stationarity or independence assumptions typical of block-identifiability results; the paper does not supply additional assumptions on the policy or value function that would restore validity.
minor comments (2)
  1. [Introduction] Notation for temporal blocks and latent variables should be introduced with explicit definitions and an early figure or diagram to improve readability.
  2. [Related Work] Related-work discussion would benefit from explicit comparison to prior latent-variable diffusion models in RL and to identifiability results for dynamical systems.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. These highlight key points about the presentation of the theoretical identifiability result and its extension from passive observations to closed-loop planning and control. We respond to each major comment below and will make revisions to improve clarity and rigor without altering the core contributions.

read point-by-point responses
  1. Referee: [Abstract] Abstract (theoretical result paragraph): the claim that the latent process can be identified from small temporal blocks under mild conditions is load-bearing for the justification of Ada-Diffuser, yet the manuscript supplies neither the specific conditions, derivation steps, nor proof sketch. Without these, it is impossible to verify whether the identification is independent of the downstream diffusion component or survives the distribution shift induced by policy-dependent actions.

    Authors: We agree that the abstract states the identifiability claim at a high level without enumerating the precise conditions or including a proof sketch. In the revision we will expand the abstract to reference the specific mild conditions (sufficiently informative temporal blocks, latent observability, and block-wise independence) and add a concise proof outline to the main text. The result is derived solely from the causal structure of the observed sequences and holds independently of the subsequent diffusion model used for generation. The identification is performed on offline observational data; any distribution shift arising from policy-dependent actions during deployment is handled by the online latent inference module rather than by re-deriving the theorem under closed-loop data. revision: yes

  2. Referee: [Planning and control sections] Planning and control sections (e.g., the description of causal diffusion for adaptive decision-making): the identifiability argument is developed for passive observation sequences, but once the model is used for planning, actions are selected from inferred latents and create a feedback loop. This closed-loop effect can violate the stationarity or independence assumptions typical of block-identifiability results; the paper does not supply additional assumptions on the policy or value function that would restore validity.

    Authors: We acknowledge that the core identifiability theorem is stated for passive observation sequences. In the planning and control sections the model performs causal latent inference on recent observation blocks and then generates actions conditioned on the current latent estimate, which indeed introduces a feedback loop. The manuscript currently relies on empirical results across benchmarks to demonstrate that adaptation remains effective. In the revision we will add an explicit discussion paragraph that states the additional working assumptions needed for the closed-loop case (e.g., that the policy is Lipschitz in the latent estimate and that latent dynamics vary slowly relative to the planning horizon) and note the resulting bounded deviation from the original stationarity conditions. revision: yes

Circularity Check

0 steps flagged

No significant circularity; theoretical identifiability presented as independent of model fitting

full rationale

The paper's central derivation begins with a theoretical claim that the latent process is identifiable from small temporal blocks under mild conditions, followed by introduction of the Ada-Diffuser model that builds on this insight for simultaneous learning of temporal structure and latent dynamics. No equations or derivations are provided in the abstract or described sections that define the latent process in terms of the diffusion outputs or that rename fitted parameters as predictions. The identification result is positioned as a prior, independent contribution rather than a self-referential fit or self-citation load-bearing step. The derivation chain therefore remains self-contained against external benchmarks, with the model leveraging the claimed identifiability without reducing to it by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the assumption that latent processes are identifiable from small temporal observation blocks under mild conditions; no free parameters or invented entities are described in the abstract.

axioms (1)
  • domain assumption Under mild conditions, the latent process can be identified from small temporal blocks of observations.
    This is the stated basis for the theoretical result that enables the rest of the framework.

pith-pipeline@v0.9.0 · 5732 in / 1141 out tokens · 39330 ms · 2026-05-20T20:49:51.360581+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    We theoretically show that under mild conditions, the latent process can be identified from small temporal blocks of observations... Theorem 1 (Identifiability on Latent Factors). Under Assumptions 1-3, the posterior distribution of latent factor with consecutive observations p(ct | x t−2:t+1) can be identifiable up to an invertible transformation

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    Ada-Diffuser, a causal diffusion model that learns the temporal structure of observed interactions and the underlying latent dynamics simultaneously

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

283 extracted references · 283 canonical work pages · 30 internal anchors

  1. [1]

    nature , volume=

    Human-level control through deep reinforcement learning , author=. nature , volume=. 2015 , publisher=

  2. [2]

    arXiv preprint arXiv:2310.10625 , year=

    Video Language Planning , author=. arXiv preprint arXiv:2310.10625 , year=

  3. [3]

    Conference on robot learning , pages=

    Accelerating reinforcement learning with learned skill priors , author=. Conference on robot learning , pages=. 2021 , organization=

  4. [4]

    Advances in Neural Information Processing Systems , volume=

    Goal-conditioned predictive coding for offline reinforcement learning , author=. Advances in Neural Information Processing Systems , volume=

  5. [5]

    Learning for Dynamics and Control , pages=

    Planning from Images with Deep Latent Gaussian Process Dynamics , author=. Learning for Dynamics and Control , pages=. 2020 , organization=

  6. [6]

    Advances in neural information processing systems , volume=

    Learning universal policies via text-guided video generation , author=. Advances in neural information processing systems , volume=

  7. [7]

    Proceedings of the 27th IEEE Conference on Decision and Control , pages=

    Receding horizon control of nonlinear systems , author=. Proceedings of the 27th IEEE Conference on Decision and Control , pages=. 1988 , organization=

  8. [8]

    International conference on machine learning , pages=

    Efficient off-policy meta-reinforcement learning via probabilistic context variables , author=. International conference on machine learning , pages=. 2019 , organization=

  9. [9]

    Conference on Robot Learning (CoRL) , year=

    What Matters in Learning from Offline Human Demonstrations for Robot Manipulation , author=. Conference on Robot Learning (CoRL) , year=

  10. [10]

    Journal of mathematical analysis and applications , volume=

    Optimal control of Markov processes with incomplete state information I , author=. Journal of mathematical analysis and applications , volume=. 1965 , publisher=

  11. [11]

    Octo: An Open-Source Generalist Robot Policy

    Octo: An open-source generalist robot policy , author=. arXiv preprint arXiv:2405.12213 , year=

  12. [12]

    Advances in neural information processing systems , volume=

    Denoising diffusion probabilistic models , author=. Advances in neural information processing systems , volume=

  13. [13]

    Advances in neural information processing systems , volume=

    Diffusion models beat gans on image synthesis , author=. Advances in neural information processing systems , volume=

  14. [14]

    International Conference on Machine Learning , pages=

    Metadiffuser: Diffusion model as conditional planner for offline meta-rl , author=. International Conference on Machine Learning , pages=. 2023 , organization=

  15. [15]

    arXiv preprint arXiv:2302.01877 , year=

    Adaptdiffuser: Diffusion models as adaptive self-evolving planners , author=. arXiv preprint arXiv:2302.01877 , year=

  16. [16]

    arXiv preprint arXiv:2107.02729 , year=

    Adarl: What, where, and how to adapt in transfer reinforcement learning , author=. arXiv preprint arXiv:2107.02729 , year=

  17. [17]

    Advances in Neural Information Processing Systems , volume=

    Learning dynamic attribute-factored world models for efficient multi-object reinforcement learning , author=. Advances in Neural Information Processing Systems , volume=

  18. [18]

    arXiv preprint arXiv:2505.08361 , year=

    Modeling unseen environments with language-guided composable causal components in reinforcement learning , author=. arXiv preprint arXiv:2505.08361 , year=

  19. [19]

    The Thirteenth International Conference on Learning Representations , year=

    SafeDiffuser: Safe Planning with Diffusion Probabilistic Models , author=. The Thirteenth International Conference on Learning Representations , year=

  20. [20]

    Advances in Neural Information Processing Systems , volume=

    Madiff: Offline multi-agent learning with diffusion models , author=. Advances in Neural Information Processing Systems , volume=

  21. [21]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Motiondiffuser: Controllable multi-agent motion prediction using diffusion , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  22. [22]

    The Twelfth International Conference on Learning Representations , year=

    AlignDiff: Aligning Diverse Human Preferences via Behavior-Customisable Diffusion Model , author=. The Twelfth International Conference on Learning Representations , year=

  23. [23]

    Advances in neural information processing systems , volume=

    Diffusion model is an effective planner and data synthesizer for multi-task reinforcement learning , author=. Advances in neural information processing systems , volume=

  24. [24]

    Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning

    Diffusion policies as an expressive policy class for offline reinforcement learning , author=. arXiv preprint arXiv:2208.06193 , year=

  25. [26]

    Conference on robot learning , pages=

    Xskill: Cross embodiment skill discovery , author=. Conference on robot learning , pages=. 2023 , organization=

  26. [27]

    International Conference on Machine Learning , pages=

    Contrastive energy prediction for exact energy-guided diffusion sampling in offline reinforcement learning , author=. International Conference on Machine Learning , pages=. 2023 , organization=

  27. [28]

    Advances in Neural Information Processing Systems , volume=

    Latent plan transformer for trajectory abstraction: Planning as latent space inference , author=. Advances in Neural Information Processing Systems , volume=

  28. [30]

    Open-Sora: Democratizing Efficient Video Production for All

    Open-sora: Democratizing efficient video production for all , author=. arXiv preprint arXiv:2412.20404 , year=

  29. [31]

    Advances in Neural Information Processing Systems , volume=

    Video diffusion models , author=. Advances in Neural Information Processing Systems , volume=

  30. [32]

    Advances in Neural Information Processing Systems , volume=

    Diffusion forcing: Next-token prediction meets full-sequence diffusion , author=. Advances in Neural Information Processing Systems , volume=

  31. [33]

    Advances in Neural Information Processing Systems , volume=

    Ar-diffusion: Auto-regressive diffusion model for text generation , author=. Advances in Neural Information Processing Systems , volume=

  32. [34]

    2025 , url=

    MAGI-1: Autoregressive Video Generation at Scale , author=. 2025 , url=

  33. [35]

    arXiv preprint arXiv:2410.08151 , year=

    Progressive autoregressive video diffusion models , author=. arXiv preprint arXiv:2410.08151 , year=

  34. [38]

    arXiv preprint arXiv:2402.03570 , year=

    Diffusion world model: Future modeling beyond step-by-step rollout for offline reinforcement learning , author=. arXiv preprint arXiv:2402.03570 , year=

  35. [39]

    2023 , eprint=

    IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies , author=. 2023 , eprint=

  36. [40]

    The Thirteenth International Conference on Learning Representations , year=

    Diffusion Policy Policy Optimization , author=. The Thirteenth International Conference on Learning Representations , year=

  37. [41]

    International Conference on Machine Learning (ICML) , year=

    Latent Diffusion Planning for Imitation Learning , author=. International Conference on Machine Learning (ICML) , year=

  38. [42]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Skilldiffuser: Interpretable hierarchical planning via skill abstractions in diffusion-based task execution , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  39. [43]

    IJCAI: proceedings of the conference , volume=

    Hidden parameter markov decision processes: A semiparametric regression approach for discovering latent task parametrizations , author=. IJCAI: proceedings of the conference , volume=

  40. [44]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    Generalized hidden parameter mdps: Transferable model-based rl in a handful of trials , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  41. [45]

    Journal of Artificial Intelligence Research , volume=

    Efficient solution algorithms for factored MDPs , author=. Journal of Artificial Intelligence Research , volume=

  42. [46]

    2002 , publisher=

    Optimal Learning: Computational procedures for Bayes-adaptive Markov decision processes , author=. 2002 , publisher=

  43. [47]

    , author=

    Some Bayesian decision problems in a Markov chain. , author=. 1965 , school=

  44. [48]

    The Twelfth International Conference on Learning Representations , year=

    Reasoning with Latent Diffusion in Offline Reinforcement Learning , author=. The Twelfth International Conference on Learning Representations , year=

  45. [49]

    Proceedings of the 38th International Conference on Machine Learning , pages =

    Deep Reinforcement Learning amidst Continual Structured Non-Stationarity , author =. Proceedings of the 38th International Conference on Machine Learning , pages =. 2021 , editor =

  46. [50]

    The Twelfth International Conference on Learning Representations , year=

    Efficient Planning with Latent Diffusion , author=. The Twelfth International Conference on Learning Representations , year=

  47. [51]

    Classifier-Free Diffusion Guidance

    Classifier-free diffusion guidance , author=. arXiv preprint arXiv:2207.12598 , year=

  48. [52]

    1998 , publisher=

    Reinforcement learning: An introduction , author=. 1998 , publisher=

  49. [53]

    2024 IEEE International Conference on Robotics and Automation (ICRA) , pages=

    Decomposing the generalization gap in imitation learning for visual robotic manipulation , author=. 2024 IEEE International Conference on Robotics and Automation (ICRA) , pages=. 2024 , organization=

  50. [54]

    Advances in Neural Information Processing Systems , volume=

    Scaling proprioceptive-visual learning with heterogeneous pre-trained transformers , author=. Advances in Neural Information Processing Systems , volume=

  51. [56]

    Artificial intelligence , volume=

    Planning and acting in partially observable stochastic domains , author=. Artificial intelligence , volume=. 1998 , publisher=

  52. [57]

    Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets

    Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets , author=. arXiv preprint arXiv:2504.02792 , year=

  53. [58]

    Unified Video Action Model

    Unified Video Action Model , author=. arXiv preprint arXiv:2503.00200 , year=

  54. [59]

    IEEE Transactions on Robotics , volume=

    Partially observable markov decision processes in robotics: A survey , author=. IEEE Transactions on Robotics , volume=. 2022 , publisher=

  55. [60]

    Artificial intelligence in medicine , volume=

    Planning treatment of ischemic heart disease with partially observable Markov decision processes , author=. Artificial intelligence in medicine , volume=. 2000 , publisher=

  56. [61]

    NPJ Digital Medicine , volume=

    Making machine learning matter to clinicians: model actionability in medical decision-making , author=. NPJ Digital Medicine , volume=. 2023 , publisher=

  57. [62]

    IEEE Robotics and Automation Letters , year=

    Learning online belief prediction for efficient pomdp planning in autonomous driving , author=. IEEE Robotics and Automation Letters , year=

  58. [63]

    Advances in Neural Information Processing Systems , volume=

    Sequence model imitation learning with unobserved contexts , author=. Advances in Neural Information Processing Systems , volume=

  59. [64]

    Advances in neural information processing systems , volume=

    Data quality in imitation learning , author=. Advances in neural information processing systems , volume=

  60. [65]

    arXiv preprint arXiv:2408.14037 , year=

    Re-mix: Optimizing data mixtures for large scale imitation learning , author=. arXiv preprint arXiv:2408.14037 , year=

  61. [68]

    Journal of artificial intelligence research , volume=

    Value-function approximations for partially observable Markov decision processes , author=. Journal of artificial intelligence research , volume=

  62. [70]

    International conference on machine learning , pages=

    Deep variational reinforcement learning for POMDPs , author=. International conference on machine learning , pages=. 2018 , organization=

  63. [72]

    Learning Interactive Real-World Simulators

    Learning Interactive Real-World Simulators , author=. arXiv preprint arXiv:2310.06114 , year=

  64. [73]

    Advances in Neural Information Processing Systems , volume=

    Predictive-state decoders: Encoding the future into recurrent networks , author=. Advances in Neural Information Processing Systems , volume=

  65. [74]

    Advances in Neural Information Processing Systems , volume=

    Parallel sampling of diffusion models , author=. Advances in Neural Information Processing Systems , volume=

  66. [75]

    $\pi_0$: A Vision-Language-Action Flow Model for General Robot Control

    \_0 : A Vision-Language-Action Flow Model for General Robot Control , author=. arXiv preprint arXiv:2410.24164 , year=

  67. [76]

    , title =

    Dunford, Nelson and Schwartz, Jacob T. , title =. 1971 , publisher =

  68. [77]

    $\pi_{0.5}$: a Vision-Language-Action Model with Open-World Generalization

    \_\ 0.5\ : a Vision-Language-Action Model with Open-World Generalization , author=. arXiv preprint arXiv:2504.16054 , year=

  69. [78]

    International Conference on Machine Learning , pages=

    Multi-task reinforcement learning with context-based representations , author=. International Conference on Machine Learning , pages=. 2021 , organization=

  70. [79]

    arXiv preprint arXiv:2402.15957 , year=

    DynaMITE-RL: A Dynamic Model for Improved Temporal Meta-Reinforcement Learning , author=. arXiv preprint arXiv:2402.15957 , year=

  71. [80]

    POMDIFFUSER: LONG-MEMORY MEETS LONG-PLANNING FOR POMDPS , author=

  72. [81]

    Journal of Machine Learning Research , volume=

    Varibad: Variational bayes-adaptive deep rl via meta-learning , author=. Journal of Machine Learning Research , volume=

  73. [83]

    Neural computation , volume=

    Efficient training of artificial neural networks for autonomous navigation , author=. Neural computation , volume=. 1991 , publisher=

  74. [84]

    arXiv preprint arXiv:2311.01223 , year=

    Diffusion models for reinforcement learning: A survey , author=. arXiv preprint arXiv:2311.01223 , year=

  75. [85]

    Advances in Neural Information Processing Systems , volume=

    Rl for latent mdps: Regret guarantees and a lower bound , author=. Advances in Neural Information Processing Systems , volume=

  76. [86]

    Uncertainty in Artificial Intelligence , pages=

    Probabilistic task modelling for meta-learning , author=. Uncertainty in Artificial Intelligence , pages=. 2021 , organization=

  77. [87]

    Advances in neural information processing systems , volume=

    Decision transformer: Reinforcement learning via sequence modeling , author=. Advances in neural information processing systems , volume=

  78. [88]

    international conference on machine learning , pages=

    Online decision transformer , author=. international conference on machine learning , pages=. 2022 , organization=

  79. [90]

    Planning with Diffusion for Flexible Behavior Synthesis

    Planning with diffusion for flexible behavior synthesis , author=. arXiv preprint arXiv:2205.09991 , year=

  80. [91]

    ACM Computing Surveys , volume=

    Reinforcement learning based recommender systems: A survey , author=. ACM Computing Surveys , volume=. 2022 , publisher=

Showing first 80 references.