MATE: Solving Contextual Markov Decision Processes with Memory of Accumulated Transition Embeddings

Frank Chongwoo Park; Gene Chung; Himchan Hwang; Hyeokju Jeong; Sangwoong Yoon; Seungyeon Kim

arxiv: 2605.17431 · v1 · pith:SKCF3QJInew · submitted 2026-05-17 · 💻 cs.LG · cs.AI

MATE: Solving Contextual Markov Decision Processes with Memory of Accumulated Transition Embeddings

Himchan Hwang , Hyeokju Jeong , Gene Chung , Seungyeon Kim , Sangwoong Yoon , Frank Chongwoo Park This is my paper

Pith reviewed 2026-05-20 13:35 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords Contextual Markov Decision ProcessesMemory ArchitectureReinforcement LearningPosterior ApproximationPermutation InvarianceOnline AdaptationTransition EmbeddingsSequence Models

0 comments

The pith

A sum of transition embeddings can stand in for the full posterior over contexts in CMDPs while keeping enough information for near-optimal decisions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

In contextual Markov decision processes an agent must adapt its actions to an unknown context that shapes the transition dynamics. Computing the exact posterior over possible contexts grows intractable with more observations. MATE keeps a running sum of embeddings computed from each observed transition instead. The sum works because the true posterior is unchanged by the order of those observations, so the aggregated memory stays expressive enough for good action choices. The resulting architecture runs with fixed per-step cost and sidesteps both the quadratic expense of attention models and the gradient problems of recurrent networks.

Core claim

The paper establishes that a memory formed by summing embeddings of successive transitions is provably sufficient to represent the posterior belief over contexts in a CMDP. This follows directly from the fact that the posterior distribution is invariant to the ordering of the observations. Consequently the sum serves as a fixed-size, constant-cost substitute for the growing belief state, enabling online adaptation that matches the returns of standard sequence models on benchmark CMDP tasks.

What carries the argument

Sum-aggregated memory of transition embeddings, which exploits the permutation invariance of the context posterior to retain sufficient statistics for action selection.

Load-bearing premise

That simply adding up transition embeddings is enough to keep the information needed for near-optimal choices, with no need for order or other structure.

What would settle it

A CMDP in which optimal behavior requires remembering the exact sequence of transitions rather than only their aggregate statistics; the sum memory would then produce visibly lower returns than an order-sensitive model.

Figures

Figures reproduced from arXiv: 2605.17431 by Frank Chongwoo Park, Gene Chung, Himchan Hwang, Hyeokju Jeong, Sangwoong Yoon, Seungyeon Kim.

**Figure 1.** Figure 1: Overview of MATE. MATE represents the memory mt as a summation of transition embeddings, serving as a tractable substitute for the intractable posterior p(c|x1:t). By preserving the permutation invariance of the posterior, πθ is provably capable of representing the optimal policy π ∗ , despite its structural simplicity. former and RNN baselines across diverse benchmarks. Our results highlight its effective… view at source ↗

**Figure 2.** Figure 2: Comparison of memory architectures. MATE replaces the attention mechanisms of Transformers and the recurrent connections of RNNs with simple sum aggregation. Single-layer versions are illustrated for clarity. Transition Embeddings (MATE), a permutation-invariant memory defined as mt = Xt i=1 Eψ(xi). (6) The transition encoder Eψ can be any network that maps each transition xi to an embedding. As highligh… view at source ↗

**Figure 3.** Figure 3: Learning curves on MuJoCo (Left, Center) and Meta-World (Right) benchmarks. Tasks are ordered by increasing difficulty from top to bottom. Solid lines and shaded regions represent the mean and standard deviation across 3 random seeds. All agents are trained using Soft Actor-Critic (SAC) (Haarnoja et al., 2018) also scale mˆ t by the square root of its dimension, adopting the strategy in RMSNorm (Zhang & Se… view at source ↗

**Figure 4.** Figure 4: Performance comparison on the T-Maze tasks. The figures show the maximum average return on the (Left) Passive T-Maze and (Right) Active T-Maze environments. The reported values are the best results from 2 independent runs. All agents are trained using Double Q-Learning (DDQN) (Van Hasselt et al., 2016). ML10 includes out-of-distribution test tasks, we focus on evaluating adaptability to the variation withi… view at source ↗

**Figure 5.** Figure 5: Memory-based RL framework used in our experiments. An episode trajectory consisting of observation, action, and reward history is processed by a sequence encoder to produce a latent memory representation mt that summarizes past interactions. This memory vector conditions standard off-policy RL components. For discrete-control tasks (e.g., T-Maze), mt is provided to a DDQN head to estimate action values. Fo… view at source ↗

read the original abstract

We propose MATE, a simple yet effective memory architecture for solving Contextual Markov Decision Processes (CMDPs), a family of MDPs parameterized by an unobserved context. In CMDPs, an optimal agent can adapt online by maintaining the posterior belief over contexts. MATE replaces this intractable posterior with a sum-aggregated memory, leveraging the posterior's permutation invariance to retain provably sufficient expressiveness. Compared to prior memory architectures, MATE avoids the growing per-step rollout cost of Transformers and the gradient issues commonly associated with Recurrent Neural Networks (RNNs). Extensive evaluations across diverse benchmarks demonstrate that MATE provides clear computational advantages while achieving performance comparable to standard sequence-model baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes MATE, a memory architecture for Contextual Markov Decision Processes (CMDPs) that replaces the intractable posterior belief over contexts with a sum-aggregated memory of transition embeddings. It leverages the permutation invariance of the posterior to claim provably sufficient expressiveness, avoiding the per-step costs of Transformers and gradient problems of RNNs, and reports comparable performance to sequence-model baselines on diverse benchmarks.

Significance. If the theoretical justification for sufficiency holds, MATE could provide a simple and scalable memory mechanism for online adaptation in CMDPs, offering computational advantages in reinforcement learning settings with unobserved contexts. The empirical results suggest practical utility, though the lack of detailed ablations limits assessment of robustness.

major comments (2)

[Abstract and §3] Abstract and §3 (MATE architecture): The claim that sum-aggregation of transition embeddings 'retains provably sufficient expressiveness' solely via the posterior's permutation invariance is load-bearing for the central contribution. Permutation invariance ensures order-independence but does not establish that the sum is injective or information-preserving w.r.t. context-distinguishing likelihoods. No theorem or explicit construction (e.g., moment-matching or injective feature map) is provided to rule out collisions when context effects are non-additive, so the sufficiency for near-optimal action selection in general CMDPs remains unsubstantiated.
[§4 or §5] §4 or §5 (Experiments): The reported performance comparability to sequence-model baselines lacks error bars, ablation results on the embedding function, or controls testing whether sum-aggregation actually preserves posterior distinguishability. Without these, it is unclear whether observed results support the sufficiency claim or arise from other implementation details.

minor comments (2)

[Method] Clarify the precise per-step computation of the accumulated sum and the embedding network architecture to support reproducibility.
[Related Work] Add missing references to prior work on sufficient statistics for CMDPs or POMDPs to better situate the permutation-invariance argument.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback. We address each major comment below, providing clarifications on the theoretical foundations and committing to empirical enhancements in the revision.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (MATE architecture): The claim that sum-aggregation of transition embeddings 'retains provably sufficient expressiveness' solely via the posterior's permutation invariance is load-bearing for the central contribution. Permutation invariance ensures order-independence but does not establish that the sum is injective or information-preserving w.r.t. context-distinguishing likelihoods. No theorem or explicit construction (e.g., moment-matching or injective feature map) is provided to rule out collisions when context effects are non-additive, so the sufficiency for near-optimal action selection in general CMDPs remains unsubstantiated.

Authors: We appreciate the referee's focus on this foundational claim. The manuscript's argument rests on the fact that the posterior over contexts is exchangeable (hence permutation-invariant) with respect to the sequence of observed transitions. This symmetry permits the use of a sum aggregator without regard to order. To address potential collisions under non-additive context effects, we will add an explicit proposition in §3 that specifies sufficient conditions on the embedding function—namely, that it realizes an injective map from transition distributions to a feature space whose sums uniquely recover the posterior's sufficient statistics for policy optimality. This construction draws on known results for symmetric function approximation and will be accompanied by a proof sketch ruling out information loss for the relevant class of CMDPs. revision: yes
Referee: [§4 or §5] §4 or §5 (Experiments): The reported performance comparability to sequence-model baselines lacks error bars, ablation results on the embedding function, or controls testing whether sum-aggregation actually preserves posterior distinguishability. Without these, it is unclear whether observed results support the sufficiency claim or arise from other implementation details.

Authors: We agree that stronger empirical support is needed to link the observed performance to the sum-aggregation mechanism. In the revised manuscript we will augment the experimental section with (i) mean performance and standard-error bars computed over multiple independent seeds for every benchmark, (ii) an ablation varying the transition-embedding architecture, and (iii) a control that substitutes alternative permutation-invariant aggregators (e.g., mean or learned attention) while keeping all other components fixed. These additions will directly test whether the sum preserves the distinguishability required by the theoretical claim. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's derivation claims that sum-aggregation of transition embeddings retains provably sufficient expressiveness for the context posterior by leveraging its permutation invariance. This rests on a standard mathematical property of posterior distributions over contexts rather than any self-definition, fitted input renamed as prediction, or load-bearing self-citation. No equations or steps in the abstract or described chain reduce the sufficiency claim to a tautology or prior author work by construction; the argument treats invariance as an external fact that permits the aggregation without additional mechanisms. The central claim therefore remains self-contained against external benchmarks and does not trigger any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the unproven assertion that permutation invariance of the posterior makes sum aggregation sufficient; no free parameters or invented entities are named in the abstract.

axioms (1)

domain assumption The posterior belief over contexts in a CMDP is permutation-invariant with respect to the order of observed transitions.
Invoked to justify that sum aggregation retains sufficient expressiveness.

pith-pipeline@v0.9.0 · 5653 in / 1183 out tokens · 43980 ms · 2026-05-20T13:35:58.791187+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

MATE replaces this intractable posterior with a sum-aggregated memory, leveraging the posterior's permutation invariance to retain provably sufficient expressiveness.
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Proposition 3.1 ... summation mt = sum Eψ(xi) is injective (Amir et al., 2023, Theorem 3.3)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · 1 internal anchor

[1]

International conference on machine learning , pages=

Efficient off-policy meta-reinforcement learning via probabilistic context variables , author=. International conference on machine learning , pages=. 2019 , organization=

work page 2019
[2]

Proceedings of ICLR 2020 , year=

VariBAD: a very good method for Bayes-adaptive deep RL via meta-learning , author=. Proceedings of ICLR 2020 , year=

work page 2020
[3]

International Conference on Machine Learning , pages=

Recurrent Model-Free RL Can Be a Strong Baseline for Many POMDPs , author=. International Conference on Machine Learning , pages=. 2022 , organization=

work page 2022
[4]

Contextual Markov Decision Processes

Contextual markov decision processes , author=. arXiv preprint arXiv:1502.02259 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[5]

Journal of Artificial Intelligence Research , volume=

A survey of zero-shot generalisation in deep reinforcement learning , author=. Journal of Artificial Intelligence Research , volume=

work page
[6]

CoRR , year=

SplAgger: Split Aggregation for Meta-Reinforcement Learning , author=. CoRR , year=

work page
[7]

CoRR , year=

Bridging State and History Representations: Understanding Self-Predictive RL , author=. CoRR , year=

work page
[8]

Proceedings of the 38th International Conference on Neural Information Processing Systems , pages=

Towards an information theoretic framework of context-based offline meta-reinforcement learning , author=. Proceedings of the 38th International Conference on Neural Information Processing Systems , pages=

work page
[9]

Foundations and Trends in Machine Learning , volume=

A tutorial on meta-reinforcement learning , author=. Foundations and Trends in Machine Learning , volume=. 2025 , publisher=

work page 2025
[10]

Advances in Neural Information Processing Systems , volume=

Recurrent hypernetworks are surprisingly strong in meta-rl , author=. Advances in Neural Information Processing Systems , volume=

work page
[11]

IEEE transactions on neural networks , volume=

Learning long-term dependencies with gradient descent is difficult , author=. IEEE transactions on neural networks , volume=. 1994 , publisher=

work page 1994
[12]

International conference on machine learning , pages=

On the difficulty of training recurrent neural networks , author=. International conference on machine learning , pages=. 2013 , organization=

work page 2013
[13]

Advances in neural information processing systems , volume=

Deep sets , author=. Advances in neural information processing systems , volume=

work page
[14]

International Conference on Machine Learning , pages=

On the limitations of representing functions on sets , author=. International Conference on Machine Learning , pages=. 2019 , organization=

work page 2019
[15]

International Conference on Algorithmic Learning Theory , pages=

Universal representation of permutation-invariant functions on vectors and tensors , author=. International Conference on Algorithmic Learning Theory , pages=. 2024 , organization=

work page 2024
[16]

IEEE Access , volume=

Off-policy meta-reinforcement learning with belief-based task inference , author=. IEEE Access , volume=. 2022 , publisher=

work page 2022
[17]

arXiv preprint arXiv:2007.02879 , year=

Fast adaptation via policy-dynamics value functions , author=. arXiv preprint arXiv:2007.02879 , year=

work page arXiv 2007
[18]

4th Lifelong Machine Learning Workshop at ICML 2020 , year=

Exchangeable Models in Meta Reinforcement Learning , author=. 4th Lifelong Machine Learning Workshop at ICML 2020 , year=

work page 2020
[19]

arXiv preprint arXiv:2410.02751 , year=

Relic: A recipe for 64k steps of in-context reinforcement learning for embodied ai , author=. arXiv preprint arXiv:2410.02751 , year=

work page arXiv
[20]

Advances in Neural Information Processing Systems , volume=

Structured state space models for in-context reinforcement learning , author=. Advances in Neural Information Processing Systems , volume=

work page
[21]

(No Title) , year=

Bayesian decision problems and Markov chains , author=. (No Title) , year=

work page
[22]

2002 , publisher=

Optimal Learning: Computational procedures for Bayes-adaptive Markov decision processes , author=. 2002 , publisher=

work page 2002
[23]

Advances in neural information processing systems , volume=

Why generalization in rl is difficult: Epistemic pomdps and implicit partial observability , author=. Advances in neural information processing systems , volume=

work page
[24]

arXiv preprint arXiv:2502.07978 , year=

A survey of in-context reinforcement learning , author=. arXiv preprint arXiv:2502.07978 , year=

work page arXiv
[25]

AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents , volume =

Grigsby, Jake and Fan, Jim and Zhu, Yuke , booktitle =. AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents , volume =

work page
[26]

Advances in Neural Information Processing Systems , volume=

Amago-2: Breaking the multi-task barrier in meta-reinforcement learning with transformers , author=. Advances in Neural Information Processing Systems , volume=

work page
[27]

Artificial intelligence , volume=

Planning and acting in partially observable stochastic domains , author=. Artificial intelligence , volume=. 1998 , publisher=

work page 1998
[28]

2023 , booktitle=

In-context Reinforcement Learning with Algorithm Distillation , author=. 2023 , booktitle=

work page 2023
[29]

CoRR , year=

1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities , author=. CoRR , year=

work page
[30]

Transactions on Machine Learning Research , year=

Contextualize Me--The Case for Context in Reinforcement Learning , author=. Transactions on Machine Learning Research , year=

work page
[31]

Aaai , volume=

Acting optimally in partially observable stochastic domains , author=. Aaai , volume=

work page
[32]

Advances in Neural Information Processing Systems , volume=

Neural injective functions for multisets, measures and graphs via a finite witness theorem , author=. Advances in Neural Information Processing Systems , volume=

work page
[33]

Mathematics of control, signals and systems , volume=

Approximation by superpositions of a sigmoidal function , author=. Mathematics of control, signals and systems , volume=. 1989 , publisher=

work page 1989
[34]

The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

Predictive Coding Enhances Meta-RL To Achieve Interpretable Bayes-Optimal Belief Representation Under Partial Observability , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

work page
[35]

Advances in neural information processing systems , volume=

Root mean square layer normalization , author=. Advances in neural information processing systems , volume=

work page
[36]

2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , year =

MuJoCo: A physics engine for model-based control , author =. 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , year =

work page 2012
[37]

Proceedings of the 34th International Conference on Machine Learning , pages =

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , author =. Proceedings of the 34th International Conference on Machine Learning , pages =. 2017 , editor =

work page 2017
[38]

OpenAI blog , volume=

Language models are unsupervised multitask learners , author=. OpenAI blog , volume=

work page
[39]

Neural computation , volume=

Long short-term memory , author=. Neural computation , volume=. 1997 , publisher=

work page 1997
[40]

Advances in Neural Information Processing Systems , volume=

When do transformers shine in rl? decoupling memory from credit assignment , author=. Advances in Neural Information Processing Systems , volume=

work page
[41]

arXiv preprint arXiv:2506.13892 , year=

Scaling Algorithm Distillation for Continuous Control with Mamba , author=. arXiv preprint arXiv:2506.13892 , year=

work page arXiv
[42]

Forty-second International Conference on Machine Learning , year=

A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks , author=. Forty-second International Conference on Machine Learning , year=

work page
[43]

Proceedings of the AAAI conference on artificial intelligence , volume=

Deep reinforcement learning with double q-learning , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

work page
[44]

International conference on machine learning , pages=

Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor , author=. International conference on machine learning , pages=. 2018 , organization=

work page 2018
[45]

International Conference on Learning Representations , year=

Amrl: Aggregated memory for reinforcement learning , author=. International Conference on Learning Representations , year=

work page
[46]

Advances in Neural Information Processing Systems , volume=

Decision mamba: Reinforcement learning via hybrid selective sequence modeling , author=. Advances in Neural Information Processing Systems , volume=

work page
[47]

The Thirteenth International Conference on Learning Representations , year=

Efficient Cross-Episode Meta-RL , author=. The Thirteenth International Conference on Learning Representations , year=

work page
[48]

Duan, Yan and Schulman, John and Chen, Xi and Bartlett, Peter L and Sutskever, Ilya and Abbeel, Pieter , journal=

work page
[49]

International conference on machine learning , pages=

Batch normalization: Accelerating deep network training by reducing internal covariate shift , author=. International conference on machine learning , pages=. 2015 , organization=

work page 2015
[50]

Conference on robot learning , pages=

Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning , author=. Conference on robot learning , pages=. 2020 , organization=

work page 2020
[51]

First conference on language modeling , year=

Mamba: Linear-time sequence modeling with selective state spaces , author=. First conference on language modeling , year=

work page
[52]

2022 International Conference on Robotics and Automation (ICRA) , year=

Context is Everything: Implicit Identification for Dynamics Adaptation , author=. 2022 International Conference on Robotics and Automation (ICRA) , year=

work page 2022

[1] [1]

International conference on machine learning , pages=

Efficient off-policy meta-reinforcement learning via probabilistic context variables , author=. International conference on machine learning , pages=. 2019 , organization=

work page 2019

[2] [2]

Proceedings of ICLR 2020 , year=

VariBAD: a very good method for Bayes-adaptive deep RL via meta-learning , author=. Proceedings of ICLR 2020 , year=

work page 2020

[3] [3]

International Conference on Machine Learning , pages=

Recurrent Model-Free RL Can Be a Strong Baseline for Many POMDPs , author=. International Conference on Machine Learning , pages=. 2022 , organization=

work page 2022

[4] [4]

Contextual Markov Decision Processes

Contextual markov decision processes , author=. arXiv preprint arXiv:1502.02259 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

Journal of Artificial Intelligence Research , volume=

A survey of zero-shot generalisation in deep reinforcement learning , author=. Journal of Artificial Intelligence Research , volume=

work page

[6] [6]

CoRR , year=

SplAgger: Split Aggregation for Meta-Reinforcement Learning , author=. CoRR , year=

work page

[7] [7]

CoRR , year=

Bridging State and History Representations: Understanding Self-Predictive RL , author=. CoRR , year=

work page

[8] [8]

Proceedings of the 38th International Conference on Neural Information Processing Systems , pages=

Towards an information theoretic framework of context-based offline meta-reinforcement learning , author=. Proceedings of the 38th International Conference on Neural Information Processing Systems , pages=

work page

[9] [9]

Foundations and Trends in Machine Learning , volume=

A tutorial on meta-reinforcement learning , author=. Foundations and Trends in Machine Learning , volume=. 2025 , publisher=

work page 2025

[10] [10]

Advances in Neural Information Processing Systems , volume=

Recurrent hypernetworks are surprisingly strong in meta-rl , author=. Advances in Neural Information Processing Systems , volume=

work page

[11] [11]

IEEE transactions on neural networks , volume=

Learning long-term dependencies with gradient descent is difficult , author=. IEEE transactions on neural networks , volume=. 1994 , publisher=

work page 1994

[12] [12]

International conference on machine learning , pages=

On the difficulty of training recurrent neural networks , author=. International conference on machine learning , pages=. 2013 , organization=

work page 2013

[13] [13]

Advances in neural information processing systems , volume=

Deep sets , author=. Advances in neural information processing systems , volume=

work page

[14] [14]

International Conference on Machine Learning , pages=

On the limitations of representing functions on sets , author=. International Conference on Machine Learning , pages=. 2019 , organization=

work page 2019

[15] [15]

International Conference on Algorithmic Learning Theory , pages=

Universal representation of permutation-invariant functions on vectors and tensors , author=. International Conference on Algorithmic Learning Theory , pages=. 2024 , organization=

work page 2024

[16] [16]

IEEE Access , volume=

Off-policy meta-reinforcement learning with belief-based task inference , author=. IEEE Access , volume=. 2022 , publisher=

work page 2022

[17] [17]

arXiv preprint arXiv:2007.02879 , year=

Fast adaptation via policy-dynamics value functions , author=. arXiv preprint arXiv:2007.02879 , year=

work page arXiv 2007

[18] [18]

4th Lifelong Machine Learning Workshop at ICML 2020 , year=

Exchangeable Models in Meta Reinforcement Learning , author=. 4th Lifelong Machine Learning Workshop at ICML 2020 , year=

work page 2020

[19] [19]

arXiv preprint arXiv:2410.02751 , year=

Relic: A recipe for 64k steps of in-context reinforcement learning for embodied ai , author=. arXiv preprint arXiv:2410.02751 , year=

work page arXiv

[20] [20]

Advances in Neural Information Processing Systems , volume=

Structured state space models for in-context reinforcement learning , author=. Advances in Neural Information Processing Systems , volume=

work page

[21] [21]

(No Title) , year=

Bayesian decision problems and Markov chains , author=. (No Title) , year=

work page

[22] [22]

2002 , publisher=

Optimal Learning: Computational procedures for Bayes-adaptive Markov decision processes , author=. 2002 , publisher=

work page 2002

[23] [23]

Advances in neural information processing systems , volume=

Why generalization in rl is difficult: Epistemic pomdps and implicit partial observability , author=. Advances in neural information processing systems , volume=

work page

[24] [24]

arXiv preprint arXiv:2502.07978 , year=

A survey of in-context reinforcement learning , author=. arXiv preprint arXiv:2502.07978 , year=

work page arXiv

[25] [25]

AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents , volume =

Grigsby, Jake and Fan, Jim and Zhu, Yuke , booktitle =. AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents , volume =

work page

[26] [26]

Advances in Neural Information Processing Systems , volume=

Amago-2: Breaking the multi-task barrier in meta-reinforcement learning with transformers , author=. Advances in Neural Information Processing Systems , volume=

work page

[27] [27]

Artificial intelligence , volume=

Planning and acting in partially observable stochastic domains , author=. Artificial intelligence , volume=. 1998 , publisher=

work page 1998

[28] [28]

2023 , booktitle=

In-context Reinforcement Learning with Algorithm Distillation , author=. 2023 , booktitle=

work page 2023

[29] [29]

CoRR , year=

1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities , author=. CoRR , year=

work page

[30] [30]

Transactions on Machine Learning Research , year=

Contextualize Me--The Case for Context in Reinforcement Learning , author=. Transactions on Machine Learning Research , year=

work page

[31] [31]

Aaai , volume=

Acting optimally in partially observable stochastic domains , author=. Aaai , volume=

work page

[32] [32]

Advances in Neural Information Processing Systems , volume=

Neural injective functions for multisets, measures and graphs via a finite witness theorem , author=. Advances in Neural Information Processing Systems , volume=

work page

[33] [33]

Mathematics of control, signals and systems , volume=

Approximation by superpositions of a sigmoidal function , author=. Mathematics of control, signals and systems , volume=. 1989 , publisher=

work page 1989

[34] [34]

The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

Predictive Coding Enhances Meta-RL To Achieve Interpretable Bayes-Optimal Belief Representation Under Partial Observability , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

work page

[35] [35]

Advances in neural information processing systems , volume=

Root mean square layer normalization , author=. Advances in neural information processing systems , volume=

work page

[36] [36]

2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , year =

MuJoCo: A physics engine for model-based control , author =. 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , year =

work page 2012

[37] [37]

Proceedings of the 34th International Conference on Machine Learning , pages =

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , author =. Proceedings of the 34th International Conference on Machine Learning , pages =. 2017 , editor =

work page 2017

[38] [38]

OpenAI blog , volume=

Language models are unsupervised multitask learners , author=. OpenAI blog , volume=

work page

[39] [39]

Neural computation , volume=

Long short-term memory , author=. Neural computation , volume=. 1997 , publisher=

work page 1997

[40] [40]

Advances in Neural Information Processing Systems , volume=

When do transformers shine in rl? decoupling memory from credit assignment , author=. Advances in Neural Information Processing Systems , volume=

work page

[41] [41]

arXiv preprint arXiv:2506.13892 , year=

Scaling Algorithm Distillation for Continuous Control with Mamba , author=. arXiv preprint arXiv:2506.13892 , year=

work page arXiv

[42] [42]

Forty-second International Conference on Machine Learning , year=

A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks , author=. Forty-second International Conference on Machine Learning , year=

work page

[43] [43]

Proceedings of the AAAI conference on artificial intelligence , volume=

Deep reinforcement learning with double q-learning , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

work page

[44] [44]

International conference on machine learning , pages=

Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor , author=. International conference on machine learning , pages=. 2018 , organization=

work page 2018

[45] [45]

International Conference on Learning Representations , year=

Amrl: Aggregated memory for reinforcement learning , author=. International Conference on Learning Representations , year=

work page

[46] [46]

Advances in Neural Information Processing Systems , volume=

Decision mamba: Reinforcement learning via hybrid selective sequence modeling , author=. Advances in Neural Information Processing Systems , volume=

work page

[47] [47]

The Thirteenth International Conference on Learning Representations , year=

Efficient Cross-Episode Meta-RL , author=. The Thirteenth International Conference on Learning Representations , year=

work page

[48] [48]

Duan, Yan and Schulman, John and Chen, Xi and Bartlett, Peter L and Sutskever, Ilya and Abbeel, Pieter , journal=

work page

[49] [49]

International conference on machine learning , pages=

Batch normalization: Accelerating deep network training by reducing internal covariate shift , author=. International conference on machine learning , pages=. 2015 , organization=

work page 2015

[50] [50]

Conference on robot learning , pages=

Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning , author=. Conference on robot learning , pages=. 2020 , organization=

work page 2020

[51] [51]

First conference on language modeling , year=

Mamba: Linear-time sequence modeling with selective state spaces , author=. First conference on language modeling , year=

work page

[52] [52]

2022 International Conference on Robotics and Automation (ICRA) , year=

Context is Everything: Implicit Identification for Dynamics Adaptation , author=. 2022 International Conference on Robotics and Automation (ICRA) , year=

work page 2022