Con- textual Markov Decision Processes

Assaf Hallak, Dotan Di Castro, Shie Mannor · 2015 · stat.ML · arXiv 1502.02259

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

open full Pith review browse 9 citing papers arXiv PDF

abstract

We consider a planning problem where the dynamics and rewards of the environment depend on a hidden static parameter referred to as the context. The objective is to learn a strategy that maximizes the accumulated reward across all contexts. The new model, called Contextual Markov Decision Process (CMDP), can model a customer's behavior when interacting with a website (the learner). The customer's behavior depends on gender, age, location, device, etc. Based on that behavior, the website objective is to determine customer characteristics, and to optimize the interaction between them. Our work focuses on one basic scenario--finite horizon with a small known number of possible contexts. We suggest a family of algorithms with provable guarantees that learn the underlying models and the latent contexts, and optimize the CMDPs. Bounds are obtained for specific naive implementations, and extensions of the framework are discussed, laying the ground for future research.

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Curriculum reinforcement learning with measurable task representation learning

cs.LG · 2026-05-22 · unverdicted · novelty 6.0

A VAE-based latent task representation enables automatic curriculum generation in CRL for non-Euclidean navigation tasks, outperforming interpolation and GAN-based methods in experiments.

Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making

cs.LG · 2026-05-15 · unverdicted · novelty 6.0

Ada-Diffuser is a causal diffusion model that jointly learns observed interaction structure and underlying latent dynamics from minimal observations for adaptive planning and policy learning.

MDP modeling for multi-stage stochastic programs

cs.LG · 2025-09-26 · unverdicted · novelty 6.0

Extends policy graphs for decision-dependent uncertainty in MDPs and develops SDDP variants for multi-stage stochastic programs with continuous state and action spaces.

Behavior-Constrained Reinforcement Learning with Receding-Horizon Credit Assignment for High-Performance Control

cs.RO · 2026-04-03 · unverdicted · novelty 6.0

A behavior-constrained RL framework with receding-horizon credit assignment learns high-performance control policies that stay aligned with expert behavior in race car simulation.

MATE: Solving Contextual Markov Decision Processes with Memory of Accumulated Transition Embeddings

cs.LG · 2026-05-17 · unverdicted · novelty 5.0

MATE uses permutation-invariant sum-aggregated memory of transition embeddings to solve CMDPs with online adaptation and computational advantages over Transformers and RNNs.

Contextual Intelligence The Next Leap for Reinforcement Learning

cs.LG · 2026-02-17 · unverdicted · novelty 5.0

Reinforcement learning agents can generalize better by treating context as a first-class primitive that distinguishes slow-changing external factors from fast-changing internal ones and incorporates abstract high-level descriptors.

Fully Decentralized Cooperative Multi-Agent Reinforcement Learning is A Context Modeling Problem

cs.LG · 2025-09-19 · unverdicted · novelty 5.0

DAC models fully decentralized cooperative MARL as a context modeling problem, using latent variables for joint policies to fix non-stationarity in value updates and relative overgeneralization in value estimation.

Task-specific Subnetwork Discovery in Reinforcement Learning for Autonomous Underwater Navigation

cs.LG · 2026-04-23 · unverdicted · novelty 5.0

Contextual multi-task RL for underwater navigation uses just 1.5% of network weights for task differentiation, mostly from context-variable connections to the first hidden layer.

Contextual Multi-Task Reinforcement Learning for Autonomous Reef Monitoring

cs.RO · 2026-04-14 · unverdicted · novelty 4.0

A context-dependent multi-task RL policy is trained and evaluated in HoloOcean simulation to solve multiple reef monitoring tasks with claimed improvements in sample efficiency, zero-shot generalization, and robustness to water currents.

citing papers explorer

Showing 9 of 9 citing papers.

Curriculum reinforcement learning with measurable task representation learning cs.LG · 2026-05-22 · unverdicted · none · ref 22 · internal anchor
A VAE-based latent task representation enables automatic curriculum generation in CRL for non-Euclidean navigation tasks, outperforming interpolation and GAN-based methods in experiments.
Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making cs.LG · 2026-05-15 · unverdicted · none · ref 299 · internal anchor
Ada-Diffuser is a causal diffusion model that jointly learns observed interaction structure and underlying latent dynamics from minimal observations for adaptive planning and policy learning.
MDP modeling for multi-stage stochastic programs cs.LG · 2025-09-26 · unverdicted · none · ref 24 · internal anchor
Extends policy graphs for decision-dependent uncertainty in MDPs and develops SDDP variants for multi-stage stochastic programs with continuous state and action spaces.
Behavior-Constrained Reinforcement Learning with Receding-Horizon Credit Assignment for High-Performance Control cs.RO · 2026-04-03 · unverdicted · none · ref 14
A behavior-constrained RL framework with receding-horizon credit assignment learns high-performance control policies that stay aligned with expert behavior in race car simulation.
MATE: Solving Contextual Markov Decision Processes with Memory of Accumulated Transition Embeddings cs.LG · 2026-05-17 · unverdicted · none · ref 4 · internal anchor
MATE uses permutation-invariant sum-aggregated memory of transition embeddings to solve CMDPs with online adaptation and computational advantages over Transformers and RNNs.
Contextual Intelligence The Next Leap for Reinforcement Learning cs.LG · 2026-02-17 · unverdicted · none · ref 30 · internal anchor
Reinforcement learning agents can generalize better by treating context as a first-class primitive that distinguishes slow-changing external factors from fast-changing internal ones and incorporates abstract high-level descriptors.
Fully Decentralized Cooperative Multi-Agent Reinforcement Learning is A Context Modeling Problem cs.LG · 2025-09-19 · unverdicted · none · ref 6 · internal anchor
DAC models fully decentralized cooperative MARL as a context modeling problem, using latent variables for joint policies to fix non-stationarity in value updates and relative overgeneralization in value estimation.
Task-specific Subnetwork Discovery in Reinforcement Learning for Autonomous Underwater Navigation cs.LG · 2026-04-23 · unverdicted · none · ref 14
Contextual multi-task RL for underwater navigation uses just 1.5% of network weights for task differentiation, mostly from context-variable connections to the first hidden layer.
Contextual Multi-Task Reinforcement Learning for Autonomous Reef Monitoring cs.RO · 2026-04-14 · unverdicted · none · ref 13
A context-dependent multi-task RL policy is trained and evaluated in HoloOcean simulation to solve multiple reef monitoring tasks with claimed improvements in sample efficiency, zero-shot generalization, and robustness to water currents.

Con- textual Markov Decision Processes

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer