pith. sign in

Contextual Markov Decision Processes

13 Pith papers cite this work. Polarity classification is still indexing.

13 Pith papers citing it
abstract

We consider a planning problem where the dynamics and rewards of the environment depend on a hidden static parameter referred to as the context. The objective is to learn a strategy that maximizes the accumulated reward across all contexts. The new model, called Contextual Markov Decision Process (CMDP), can model a customer's behavior when interacting with a website (the learner). The customer's behavior depends on gender, age, location, device, etc. Based on that behavior, the website objective is to determine customer characteristics, and to optimize the interaction between them. Our work focuses on one basic scenario--finite horizon with a small known number of possible contexts. We suggest a family of algorithms with provable guarantees that learn the underlying models and the latent contexts, and optimize the CMDPs. Bounds are obtained for specific naive implementations, and extensions of the framework are discussed, laying the ground for future research.

citation-role summary

background 2

citation-polarity summary

years

2026 11 2025 2

roles

background 2

polarities

background 2

clear filters

representative citing papers

Formalizing Task-Space Complexity for Zero-Shot Generalization

cs.LG · 2026-06-18 · unverdicted · novelty 6.0

Introduces signed divergence to bound generalization gaps and defines task-space complexity as the minimum source contexts needed for ε-coverage under local smoothness, with set-cover reduction and empirical validation on LQR and DRL systems.

MDP modeling for multi-stage stochastic programs

cs.LG · 2025-09-26 · unverdicted · novelty 6.0

Extends policy graphs for decision-dependent uncertainty in MDPs and develops SDDP variants for multi-stage stochastic programs with continuous state and action spaces.

Contextual Intelligence The Next Leap for Reinforcement Learning

cs.LG · 2026-02-17 · unverdicted · novelty 5.0

Reinforcement learning agents can generalize better by treating context as a first-class primitive that distinguishes slow-changing external factors from fast-changing internal ones and incorporates abstract high-level descriptors.

citing papers explorer

Showing 1 of 1 citing paper after filters.