Mathematics of Operations Research , volume=

Robust Markov decision processes , author= · 2013

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Learning When to Stop: Selective Imitation Learning Under Arbitrary Dynamics Shift

cs.LG · 2026-05-09 · unverdicted · novelty 7.0 · 2 refs

SeqRejectron constructs a stopping rule with a small set of validator policies to achieve horizon-free sample complexity for selective imitation learning under arbitrary dynamics shifts.

On the Complexity of Discounted Robust MDPs with $L_p$ Uncertainty Sets

cs.CC · 2026-05-08 · unverdicted · novelty 7.0

Policy iteration for discounted robust MDPs is strongly polynomial for L1 and L∞ uncertainty sets but hard for other Lp sets.

Bandit Learning in General Open Multi-agent Systems

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

A unified bandit framework for general open multi-agent systems with global-UCB algorithms and regret bounds linear in entry uncertainty and dependent on system stability and agent patterns.

Optimal Online and Offline Algorithms for Contextual MNL with Applications to Assortment and Pricing

math.OC · 2026-04-21 · unverdicted · novelty 6.0

New algorithms for joint contextual MNL assortment and pricing deliver improved online regret bounds of order W sqrt(d T log N)/L0 and local suboptimality guarantees offline.

Finite-Time Analysis of MCTS in Continuous POMDP Planning

cs.AI · 2026-05-08 · unverdicted · novelty 5.0

The paper proves finite-time probabilistic bounds on value estimates for MCTS in both discrete and continuous POMDPs and introduces Voro-POMCPOW with adaptive partitioning for guarantees.

citing papers explorer

Showing 5 of 5 citing papers.

Learning When to Stop: Selective Imitation Learning Under Arbitrary Dynamics Shift cs.LG · 2026-05-09 · unverdicted · none · ref 49 · 2 links
SeqRejectron constructs a stopping rule with a small set of validator policies to achieve horizon-free sample complexity for selective imitation learning under arbitrary dynamics shifts.
On the Complexity of Discounted Robust MDPs with $L_p$ Uncertainty Sets cs.CC · 2026-05-08 · unverdicted · none · ref 14
Policy iteration for discounted robust MDPs is strongly polynomial for L1 and L∞ uncertainty sets but hard for other Lp sets.
Bandit Learning in General Open Multi-agent Systems cs.LG · 2026-05-07 · unverdicted · none · ref 14
A unified bandit framework for general open multi-agent systems with global-UCB algorithms and regret bounds linear in entry uncertainty and dependent on system stability and agent patterns.
Optimal Online and Offline Algorithms for Contextual MNL with Applications to Assortment and Pricing math.OC · 2026-04-21 · unverdicted · none · ref 44
New algorithms for joint contextual MNL assortment and pricing deliver improved online regret bounds of order W sqrt(d T log N)/L0 and local suboptimality guarantees offline.
Finite-Time Analysis of MCTS in Continuous POMDP Planning cs.AI · 2026-05-08 · unverdicted · none · ref 261
The paper proves finite-time probabilistic bounds on value estimates for MCTS in both discrete and continuous POMDPs and introduces Voro-POMCPOW with adaptive partitioning for guarantees.

Mathematics of Operations Research , volume=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer