SeqRejectron constructs a stopping rule with a small set of validator policies to achieve horizon-free sample complexity for selective imitation learning under arbitrary dynamics shifts.
Mathematics of Operations Research , volume=
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5verdicts
UNVERDICTED 5roles
background 1polarities
background 1representative citing papers
Policy iteration for discounted robust MDPs is strongly polynomial for L1 and L∞ uncertainty sets but hard for other Lp sets.
A unified bandit framework for general open multi-agent systems with global-UCB algorithms and regret bounds linear in entry uncertainty and dependent on system stability and agent patterns.
New algorithms for joint contextual MNL assortment and pricing deliver improved online regret bounds of order W sqrt(d T log N)/L0 and local suboptimality guarantees offline.
The paper proves finite-time probabilistic bounds on value estimates for MCTS in both discrete and continuous POMDPs and introduces Voro-POMCPOW with adaptive partitioning for guarantees.
citing papers explorer
-
Learning When to Stop: Selective Imitation Learning Under Arbitrary Dynamics Shift
SeqRejectron constructs a stopping rule with a small set of validator policies to achieve horizon-free sample complexity for selective imitation learning under arbitrary dynamics shifts.
-
On the Complexity of Discounted Robust MDPs with $L_p$ Uncertainty Sets
Policy iteration for discounted robust MDPs is strongly polynomial for L1 and L∞ uncertainty sets but hard for other Lp sets.
-
Bandit Learning in General Open Multi-agent Systems
A unified bandit framework for general open multi-agent systems with global-UCB algorithms and regret bounds linear in entry uncertainty and dependent on system stability and agent patterns.
-
Optimal Online and Offline Algorithms for Contextual MNL with Applications to Assortment and Pricing
New algorithms for joint contextual MNL assortment and pricing deliver improved online regret bounds of order W sqrt(d T log N)/L0 and local suboptimality guarantees offline.
-
Finite-Time Analysis of MCTS in Continuous POMDP Planning
The paper proves finite-time probabilistic bounds on value estimates for MCTS in both discrete and continuous POMDPs and introduces Voro-POMCPOW with adaptive partitioning for guarantees.