Haichen Hu, Rui Ai, Stephen Bates, and David Simchi-Levi

URLhttps://arxiv · arXiv 2603.14218

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Model-Based Reinforcement Learning with Double Oracle Efficiency in Policy Optimization and Offline Estimation

cs.LG · 2026-05-01 · unverdicted · novelty 6.0

A novel log-barrier and log-determinant regularized algorithm achieves Õ(√T) regret in tabular MDPs with O(H log log T) oracle calls independent of |S|×|A| and extends to linear MDPs with infinite states for sublinear regret.

citing papers explorer

Showing 1 of 1 citing paper.

Model-Based Reinforcement Learning with Double Oracle Efficiency in Policy Optimization and Offline Estimation cs.LG · 2026-05-01 · unverdicted · none · ref 7
A novel log-barrier and log-determinant regularized algorithm achieves Õ(√T) regret in tabular MDPs with O(H log log T) oracle calls independent of |S|×|A| and extends to linear MDPs with infinite states for sublinear regret.

Haichen Hu, Rui Ai, Stephen Bates, and David Simchi-Levi

fields

years

verdicts

representative citing papers

citing papers explorer