A provably efficient algorithm for linear markov decision process with low switching cost.arXiv preprint arXiv:2101.00494

Minbo Gao, Tianle Xie, Simon S Du, Lin F Yang · arXiv 2101.00494

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Model-Based Reinforcement Learning with Double Oracle Efficiency in Policy Optimization and Offline Estimation

cs.LG · 2026-05-01 · unverdicted · novelty 6.0

A novel log-barrier and log-determinant regularized algorithm achieves Õ(√T) regret in tabular MDPs with O(H log log T) oracle calls independent of |S|×|A| and extends to linear MDPs with infinite states for sublinear regret.

citing papers explorer

Showing 1 of 1 citing paper.

Model-Based Reinforcement Learning with Double Oracle Efficiency in Policy Optimization and Offline Estimation cs.LG · 2026-05-01 · unverdicted · none · ref 6
A novel log-barrier and log-determinant regularized algorithm achieves Õ(√T) regret in tabular MDPs with O(H log log T) oracle calls independent of |S|×|A| and extends to linear MDPs with infinite states for sublinear regret.

A provably efficient algorithm for linear markov decision process with low switching cost.arXiv preprint arXiv:2101.00494

fields

years

verdicts

representative citing papers

citing papers explorer