Convergence of finite memory Q-learning for POMDPs and near optimality of learned policies under filter stability.Mathematics of Operations Research, 48(4):2066–2093

Ali Devran Kara, Serdar Yüksel · 2066

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Learning to Hand Off: Provably Convergent Workflow Learning under Interface Constraints

cs.AI · 2026-05-18 · unverdicted · novelty 8.0

Formalizes interface-constrained semi-Markov decision processes and proves a finite-sample bound for neural IC-Q that decomposes into neural approximation error, interface gap, and mixing-time residual, with experiments showing parity to centralized oracles.

citing papers explorer

Showing 1 of 1 citing paper.

Learning to Hand Off: Provably Convergent Workflow Learning under Interface Constraints cs.AI · 2026-05-18 · unverdicted · none · ref 24
Formalizes interface-constrained semi-Markov decision processes and proves a finite-sample bound for neural IC-Q that decomposes into neural approximation error, interface gap, and mixing-time residual, with experiments showing parity to centralized oracles.

Convergence of finite memory Q-learning for POMDPs and near optimality of learned policies under filter stability.Mathematics of Operations Research, 48(4):2066–2093

fields

years

verdicts

representative citing papers

citing papers explorer