{"paper":{"title":"Constrained Stochastic Optimal Control with a Baseline Performance Guarantee","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"","cross_cats":[],"primary_cat":"math.OC","authors_text":"Mohammad Ghavamzadeh, Yinlam Chow","submitted_at":"2014-10-10T10:15:15Z","abstract_excerpt":"In this paper, we show how a simulated Markov decision process (MDP) built by the so-called \\emph{baseline} policies, can be used to compute a different policy, namely the \\emph{simulated optimal} policy, for which the performance of this policy is guaranteed to be better than the baseline policy in the real environment. This technique has immense applications in fields such as news recommendation systems, health care diagnosis and digital online marketing. Our proposed algorithm iteratively solves for a \"good\" policy in the simulated MDP in an offline setting. Furthermore, we provide a perfor"},"claims":{"count":0,"items":[],"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"source":{"id":"1410.2726","kind":"arxiv","version":1},"verdict":{"id":null,"model_set":{},"created_at":null,"strongest_claim":"","one_line_summary":"","pipeline_version":null,"weakest_assumption":"","pith_extraction_headline":""},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}