{"paper":{"title":"On Gaussian approximation for entropy-regularized Q-learning with function approximation","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Entropy-regularized Q-learning with linear function approximation yields a Gaussian approximation bound of order n to the minus one-fourth for Polyak-Ruppert averaged iterates.","cross_cats":["cs.LG"],"primary_cat":"stat.ML","authors_text":"Alexey Naumov, Artemy Rubtsov, Eric Moulines, Rahul Singh, Sergey Samsonov","submitted_at":"2026-05-17T22:23:25Z","abstract_excerpt":"In this paper, we derive rates of convergence in the high-dimensional central limit theorem for Polyak--Ruppert averaged iterates generated by entropy-regularized asynchronous Q-learning with linear function approximation and a polynomial stepsize $k^{-\\omega}$, $\\omega \\in (1/2,1)$. Assuming that the sequence of observed triples $(s_k,a_k,s_{k+1})_{k \\geq 0}$ forms a uniformly geometrically ergodic Markov chain, and under suitable regularity conditions for the projected soft Bellman equation, we establish a Gaussian approximation bound in the convex distance with rate of order $n^{-1/4}$, up "},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"We establish a Gaussian approximation bound in the convex distance with rate of order n^{-1/4}, up to polylogarithmic factors in n, for the Polyak-Ruppert averaged iterates.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The sequence of observed triples (s_k, a_k, s_{k+1}) forms a uniformly geometrically ergodic Markov chain, together with suitable regularity conditions for the projected soft Bellman equation.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Establishes n^{-1/4} Gaussian approximation in convex distance for averaged entropy-regularized Q-learning with linear function approximation and polynomial stepsizes.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Entropy-regularized Q-learning with linear function approximation yields a Gaussian approximation bound of order n to the minus one-fourth for Polyak-Ruppert averaged iterates.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"0f31c0d62c8a0aaa0f1f08aafdbc3921456b4adbdcdb447dd79b4a85b5c3e0e7"},"source":{"id":"2605.17678","kind":"arxiv","version":1},"verdict":{"id":"1fb85ef0-8a8a-4508-b400-fe18cf6333be","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-19T22:08:20.724234Z","strongest_claim":"We establish a Gaussian approximation bound in the convex distance with rate of order n^{-1/4}, up to polylogarithmic factors in n, for the Polyak-Ruppert averaged iterates.","one_line_summary":"Establishes n^{-1/4} Gaussian approximation in convex distance for averaged entropy-regularized Q-learning with linear function approximation and polynomial stepsizes.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The sequence of observed triples (s_k, a_k, s_{k+1}) forms a uniformly geometrically ergodic Markov chain, together with suitable regularity conditions for the projected soft Bellman equation.","pith_extraction_headline":"Entropy-regularized Q-learning with linear function approximation yields a Gaussian approximation bound of order n to the minus one-fourth for Polyak-Ruppert averaged iterates."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2605.17678/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"doi_title_agreement","ran_at":"2026-05-19T22:31:19.443137Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_compliance","ran_at":"2026-05-19T22:20:57.108737Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"cited_work_retraction","ran_at":"2026-05-19T21:51:58.976109Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"citation_quote_validity","ran_at":"2026-05-19T21:49:44.258233Z","status":"skipped","version":"0.1.0","findings_count":0},{"name":"ai_meta_artifact","ran_at":"2026-05-19T21:33:23.530051Z","status":"skipped","version":"1.0.0","findings_count":0},{"name":"claim_evidence","ran_at":"2026-05-19T21:21:57.442604Z","status":"completed","version":"1.0.0","findings_count":0}],"snapshot_sha256":"1058f9d7468c8f1729374a219f75350c86f515d40bfa1409982e837fb54b5cce"},"references":{"count":40,"sample":[{"doi":"","year":1995,"title":"Residual algorithms: Reinforcement learning with function approximation","work_id":"7740df1b-123f-43d5-b998-329a7ac59906","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":1993,"title":"The reverse isoperimetric problem for gaussian measure.Discrete & Computational Geometry, 10(4):411–420, 1993","work_id":"e4f5d992-92af-405f-bd44-b7a85b0b2108","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":1996,"title":"Bertsekas and John N","work_id":"16043d2a-fad1-4af5-87e2-d17f27834c32","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2026,"title":"Gaussian approximation for two-timescale linear stochastic approximation","work_id":"650395df-4448-4338-9dbb-5f6b8b6b0155","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2022,"title":"Finite-sample analysis of nonlinear stochastic approximation with applications in reinforcement learning.Automatica, 146:110623, 2022","work_id":"fd03ab28-49e8-48e5-91ba-b2f06ab6e9d5","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":40,"snapshot_sha256":"0a9e7de1187c9a3d92c31e35262dcafa7453b07e3c7603b2645b25190b788e8f","internal_anchors":3},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}