{"paper":{"title":"When Does Non-Uniform Replay Matter in Reinforcement Learning?","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Non-uniform replay improves reinforcement learning sample efficiency mainly when replay volume is low, provided sampling entropy stays high.","cross_cats":["cs.AI"],"primary_cat":"cs.LG","authors_text":"Michal Korniak, Michal Nauman, Miko{\\l}aj Czarnecki, Pieter Abbeel, Piotr Mi{\\l}o\\'s, Yarden As","submitted_at":"2026-05-11T09:11:05Z","abstract_excerpt":"Modern off-policy reinforcement learning algorithms often rely on simple uniform replay sampling and it remains unclear when and why non-uniform replay improves over this strong baseline. Across diverse RL settings, we show that the effectiveness of non-uniform replay is governed by three factors: replay volume, the number of replayed transitions per environment step; expected recency, how recent sampled transitions are; and the entropy of the replay sampling distribution. Our main contribution is clarifying when non-uniform replay is beneficial and providing practical guidance for replay desi"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"the effectiveness of non-uniform replay is governed by three factors: replay volume, the number of replayed transitions per environment step; expected recency, how recent sampled transitions are; and the entropy of the replay sampling distribution. ... non-uniform replay is most beneficial when replay volume is low, and that high-entropy sampling is important even at comparable expected recency.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the three identified factors comprehensively govern non-uniform replay effectiveness and that the observed benefits will generalize beyond the specific algorithms, benchmarks, and parallel-simulation regimes tested.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Non-uniform replay improves RL sample efficiency mainly in low replay-volume regimes, with high-entropy sampling being key even at comparable recency.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Non-uniform replay improves reinforcement learning sample efficiency mainly when replay volume is low, provided sampling entropy stays high.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"43df73cedae58c20bca3de2933d49bbd268b878f2e44eb1c7fa06e009cb362a9"},"source":{"id":"2605.10236","kind":"arxiv","version":3},"verdict":{"id":"7d7639d2-4b3a-4ee5-a83f-d74534203b2e","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-13T06:22:47.019414Z","strongest_claim":"the effectiveness of non-uniform replay is governed by three factors: replay volume, the number of replayed transitions per environment step; expected recency, how recent sampled transitions are; and the entropy of the replay sampling distribution. ... non-uniform replay is most beneficial when replay volume is low, and that high-entropy sampling is important even at comparable expected recency.","one_line_summary":"Non-uniform replay improves RL sample efficiency mainly in low replay-volume regimes, with high-entropy sampling being key even at comparable recency.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the three identified factors comprehensively govern non-uniform replay effectiveness and that the observed benefits will generalize beyond the specific algorithms, benchmarks, and parallel-simulation regimes tested.","pith_extraction_headline":"Non-uniform replay improves reinforcement learning sample efficiency mainly when replay volume is low, provided sampling entropy stays high."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2605.10236/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"ai_meta_artifact","ran_at":"2026-05-19T15:38:10.202722Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_title_agreement","ran_at":"2026-05-19T11:31:19.746771Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_compliance","ran_at":"2026-05-19T09:33:47.881786Z","status":"completed","version":"1.0.0","findings_count":0}],"snapshot_sha256":"9ad6adbe4ce54f729447c63461f7ce37017431aba2fc990d1fa3e8848f8c5734"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}