{"paper":{"title":"Logging Policy Design for Off-Policy Evaluation","license":"http://creativecommons.org/licenses/by/4.0/","headline":"A unifying framework derives optimal logging policies that minimize off-policy evaluation error by balancing reward concentration against action coverage across known, unknown, and partial information regimes.","cross_cats":["cs.AI","cs.IR","cs.LG","stat.ME"],"primary_cat":"stat.ML","authors_text":"Connor Douglas, Foster Provost, Joel Persson","submitted_at":"2026-05-14T17:25:19Z","abstract_excerpt":"Off-policy evaluation (OPE) estimates the value of a target treatment policy (e.g., a recommender system) using data collected by a different logging policy. It enables high-stakes experimentation without live deployment, yet in practice accuracy depends heavily on the logging policy used to collect data for computing the estimate. We study how to design logging policies that minimize OPE error for given target policies. We characterize a fundamental reward-coverage tradeoff: concentrating probability mass on high-reward actions reduces variance but risks missing signal on actions the target p"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"We propose a unifying framework for logging policy design and derive optimal policies in canonical informational regimes where the target policy and reward distribution are (i) known, (ii) unknown, and (iii) partially known through priors or noisy estimates at logging time.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The reward-coverage tradeoff and optimality results assume that the informational regimes (known, unknown, partial) accurately capture real-world knowledge at logging time and that standard OPE estimators behave according to the modeled variance and bias terms.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Derives optimal logging policies for off-policy evaluation by balancing reward concentration against action coverage in known, unknown, and partially known regimes of target policy and rewards.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"A unifying framework derives optimal logging policies that minimize off-policy evaluation error by balancing reward concentration against action coverage across known, unknown, and partial information regimes.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"30a339e8b6e2090b83135beb540d1e7132c44d2118d37a0a66fd48fc44b36655"},"source":{"id":"2605.15108","kind":"arxiv","version":1},"verdict":{"id":"101b4b32-f868-40b1-a2ae-84c6ecd46a55","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T03:04:05.796307Z","strongest_claim":"We propose a unifying framework for logging policy design and derive optimal policies in canonical informational regimes where the target policy and reward distribution are (i) known, (ii) unknown, and (iii) partially known through priors or noisy estimates at logging time.","one_line_summary":"Derives optimal logging policies for off-policy evaluation by balancing reward concentration against action coverage in known, unknown, and partially known regimes of target policy and rewards.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The reward-coverage tradeoff and optimality results assume that the informational regimes (known, unknown, partial) accurately capture real-world knowledge at logging time and that standard OPE estimators behave according to the modeled variance and bias terms.","pith_extraction_headline":"A unifying framework derives optimal logging policies that minimize off-policy evaluation error by balancing reward concentration against action coverage across known, unknown, and partial information regimes."},"references":{"count":86,"sample":[{"doi":"","year":2022,"title":"Proceedings of the 39th International Conference on Machine Learning (ICML) , pages =","work_id":"ac9f865d-5eab-4c6b-afca-bca92f60bb4e","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"arXiv preprint arXiv:2402.08201 , year=","work_id":"17184f5e-3258-4432-961d-38ba813bbe78","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2023,"title":"The Annals of Statistics , volume=","work_id":"632ce3ed-68c7-4e69-b5af-29702996e417","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Advances in neural information processing systems , volume=","work_id":"5b25cb38-d44b-43f4-8522-0e8b0a687e61","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2014,"title":"Mathematics of Operations Research , volume=","work_id":"7852c905-4c0f-4397-9a1c-bc56a5968cd1","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":86,"snapshot_sha256":"de00ae94d228c99880f940da094ee3a4a4d0e95925011af04d35a932a19fa772","internal_anchors":2},"formal_canon":{"evidence_count":2,"snapshot_sha256":"d22e391e70a51d0877e84bc0dcc08e439bf9e66e61e6735dba3b70160bc2d1e1"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}