pith. sign in
Pith Number

pith:XCPSYI2I

pith:2026:XCPSYI2IAXYG2QQVR2F7FQ6IIW
not attested not anchored not stored refs resolved

Logging Policy Design for Off-Policy Evaluation

Connor Douglas, Foster Provost, Joel Persson

A unifying framework derives optimal logging policies that minimize off-policy evaluation error by balancing reward concentration against action coverage across known, unknown, and partial information regimes.

arxiv:2605.15108 v1 · 2026-05-14 · stat.ML · cs.AI · cs.IR · cs.LG · stat.ME

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{XCPSYI2IAXYG2QQVR2F7FQ6IIW}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

We propose a unifying framework for logging policy design and derive optimal policies in canonical informational regimes where the target policy and reward distribution are (i) known, (ii) unknown, and (iii) partially known through priors or noisy estimates at logging time.

C2weakest assumption

The reward-coverage tradeoff and optimality results assume that the informational regimes (known, unknown, partial) accurately capture real-world knowledge at logging time and that standard OPE estimators behave according to the modeled variance and bias terms.

C3one line summary

Derives optimal logging policies for off-policy evaluation by balancing reward concentration against action coverage in known, unknown, and partially known regimes of target policy and rewards.

References

86 extracted · 86 resolved · 2 Pith anchors

[1] Proceedings of the 39th International Conference on Machine Learning (ICML) , pages = 2022
[2] arXiv preprint arXiv:2402.08201 , year=
[3] The Annals of Statistics , volume= 2023
[4] Advances in neural information processing systems , volume=
[5] Mathematics of Operations Research , volume= 2014

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-17T21:40:25.768637Z
Last reissued 2026-05-17T21:57:19.096160Z
Builder pith-number-builder-2026-05-17-v1
Signature unsigned_v0
Schema pith-number/v1.0

Canonical hash

b89f2c234805f06d42158e8bf2c3c845b35528d71928e2c46621ab3ef868bccd

Aliases

arxiv: 2605.15108 · arxiv_version: 2605.15108v1 · pith_short_12: XCPSYI2IAXYG · pith_short_16: XCPSYI2IAXYG2QQV · pith_short_8: XCPSYI2I
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/XCPSYI2IAXYG2QQVR2F7FQ6IIW \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: b89f2c234805f06d42158e8bf2c3c845b35528d71928e2c46621ab3ef868bccd
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "9f7505eea36120ce3ad46d6a4c240d5359cf786c1048687aaab3e1a0196dfcaa",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.IR",
      "cs.LG",
      "stat.ME"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "stat.ML",
    "submitted_at": "2026-05-14T17:25:19Z",
    "title_canon_sha256": "c73506c0ad7842abfb1fe695bb315a9e053c3265d1179618095f86e06f82aec4"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.15108",
    "kind": "arxiv",
    "version": 1
  }
}