pith. sign in
Pith Number

pith:UFEZKARI

pith:2026:UFEZKARIFIPNHMH6773EP3JCTI
not attested not anchored not stored refs pending

Off-Policy Learning with Limited Supply

Bushun Kawagishi, Koichi Tanaka, Nobuyuki Shimizu, Ren Kishimoto, Yasuo Yamamoto, Yusuke Narita, Yuta Saito

Greedy off-policy learning is suboptimal when supply is limited, and superior policies exist that allocate items based on relative expected rewards across users.

arxiv:2603.18702 v4 · 2026-03-19 · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{UFEZKARIFIPNHMH6773EP3JCTI}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Conventional greedy OPL approaches may fail to maximize the policy performance, and demonstrate that policies with superior performance must exist in limited supply settings.

C2weakest assumption

That logged data from an unconstrained behavior policy can be used to learn a policy that correctly accounts for future users' relative valuations under limited supply without additional assumptions on the arrival process or reward distributions.

C3one line summary

OPLS is a new off-policy method for contextual bandits with limited supply that outperforms greedy approaches by prioritizing items with higher relative expected rewards for the current user.

Receipt and verification
First computed 2026-05-20T00:04:28.780340Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

a1499502282a1ed3b0fefff647ed229a1be6d1385f3f244f84809a86e5e43767

Aliases

arxiv: 2603.18702 · arxiv_version: 2603.18702v4 · doi: 10.48550/arxiv.2603.18702 · pith_short_12: UFEZKARIFIPN · pith_short_16: UFEZKARIFIPNHMH6 · pith_short_8: UFEZKARI
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/UFEZKARIFIPNHMH6773EP3JCTI \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: a1499502282a1ed3b0fefff647ed229a1be6d1385f3f244f84809a86e5e43767
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "040b1b4756ef9bc2f2dbfa4df5cf000a85c5a82ca2f54a3861f81356a0aa5c4e",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-03-19T10:01:39Z",
    "title_canon_sha256": "2454d0a0cbbcd1c6815d4aac9e68115cfe1bed1301ce8fcaec3cdfa0072f2760"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2603.18702",
    "kind": "arxiv",
    "version": 4
  }
}