pith. sign in
Pith Number

pith:4MRX7NOB

pith:2026:4MRX7NOBILF4SALFR563EFL3K6
not attested not anchored not stored refs resolved

A Harmonic Mean Formulation of Average Reward Reinforcement Learning in SMDPs

Alicia Vidler, Erel Shtossel, Gal A. Kaminka, Uri Shaham

A modified harmonic mean operator correctly computes average reward rates in non-stationary semi-Markov decision processes.

arxiv:2605.04880 v1 · 2026-05-06 · cs.LG · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{4MRX7NOBILF4SALFR563EFL3K6}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

This paper presents a novel modified harmonic mean operator that correctly computes reward rates even under such conditions. This yields model-free learning algorithms that can work with SMDPs, while maintaining robustness to non-stationary reward and duration distributions over time.

C2weakest assumption

That the ratio of cumulative reward to cumulative duration becomes incorrect under non-stationarity in infinite-horizon SMDPs, and that the proposed harmonic-mean modification resolves this without introducing new biases or requiring additional assumptions.

C3one line summary

A modified harmonic mean operator correctly computes reward rates in non-stationary SMDPs for average-reward reinforcement learning.

References

23 extracted · 23 resolved · 0 Pith anchors

[1] János Aczél. 1948. On mean values.Bull. Amer. Math. Soc.54, 4 (1948), 392–400 1948
[2] 2007.Aggregation Functions: A Guide for Practitioners 2007
[3] 2013.The Problem of HFT: Collected Writings on High Frequency Trading and Stock Market Structure Reform 2013
[4] Peter S. Bullen. 2003.Handbook of Means and Their Inequalities. Kluwer Academic Publishers, Dordrecht 2003
[5] Das, Abhijit Gosavi, Sridhar Mahadevan, and Nicholas Marchalleck 1999 · doi:10.1287/mnsc.45.4.560
Receipt and verification
First computed 2026-05-27T01:05:56.196465Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

e3237fb5c142cbc901658f7db2157b578a67911ba3a50e83bcc7beee2ceef328

Aliases

arxiv: 2605.04880 · arxiv_version: 2605.04880v1 · doi: 10.48550/arxiv.2605.04880 · pith_short_12: 4MRX7NOBILF4 · pith_short_16: 4MRX7NOBILF4SALF · pith_short_8: 4MRX7NOB
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/4MRX7NOBILF4SALFR563EFL3K6 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: e3237fb5c142cbc901658f7db2157b578a67911ba3a50e83bcc7beee2ceef328
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "60af616fdd4a5193ed3e9f635c0f6ed443c4e093fdcbc3ed2ed8af45d4b67c00",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-06T13:16:42Z",
    "title_canon_sha256": "578a55a5a8058c617a340c9f17a80a454c1dde2440bd51654a5a83092e8d6b25"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.04880",
    "kind": "arxiv",
    "version": 1
  }
}