pith. sign in
Pith Number

pith:CZU2QIMD

pith:2026:CZU2QIMDK4Y3SFM5RI27BTRR77
not attested not anchored not stored refs resolved

Fast Rates for Inverse Reinforcement Learning

Andreas Schlaginhaufen, Maryam Kamgarpour

Min-Max-IRL with linear rewards achieves fast O(n^{-1}) rates for KL divergence and parameter error.

arxiv:2605.14599 v1 · 2026-05-14 · cs.LG · cs.AI · stat.ML

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{CZU2QIMDK4Y3SFM5RI27BTRR77}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

exploiting pseudo-self-concordance of the Min-Max-IRL loss, we prove that both the trajectory-level KL divergence and the squared parameter error in the Hessian norm decay at the fast rate O(n^{-1})

C2weakest assumption

The Min-Max-IRL loss is pseudo-self-concordant (invoked to obtain the fast rates); the paper also relies on linear reward classes and finite-horizon structure.

C3one line summary

Entropy-regularized Min-Max-IRL achieves O(n^{-1}) rates for trajectory-level KL divergence and squared parameter error in the Hessian norm under misspecification in Borel MDPs.

References

14 extracted · 14 resolved · 0 Pith anchors

[1] Ifβ >0, thenπ=π⋆ r if and only ifAπ t,r(s,a) = 0for all(t,s)andν-a.e.a∈A
[2] 9 Proof.Inpart 1, the implicationπ=π⋆ r =⇒Aπ t,r = 0ν-a.s 1999
[3] For allα∈[0,1], e−αSH(θ0)⪯H(θα)⪯eαSH(θ0).(15)
[4] Then ψ(−S)∥∆∥2 H(θ0) ≤DJ⋆(θ1,θ0)≤ψ(S)∥∆∥2 H(θ0)
[5] Then χ(−S)∥∆∥2 H(θ0) ≤DJ⋆(θ1,θ0) +DJ⋆(θ0,θ1) =⟨∆,∇J⋆(θ1)−∇J⋆(θ0)⟩ ≤χ(S)∥∆∥2 H(θ0)

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-17T23:39:04.272455Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

1669a821835731b9159d8a35f0ce31ffd899485ca3491db02ba99ba1bd27975f

Aliases

arxiv: 2605.14599 · arxiv_version: 2605.14599v1 · doi: 10.48550/arxiv.2605.14599 · pith_short_12: CZU2QIMDK4Y3 · pith_short_16: CZU2QIMDK4Y3SFM5 · pith_short_8: CZU2QIMD
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/CZU2QIMDK4Y3SFM5RI27BTRR77 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 1669a821835731b9159d8a35f0ce31ffd899485ca3491db02ba99ba1bd27975f
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "1b54a95d9c9568780923126408f03216921fd62ad2ceaee949651603d7664232",
    "cross_cats_sorted": [
      "cs.AI",
      "stat.ML"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-14T09:07:31Z",
    "title_canon_sha256": "147f5f120acf21e3e740194c0d945d73eee02f7a461d526f4344029256caaeb2"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.14599",
    "kind": "arxiv",
    "version": 1
  }
}