pith. sign in
Pith Number

pith:YMJNFF3G

pith:2026:YMJNFF3GKDF5ITPVHOUQKO6D7J
not attested not anchored not stored refs pending

Process Reward Agents for Steering Knowledge-Intensive Reasoning

Jiwoong Sohn, Kenneth Styppa, Michael Moor, Tomasz Sternal, Torsten Hoefler

Process Reward Agents supply online step-wise rewards from external knowledge to steer reasoning in frozen language models.

arxiv:2604.09482 v2 · 2026-04-10 · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{YMJNFF3GKDF5ITPVHOUQKO6D7J}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

PRA consistently outperforms strong baselines, achieving 80.8% accuracy on MedQA with Qwen3-4B, a new state of the art at the 4B scale. Importantly, PRA generalizes to unseen frozen policy models ranging from 0.5B to 8B parameters, improving their accuracy by up to 25.7% without any policy model updates.

C2weakest assumption

That retrieval-augmented process rewards can be computed reliably and cheaply at every generation step from external knowledge sources without introducing undetected errors or prohibitive latency that would negate the search benefits.

C3one line summary

Process Reward Agents enable online step-wise guidance for frozen AI models in medical reasoning, raising accuracy to 80.8% on MedQA and up to 25.7% gains across 0.5B-8B models without policy updates.

Receipt and verification
First computed 2026-06-02T02:04:52.913127Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

c312d2976650cbd44df53ba9053bc3fa6004c1ba2fd631103acb5c432caaf683

Aliases

arxiv: 2604.09482 · arxiv_version: 2604.09482v2 · doi: 10.48550/arxiv.2604.09482 · pith_short_12: YMJNFF3GKDF5 · pith_short_16: YMJNFF3GKDF5ITPV · pith_short_8: YMJNFF3G
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/YMJNFF3GKDF5ITPVHOUQKO6D7J \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: c312d2976650cbd44df53ba9053bc3fa6004c1ba2fd631103acb5c432caaf683
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "3d68182b83fcba80a276f656f3b39509f5e035d32b9d9de46c8f8c9f974d5d13",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by-nc-sa/4.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2026-04-10T16:45:44Z",
    "title_canon_sha256": "92b491970cb747dc463d80ee2cb8307cf3a866ffd7de9570096323fae0af1e83"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2604.09482",
    "kind": "arxiv",
    "version": 2
  }
}