pith. sign in
Pith Number

pith:Y7KS2YN2

pith:2025:Y7KS2YN22NQU7VINAEUXTHR2TV
not attested not anchored not stored refs pending

Learning to Reason without External Rewards

Aosong Feng, Dawn Song, Sergey Levine, Xuandong Zhao, Zhewei Kang

Large language models can improve at reasoning by using only their own internal confidence as the reward signal.

arxiv:2505.19590 v4 · 2025-05-26 · cs.LG · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{Y7KS2YN22NQU7VINAEUXTHR2TV}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Experiments demonstrate that Intuitor matches GRPO's performance on mathematical benchmarks while achieving better generalization to out-of-domain tasks like code generation, without requiring gold solutions or test cases.

C2weakest assumption

That the model's self-certainty score reliably indicates correct reasoning and does not encourage reward hacking or overconfidence in incorrect outputs.

C3one line summary

Intuitor enables LLMs to learn complex reasoning from self-certainty signals alone, matching supervised RL performance on math benchmarks while generalizing better to code generation without gold solutions or test cases.

Formal links

1 machine-checked theorem link

Cited by

26 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:50.140295Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

c7d52d61bad3614fd50d0129799e3a9d6ff56e1d00ca57a05aa0c5c065316e4a

Aliases

arxiv: 2505.19590 · arxiv_version: 2505.19590v4 · doi: 10.48550/arxiv.2505.19590 · pith_short_12: Y7KS2YN22NQU · pith_short_16: Y7KS2YN22NQU7VIN · pith_short_8: Y7KS2YN2
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/Y7KS2YN22NQU7VINAEUXTHR2TV \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: c7d52d61bad3614fd50d0129799e3a9d6ff56e1d00ca57a05aa0c5c065316e4a
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "a83a896d73f839a3f15411ac5d5918d65098f6a0030ce7b0c3a81486cd90ea90",
    "cross_cats_sorted": [
      "cs.CL"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2025-05-26T07:01:06Z",
    "title_canon_sha256": "d21b544e2c7a446c1b6b4b60f6885a2ade25ecdf29f07a226c5c29afa33312ff"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2505.19590",
    "kind": "arxiv",
    "version": 4
  }
}