pith. sign in
Pith Number

pith:R2MVQSCX

pith:2026:R2MVQSCXWXV7LP42AYNCDGFLR6
not attested not anchored not stored refs pending

Reinforcement-aware Knowledge Distillation for LLM Reasoning

Dhananjay Ram, Shuli Jiang, Shuo Yang, Stefano Soatto, Wei Xia, Yantao Shen, Yuting Zhang, Zhaoyang Zhang, Zhuowen Tu

RLAD enables better distillation of reasoning LLMs by imitating the teacher selectively during policy updates.

arxiv:2602.22495 v3 · 2026-02-26 · cs.LG · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{R2MVQSCXWXV7LP42AYNCDGFLR6}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Across diverse logic reasoning and math benchmarks, RLAD consistently outperforms offline distillation, standard GRPO, and KL-based on-policy teacher-student knowledge distillation.

C2weakest assumption

That guiding the student toward the teacher only when it improves the current policy update will reliably avoid distribution mismatch and objective interference without introducing new instabilities or requiring additional hyperparameter tuning.

C3one line summary

RLAD replaces standard KL-based distillation with Trust Region Ratio Distillation, a PPO-style likelihood ratio objective that performs advantage-aware imitation on student rollouts and outperforms offline KD, GRPO, and KL on-policy KD on logic and math benchmarks.

Formal links

2 machine-checked theorem links

Cited by

4 papers in Pith

Receipt and verification
First computed 2026-06-19T16:12:18.946766Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

8e99584857b5ebf5bf9a061a2198ab8f92e0fefc8f5199cd514f398f85b9462c

Aliases

arxiv: 2602.22495 · arxiv_version: 2602.22495v3 · doi: 10.48550/arxiv.2602.22495 · pith_short_12: R2MVQSCXWXV7 · pith_short_16: R2MVQSCXWXV7LP42 · pith_short_8: R2MVQSCX
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/R2MVQSCXWXV7LP42AYNCDGFLR6 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 8e99584857b5ebf5bf9a061a2198ab8f92e0fefc8f5199cd514f398f85b9462c
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "9fc3211235a91ef9f4e24e5340f6e9b9d74918b26850234a2a52ae235f7949f6",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-02-26T00:20:39Z",
    "title_canon_sha256": "5c443d568afefd33c13d07cff1913eafe48bc17f95dc977340c2973acbd21553"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2602.22495",
    "kind": "arxiv",
    "version": 3
  }
}