pith:GY43OMZ3
Towards Generalizable Reasoning: Group Causal Counterfactual Policy Optimization for LLM Reasoning
Treating multiple reasoning paths for one question as counterfactual experiments trains LLMs to favor stable and transferable reasoning patterns over lucky guesses.
arxiv:2602.06475 v2 · 2026-02-06 · cs.LG
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{GY43OMZ3V66BX2ECGX6W2HOQP4}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more
Record completeness
Claims
We propose Group Causal Counterfactual Policy Optimization to explicitly train LLMs to learn generalizable reasoning patterns. It proposes an episodic causal counterfactual reward that jointly captures (i) robustness, encouraging the answer distribution induced by a reasoning step to remain stable under counterfactual perturbations; and (ii) effectiveness, enforcing sufficient variability so that the learned reasoning strategy can transfer across questions.
That multi-candidate reasoning trajectories for a fixed question can be validly interpreted as a family of counterfactual experiments with sufficient theoretical support, and that the resulting robustness and effectiveness reward will produce reasoning patterns that generalize without introducing new failure modes or biases.
Group Causal Counterfactual Policy Optimization trains LLMs on generalizable reasoning by defining episodic rewards for counterfactual robustness and transferability then optimizing the policy with token-level advantages.
References
Receipt and verification
| First computed | 2026-05-18T03:09:23.754563Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
3639b7333bafbc1be88235fd6d1dd07f24fd3d19aaa6381bb3febc6451ec41e9
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/GY43OMZ3V66BX2ECGX6W2HOQP4 \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 3639b7333bafbc1be88235fd6d1dd07f24fd3d19aaa6381bb3febc6451ec41e9
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "f2b0b5545aeada88c5c61f59a3fe22d06cfb6430a1b9d68a2b9b87d21a91f7dd",
"cross_cats_sorted": [],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-02-06T08:03:11Z",
"title_canon_sha256": "6ff29da67c0f21410f70c05d05895ed21596719fc205159b6d48e4dd92eb0618"
},
"schema_version": "1.0",
"source": {
"id": "2602.06475",
"kind": "arxiv",
"version": 2
}
}