Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

Boyu Zhu; Huiming Wang; Xinyu Zhou; Yingfa Chen; Yi Xu; Zhijiang Guo; Zhiwei Li

arxiv: 2606.11052 · v1 · pith:YBDWLXJZnew · submitted 2026-06-09 · 💻 cs.CL

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

Xinyu Zhou , Boyu Zhu , Yi Xu , Zhiwei Li , Yingfa Chen , Huiming Wang , Zhijiang Guo This is my paper

Pith reviewed 2026-06-27 13:07 UTC · model grok-4.3

classification 💻 cs.CL

keywords chain-of-thought fine-tuninglong-context recallattention amnesiahybrid LLMsquery-key projectionsNeedle-In-A-HaystackQK-Restore

0 comments

The pith

CoT fine-tuning disrupts long-range recall in hybrid LLMs by changing query-key projections, but restoring only those projections recovers the capability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that chain-of-thought supervised fine-tuning, while improving reasoning, systematically degrades long-context recall in hybrid linear-attention models such as HypeNet and Jet-Nemotron. The degradation appears on Needle-In-A-Haystack tests and grows worse with longer contexts and harder retrieval settings, for instance dropping HypeNet-9B performance on NIAH-S2@256K from 67.2% to 9.4%. The authors trace the effect to CoT-SFT biasing attention gradients toward short-range patterns and thereby altering the query and key projections that handle long-range routing. They introduce QK-Restore, a training-free swap that returns only W_Q and W_K to their pre-SFT values while keeping every other post-SFT parameter, and show that this restores recall across models without harming the reasoning gains.

Core claim

CoT-SFT biases attention gradients toward short-range patterns, disrupting query-key projections (W_Q, W_K) that are responsible for long-range routing; restoring only these projections from the pre-SFT checkpoint recovers long-context capability.

What carries the argument

QK-Restore: a training-free method that restores only W_Q and W_K from the pre-SFT checkpoint while preserving all other post-SFT parameters (plus a Procrustes variant that balances routing preservation against reasoning adaptation).

If this is right

Long-context recall on NIAH is restored at zero training cost while reasoning performance is preserved.
The recovery holds across hybrid architectures including HypeNet and Jet-Nemotron.
Gains are largest under longer context windows and harder retrieval settings.
A Procrustes alignment variant offers a tunable trade-off between routing fidelity and adaptation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Attention routing for different distance scales may be carried by partially separable weight components that can be edited independently.
Selective checkpoint restoration could be tested on other fine-tuning regimes that trade one capability for another.
Systematic measurement of which projection matrices affect which context lengths would clarify the scope of the effect.

Load-bearing premise

The observed NIAH degradation is caused specifically by changes to W_Q and W_K during CoT-SFT rather than by other simultaneous changes in the model or training dynamics.

What would settle it

An experiment in which only W_Q and W_K are updated during CoT-SFT yet NIAH performance does not degrade, or in which restoring those matrices fails to recover performance.

read the original abstract

Chain-of-thought (CoT) supervised fine-tuning (SFT) is widely adopted to improve reasoning ability, yet we find that it systematically degrades long-context recall in hybrid linear-attention models. Across architectures including HypeNet and Jet-Nemotron, retrieval performance on Needle-In-A-Haystack (NIAH) deteriorates substantially after CoT-SFT, and the degradation becomes more severe under harder retrieval settings and longer context windows. For example, HypeNet-9B on NIAH-S2@256K decreases from $67.2\%$ to $9.4\%$. We attribute this to CoT-SFT biasing attention gradients toward short-range patterns, disrupting query-key projections ($W_Q, W_K$) that are responsible for long-range routing. Motivated by this observation, we propose QK-Restore, a training-free method that restores only $W_Q$ and $W_K$ from the pre-SFT checkpoint while preserving all other post-SFT parameters. We further introduce a Procrustes variant to balance routing preservation and reasoning adaptation. Across architectures, QK-Restore consistently restores long-context capability at zero training cost while preserving reasoning performance; for instance, on HypeNet-5B it improves S3@256K from $65.4\%$ to $76.4\%$ while maintaining strong reasoning performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CoT-SFT hurts NIAH scores in hybrid models and restoring just W_Q/W_K recovers them, but the causal attribution rests on a single intervention without isolating ablations.

read the letter

The paper reports that supervised fine-tuning on chain-of-thought data reliably drops needle-in-haystack performance in hybrid linear-attention models like HypeNet and Jet-Nemotron, with bigger drops at longer contexts. They trace this to shifts in the query and key projections and show that swapping those two matrices back to the pre-SFT checkpoint restores most of the recall while keeping the reasoning gains. A Procrustes variant is offered to trade off the two goals.

The concrete finding and the training-free fix are the clearest contributions. The before-and-after numbers are consistent across model sizes, and the restore step is simple enough to test. That part of the work is reproducible on its face and directly useful for anyone running CoT SFT on these architectures.

The attribution step is thinner. The claim that CoT gradients specifically bias W_Q and W_K toward short-range patterns is supported only by the restore experiment itself. No ablations restore other components while holding W_Q/W_K fixed, and there are no reported measurements of attention-score distributions or gradient norms on long versus short tokens. Without those controls it remains possible that the recovery correlates with the fix rather than being caused by it. The abstract also gives limited detail on baselines and whether other parameters were checked.

The result is narrow but actionable for groups already working on hybrid long-context models and SFT pipelines. It deserves referee time because the empirical pattern is sharp enough to check, even if the mechanism needs tighter evidence. I would not cite it yet without seeing the full controls.

Referee Report

1 major / 2 minor

Summary. The paper claims that CoT supervised fine-tuning systematically degrades long-context recall (measured via Needle-In-A-Haystack) in hybrid linear-attention models such as HypeNet and Jet-Nemotron by biasing attention gradients toward short-range patterns and thereby altering the query-key projections W_Q and W_K. It supports this via before/after performance drops that worsen with harder retrieval settings and longer contexts, then introduces the training-free QK-Restore intervention (and a Procrustes variant) that swaps only the pre-SFT W_Q/W_K matrices back into the post-SFT model, reporting recovery of NIAH scores (e.g., HypeNet-5B S3@256K from 65.4% to 76.4%) while retaining reasoning gains.

Significance. If the attribution and intervention hold, the result identifies a previously under-appreciated side-effect of standard CoT-SFT on long-range routing in hybrid architectures and supplies a zero-cost, parameter-swap fix that preserves downstream reasoning. The reported consistency of degradation and recovery across multiple models, context lengths, and difficulty settings strengthens the practical relevance for fine-tuning pipelines that must balance reasoning and long-context capability.

major comments (1)

[attribution paragraph / abstract] Abstract and attribution paragraph: the central causal claim that degradation arises specifically from CoT-SFT-induced changes to W_Q and W_K (rather than simultaneous changes to W_V, feed-forward layers, or optimizer state) rests on the QK-Restore swap experiment. No ablation is described that restores alternative components while holding W_Q/W_K fixed, nor are direct measurements of attention-score distributions or gradient norms on long versus short tokens reported pre- and post-SFT; without these controls the restoration could be correlational rather than mechanistic.

minor comments (2)

Results sections would benefit from explicit enumeration of all baselines, whether other hyperparameters were ablated or held constant during SFT, and the precise definition of the NIAH-S2/S3 variants and context lengths used.
The Procrustes variant is introduced but its exact formulation, hyper-parameters, and comparison to plain QK-Restore are not detailed enough to allow reproduction from the text alone.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on strengthening the causal attribution. We address the concern point-by-point below and will revise the manuscript accordingly.

read point-by-point responses

Referee: Abstract and attribution paragraph: the central causal claim that degradation arises specifically from CoT-SFT-induced changes to W_Q and W_K (rather than simultaneous changes to W_V, feed-forward layers, or optimizer state) rests on the QK-Restore swap experiment. No ablation is described that restores alternative components while holding W_Q/W_K fixed, nor are direct measurements of attention-score distributions or gradient norms on long versus short tokens reported pre- and post-SFT; without these controls the restoration could be correlational rather than mechanistic.

Authors: We agree the current evidence is primarily correlational and that targeted controls are needed to isolate the role of W_Q/W_K. In the revised version we will add: (i) ablations that restore only W_V or feed-forward layers (holding post-SFT W_Q/W_K fixed) and show these yield no NIAH recovery, unlike QK-Restore; (ii) pre-/post-SFT comparisons of attention-score distributions and gradient norms separated by long-range vs. short-range tokens, confirming the post-SFT bias toward short-range patterns. These experiments directly address the mechanistic gap while preserving the training-free nature of the intervention. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical before/after and intervention results stand on direct measurement.

full rationale

The paper reports observed NIAH degradation after CoT-SFT, attributes it observationally to changes in W_Q/W_K, and demonstrates recovery via a parameter-swap intervention that restores only those matrices. No equations, fitted parameters, or self-citations reduce any claimed result to a quantity defined by the claim itself. The derivation chain consists of experimental comparisons and a training-free fix; it is self-contained against external benchmarks and does not invoke uniqueness theorems, ansatzes, or renamings that collapse into inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Empirical study; relies on standard assumptions of transformer training and evaluation without new mathematical derivations or postulated entities.

axioms (1)

domain assumption Standard assumptions in transformer training and evaluation hold, including that NIAH measures long-range recall.
The paper uses NIAH as the primary metric without additional justification in the abstract.

pith-pipeline@v0.9.1-grok · 5808 in / 1139 out tokens · 24255 ms · 2026-06-27T13:07:31.325395+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

122 extracted references · 32 canonical work pages · 13 internal anchors

[1]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning , journal =. 2025 , url =. doi:10.48550/ARXIV.2501.12948 , eprinttype =. 2501.12948 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2501.12948 2025
[2]

OpenCodeReasoning: Advancing Data Distillation for Competitive Coding

Wasi Uddin Ahmad and Sean Narenthiran and Somshubra Majumdar and Aleksander Ficek and Siddhartha Jain and Jocelyn Huang and Vahid Noroozi and Boris Ginsburg , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2504.01943 , eprinttype =. 2504.01943 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2504.01943 2025
[3]

Open R1: A fully open reproduction of DeepSeek-R1 , url =
[4]

Phi-4-reasoning Technical Report

Marah I Abdin and Sahaj Agarwal and Ahmed Awadallah and Vidhisha Balachandran and Harkirat S. Behl and Lingjiao Chen and Gustavo de Rosa and Suriya Gunasekar and Mojan Javaheripi and Neel Joshi and Piero Kauffmann and Yash Lara and Caio C. Phi-4-reasoning Technical Report , journal =. 2025 , url =. doi:10.48550/ARXIV.2504.21318 , eprinttype =. 2504.21318 ...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2504.21318 2025
[5]

DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level , author=
[6]

CoRR , volume =

Hanxu Hu and Xingxing Zhang and Jannis Vamvas and Rico Sennrich and Furu Wei , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2510.17715 , eprinttype =. 2510.17715 , timestamp =

work page doi:10.48550/arxiv.2510.17715 2025
[7]

OpenThoughts: Data Recipes for Reasoning Models

Etash Kumar Guha and Ryan Marten and Sedrick Keh and Negin Raoof and Georgios Smyrnis and Hritik Bansal and Marianna Nezhurina and Jean Mercat and Trung Vu and Zayne Sprague and Ashima Suvarna and Benjamin Feuer and Liangyu Chen and Zaid Khan and Eric Frankel and Sachin Grover and Caroline Choi and Niklas Muennighoff and Shiye Su and Wanjia Zhao and John ...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2506.04178 2025
[8]

CoRR , volume =

Xianzhen Luo and Jinyang Huang and Wenzhen Zheng and Qingfu Zhu and Mingzheng Xu and Yiheng Xu and YuanTao Fan and Libo Qin and Wanxiang Che , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2510.08720 , eprinttype =. 2510.08720 , timestamp =

work page doi:10.48550/arxiv.2510.08720 2025
[9]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),

Huaye Zeng and Dongfu Jiang and Haozhe Wang and Ping Nie and Xiaotong Chen and Wenhu Chen , editor =. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),. 2025 , url =

2025
[10]

Measuring Coding Challenge Competence With

Dan Hendrycks and Steven Basart and Saurav Kadavath and Mantas Mazeika and Akul Arora and Ethan Guo and Collin Burns and Samir Puranik and Horace He and Dawn Song and Jacob Steinhardt , editor =. Measuring Coding Challenge Competence With. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Ben...

2021
[11]

NeurIPS , year=

Measuring Coding Challenge Competence With APPS , author=. NeurIPS , year=
[12]

Evaluating Large Language Models Trained on Code , journal =

Mark Chen and Jerry Tworek and Heewoo Jun and Qiming Yuan and Henrique Pond. Evaluating Large Language Models Trained on Code , journal =. 2021 , url =. 2107.03374 , timestamp =

Pith/arXiv arXiv 2021
[13]

Reddy , title =

Parshin Shojaee and Aneesh Jain and Sindhu Tipirneni and Chandan K. Reddy , title =. Trans. Mach. Learn. Res. , volume =. 2023 , url =

2023
[14]

Forty-second International Conference on Machine Learning,

Jonas Gehring and Kunhao Zheng and Jade Copet and Vegard Mella and Taco Cohen and Gabriel Synnaeve , title =. Forty-second International Conference on Machine Learning,. 2025 , url =

2025
[15]

CoRR , volume =

Shihan Dou and Yan Liu and Haoxiang Jia and Limao Xiong and Enyu Zhou and Wei Shen and Junjie Shan and Caishuang Huang and Xiao Wang and Xiaoran Fan and Zhiheng Xi and Yuhao Zhou and Tao Ji and Rui Zheng and Qi Zhang and Xuanjing Huang and Tao Gui , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2402.01391 , eprinttype =. 2402.01391 , timestamp =

work page doi:10.48550/arxiv.2402.01391 2024
[16]

CoRR , volume =

Huimu Yu and Xing Wu and Weidong Yin and Debing Zhang and Songlin Hu , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2410.02229 , eprinttype =. 2410.02229 , timestamp =

work page doi:10.48550/arxiv.2410.02229 2024
[17]

Joar Skalse and Nikolaus H. R. Howe and Dmitrii Krasheninnikov and David Krueger , editor =. Defining and Characterizing Reward Gaming , booktitle =. 2022 , url =

2022
[18]

CoRR , volume =

Jiayi Fu and Xuandong Zhao and Chengyuan Yao and Heng Wang and Qi Han and Yanghua Xiao , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2502.18770 , eprinttype =. 2502.18770 , timestamp =

work page doi:10.48550/arxiv.2502.18770 2025
[19]

Christiano and John Schulman and Dan Man

Dario Amodei and Chris Olah and Jacob Steinhardt and Paul F. Christiano and John Schulman and Dan Man. Concrete Problems in. CoRR , volume =. 2016 , url =. 1606.06565 , timestamp =

Pith/arXiv arXiv 2016
[20]

Reinforcement Learning with a Corrupted Reward Channel , booktitle =

Tom Everitt and Victoria Krakovna and Laurent Orseau and Shane Legg , editor =. Reinforcement Learning with a Corrupted Reward Channel , booktitle =. 2017 , url =. doi:10.24963/IJCAI.2017/656 , timestamp =

work page doi:10.24963/ijcai.2017/656 2017
[21]

the method of paired comparisons , author=

Rank analysis of incomplete block designs: I. the method of paired comparisons , author=. Biometrika , volume=. 1952 , publisher=

1952
[22]

Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization , journal =

Mingzhe Du and Luu Tuan Tuan and Yue Liu and Yuhao Qing and Dong Huang and Xinyi He and Qian Liu and Zejun Ma and See. Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization , journal =. 2025 , url =. doi:10.48550/ARXIV.2505.23387 , eprinttype =. 2505.23387 , timestamp =

work page doi:10.48550/arxiv.2505.23387 2025
[23]

Hugging Face repository , howpublished =

CodeForces , author=. Hugging Face repository , howpublished =. 2025 , publisher =

2025
[24]

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Qiying Yu and Zheng Zhang and Ruofei Zhu and Yufeng Yuan and Xiaochen Zuo and Yu Yue and Tiantian Fan and Gaohong Liu and Lingjun Liu and Xin Liu and Haibin Lin and Zhiqi Lin and Bole Ma and Guangming Sheng and Yuxuan Tong and Chi Zhang and Mofan Zhang and Wang Zhang and Hang Zhu and Jinhua Zhu and Jiaze Chen and Jiangjie Chen and Chengyi Wang and Hongli ...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2503.14476 2025
[25]

2025 , isbn =

Guangming Sheng and Chi Zhang and Zilingfeng Ye and Xibin Wu and Wang Zhang and Ru Zhang and Yanghua Peng and Haibin Lin and Chuan Wu , title =. Proceedings of the Twentieth European Conference on Computer Systems, EuroSys 2025, Rotterdam, The Netherlands, 30 March 2025 - 3 April 2025 , pages =. 2025 , url =. doi:10.1145/3689031.3696075 , timestamp =

work page doi:10.1145/3689031.3696075 2025
[26]

LiveBench:

Colin White and Samuel Dooley and Manley Roberts and Arka Pal and Benjamin Feuer and Siddhartha Jain and Ravid Shwartz. LiveBench:. The Thirteenth International Conference on Learning Representations,. 2025 , url =

2025
[27]

Nye and Maarten Bosma and Henryk Michalewski and David Dohan and Ellen Jiang and Carrie J

Jacob Austin and Augustus Odena and Maxwell I. Nye and Maarten Bosma and Henryk Michalewski and David Dohan and Ellen Jiang and Carrie J. Cai and Michael Terry and Quoc V. Le and Charles Sutton , title =. CoRR , volume =. 2021 , url =. 2108.07732 , timestamp =

Pith/arXiv arXiv 2021
[28]

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code , booktitle =

Naman Jain and King Han and Alex Gu and Wen. LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code , booktitle =. 2025 , url =

2025
[29]

Science , volume=

Competition-level code generation with alphacode , author=. Science , volume=. 2022 , publisher=

2022
[30]

CoRR , volume =

Yinjie Wang and Ling Yang and Ye Tian and Ke Shen and Mengdi Wang , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2506.03136 , eprinttype =. 2506.03136 , timestamp =

work page doi:10.48550/arxiv.2506.03136 2025
[31]

2022 , eprint=

Emergent Abilities of Large Language Models , author=. 2022 , eprint=

2022
[32]

2023 , eprint=

AceCoder: Utilizing Existing Code to Enhance Code Generation , author=. 2023 , eprint=

2023
[33]

Evaluating In-Context Learning of Libraries for Code Generation , booktitle =

Arkil Patel and Siva Reddy and Dzmitry Bahdanau and Pradeep Dasigi , editor =. Evaluating In-Context Learning of Libraries for Code Generation , booktitle =. 2024 , url =. doi:10.18653/V1/2024.NAACL-LONG.161 , timestamp =

work page doi:10.18653/v1/2024.naacl-long.161 2024
[34]

2026 , eprint=

X-Coder: Advancing Competitive Programming with Fully Synthetic Tasks, Solutions, and Tests , author=. 2026 , eprint=

2026
[35]

CoRR , volume =

Codefuse and Wenting Cai and Yuchen Cao and Chaoyu Chen and Chen Chen and Siba Chen and Qing Cui and Peng Di and Junpeng Fang and Zi Gong and Ting Guo and Zhengyu He and Yang Huang and Cong Li and Jianguo Li and Zheng Li and Shijie Lian and Bingchang Liu and Songshan Luo and Shuo Mao and Min Shen and Jian Wu and Jiaolong Yang and Wenjie Yang and Tong Ye a...

work page doi:10.48550/arxiv.2503.17793 2025
[36]

CoRR , volume =

Yifei Liu and Li Lyna Zhang and Yi Zhu and Bingcheng Dong and Xudong Zhou and Ning Shang and Fan Yang and Mao Yang , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2505.21297 , eprinttype =. 2505.21297 , timestamp =

work page doi:10.48550/arxiv.2505.21297 2025
[37]

CodeRL+: Improving Code Generation via Reinforcement with Execution Semantics Alignment

Xue Jiang and Yihong Dong and Mengyang Liu and Hongyi Deng and Tian Wang and Yongding Tao and Rongyu Cao and Binhua Li and Zhi Jin and Wenpin Jiao and Fei Huang and Yongbin Li and Ge Li , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2510.18471 , eprinttype =. 2510.18471 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2510.18471 2025
[38]

CoRR , volume =

Rongao Li and Jie Fu and Bo. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2312.14852 , eprinttype =. 2312.14852 , timestamp =

work page doi:10.48550/arxiv.2312.14852 2023
[39]

2025 , url=

SYNTHETIC-1: Two Million Collaboratively Generated Reasoning Traces from Deepseek-R1 , author=. 2025 , url=

2025
[40]

Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy

Chris Yuhao Liu and Liang Zeng and Yuzhen Xiao and Jujie He and Jiacai Liu and Chaojie Wang and Rui Yan and Wei Shen and Fuxiang Zhang and Jiacheng Xu and Yang Liu and Yahui Zhou , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2507.01352 , eprinttype =. 2507.01352 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2507.01352 2025
[41]

Phi-4 Technical Report

Marah I Abdin and Jyoti Aneja and Harkirat S. Behl and S. Phi-4 Technical Report , journal =. 2024 , url =. doi:10.48550/ARXIV.2412.08905 , eprinttype =. 2412.08905 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2412.08905 2024
[42]

Gemma 2: Improving Open Language Models at a Practical Size

Morgane Rivi. Gemma 2: Improving Open Language Models at a Practical Size , journal =. 2024 , url =. doi:10.48550/ARXIV.2408.00118 , eprinttype =. 2408.00118 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2408.00118 2024
[43]

KodCode: A diverse, challenging, and verifiable synthetic dataset for coding

Zhangchen Xu and Yang Liu and Yueqin Yin and Mingyuan Zhou and Radha Poovendran , editor =. KodCode:. Findings of the Association for Computational Linguistics,. 2025 , url =. doi:10.18653/V1/2025.FINDINGS-ACL.365 , timestamp =

work page doi:10.18653/v1/2025.findings-acl.365 2025
[44]

LIMO: Less is More for Reasoning

Yixin Ye and Zhen Huang and Yang Xiao and Ethan Chern and Shijie Xia and Pengfei Liu , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2502.03387 , eprinttype =. 2502.03387 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2502.03387 2025
[45]

Forty-first International Conference on Machine Learning,

Zhengyang Tang and Xingxing Zhang and Benyou Wang and Furu Wei , title =. Forty-first International Conference on Machine Learning,. 2024 , url =

2024
[46]

AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation

Dong Huang and Qingwen Bu and Jie M. Zhang and Michael Luck and Heming Cui , title =. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2312.13010 , eprinttype =. 2312.13010 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2312.13010 2023
[47]

2025 , eprint=

Qwen3 Technical Report , author=. 2025 , eprint=

2025
[48]

5-Coder Technical Report , author=

Qwen2. 5-Coder Technical Report , author=. arXiv preprint arXiv:2409.12186 , year=

Pith/arXiv arXiv
[49]

Jimenez and Alexander Wettig and Kilian Lieret and Shunyu Yao and Karthik Narasimhan and Ofir Press , editor =

John Yang and Carlos E. Jimenez and Alexander Wettig and Kilian Lieret and Shunyu Yao and Karthik Narasimhan and Ofir Press , editor =. SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering , booktitle =. 2024 , url =

2024
[50]

CodePRM: Execution Feedback-enhanced Process Reward Model for Code Generation , booktitle =

Qingyao Li and Xinyi Dai and Xiangyang Li and Weinan Zhang and Yasheng Wang and Ruiming Tang and Yong Yu , editor =. CodePRM: Execution Feedback-enhanced Process Reward Model for Code Generation , booktitle =. 2025 , url =

2025
[51]

Yue Wang and Hung Le and Akhilesh Deepak Gotmare and Nghi D. Q. Bui and Junnan Li and Steven C. H. Hoi , title =. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2305.07922 , eprinttype =. 2305.07922 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2305.07922 2023
[52]

Joty, and Steven C

Yue Wang and Weishi Wang and Shafiq R. Joty and Steven C. H. Hoi , editor =. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation , booktitle =. 2021 , url =. doi:10.18653/V1/2021.EMNLP-MAIN.685 , timestamp =

work page doi:10.18653/v1/2021.emnlp-main.685 2021
[53]

2024 , eprint=

StarCoder 2 and The Stack v2: The Next Generation , author=. 2024 , eprint=

2024
[54]

2024 , eprint=

Code Llama: Open Foundation Models for Code , author=. 2024 , eprint=

2024
[55]

2024 , eprint=

DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence , author=. 2024 , eprint=

2024
[56]

2023 , eprint=

Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks , author=. 2023 , eprint=

2023
[57]

2020 , eprint=

CodeSearchNet Challenge: Evaluating the State of Semantic Code Search , author=. 2020 , eprint=

2020
[58]

2025 , eprint=

A Large-scale Class-level Benchmark Dataset for Code Generation with LLMs , author=. 2025 , eprint=

2025
[59]

2025 , eprint=

OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs , author=. 2025 , eprint=

2025
[60]

2025 , eprint=

UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance , author=. 2025 , eprint=

2025
[61]

2022 , eprint=

Training language models to follow instructions with human feedback , author=. 2022 , eprint=

2022
[62]

2017 , eprint=

Proximal Policy Optimization Algorithms , author=. 2017 , eprint=

2017
[63]

2025 , eprint=

Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback , author=. 2025 , eprint=

2025
[64]

2025 , eprint=

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution , author=. 2025 , eprint=

2025
[65]

arXiv preprint arXiv:2502.14382 , year=

S*: Test time scaling for code generation , author=. arXiv preprint arXiv:2502.14382 , year=

arXiv
[66]

CodeT: Code Generation with Generated Tests , author=
[67]

Jackson Petty and Sjoerd van Steenkiste and Tal Linzen , title =. Trans. Mach. Learn. Res. , volume =. 2025 , url =

2025
[68]

arXiv preprint arXiv:2309.16298 , year=

At which training stage does code data help llms reasoning? , author=. arXiv preprint arXiv:2309.16298 , year=

arXiv
[69]

arXiv preprint arXiv:2507.17512 , year=

Can one domain help others? a data-centric study on multi-domain reasoning via reinforcement learning , author=. arXiv preprint arXiv:2507.17512 , year=

arXiv
[70]

The Thirteenth International Conference on Learning Representations,

Yantao Liu and Zijun Yao and Rui Min and Yixin Cao and Lei Hou and Juanzi Li , title =. The Thirteenth International Conference on Learning Representations,. 2025 , url =

2025
[71]

Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

Maggie Huan and Yuetai Li and Tuney Zheng and Xiaoyu Xu and Seungone Kim and Minxin Du and Radha Poovendran and Graham Neubig and Xiang Yue , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2507.00432 , eprinttype =. 2507.00432 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2507.00432 2025
[72]

Aho and Jeffrey D

Alfred V. Aho and Jeffrey D. Ullman , title =. 1972

1972
[73]

Publications Manual , year = "1983", publisher =

1983
[74]

Chandra and Dexter C

Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

work page doi:10.1145/322234.322243 1981
[75]

Scalable training of

Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of
[76]

Dan Gusfield , title =. 1997

1997
[77]

Tetreault , title =

Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

2015
[78]

A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =

Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =
[79]

2026 , eprint=

Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts , author=. 2026 , eprint=

2026
[80]

arXiv preprint arXiv:2506.02678 , year=

TL; DR: Too Long, Do Re-weighting for Efficient LLM Reasoning Compression , author=. arXiv preprint arXiv:2506.02678 , year=

arXiv

Showing first 80 references.

[1] [1]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning , journal =. 2025 , url =. doi:10.48550/ARXIV.2501.12948 , eprinttype =. 2501.12948 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2501.12948 2025

[2] [2]

OpenCodeReasoning: Advancing Data Distillation for Competitive Coding

Wasi Uddin Ahmad and Sean Narenthiran and Somshubra Majumdar and Aleksander Ficek and Siddhartha Jain and Jocelyn Huang and Vahid Noroozi and Boris Ginsburg , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2504.01943 , eprinttype =. 2504.01943 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2504.01943 2025

[3] [3]

Open R1: A fully open reproduction of DeepSeek-R1 , url =

[4] [4]

Phi-4-reasoning Technical Report

Marah I Abdin and Sahaj Agarwal and Ahmed Awadallah and Vidhisha Balachandran and Harkirat S. Behl and Lingjiao Chen and Gustavo de Rosa and Suriya Gunasekar and Mojan Javaheripi and Neel Joshi and Piero Kauffmann and Yash Lara and Caio C. Phi-4-reasoning Technical Report , journal =. 2025 , url =. doi:10.48550/ARXIV.2504.21318 , eprinttype =. 2504.21318 ...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2504.21318 2025

[5] [5]

DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level , author=

[6] [6]

CoRR , volume =

Hanxu Hu and Xingxing Zhang and Jannis Vamvas and Rico Sennrich and Furu Wei , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2510.17715 , eprinttype =. 2510.17715 , timestamp =

work page doi:10.48550/arxiv.2510.17715 2025

[7] [7]

OpenThoughts: Data Recipes for Reasoning Models

Etash Kumar Guha and Ryan Marten and Sedrick Keh and Negin Raoof and Georgios Smyrnis and Hritik Bansal and Marianna Nezhurina and Jean Mercat and Trung Vu and Zayne Sprague and Ashima Suvarna and Benjamin Feuer and Liangyu Chen and Zaid Khan and Eric Frankel and Sachin Grover and Caroline Choi and Niklas Muennighoff and Shiye Su and Wanjia Zhao and John ...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2506.04178 2025

[8] [8]

CoRR , volume =

Xianzhen Luo and Jinyang Huang and Wenzhen Zheng and Qingfu Zhu and Mingzheng Xu and Yiheng Xu and YuanTao Fan and Libo Qin and Wanxiang Che , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2510.08720 , eprinttype =. 2510.08720 , timestamp =

work page doi:10.48550/arxiv.2510.08720 2025

[9] [9]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),

Huaye Zeng and Dongfu Jiang and Haozhe Wang and Ping Nie and Xiaotong Chen and Wenhu Chen , editor =. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),. 2025 , url =

2025

[10] [10]

Measuring Coding Challenge Competence With

Dan Hendrycks and Steven Basart and Saurav Kadavath and Mantas Mazeika and Akul Arora and Ethan Guo and Collin Burns and Samir Puranik and Horace He and Dawn Song and Jacob Steinhardt , editor =. Measuring Coding Challenge Competence With. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Ben...

2021

[11] [11]

NeurIPS , year=

Measuring Coding Challenge Competence With APPS , author=. NeurIPS , year=

[12] [12]

Evaluating Large Language Models Trained on Code , journal =

Mark Chen and Jerry Tworek and Heewoo Jun and Qiming Yuan and Henrique Pond. Evaluating Large Language Models Trained on Code , journal =. 2021 , url =. 2107.03374 , timestamp =

Pith/arXiv arXiv 2021

[13] [13]

Reddy , title =

Parshin Shojaee and Aneesh Jain and Sindhu Tipirneni and Chandan K. Reddy , title =. Trans. Mach. Learn. Res. , volume =. 2023 , url =

2023

[14] [14]

Forty-second International Conference on Machine Learning,

Jonas Gehring and Kunhao Zheng and Jade Copet and Vegard Mella and Taco Cohen and Gabriel Synnaeve , title =. Forty-second International Conference on Machine Learning,. 2025 , url =

2025

[15] [15]

CoRR , volume =

Shihan Dou and Yan Liu and Haoxiang Jia and Limao Xiong and Enyu Zhou and Wei Shen and Junjie Shan and Caishuang Huang and Xiao Wang and Xiaoran Fan and Zhiheng Xi and Yuhao Zhou and Tao Ji and Rui Zheng and Qi Zhang and Xuanjing Huang and Tao Gui , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2402.01391 , eprinttype =. 2402.01391 , timestamp =

work page doi:10.48550/arxiv.2402.01391 2024

[16] [16]

CoRR , volume =

Huimu Yu and Xing Wu and Weidong Yin and Debing Zhang and Songlin Hu , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2410.02229 , eprinttype =. 2410.02229 , timestamp =

work page doi:10.48550/arxiv.2410.02229 2024

[17] [17]

Joar Skalse and Nikolaus H. R. Howe and Dmitrii Krasheninnikov and David Krueger , editor =. Defining and Characterizing Reward Gaming , booktitle =. 2022 , url =

2022

[18] [18]

CoRR , volume =

Jiayi Fu and Xuandong Zhao and Chengyuan Yao and Heng Wang and Qi Han and Yanghua Xiao , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2502.18770 , eprinttype =. 2502.18770 , timestamp =

work page doi:10.48550/arxiv.2502.18770 2025

[19] [19]

Christiano and John Schulman and Dan Man

Dario Amodei and Chris Olah and Jacob Steinhardt and Paul F. Christiano and John Schulman and Dan Man. Concrete Problems in. CoRR , volume =. 2016 , url =. 1606.06565 , timestamp =

Pith/arXiv arXiv 2016

[20] [20]

Reinforcement Learning with a Corrupted Reward Channel , booktitle =

Tom Everitt and Victoria Krakovna and Laurent Orseau and Shane Legg , editor =. Reinforcement Learning with a Corrupted Reward Channel , booktitle =. 2017 , url =. doi:10.24963/IJCAI.2017/656 , timestamp =

work page doi:10.24963/ijcai.2017/656 2017

[21] [21]

the method of paired comparisons , author=

Rank analysis of incomplete block designs: I. the method of paired comparisons , author=. Biometrika , volume=. 1952 , publisher=

1952

[22] [22]

Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization , journal =

Mingzhe Du and Luu Tuan Tuan and Yue Liu and Yuhao Qing and Dong Huang and Xinyi He and Qian Liu and Zejun Ma and See. Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization , journal =. 2025 , url =. doi:10.48550/ARXIV.2505.23387 , eprinttype =. 2505.23387 , timestamp =

work page doi:10.48550/arxiv.2505.23387 2025

[23] [23]

Hugging Face repository , howpublished =

CodeForces , author=. Hugging Face repository , howpublished =. 2025 , publisher =

2025

[24] [24]

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Qiying Yu and Zheng Zhang and Ruofei Zhu and Yufeng Yuan and Xiaochen Zuo and Yu Yue and Tiantian Fan and Gaohong Liu and Lingjun Liu and Xin Liu and Haibin Lin and Zhiqi Lin and Bole Ma and Guangming Sheng and Yuxuan Tong and Chi Zhang and Mofan Zhang and Wang Zhang and Hang Zhu and Jinhua Zhu and Jiaze Chen and Jiangjie Chen and Chengyi Wang and Hongli ...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2503.14476 2025

[25] [25]

2025 , isbn =

Guangming Sheng and Chi Zhang and Zilingfeng Ye and Xibin Wu and Wang Zhang and Ru Zhang and Yanghua Peng and Haibin Lin and Chuan Wu , title =. Proceedings of the Twentieth European Conference on Computer Systems, EuroSys 2025, Rotterdam, The Netherlands, 30 March 2025 - 3 April 2025 , pages =. 2025 , url =. doi:10.1145/3689031.3696075 , timestamp =

work page doi:10.1145/3689031.3696075 2025

[26] [26]

LiveBench:

Colin White and Samuel Dooley and Manley Roberts and Arka Pal and Benjamin Feuer and Siddhartha Jain and Ravid Shwartz. LiveBench:. The Thirteenth International Conference on Learning Representations,. 2025 , url =

2025

[27] [27]

Nye and Maarten Bosma and Henryk Michalewski and David Dohan and Ellen Jiang and Carrie J

Jacob Austin and Augustus Odena and Maxwell I. Nye and Maarten Bosma and Henryk Michalewski and David Dohan and Ellen Jiang and Carrie J. Cai and Michael Terry and Quoc V. Le and Charles Sutton , title =. CoRR , volume =. 2021 , url =. 2108.07732 , timestamp =

Pith/arXiv arXiv 2021

[28] [28]

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code , booktitle =

Naman Jain and King Han and Alex Gu and Wen. LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code , booktitle =. 2025 , url =

2025

[29] [29]

Science , volume=

Competition-level code generation with alphacode , author=. Science , volume=. 2022 , publisher=

2022

[30] [30]

CoRR , volume =

Yinjie Wang and Ling Yang and Ye Tian and Ke Shen and Mengdi Wang , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2506.03136 , eprinttype =. 2506.03136 , timestamp =

work page doi:10.48550/arxiv.2506.03136 2025

[31] [31]

2022 , eprint=

Emergent Abilities of Large Language Models , author=. 2022 , eprint=

2022

[32] [32]

2023 , eprint=

AceCoder: Utilizing Existing Code to Enhance Code Generation , author=. 2023 , eprint=

2023

[33] [33]

Evaluating In-Context Learning of Libraries for Code Generation , booktitle =

Arkil Patel and Siva Reddy and Dzmitry Bahdanau and Pradeep Dasigi , editor =. Evaluating In-Context Learning of Libraries for Code Generation , booktitle =. 2024 , url =. doi:10.18653/V1/2024.NAACL-LONG.161 , timestamp =

work page doi:10.18653/v1/2024.naacl-long.161 2024

[34] [34]

2026 , eprint=

X-Coder: Advancing Competitive Programming with Fully Synthetic Tasks, Solutions, and Tests , author=. 2026 , eprint=

2026

[35] [35]

CoRR , volume =

Codefuse and Wenting Cai and Yuchen Cao and Chaoyu Chen and Chen Chen and Siba Chen and Qing Cui and Peng Di and Junpeng Fang and Zi Gong and Ting Guo and Zhengyu He and Yang Huang and Cong Li and Jianguo Li and Zheng Li and Shijie Lian and Bingchang Liu and Songshan Luo and Shuo Mao and Min Shen and Jian Wu and Jiaolong Yang and Wenjie Yang and Tong Ye a...

work page doi:10.48550/arxiv.2503.17793 2025

[36] [36]

CoRR , volume =

Yifei Liu and Li Lyna Zhang and Yi Zhu and Bingcheng Dong and Xudong Zhou and Ning Shang and Fan Yang and Mao Yang , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2505.21297 , eprinttype =. 2505.21297 , timestamp =

work page doi:10.48550/arxiv.2505.21297 2025

[37] [37]

CodeRL+: Improving Code Generation via Reinforcement with Execution Semantics Alignment

Xue Jiang and Yihong Dong and Mengyang Liu and Hongyi Deng and Tian Wang and Yongding Tao and Rongyu Cao and Binhua Li and Zhi Jin and Wenpin Jiao and Fei Huang and Yongbin Li and Ge Li , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2510.18471 , eprinttype =. 2510.18471 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2510.18471 2025

[38] [38]

CoRR , volume =

Rongao Li and Jie Fu and Bo. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2312.14852 , eprinttype =. 2312.14852 , timestamp =

work page doi:10.48550/arxiv.2312.14852 2023

[39] [39]

2025 , url=

SYNTHETIC-1: Two Million Collaboratively Generated Reasoning Traces from Deepseek-R1 , author=. 2025 , url=

2025

[40] [40]

Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy

Chris Yuhao Liu and Liang Zeng and Yuzhen Xiao and Jujie He and Jiacai Liu and Chaojie Wang and Rui Yan and Wei Shen and Fuxiang Zhang and Jiacheng Xu and Yang Liu and Yahui Zhou , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2507.01352 , eprinttype =. 2507.01352 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2507.01352 2025

[41] [41]

Phi-4 Technical Report

Marah I Abdin and Jyoti Aneja and Harkirat S. Behl and S. Phi-4 Technical Report , journal =. 2024 , url =. doi:10.48550/ARXIV.2412.08905 , eprinttype =. 2412.08905 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2412.08905 2024

[42] [42]

Gemma 2: Improving Open Language Models at a Practical Size

Morgane Rivi. Gemma 2: Improving Open Language Models at a Practical Size , journal =. 2024 , url =. doi:10.48550/ARXIV.2408.00118 , eprinttype =. 2408.00118 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2408.00118 2024

[43] [43]

KodCode: A diverse, challenging, and verifiable synthetic dataset for coding

Zhangchen Xu and Yang Liu and Yueqin Yin and Mingyuan Zhou and Radha Poovendran , editor =. KodCode:. Findings of the Association for Computational Linguistics,. 2025 , url =. doi:10.18653/V1/2025.FINDINGS-ACL.365 , timestamp =

work page doi:10.18653/v1/2025.findings-acl.365 2025

[44] [44]

LIMO: Less is More for Reasoning

Yixin Ye and Zhen Huang and Yang Xiao and Ethan Chern and Shijie Xia and Pengfei Liu , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2502.03387 , eprinttype =. 2502.03387 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2502.03387 2025

[45] [45]

Forty-first International Conference on Machine Learning,

Zhengyang Tang and Xingxing Zhang and Benyou Wang and Furu Wei , title =. Forty-first International Conference on Machine Learning,. 2024 , url =

2024

[46] [46]

AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation

Dong Huang and Qingwen Bu and Jie M. Zhang and Michael Luck and Heming Cui , title =. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2312.13010 , eprinttype =. 2312.13010 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2312.13010 2023

[47] [47]

2025 , eprint=

Qwen3 Technical Report , author=. 2025 , eprint=

2025

[48] [48]

5-Coder Technical Report , author=

Qwen2. 5-Coder Technical Report , author=. arXiv preprint arXiv:2409.12186 , year=

Pith/arXiv arXiv

[49] [49]

Jimenez and Alexander Wettig and Kilian Lieret and Shunyu Yao and Karthik Narasimhan and Ofir Press , editor =

John Yang and Carlos E. Jimenez and Alexander Wettig and Kilian Lieret and Shunyu Yao and Karthik Narasimhan and Ofir Press , editor =. SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering , booktitle =. 2024 , url =

2024

[50] [50]

CodePRM: Execution Feedback-enhanced Process Reward Model for Code Generation , booktitle =

Qingyao Li and Xinyi Dai and Xiangyang Li and Weinan Zhang and Yasheng Wang and Ruiming Tang and Yong Yu , editor =. CodePRM: Execution Feedback-enhanced Process Reward Model for Code Generation , booktitle =. 2025 , url =

2025

[51] [51]

Yue Wang and Hung Le and Akhilesh Deepak Gotmare and Nghi D. Q. Bui and Junnan Li and Steven C. H. Hoi , title =. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2305.07922 , eprinttype =. 2305.07922 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2305.07922 2023

[52] [52]

Joty, and Steven C

Yue Wang and Weishi Wang and Shafiq R. Joty and Steven C. H. Hoi , editor =. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation , booktitle =. 2021 , url =. doi:10.18653/V1/2021.EMNLP-MAIN.685 , timestamp =

work page doi:10.18653/v1/2021.emnlp-main.685 2021

[53] [53]

2024 , eprint=

StarCoder 2 and The Stack v2: The Next Generation , author=. 2024 , eprint=

2024

[54] [54]

2024 , eprint=

Code Llama: Open Foundation Models for Code , author=. 2024 , eprint=

2024

[55] [55]

2024 , eprint=

DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence , author=. 2024 , eprint=

2024

[56] [56]

2023 , eprint=

Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks , author=. 2023 , eprint=

2023

[57] [57]

2020 , eprint=

CodeSearchNet Challenge: Evaluating the State of Semantic Code Search , author=. 2020 , eprint=

2020

[58] [58]

2025 , eprint=

A Large-scale Class-level Benchmark Dataset for Code Generation with LLMs , author=. 2025 , eprint=

2025

[59] [59]

2025 , eprint=

OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs , author=. 2025 , eprint=

2025

[60] [60]

2025 , eprint=

UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance , author=. 2025 , eprint=

2025

[61] [61]

2022 , eprint=

Training language models to follow instructions with human feedback , author=. 2022 , eprint=

2022

[62] [62]

2017 , eprint=

Proximal Policy Optimization Algorithms , author=. 2017 , eprint=

2017

[63] [63]

2025 , eprint=

Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback , author=. 2025 , eprint=

2025

[64] [64]

2025 , eprint=

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution , author=. 2025 , eprint=

2025

[65] [65]

arXiv preprint arXiv:2502.14382 , year=

S*: Test time scaling for code generation , author=. arXiv preprint arXiv:2502.14382 , year=

arXiv

[66] [66]

CodeT: Code Generation with Generated Tests , author=

[67] [67]

Jackson Petty and Sjoerd van Steenkiste and Tal Linzen , title =. Trans. Mach. Learn. Res. , volume =. 2025 , url =

2025

[68] [68]

arXiv preprint arXiv:2309.16298 , year=

At which training stage does code data help llms reasoning? , author=. arXiv preprint arXiv:2309.16298 , year=

arXiv

[69] [69]

arXiv preprint arXiv:2507.17512 , year=

Can one domain help others? a data-centric study on multi-domain reasoning via reinforcement learning , author=. arXiv preprint arXiv:2507.17512 , year=

arXiv

[70] [70]

The Thirteenth International Conference on Learning Representations,

Yantao Liu and Zijun Yao and Rui Min and Yixin Cao and Lei Hou and Juanzi Li , title =. The Thirteenth International Conference on Learning Representations,. 2025 , url =

2025

[71] [71]

Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

Maggie Huan and Yuetai Li and Tuney Zheng and Xiaoyu Xu and Seungone Kim and Minxin Du and Radha Poovendran and Graham Neubig and Xiang Yue , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2507.00432 , eprinttype =. 2507.00432 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2507.00432 2025

[72] [72]

Aho and Jeffrey D

Alfred V. Aho and Jeffrey D. Ullman , title =. 1972

1972

[73] [73]

Publications Manual , year = "1983", publisher =

1983

[74] [74]

Chandra and Dexter C

Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

work page doi:10.1145/322234.322243 1981

[75] [75]

Scalable training of

Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of

[76] [76]

Dan Gusfield , title =. 1997

1997

[77] [77]

Tetreault , title =

Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

2015

[78] [78]

A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =

Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =

[79] [79]

2026 , eprint=

Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts , author=. 2026 , eprint=

2026

[80] [80]

arXiv preprint arXiv:2506.02678 , year=

TL; DR: Too Long, Do Re-weighting for Efficient LLM Reasoning Compression , author=. arXiv preprint arXiv:2506.02678 , year=

arXiv