Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It
Pith reviewed 2026-06-27 13:07 UTC · model grok-4.3
The pith
CoT fine-tuning disrupts long-range recall in hybrid LLMs by changing query-key projections, but restoring only those projections recovers the capability.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CoT-SFT biases attention gradients toward short-range patterns, disrupting query-key projections (W_Q, W_K) that are responsible for long-range routing; restoring only these projections from the pre-SFT checkpoint recovers long-context capability.
What carries the argument
QK-Restore: a training-free method that restores only W_Q and W_K from the pre-SFT checkpoint while preserving all other post-SFT parameters (plus a Procrustes variant that balances routing preservation against reasoning adaptation).
If this is right
- Long-context recall on NIAH is restored at zero training cost while reasoning performance is preserved.
- The recovery holds across hybrid architectures including HypeNet and Jet-Nemotron.
- Gains are largest under longer context windows and harder retrieval settings.
- A Procrustes alignment variant offers a tunable trade-off between routing fidelity and adaptation.
Where Pith is reading between the lines
- Attention routing for different distance scales may be carried by partially separable weight components that can be edited independently.
- Selective checkpoint restoration could be tested on other fine-tuning regimes that trade one capability for another.
- Systematic measurement of which projection matrices affect which context lengths would clarify the scope of the effect.
Load-bearing premise
The observed NIAH degradation is caused specifically by changes to W_Q and W_K during CoT-SFT rather than by other simultaneous changes in the model or training dynamics.
What would settle it
An experiment in which only W_Q and W_K are updated during CoT-SFT yet NIAH performance does not degrade, or in which restoring those matrices fails to recover performance.
read the original abstract
Chain-of-thought (CoT) supervised fine-tuning (SFT) is widely adopted to improve reasoning ability, yet we find that it systematically degrades long-context recall in hybrid linear-attention models. Across architectures including HypeNet and Jet-Nemotron, retrieval performance on Needle-In-A-Haystack (NIAH) deteriorates substantially after CoT-SFT, and the degradation becomes more severe under harder retrieval settings and longer context windows. For example, HypeNet-9B on NIAH-S2@256K decreases from $67.2\%$ to $9.4\%$. We attribute this to CoT-SFT biasing attention gradients toward short-range patterns, disrupting query-key projections ($W_Q, W_K$) that are responsible for long-range routing. Motivated by this observation, we propose QK-Restore, a training-free method that restores only $W_Q$ and $W_K$ from the pre-SFT checkpoint while preserving all other post-SFT parameters. We further introduce a Procrustes variant to balance routing preservation and reasoning adaptation. Across architectures, QK-Restore consistently restores long-context capability at zero training cost while preserving reasoning performance; for instance, on HypeNet-5B it improves S3@256K from $65.4\%$ to $76.4\%$ while maintaining strong reasoning performance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that CoT supervised fine-tuning systematically degrades long-context recall (measured via Needle-In-A-Haystack) in hybrid linear-attention models such as HypeNet and Jet-Nemotron by biasing attention gradients toward short-range patterns and thereby altering the query-key projections W_Q and W_K. It supports this via before/after performance drops that worsen with harder retrieval settings and longer contexts, then introduces the training-free QK-Restore intervention (and a Procrustes variant) that swaps only the pre-SFT W_Q/W_K matrices back into the post-SFT model, reporting recovery of NIAH scores (e.g., HypeNet-5B S3@256K from 65.4% to 76.4%) while retaining reasoning gains.
Significance. If the attribution and intervention hold, the result identifies a previously under-appreciated side-effect of standard CoT-SFT on long-range routing in hybrid architectures and supplies a zero-cost, parameter-swap fix that preserves downstream reasoning. The reported consistency of degradation and recovery across multiple models, context lengths, and difficulty settings strengthens the practical relevance for fine-tuning pipelines that must balance reasoning and long-context capability.
major comments (1)
- [attribution paragraph / abstract] Abstract and attribution paragraph: the central causal claim that degradation arises specifically from CoT-SFT-induced changes to W_Q and W_K (rather than simultaneous changes to W_V, feed-forward layers, or optimizer state) rests on the QK-Restore swap experiment. No ablation is described that restores alternative components while holding W_Q/W_K fixed, nor are direct measurements of attention-score distributions or gradient norms on long versus short tokens reported pre- and post-SFT; without these controls the restoration could be correlational rather than mechanistic.
minor comments (2)
- Results sections would benefit from explicit enumeration of all baselines, whether other hyperparameters were ablated or held constant during SFT, and the precise definition of the NIAH-S2/S3 variants and context lengths used.
- The Procrustes variant is introduced but its exact formulation, hyper-parameters, and comparison to plain QK-Restore are not detailed enough to allow reproduction from the text alone.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on strengthening the causal attribution. We address the concern point-by-point below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: Abstract and attribution paragraph: the central causal claim that degradation arises specifically from CoT-SFT-induced changes to W_Q and W_K (rather than simultaneous changes to W_V, feed-forward layers, or optimizer state) rests on the QK-Restore swap experiment. No ablation is described that restores alternative components while holding W_Q/W_K fixed, nor are direct measurements of attention-score distributions or gradient norms on long versus short tokens reported pre- and post-SFT; without these controls the restoration could be correlational rather than mechanistic.
Authors: We agree the current evidence is primarily correlational and that targeted controls are needed to isolate the role of W_Q/W_K. In the revised version we will add: (i) ablations that restore only W_V or feed-forward layers (holding post-SFT W_Q/W_K fixed) and show these yield no NIAH recovery, unlike QK-Restore; (ii) pre-/post-SFT comparisons of attention-score distributions and gradient norms separated by long-range vs. short-range tokens, confirming the post-SFT bias toward short-range patterns. These experiments directly address the mechanistic gap while preserving the training-free nature of the intervention. revision: yes
Circularity Check
No circularity: empirical before/after and intervention results stand on direct measurement.
full rationale
The paper reports observed NIAH degradation after CoT-SFT, attributes it observationally to changes in W_Q/W_K, and demonstrates recovery via a parameter-swap intervention that restores only those matrices. No equations, fitted parameters, or self-citations reduce any claimed result to a quantity defined by the claim itself. The derivation chain consists of experimental comparisons and a training-free fix; it is self-contained against external benchmarks and does not invoke uniqueness theorems, ansatzes, or renamings that collapse into inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard assumptions in transformer training and evaluation hold, including that NIAH measures long-range recall.
Reference graph
Works this paper leans on
-
[1]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning , journal =. 2025 , url =. doi:10.48550/ARXIV.2501.12948 , eprinttype =. 2501.12948 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2501.12948 2025
-
[2]
OpenCodeReasoning: Advancing Data Distillation for Competitive Coding
Wasi Uddin Ahmad and Sean Narenthiran and Somshubra Majumdar and Aleksander Ficek and Siddhartha Jain and Jocelyn Huang and Vahid Noroozi and Boris Ginsburg , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2504.01943 , eprinttype =. 2504.01943 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2504.01943 2025
-
[3]
Open R1: A fully open reproduction of DeepSeek-R1 , url =
-
[4]
Phi-4-reasoning Technical Report
Marah I Abdin and Sahaj Agarwal and Ahmed Awadallah and Vidhisha Balachandran and Harkirat S. Behl and Lingjiao Chen and Gustavo de Rosa and Suriya Gunasekar and Mojan Javaheripi and Neel Joshi and Piero Kauffmann and Yash Lara and Caio C. Phi-4-reasoning Technical Report , journal =. 2025 , url =. doi:10.48550/ARXIV.2504.21318 , eprinttype =. 2504.21318 ...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2504.21318 2025
-
[5]
DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level , author=
-
[6]
Hanxu Hu and Xingxing Zhang and Jannis Vamvas and Rico Sennrich and Furu Wei , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2510.17715 , eprinttype =. 2510.17715 , timestamp =
-
[7]
OpenThoughts: Data Recipes for Reasoning Models
Etash Kumar Guha and Ryan Marten and Sedrick Keh and Negin Raoof and Georgios Smyrnis and Hritik Bansal and Marianna Nezhurina and Jean Mercat and Trung Vu and Zayne Sprague and Ashima Suvarna and Benjamin Feuer and Liangyu Chen and Zaid Khan and Eric Frankel and Sachin Grover and Caroline Choi and Niklas Muennighoff and Shiye Su and Wanjia Zhao and John ...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2506.04178 2025
-
[8]
Xianzhen Luo and Jinyang Huang and Wenzhen Zheng and Qingfu Zhu and Mingzheng Xu and Yiheng Xu and YuanTao Fan and Libo Qin and Wanxiang Che , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2510.08720 , eprinttype =. 2510.08720 , timestamp =
-
[9]
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),
Huaye Zeng and Dongfu Jiang and Haozhe Wang and Ping Nie and Xiaotong Chen and Wenhu Chen , editor =. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),. 2025 , url =
2025
-
[10]
Measuring Coding Challenge Competence With
Dan Hendrycks and Steven Basart and Saurav Kadavath and Mantas Mazeika and Akul Arora and Ethan Guo and Collin Burns and Samir Puranik and Horace He and Dawn Song and Jacob Steinhardt , editor =. Measuring Coding Challenge Competence With. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Ben...
2021
-
[11]
NeurIPS , year=
Measuring Coding Challenge Competence With APPS , author=. NeurIPS , year=
-
[12]
Evaluating Large Language Models Trained on Code , journal =
Mark Chen and Jerry Tworek and Heewoo Jun and Qiming Yuan and Henrique Pond. Evaluating Large Language Models Trained on Code , journal =. 2021 , url =. 2107.03374 , timestamp =
Pith/arXiv arXiv 2021
-
[13]
Reddy , title =
Parshin Shojaee and Aneesh Jain and Sindhu Tipirneni and Chandan K. Reddy , title =. Trans. Mach. Learn. Res. , volume =. 2023 , url =
2023
-
[14]
Forty-second International Conference on Machine Learning,
Jonas Gehring and Kunhao Zheng and Jade Copet and Vegard Mella and Taco Cohen and Gabriel Synnaeve , title =. Forty-second International Conference on Machine Learning,. 2025 , url =
2025
-
[15]
Shihan Dou and Yan Liu and Haoxiang Jia and Limao Xiong and Enyu Zhou and Wei Shen and Junjie Shan and Caishuang Huang and Xiao Wang and Xiaoran Fan and Zhiheng Xi and Yuhao Zhou and Tao Ji and Rui Zheng and Qi Zhang and Xuanjing Huang and Tao Gui , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2402.01391 , eprinttype =. 2402.01391 , timestamp =
-
[16]
Huimu Yu and Xing Wu and Weidong Yin and Debing Zhang and Songlin Hu , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2410.02229 , eprinttype =. 2410.02229 , timestamp =
-
[17]
Joar Skalse and Nikolaus H. R. Howe and Dmitrii Krasheninnikov and David Krueger , editor =. Defining and Characterizing Reward Gaming , booktitle =. 2022 , url =
2022
-
[18]
Jiayi Fu and Xuandong Zhao and Chengyuan Yao and Heng Wang and Qi Han and Yanghua Xiao , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2502.18770 , eprinttype =. 2502.18770 , timestamp =
-
[19]
Christiano and John Schulman and Dan Man
Dario Amodei and Chris Olah and Jacob Steinhardt and Paul F. Christiano and John Schulman and Dan Man. Concrete Problems in. CoRR , volume =. 2016 , url =. 1606.06565 , timestamp =
Pith/arXiv arXiv 2016
-
[20]
Reinforcement Learning with a Corrupted Reward Channel , booktitle =
Tom Everitt and Victoria Krakovna and Laurent Orseau and Shane Legg , editor =. Reinforcement Learning with a Corrupted Reward Channel , booktitle =. 2017 , url =. doi:10.24963/IJCAI.2017/656 , timestamp =
-
[21]
the method of paired comparisons , author=
Rank analysis of incomplete block designs: I. the method of paired comparisons , author=. Biometrika , volume=. 1952 , publisher=
1952
-
[22]
Mingzhe Du and Luu Tuan Tuan and Yue Liu and Yuhao Qing and Dong Huang and Xinyi He and Qian Liu and Zejun Ma and See. Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization , journal =. 2025 , url =. doi:10.48550/ARXIV.2505.23387 , eprinttype =. 2505.23387 , timestamp =
-
[23]
Hugging Face repository , howpublished =
CodeForces , author=. Hugging Face repository , howpublished =. 2025 , publisher =
2025
-
[24]
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Qiying Yu and Zheng Zhang and Ruofei Zhu and Yufeng Yuan and Xiaochen Zuo and Yu Yue and Tiantian Fan and Gaohong Liu and Lingjun Liu and Xin Liu and Haibin Lin and Zhiqi Lin and Bole Ma and Guangming Sheng and Yuxuan Tong and Chi Zhang and Mofan Zhang and Wang Zhang and Hang Zhu and Jinhua Zhu and Jiaze Chen and Jiangjie Chen and Chengyi Wang and Hongli ...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2503.14476 2025
-
[25]
Guangming Sheng and Chi Zhang and Zilingfeng Ye and Xibin Wu and Wang Zhang and Ru Zhang and Yanghua Peng and Haibin Lin and Chuan Wu , title =. Proceedings of the Twentieth European Conference on Computer Systems, EuroSys 2025, Rotterdam, The Netherlands, 30 March 2025 - 3 April 2025 , pages =. 2025 , url =. doi:10.1145/3689031.3696075 , timestamp =
-
[26]
LiveBench:
Colin White and Samuel Dooley and Manley Roberts and Arka Pal and Benjamin Feuer and Siddhartha Jain and Ravid Shwartz. LiveBench:. The Thirteenth International Conference on Learning Representations,. 2025 , url =
2025
-
[27]
Nye and Maarten Bosma and Henryk Michalewski and David Dohan and Ellen Jiang and Carrie J
Jacob Austin and Augustus Odena and Maxwell I. Nye and Maarten Bosma and Henryk Michalewski and David Dohan and Ellen Jiang and Carrie J. Cai and Michael Terry and Quoc V. Le and Charles Sutton , title =. CoRR , volume =. 2021 , url =. 2108.07732 , timestamp =
Pith/arXiv arXiv 2021
-
[28]
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code , booktitle =
Naman Jain and King Han and Alex Gu and Wen. LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code , booktitle =. 2025 , url =
2025
-
[29]
Science , volume=
Competition-level code generation with alphacode , author=. Science , volume=. 2022 , publisher=
2022
-
[30]
Yinjie Wang and Ling Yang and Ye Tian and Ke Shen and Mengdi Wang , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2506.03136 , eprinttype =. 2506.03136 , timestamp =
-
[31]
2022 , eprint=
Emergent Abilities of Large Language Models , author=. 2022 , eprint=
2022
-
[32]
2023 , eprint=
AceCoder: Utilizing Existing Code to Enhance Code Generation , author=. 2023 , eprint=
2023
-
[33]
Evaluating In-Context Learning of Libraries for Code Generation , booktitle =
Arkil Patel and Siva Reddy and Dzmitry Bahdanau and Pradeep Dasigi , editor =. Evaluating In-Context Learning of Libraries for Code Generation , booktitle =. 2024 , url =. doi:10.18653/V1/2024.NAACL-LONG.161 , timestamp =
-
[34]
2026 , eprint=
X-Coder: Advancing Competitive Programming with Fully Synthetic Tasks, Solutions, and Tests , author=. 2026 , eprint=
2026
-
[35]
Codefuse and Wenting Cai and Yuchen Cao and Chaoyu Chen and Chen Chen and Siba Chen and Qing Cui and Peng Di and Junpeng Fang and Zi Gong and Ting Guo and Zhengyu He and Yang Huang and Cong Li and Jianguo Li and Zheng Li and Shijie Lian and Bingchang Liu and Songshan Luo and Shuo Mao and Min Shen and Jian Wu and Jiaolong Yang and Wenjie Yang and Tong Ye a...
-
[36]
Yifei Liu and Li Lyna Zhang and Yi Zhu and Bingcheng Dong and Xudong Zhou and Ning Shang and Fan Yang and Mao Yang , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2505.21297 , eprinttype =. 2505.21297 , timestamp =
-
[37]
CodeRL+: Improving Code Generation via Reinforcement with Execution Semantics Alignment
Xue Jiang and Yihong Dong and Mengyang Liu and Hongyi Deng and Tian Wang and Yongding Tao and Rongyu Cao and Binhua Li and Zhi Jin and Wenpin Jiao and Fei Huang and Yongbin Li and Ge Li , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2510.18471 , eprinttype =. 2510.18471 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2510.18471 2025
-
[38]
Rongao Li and Jie Fu and Bo. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2312.14852 , eprinttype =. 2312.14852 , timestamp =
-
[39]
2025 , url=
SYNTHETIC-1: Two Million Collaboratively Generated Reasoning Traces from Deepseek-R1 , author=. 2025 , url=
2025
-
[40]
Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy
Chris Yuhao Liu and Liang Zeng and Yuzhen Xiao and Jujie He and Jiacai Liu and Chaojie Wang and Rui Yan and Wei Shen and Fuxiang Zhang and Jiacheng Xu and Yang Liu and Yahui Zhou , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2507.01352 , eprinttype =. 2507.01352 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2507.01352 2025
-
[41]
Marah I Abdin and Jyoti Aneja and Harkirat S. Behl and S. Phi-4 Technical Report , journal =. 2024 , url =. doi:10.48550/ARXIV.2412.08905 , eprinttype =. 2412.08905 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2412.08905 2024
-
[42]
Gemma 2: Improving Open Language Models at a Practical Size
Morgane Rivi. Gemma 2: Improving Open Language Models at a Practical Size , journal =. 2024 , url =. doi:10.48550/ARXIV.2408.00118 , eprinttype =. 2408.00118 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2408.00118 2024
-
[43]
KodCode: A diverse, challenging, and verifiable synthetic dataset for coding
Zhangchen Xu and Yang Liu and Yueqin Yin and Mingyuan Zhou and Radha Poovendran , editor =. KodCode:. Findings of the Association for Computational Linguistics,. 2025 , url =. doi:10.18653/V1/2025.FINDINGS-ACL.365 , timestamp =
-
[44]
LIMO: Less is More for Reasoning
Yixin Ye and Zhen Huang and Yang Xiao and Ethan Chern and Shijie Xia and Pengfei Liu , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2502.03387 , eprinttype =. 2502.03387 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2502.03387 2025
-
[45]
Forty-first International Conference on Machine Learning,
Zhengyang Tang and Xingxing Zhang and Benyou Wang and Furu Wei , title =. Forty-first International Conference on Machine Learning,. 2024 , url =
2024
-
[46]
AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation
Dong Huang and Qingwen Bu and Jie M. Zhang and Michael Luck and Heming Cui , title =. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2312.13010 , eprinttype =. 2312.13010 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2312.13010 2023
-
[47]
2025 , eprint=
Qwen3 Technical Report , author=. 2025 , eprint=
2025
-
[48]
5-Coder Technical Report , author=
Qwen2. 5-Coder Technical Report , author=. arXiv preprint arXiv:2409.12186 , year=
-
[49]
Jimenez and Alexander Wettig and Kilian Lieret and Shunyu Yao and Karthik Narasimhan and Ofir Press , editor =
John Yang and Carlos E. Jimenez and Alexander Wettig and Kilian Lieret and Shunyu Yao and Karthik Narasimhan and Ofir Press , editor =. SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering , booktitle =. 2024 , url =
2024
-
[50]
CodePRM: Execution Feedback-enhanced Process Reward Model for Code Generation , booktitle =
Qingyao Li and Xinyi Dai and Xiangyang Li and Weinan Zhang and Yasheng Wang and Ruiming Tang and Yong Yu , editor =. CodePRM: Execution Feedback-enhanced Process Reward Model for Code Generation , booktitle =. 2025 , url =
2025
-
[51]
Yue Wang and Hung Le and Akhilesh Deepak Gotmare and Nghi D. Q. Bui and Junnan Li and Steven C. H. Hoi , title =. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2305.07922 , eprinttype =. 2305.07922 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2305.07922 2023
-
[52]
Yue Wang and Weishi Wang and Shafiq R. Joty and Steven C. H. Hoi , editor =. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation , booktitle =. 2021 , url =. doi:10.18653/V1/2021.EMNLP-MAIN.685 , timestamp =
-
[53]
2024 , eprint=
StarCoder 2 and The Stack v2: The Next Generation , author=. 2024 , eprint=
2024
-
[54]
2024 , eprint=
Code Llama: Open Foundation Models for Code , author=. 2024 , eprint=
2024
-
[55]
2024 , eprint=
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence , author=. 2024 , eprint=
2024
-
[56]
2023 , eprint=
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks , author=. 2023 , eprint=
2023
-
[57]
2020 , eprint=
CodeSearchNet Challenge: Evaluating the State of Semantic Code Search , author=. 2020 , eprint=
2020
-
[58]
2025 , eprint=
A Large-scale Class-level Benchmark Dataset for Code Generation with LLMs , author=. 2025 , eprint=
2025
-
[59]
2025 , eprint=
OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs , author=. 2025 , eprint=
2025
-
[60]
2025 , eprint=
UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance , author=. 2025 , eprint=
2025
-
[61]
2022 , eprint=
Training language models to follow instructions with human feedback , author=. 2022 , eprint=
2022
-
[62]
2017 , eprint=
Proximal Policy Optimization Algorithms , author=. 2017 , eprint=
2017
-
[63]
2025 , eprint=
Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback , author=. 2025 , eprint=
2025
-
[64]
2025 , eprint=
BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution , author=. 2025 , eprint=
2025
-
[65]
arXiv preprint arXiv:2502.14382 , year=
S*: Test time scaling for code generation , author=. arXiv preprint arXiv:2502.14382 , year=
-
[66]
CodeT: Code Generation with Generated Tests , author=
-
[67]
Jackson Petty and Sjoerd van Steenkiste and Tal Linzen , title =. Trans. Mach. Learn. Res. , volume =. 2025 , url =
2025
-
[68]
arXiv preprint arXiv:2309.16298 , year=
At which training stage does code data help llms reasoning? , author=. arXiv preprint arXiv:2309.16298 , year=
-
[69]
arXiv preprint arXiv:2507.17512 , year=
Can one domain help others? a data-centric study on multi-domain reasoning via reinforcement learning , author=. arXiv preprint arXiv:2507.17512 , year=
-
[70]
The Thirteenth International Conference on Learning Representations,
Yantao Liu and Zijun Yao and Rui Min and Yixin Cao and Lei Hou and Juanzi Li , title =. The Thirteenth International Conference on Learning Representations,. 2025 , url =
2025
-
[71]
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning
Maggie Huan and Yuetai Li and Tuney Zheng and Xiaoyu Xu and Seungone Kim and Minxin Du and Radha Poovendran and Graham Neubig and Xiang Yue , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2507.00432 , eprinttype =. 2507.00432 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2507.00432 2025
-
[72]
Aho and Jeffrey D
Alfred V. Aho and Jeffrey D. Ullman , title =. 1972
1972
-
[73]
Publications Manual , year = "1983", publisher =
1983
-
[74]
Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243
-
[75]
Scalable training of
Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of
-
[76]
Dan Gusfield , title =. 1997
1997
-
[77]
Tetreault , title =
Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =
2015
-
[78]
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =
Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =
-
[79]
2026 , eprint=
Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts , author=. 2026 , eprint=
2026
-
[80]
arXiv preprint arXiv:2506.02678 , year=
TL; DR: Too Long, Do Re-weighting for Efficient LLM Reasoning Compression , author=. arXiv preprint arXiv:2506.02678 , year=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.