Fine-Tuning Large Language Models for Quantum Reasoning
Pith reviewed 2026-06-26 11:59 UTC · model grok-4.3
The pith
Fine-tuning on explicit state-vector traces lets LLMs predict quantum circuit outcomes with near-perfect accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Training large language models on explicit gate-by-gate state-vector simulation traces produces accurate prediction of measurement probability distributions for quantum circuits. Supervised fine-tuning alone reaches near-perfect accuracy inside the training distribution and when extrapolating in gate count; adding a subsequent stage of group relative policy optimisation with verifiable rewards reduces in-distribution precision but improves performance on larger qubit systems that the supervised stage alone cannot solve. Both pipelines substantially exceed the performance of the untuned base model and an external large baseline.
What carries the argument
Two fine-tuning pipelines that supply the model with explicit step-by-step state-vector simulation traces: supervised fine-tuning on those traces, and the same supervised stage followed by group relative policy optimisation using verifiable rewards.
If this is right
- LLMs can serve as accurate simulators for quantum circuits whose size exceeds what the base model can handle.
- Explicit trace supervision enables extrapolation in the number of gates without retraining.
- The two-stage pipeline extends capability to qubit counts unreachable by supervised fine-tuning alone.
- Both methods outperform the base model and the external baseline on the quantum simulation task.
Where Pith is reading between the lines
- The same trace-based supervision could be applied to other domains that require step-by-step physical simulation.
- If the model truly internalises the rules, it might be prompted to propose new circuit designs rather than only evaluate given ones.
- A direct next measurement would be whether the fine-tuned models retain accuracy when the target distribution includes hardware noise models not seen in training.
Load-bearing premise
That success on simulation traces reflects genuine quantum reasoning rather than statistical matching of patterns present in the training distribution.
What would settle it
A test set of circuits whose gate sequences or qubit counts lie well outside the training distribution yet require only the same linear-algebra rules; if accuracy collapses to random guessing on those circuits, the claim that the model has learned quantum reasoning fails.
Figures
read the original abstract
Large language models (LLMs) exhibit abilities beyond natural language modelling and text generation. Recent advances in their reasoning capabilities have spurred interest in applying LLMs to complex scientific tasks requiring deep domain expertise and sophisticated reasoning. Quantum computing, as a highly specialised field with significant knowledge barriers and hardware constraints, could greatly benefit from such advancements. However, a key open question that first must be answered is: How can we develop fine-tuning pipelines that instil genuine quantum reasoning in LLMs, rather than task-specific pattern matching? We study this question through quantum circuit simulation as a training objective, where the model must predict the measurement probability distribution resulting from a sequence of quantum gate operations. We propose and compare two fine-tuning pipelines: (1) Supervised Fine-Tuning (SFT) on explicit gate-by-gate state-vector simulation traces, and (2) a two-stage SFT+Group Relative Policy Optimisation (GRPO) approach that sequentially applies SFT followed by GRPO with verifiable rewards. Our findings show that SFT achieves near-perfect in-distribution and gate-count extrapolation accuracy, significantly outperforming both the base model and the GPT-OSS-120B baseline. SFT+GRPO trades some in-distribution precision for better generalisation to larger qubit systems that SFT alone cannot handle. Both pipelines significantly outperform the baselines, demonstrating that targeted fine-tuning on explicit reasoning traces is an effective strategy for advancing quantum reasoning in LLMs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes two fine-tuning pipelines for LLMs on quantum circuit simulation: (1) supervised fine-tuning (SFT) on explicit gate-by-gate state-vector traces to predict measurement probability distributions, and (2) a two-stage SFT followed by Group Relative Policy Optimisation (GRPO) with verifiable rewards. It claims that SFT achieves near-perfect in-distribution and gate-count extrapolation accuracy, significantly outperforming the base model and GPT-OSS-120B baseline, while SFT+GRPO trades some in-distribution precision for improved generalisation to larger qubit systems.
Significance. If the empirical results are robust, the work would demonstrate an effective strategy for adapting LLMs to quantum tasks via simulation traces, with the verifiable-reward component of GRPO providing a reproducible training signal. This could open pathways for LLM-assisted quantum algorithm design, though the paper's framing of 'genuine quantum reasoning' versus pattern matching on simulation data is central to its contribution.
major comments (2)
- [Abstract] Abstract: the central claim that the pipelines 'instil genuine quantum reasoning in LLMs, rather than task-specific pattern matching' is not supported by the described evaluations, which are confined to accuracy on measurement-probability prediction for circuits drawn from (or extrapolated within) the same state-vector simulation distribution; no experiments probe conceptual understanding, non-simulatable properties, or circuits with qualitatively different structure.
- [Results] Results section: the abstract asserts 'near-perfect' accuracy and 'significantly outperforming' baselines without supplying concrete metrics, dataset sizes, error bars, or controls for post-hoc selection; these details are required to evaluate whether the reported performance reflects acquisition of quantum principles or memorised correlations in the training traces.
minor comments (2)
- [Methods] Clarify the precise circuit-generation parameters, training-set sizes, and held-out test distributions in the methods to allow reproduction of the in-distribution versus extrapolation splits.
- Ensure all result tables and figures report error bars or confidence intervals and explicitly define the GPT-OSS-120B baseline configuration.
Simulated Author's Rebuttal
We thank the referee for their thorough review and valuable comments on our manuscript. We address each major comment below and outline the revisions we will make.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that the pipelines 'instil genuine quantum reasoning in LLMs, rather than task-specific pattern matching' is not supported by the described evaluations, which are confined to accuracy on measurement-probability prediction for circuits drawn from (or extrapolated within) the same state-vector simulation distribution; no experiments probe conceptual understanding, non-simulatable properties, or circuits with qualitatively different structure.
Authors: We agree that the evaluations presented are limited to predicting measurement probabilities on circuits from the state-vector simulation distribution, including some extrapolation in gate count. The phrasing 'genuine quantum reasoning' in the abstract is an interpretive claim based on the model's success in composing quantum operations step-by-step, which goes beyond simple memorization due to the variable-length and compositional nature of the traces. However, we acknowledge that this does not constitute direct evidence of conceptual understanding or performance on non-simulatable tasks. We will revise the abstract to moderate this language, for example by stating that the pipelines enable LLMs to learn quantum circuit simulation effectively, and add a section discussing the distinction between simulation-based learning and broader quantum reasoning. revision: yes
-
Referee: [Results] Results section: the abstract asserts 'near-perfect' accuracy and 'significantly outperforming' baselines without supplying concrete metrics, dataset sizes, error bars, or controls for post-hoc selection; these details are required to evaluate whether the reported performance reflects acquisition of quantum principles or memorised correlations in the training traces.
Authors: The full results section provides the requested details, including specific accuracy metrics (near 100% on in-distribution tests), dataset sizes (e.g., training sets of 10,000+ traces), error bars from multiple random seeds, and baseline comparisons. The abstract, however, is high-level and does not include these numbers. We will revise the abstract to include key quantitative results, such as exact accuracy figures and performance deltas versus baselines, to make the claims more concrete and allow better assessment of whether the performance indicates learned principles. revision: yes
Circularity Check
No circularity; empirical results on held-out simulation data with no derivations reducing to inputs by construction.
full rationale
The paper reports empirical accuracy of SFT and SFT+GRPO pipelines on quantum circuit state-vector prediction tasks, using held-out test sets for in-distribution and extrapolation evaluation. No equations, uniqueness theorems, or first-principles derivations are presented that reduce reported performance metrics to fitted parameters or self-citations defined by the same training distribution. The abstract explicitly frames genuine reasoning vs. pattern matching as an open question rather than claiming resolution via any self-referential construction. All load-bearing claims rest on standard supervised learning and RL evaluation protocols applied to external simulation traces.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Cognitive behaviors that enable self-improving reasoners, or, four habits of highly effective STars
Kanishk Gandhi, Ayush K Chakravarthy, Anikait Singh, Nathan Lile, and Noah Goodman. Cognitive behaviors that enable self-improving reasoners, or, four habits of highly effective STars. In Second Conference on Language Modeling , 2025
2025
-
[2]
Qwen2.5 technical report, 2025
Qwen, :, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li,...
2025
-
[3]
The llama 3 herd of models
Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models. arXiv e-prints , pages arXiv–2407, 2024
2024
-
[4]
Thinking, fast and slow
Daniel Kahneman. Thinking, fast and slow. Farrar, Straus and Giroux , 2011
2011
-
[5]
Gold-medalist performance in solving olympiad geometry with alphageometry2
Yuri Chervonyi, Trieu H Trinh, Miroslav Olšák, Xiaomeng Yang, Hoang H Nguyen, Marcelo Menegali, Junehyuk Jung, Junsu Kim, Vikas Verma, Quoc V Le, et al. Gold-medalist performance in solving olympiad geometry with alphageometry2. Journal of Machine Learning Research, 26(241):1–39, 2025
2025
-
[6]
Cwm: An open-weights llm for research on code generation with world models
Quentin Carbonneaux, Gal Cohen, Jonas Gehring, Jacob Kahn, Jannik Kossen, Felix Kreuk, Emily McMilin, Michel Meyer, Yuxiang Wei, David Zhang, et al. Cwm: An open-weights llm for research on code generation with world models. arXiv preprint arXiv:2510.02387, 2025
arXiv 2025
-
[7]
A systematic survey on large language models for algorithm design
Fei Liu, Yiming Yao, Ping Guo, Zhiyuan Yang, Xi Lin, Zhe Zhao, Xialiang Tong, Kun Mao, Zhichao Lu, Zhenkun Wang, et al. A systematic survey on large language models for algorithm design. ACM Computing Surveys , 58(8):1–32, 2026
2026
-
[8]
Sparks of artificial general intelligence: Early experiments with gpt-4
Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, et al. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712 , 2023
Pith/arXiv arXiv 2023
-
[9]
Benchmarking large language models for molecule prediction tasks
Zhiqiang Zhong, Kuangyu Zhou, and Davide Mottin. Benchmarking large language models for molecule prediction tasks. arXiv preprint arXiv:2403.05075 , 2024
arXiv 2024
-
[10]
Bayesian optimization of catalysis with in-context learning
Mayk Caldas Ramos, Shane S Michtavy, Andrew D White, and Marc D Porosoff. Bayesian optimization of catalysis with in-context learning. ACS Central Science , 12(5):599, 2026
2026
-
[11]
Towards end-to-end automation of ai research
Chris Lu, Cong Lu, Robert Tjarko Lange, Yutaro Yamada, Shengran Hu, Jakob Foer- ster, David Ha, and Jeff Clune. Towards end-to-end automation of ai research. Nature, 651(8107):914–919, 2026
2026
-
[12]
A survey on large language models for code generation
Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, and Sunghun Kim. A survey on large language models for code generation. ACM Transactions on Software Engineering and Methodology, 35(2):1–72, 2026. 24
2026
-
[13]
On repairing quantum programs us- ing chatgpt
Xiaoyu Guo, Jianjun Zhao, and Pengzhan Zhao. On repairing quantum programs us- ing chatgpt. In Proceedings of the 5th ACM/IEEE International Workshop on Quantum Software Engineering, pages 9–16, 2024
2024
-
[14]
Q-bridge: Code translation for quantum machine learning via llms
Runjia Zeng, Priyabrata Senapati, Ruixiang Tang, Dongfang Liu, and Qiang Guan. Q-bridge: Code translation for quantum machine learning via llms. arXiv preprint arXiv:2603.27836, 2026
arXiv 2026
-
[15]
Qiskit code assistant: Training llms for generating quantum computing code
Nicolas Dupuis, Luca Buratti, Sanjay Vishwakarma, Aitana Viudes Forrat, David Kremer, Ismael Faro, Ruchir Puri, and Juan Cruz-Benito. Qiskit code assistant: Training llms for generating quantum computing code. In 2024 IEEE LLM Aided Design Workshop (LAD) , pages 1–4. IEEE, 2024
2024
-
[16]
Quantum verifiable rewards for post-training qiskit code assistant
Nicolas Dupuis, Adarsh Tiwari, Youssef Mroueh, David Kremer, Ismael Faro, and Juan Cruz-Benito. Quantum verifiable rewards for post-training qiskit code assistant. arXiv preprint arXiv:2508.20907 , 2025
arXiv 2025
-
[17]
Enhancing llm-based quantum code generation with multi-agent optimization and quantum error correction
Charlie Campbell, Hao Mark Chen, Wayne Luk, and Hongxiang Fan. Enhancing llm-based quantum code generation with multi-agent optimization and quantum error correction. In 2025 62nd ACM/IEEE Design Automation Conference (DAC) , pages 1–7. IEEE, 2025
2025
-
[18]
Qcoder benchmark: Bridging language generation and quantum hardware through simulator-based feedback
Taku Mikuriya, Tatsuya Ishigaki, Masayuki Kawarada, Shunya Minami, Tadashi Kad- owaki, Yohichi Suzuki, Soshun Naito, Shunya Takada, Takumi Kato, Tamotsu Basseda, et al. Qcoder benchmark: Bridging language generation and quantum hardware through simulator-based feedback. In Proceedings of the 18th International Natural Language Gen- eration Conference, pag...
2025
-
[19]
Pennylang: Pioneering llm-based quantum code generation with a novel pennylane-centric dataset
Abdul Basit, Nouhaila Innan, Muhammad Haider Asif, Minghao Shao, Muhammad Kashif, Alberto Marchisio, and Muhammad Shafique. Pennylang: Pioneering llm-based quantum code generation with a novel pennylane-centric dataset. arXiv preprint arXiv:2503.02497 , 2025
Pith/arXiv arXiv 2025
-
[20]
Pennycoder: Efficient domain-specific llms for pennylane-based quantum code generation
Abdul Basit, Minghao Shao, Muhammad Haider Asif, Nouhaila Innan, Muhammad Kashif, Alberto Marchisio, and Muhammad Shafique. Pennycoder: Efficient domain-specific llms for pennylane-based quantum code generation. In 2025 IEEE International Conference on Quantum Computing and Engineering (QCE) , volume 2, pages 229–234. IEEE, 2025
2025
-
[21]
Qagent: An llm-based multi-agent system for autonomous openqasm programming
Zhenxiao Fu, Fan Chen, and Lei Jiang. Qagent: An llm-based multi-agent system for autonomous openqasm programming. arXiv preprint arXiv:2508.20134 , 2025
arXiv 2025
-
[22]
Unleashing the potential of llms for quantum computing: A study in quantum architecture design
Zhiding Liang, Jinglei Cheng, Rui Yang, Hang Ren, Zhixin Song, Di Wu, Xuehai Qian, Tongyang Li, and Yiyu Shi. Unleashing the potential of llms for quantum computing: A study in quantum architecture design. arXiv preprint arXiv:2307.08191 , 2023
arXiv 2023
-
[23]
Agent-q: fine-tuning large language models for quantum circuit generation and optimization
Linus Jern, Valter Uotila, Cong Yu, and Bo Zhao. Agent-q: fine-tuning large language models for quantum circuit generation and optimization. In 2025 IEEE International Conference on Quantum Computing and Engineering (QCE) , volume 1, pages 1621–1632. IEEE, 2025
2025
-
[24]
Quasar: Quantum assembly code generation using tool-augmented llms via agentic rl
Cong Yu, Valter Uotila, Shilong Deng, Qingyuan Wu, Tuo Shi, Songlin Jiang, Lei You, and Bo Zhao. Quasar: Quantum assembly code generation using tool-augmented llms via agentic rl. arXiv preprint arXiv:2510.00967 , 2025
arXiv 2025
-
[25]
Automated near-term quantum algorithm discovery for molecular ground states
Fabian Finger, Frederic Rapp, Pranav Kalidindi, Kerry He, Kante Yin, Alexander Koziell- Pipe, David Zsolt Manrique, Gabriel Greene-Diniz, Stephen Clark, Hamza Fawzi, et al. Automated near-term quantum algorithm discovery for molecular ground states. arXiv preprint arXiv:2603.26359 , 2026. 25
arXiv 2026
-
[26]
Scalable quantum state preparation via large- language-model-driven discovery
Qing-Hong Cao, Zong-Yue Hou, Ying-Ying Li, Xiaohui Liu, Zhuo-Yang Song, Liang-Qi Zhang, Shutao Zhang, and Ke Zhao. Scalable quantum state preparation via large- language-model-driven discovery. arXiv preprint arXiv:2505.06347 , 2025
arXiv 2025
-
[27]
Optimizing ansatz design in quantum generative adver- sarial networks using large language models
Kento Ueda and Atsushi Matsuo. Optimizing ansatz design in quantum generative adver- sarial networks using large language models. arXiv preprint arXiv:2503.12884 , 2025
arXiv 2025
-
[28]
Automating quantum feature map design via large language models
Kenya Sakka, Kosuke Mitarai, and Keisuke Fujii. Automating quantum feature map design via large language models. arXiv preprint arXiv:2504.07396 , 2025
arXiv 2025
-
[29]
Grovergpt: A large language model with 8 billion parameters for quantum searching
Haoran Wang, Pingzhi Li, Min Chen, Jinglei Cheng, Junyu Liu, and Tianlong Chen. Grovergpt: A large language model with 8 billion parameters for quantum searching. arXiv preprint arXiv:2501.00135 , 2024
arXiv 2024
-
[30]
Symbolic analysis of grover search algorithm via chain-of-thought reasoning and quantum- native tokenization
Min Chen, Jinglei Cheng, Pingzhi Li, Haoran Wang, Tianlong Chen, and Junyu Liu. Symbolic analysis of grover search algorithm via chain-of-thought reasoning and quantum- native tokenization. npj Quantum Information , 2026
2026
-
[31]
Attention is all you need
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems , 30, 2017
2017
-
[32]
Improving lan- guage understanding by generative pre-training
Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. Improving lan- guage understanding by generative pre-training. 2018
2018
-
[33]
AlphaZero-like tree-search can guide large language model decoding and training
Ziyu Wan, Xidong Feng, Muning Wen, Stephen Marcus Mcaleer, Ying Wen, Weinan Zhang, and Jun Wang. AlphaZero-like tree-search can guide large language model decoding and training. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors, Proceedings of the 41st Interna- tional Conf...
2024
-
[34]
Deepseek-r1 incentivizes reasoning in llms through reinforcement learning
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al. Deepseek-r1 incentivizes reasoning in llms through reinforcement learning. Nature, 645(8081):633–638, 2025
2025
-
[35]
Deepseekmath: Pushing the limits of mathematical reasoning in open language models
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Mingchuan Zhang, YK Li, Y Wu, and Daya Guo. Deepseekmath: Pushing the limits of mathematical reasoning in open language models. arXiv e-prints , pages arXiv–2402, 2024
2024
-
[36]
Hwang, Jiangjiang Yang, Ronan Le Bras, Oyvind Tafjord, Christopher Wilhelm, Luca Soldaini, Noah A
Nathan Lambert, Jacob Morrison, Valentina Pyatkin, Shengyi Huang, Hamish Ivison, Faeze Brahman, Lester James Validad Miranda, Alisa Liu, Nouha Dziri, Xinxi Lyu, Yul- ing Gu, Saumya Malik, Victoria Graf, Jena D. Hwang, Jiangjiang Yang, Ronan Le Bras, Oyvind Tafjord, Christopher Wilhelm, Luca Soldaini, Noah A. Smith, Yizhong Wang, Pradeep Dasigi, and Hannan...
2025
-
[37]
Lora: Low-rank adaptation of large language models
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. ICLR, 1(2):3, 2022
2022
-
[38]
How accurately do large language models understand code? arXiv preprint arXiv:2504.04372 , 2025
Sabaat Haroon, Ahmad Faraz Khan, Ahmad Humayun, Waris Gill, Abdul Haddi Amjad, Ali R Butt, Mohammad Taha Khan, and Muhammad Ali Gulzar. How accurately do large language models understand code? arXiv preprint arXiv:2504.04372 , 2025
arXiv 2025
-
[39]
A practical introduction to tensor networks: Matrix product states and projected entangled pair states
Román Orús. A practical introduction to tensor networks: Matrix product states and projected entangled pair states. Annals of physics , 349:117–158, 2014. 26
2014
-
[40]
Solving the quantum many-body problem with artificial neural networks
Giuseppe Carleo and Matthias Troyer. Solving the quantum many-body problem with artificial neural networks. Science, 355(6325):602–606, 2017
2017
-
[41]
Application of large language models to quantum state simulation
Shuangxiang Zhou, Ronghang Chen, Zheng An, Chao Zhang, and Shi-Yao Hou. Application of large language models to quantum state simulation. Science China Physics, Mechanics & Astronomy, 68(4):240313, 2025
2025
-
[42]
A fast quantum mechanical algorithm for database search
Lov K Grover. A fast quantum mechanical algorithm for database search. In Proceedings of the twenty-eighth annual ACM symposium on Theory of computing , pages 212–219, 1996
1996
-
[43]
Overcoming catastrophic forgetting in neural networks
James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences , 114(13):3521–3526, 2017
2017
-
[44]
Stabilizing transformer training by preventing attention entropy collapse
Shuangfei Zhai, Tatiana Likhomanenko, Etai Littwin, Dan Busbridge, Jason Ramapuram, Yizhe Zhang, Jiatao Gu, and Joshua M Susskind. Stabilizing transformer training by preventing attention entropy collapse. In International Conference on Machine Learning , pages 40770–40803. PMLR, 2023
2023
-
[45]
Ivan Moshkov, Darragh Hanley, Ivan Sorokin, Shubham Toshniwal, Christof Henkel, Benedikt Schifferer, Wei Du, and Igor Gitman. Aimo-2 winning solution: Building state- of-the-art mathematical reasoning models with openmathreasoning dataset. arXiv preprint arXiv:2504.16891, 2025
arXiv 2025
-
[46]
Deepseek-coder: When the large language model meets programming–the rise of code intelligence
Daya Guo, Qihao Zhu, Dejian Yang, Zhenda Xie, Kai Dong, Wentao Zhang, Guanting Chen, Xiao Bi, Yu Wu, YK Li, et al. Deepseek-coder: When the large language model meets programming–the rise of code intelligence. arXiv preprint arXiv:2401.14196 , 2024
Pith/arXiv arXiv 2024
-
[47]
Opencodereasoning: Advancing data distillation for competitive coding
Wasi Uddin Ahmad, Sean Narenthiran, Somshubra Majumdar, Aleksander Ficek, Sid- dhartha Jain, Jocelyn Huang, Vahid Noroozi, and Boris Ginsburg. Opencodereasoning: Advancing data distillation for competitive coding. In Second Conference on Language Modeling, 2025
2025
-
[48]
Hybridflow: A flexible and efficient rlhf framework
Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, and Chuan Wu. Hybridflow: A flexible and efficient rlhf framework. In Proceedings of the Twentieth European Conference on Computer Systems, pages 1279–1297, 2025
2025
-
[49]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report. arXiv preprint arXiv:2505.09388, 2025
Pith/arXiv arXiv 2025
-
[50]
gpt-oss-120b & gpt-oss-20b model card
Sandhini Agarwal, Lama Ahmad, Jason Ai, Sam Altman, Andy Applebaum, Edwin Arbus, Rahul K Arora, Yu Bai, Bowen Baker, Haiming Bao, et al. gpt-oss-120b & gpt-oss-20b model card. arXiv preprint arXiv:2508.10925 , 2025
Pith/arXiv arXiv 2025
-
[51]
Exploring length generalization in large language models
Cem Anil, Yuhuai Wu, Anders Andreassen, Aitor Lewkowycz, Vedant Misra, Vinay Ra- masesh, Ambrose Slone, Guy Gur-Ari, Ethan Dyer, and Behnam Neyshabur. Exploring length generalization in large language models. Advances in Neural Information Processing Systems, 35:38546–38556, 2022
2022
-
[52]
Length generalization in arithmetic transformers
Samy Jelassi, Stéphane d’Ascoli, Carles Domingo-Enrich, Yuhuai Wu, Yuanzhi Li, and François Charton. Length generalization in arithmetic transformers. arXiv preprint arXiv:2306.15400, 2023. 27
arXiv 2023
-
[53]
How are you?
Hattie Zhou, Arwen Bradley, Etai Littwin, Noam Razin, Omid Saremi, Joshua Susskind, Samy Bengio, and Preetum Nakkiran. What algorithms can transformers learn? a study in length generalization. In International Conference on Learning Representations , volume 2024, pages 15898–15926, 2024. 28 A SFT Training Configuration Configuration V alue Base model Qwen...
2024
-
[54]
RZ(−3π/4) gate on qubit 2
circuit.rz(-3*pi/4, 2) . RZ(−3π/4) gate on qubit 2. <quantum_state> [0.38 + 0.92i, 0, 0, 0, 0, 0, 0, 0] </quantum_state>
-
[55]
RY(−π/4) gate on qubit 1
circuit.ry(-pi/4, 1) . RY(−π/4) gate on qubit 1. <quantum_state> [0.35 + 0.85i, 0, −0.15 − 0.35i, 0, 0, 0, 0, 0] </quantum_state>
-
[56]
RY(3π/4) gate on qubit 0
circuit.ry(3*pi/4, 0) . RY(3π/4) gate on qubit 0. <quantum_state> [0.14 + 0 .33i, 0, −0.06 − 0.14i, 0, 0.33 + 0 .79i, 0, −0.14 − 0.33i, 0] </quantum_state> C GRPO Training Configuration Configuration V alue LoRA rank 16 LoRA alpha 32 LoRA Target modules All linear layers Training batch size 64 Rollouts per sample 5 Rollouts Temperature 1.0 Rollouts Top-p ...
-
[57]
State becomes: 1/√2 (|000 + |010)
h(1) → apply H to qubit 1. State becomes: 1/√2 (|000 + |010)
-
[58]
Now the state is 1/2 (|000 + |010 + |100 + |110) [...] 30
h(0) → apply H to qubit 0. Now the state is 1/2 (|000 + |010 + |100 + |110) [...] 30
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.