pith. sign in

arxiv: 2605.21260 · v1 · pith:SYO4IKLPnew · submitted 2026-05-20 · 💻 cs.LG

On the Cost and Benefit of Chain of Thought: A Learning-Theoretic Perspective

Pith reviewed 2026-05-21 05:06 UTC · model grok-4.3

classification 💻 cs.LG
keywords chain of thoughtreasoning riskoracle-trajectory risktrajectory-mismatch riskstabilityerror accumulationlearning theorydomain adaptation
0
0 comments X

The pith

Chain-of-thought reasoning risk decomposes into oracle-trajectory benefit and trajectory-mismatch cost

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper models chain-of-thought as the interaction between an answer map and an autoregressive chain rule that produces intermediate questions. It defines the reasoning risk of a hypothesis under this interaction and decomposes the risk into an oracle-trajectory risk that captures the benefit of CoT by reducing to a target-domain risk in domain adaptation, and a trajectory-mismatch risk that captures the cost through error accumulation on mismatched paths. The work shows that without stability in the loss, the hypothesis answer map, or the chain rule, the mismatch cost can become arbitrarily large even when the oracle term is zero and the hypothesis is uniformly close to the ground truth. Under stability, a tight upper bound on the mismatch risk is controlled by an exact amplification factor that distinguishes bounded, linear, and exponential error-growth regimes. This supplies a precise account of when chain-of-thought improves accuracy and when it degrades it.

Core claim

We model CoT as the interaction between an answer map and a chain rule that generates intermediate questions autoregressively, and define the reasoning risk of a hypothesis under this interaction. Our first result is a tight canonical decomposition of this risk into two terms with opposing roles: an oracle-trajectory risk (OTR), which captures the benefit of CoT and reduces to a target-domain risk in a domain adaptation problem, and a trajectory-mismatch risk (TMR), which captures the cost of CoT through error accumulation along mismatched reasoning trajectories. Under stability, we prove a tight upper bound on the TMR governed by an exact amplification factor that identifies bounded, linear

What carries the argument

The canonical decomposition of reasoning risk into oracle-trajectory risk (OTR) and trajectory-mismatch risk (TMR), which separates CoT's benefit from its cost of error accumulation on mismatched trajectories.

Load-bearing premise

The modeling of CoT as the interaction between an answer map and a chain rule that generates intermediate questions autoregressively.

What would settle it

Construct an unstable chain rule and an accurate hypothesis with zero oracle-trajectory risk, then measure whether the trajectory-mismatch risk grows without bound.

read the original abstract

We develop a learning-theoretic framework for understanding Chain of Thought (CoT). We model CoT as the interaction between an answer map and a chain rule that generates intermediate questions autoregressively, and define the reasoning risk of a hypothesis under this interaction. Our first result is a tight canonical decomposition of this risk into two terms with opposing roles: an oracle-trajectory risk (OTR), which captures the benefit of CoT and reduces to a target-domain risk in a domain adaptation problem, and a trajectory-mismatch risk (TMR), which captures the cost of CoT through error accumulation along mismatched reasoning trajectories. We then show that this cost is unavoidable without structure: if any one of the loss, the hypothesis answer map, or the chain rule lacks stability, the TMR can be arbitrarily large even when the OTR is zero and the hypothesis is uniformly close to the ground truth. Conversely, under stability, we prove a tight upper bound on the TMR governed by an exact amplification factor that identifies bounded, linear, and exponential error-growth regimes. Together, these results give a precise theory of when CoT helps, when it hurts, and what controls the transition between the two.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper develops a learning-theoretic framework for understanding Chain of Thought (CoT). It models CoT as the interaction between an answer map and a chain rule that generates intermediate questions autoregressively, and defines the reasoning risk of a hypothesis under this interaction. The main results are a tight canonical decomposition of this risk into oracle-trajectory risk (OTR) capturing the benefit of CoT (reducing to target-domain risk in domain adaptation) and trajectory-mismatch risk (TMR) capturing the cost through error accumulation. It shows that without stability, TMR can be arbitrarily large even when OTR is zero, and under stability provides a tight upper bound governed by an amplification factor identifying bounded, linear, and exponential error-growth regimes.

Significance. This framework offers a precise theory of when CoT helps or hurts by identifying stability as the key factor. The OTR/TMR decomposition provides clear separation of benefits and costs, with the domain adaptation analogy adding interpretability. The error growth regimes could help predict and mitigate issues in long reasoning chains. These results, if verified, contribute to the theoretical foundations of reasoning in large models.

major comments (1)
  1. [Modeling of CoT and definition of reasoning risk] The canonical decomposition into OTR and TMR is a direct algebraic consequence of the modeling choice where CoT is the interaction between a fixed answer map and an independent autoregressive chain rule. This modeling enables the split but may not reflect the joint training dynamics typical in CoT, where a single model generates both the chain and the answer without explicit separation. Since this is the load-bearing assumption for defining the reasoning risk and enabling the subsequent analysis, the paper would benefit from additional discussion on how the results translate to jointly optimized models or why the separation is a reasonable abstraction.
minor comments (1)
  1. [Notation and definitions] Ensure that all symbols, such as the amplification factor, are clearly defined upon first use and that any assumptions on the hypothesis class are explicitly stated to aid readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments, as well as for recognizing the potential of the OTR/TMR framework. We address the single major comment below and will revise the manuscript accordingly to strengthen the discussion of modeling assumptions.

read point-by-point responses
  1. Referee: [Modeling of CoT and definition of reasoning risk] The canonical decomposition into OTR and TMR is a direct algebraic consequence of the modeling choice where CoT is the interaction between a fixed answer map and an independent autoregressive chain rule. This modeling enables the split but may not reflect the joint training dynamics typical in CoT, where a single model generates both the chain and the answer without explicit separation. Since this is the load-bearing assumption for defining the reasoning risk and enabling the subsequent analysis, the paper would benefit from additional discussion on how the results translate to jointly optimized models or why the separation is a reasonable abstraction.

    Authors: We agree that the separation of the answer map and chain rule is a deliberate modeling abstraction chosen to enable the clean algebraic decomposition of reasoning risk. This choice is reasonable because the autoregressive generation of intermediate steps followed by a final answer map is the functional form of CoT even when a single model is trained end-to-end; the parameters may be shared, but the roles remain distinct and the risk decomposition continues to hold formally under the same interaction. The framework thereby isolates the benefit (OTR, which reduces to target risk in a domain-adaptation view) from the cost (TMR due to trajectory mismatch), providing insight that remains relevant for jointly optimized models. To address the referee's suggestion, we will add a new paragraph in the Discussion section explaining this rationale, noting that the stability conditions and error-growth regimes apply directly to the composed hypothesis regardless of training procedure, and briefly relating the abstraction to modular versus monolithic reasoning architectures in the literature. This revision clarifies scope without changing any theorems or proofs. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper defines a reasoning risk via the composition of an answer map and an autoregressive chain rule for generating intermediate questions. It then algebraically decomposes this defined quantity into an oracle-trajectory risk (OTR) term and a trajectory-mismatch risk (TMR) term. Subsequent stability-based bounds on TMR follow from additional assumptions on the loss, hypothesis, and chain rule rather than from any fitted parameters, self-citations, or reductions of the target claims to the inputs by construction. The framework remains self-contained as a sequence of definitions, identities, and conditional theorems without load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the newly introduced definitions of reasoning risk, OTR, and TMR plus stability assumptions on loss, answer map, and chain rule. These are defined within the paper rather than taken from upstream literature without justification. No numerical free parameters or new physical entities are mentioned.

axioms (1)
  • standard math Existence of probability distributions over input sequences and output labels for defining expectations in the risk terms.
    Invoked when defining reasoning risk and its decomposition into OTR and TMR.

pith-pipeline@v0.9.0 · 5744 in / 1430 out tokens · 51906 ms · 2026-05-21T05:06:23.201969+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

123 extracted references · 123 canonical work pages · 12 internal anchors

  1. [1]

    A. A. Abdullah, A. Zubiaga, S. Mirjalili, A. H. Gandomi, F. Daneshfar, M. Amini, A. S. Mohammed, and H. Veisi. Evolution of meta’s llama models and parameter-efficient fine-tuning of large language models: a survey.arXiv preprint arXiv:2510.12178, 2025

  2. [2]

    Acuna, G

    D. Acuna, G. Zhang, M. T. Law, and S. Fidler. f-domain adversarial learning: Theory and algorithms. In M. Meila and T. Zhang, editors,Proceedings of the 38th International Conference on Machine Learning, volume 139 ofProceedings of Machine Learning Research, pages 66–75. PMLR, 18–24 Jul 2021

  3. [3]

    Aghajohari, K

    M. Aghajohari, K. Chitsaz, A. Kazemnejad, S. Chandar, A. Sordoni, A. Courville, and S. Reddy. The markovian thinker: Architecture-agnostic linear scaling of reasoning. InThe Fourteenth International Conference on Learning Representations, 2026

  4. [4]

    Altabaa, O

    A. Altabaa, O. Montasser, and J. Lafferty. Cot information: Improved sample complexity under chain- of-thought supervision. In D. Belgrave, C. Zhang, H. Lin, R. Pascanu, P. Koniusz, M. Ghassemi, and N. Chen, editors,Advances in Neural Information Processing Systems, volume 38, pages 24822–24862. Curran Associates, Inc., 2025

  5. [5]

    Amiri, X

    A. Amiri, X. Huang, M. Rofin, and M. Hahn. Lower bounds for chain-of-thought reasoning in hard- attention transformers. In A. Singh, M. Fazel, D. Hsu, S. Lacoste-Julien, F. Berkenkamp, T. Maharaj, K. Wagstaff, and J. Zhu, editors,Proceedings of the 42nd International Conference on Machine Learn- ing, volume 267 ofProceedings of Machine Learning Research, ...

  6. [6]

    Claude Opus 4.6 System Card

    Anthropic. Claude Opus 4.6 System Card. System card, Anthropic, Feb. 2026. URLhttps://www-cdn. anthropic.com/0dd865075ad3132672ee0ab40b05a53f14cf5288.pdf. Accessed: 2026-05-06

  7. [7]

    Bachmann and V

    G. Bachmann and V. Nagarajan. The pitfalls of next-token prediction. InForty-first International Conference on Machine Learning, 2024

  8. [8]

    X. Bai, I. Pres, Y. Deng, C. Tan, S. Shieber, F. Vi´ egas, M. Wattenberg, and A. Lee. Why can’t transformers learn multiplication? reverse-engineering reveals long-range dependency pitfalls.arXiv preprint arXiv:2510.00184, 2025

  9. [9]

    G. Bao, H. Zhang, C. Wang, L. Yang, and Y. Zhang. How likely do LLMs with CoT mimic human reasoning? In O. Rambow, L. Wanner, M. Apidianaki, H. Al-Khalifa, B. D. Eugenio, and S. Schockaert, editors,Proceedings of the 31st International Conference on Computational Linguistics, pages 7831– 7850, Abu Dhabi, UAE, Jan. 2025. Association for Computational Linguistics

  10. [10]

    Barcelo, A

    P. Barcelo, A. Kozachinskiy, and T. Steifer. Ehrenfeucht-haussler rank and chain of thought. In Forty-second International Conference on Machine Learning, 2025

  11. [11]

    Ben-David and R

    S. Ben-David and R. Urner. On the hardness of domain adaptation and the utility of unlabeled target samples. InInternational Conference on Algorithmic Learning Theory, pages 139–153. Springer, 2012

  12. [12]

    Ben-David and R

    S. Ben-David and R. Urner. Domain adaptation–can quantity compensate for quality?Annals of Mathematics and Artificial Intelligence, 70(3):185–202, 2014

  13. [13]

    Ben-David, J

    S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira. Analysis of representations for domain adapta- tion. In B. Sch¨ olkopf, J. Platt, and T. Hoffman, editors,Advances in Neural Information Processing Systems, volume 19. MIT Press, 2006

  14. [14]

    Ben-David, J

    S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. W. Vaughan. A theory of learning from different domains.Machine learning, 79(1):151–175, 2010. 11

  15. [15]

    Besta, N

    M. Besta, N. Blach, A. Kubicek, R. Gerstenberger, M. Podstawski, L. Gianinazzi, J. Gajda, T. Lehmann, H. Niewiadomski, P. Nyczyk, et al. Graph of thoughts: Solving elaborate problems with large language models. InProceedings of the AAAI conference on artificial intelligence, volume 38, pages 17682–17690, 2024

  16. [16]

    Besta, F

    M. Besta, F. Memedi, Z. Zhang, R. Gerstenberger, G. Piao, N. Blach, P. Nyczyk, M. Copik, G. Kwa´ sniewski, J. M¨ uller, et al. Demystifying chains, trees, and graphs of thoughts.IEEE Transac- tions on Pattern Analysis and Machine Intelligence, 2025

  17. [17]

    Blitzer, K

    J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. Wortman. Learning bounds for domain adap- tation. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors,Advances in Neural Information Processing Systems, volume 20. Curran Associates, Inc., 2007

  18. [18]

    Brown, B

    T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei....

  19. [19]

    L. Chen, B. Peng, and H. Wu. Theoretical limitations of multi-layer transformer. In2025 IEEE 66th Annual Symposium on Foundations of Computer Science (FOCS), pages 2631–2653, 2025. doi: 10.1109/FOCS63196.2025.00136

  20. [20]

    Q. Chen, L. Qin, J. Wang, J. Zhou, and W. Che. Unlocking the capabilities of thought: A reasoning boundary framework to quantify and optimize chain-of-thought. In A. Globerson, L. Mackey, D. Bel- grave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neural Information Process- ing Systems, volume 37, pages 54872–54904. Curran Associates,...

  21. [21]

    Q. Chen, L. Qin, J. Liu, D. Peng, J. Guan, P. Wang, M. Hu, Y. Zhou, T. Gao, and W. Che. Towards reasoning era: A survey of long chain-of-thought for reasoning large language models.arXiv preprint arXiv:2503.09567v5, 2025

  22. [22]

    X. Chen, R. Aksitov, U. Alon, J. Ren, K. Xiao, P. Yin, S. Prakash, C. Sutton, X. Wang, and D. Zhou. Universal self-consistency for large language models. InICML 2024 Workshop on In-Context Learning, 2024

  23. [23]

    X. Chen, J. Xu, T. Liang, Z. He, J. Pang, D. Yu, L. Song, Q. Liu, M. Zhou, Z. Zhang, R. Wang, Z. Tu, H. Mi, and D. Yu. Do NOT think that much for 2+3=? On the overthinking of long reasoning models. In A. Singh, M. Fazel, D. Hsu, S. Lacoste-Julien, F. Berkenkamp, T. Maharaj, K. Wagstaff, and J. Zhu, editors,Proceedings of the 42nd International Conference ...

  24. [24]

    Y. Chen, J. Benton, A. Radhakrishnan, J. Uesato, C. Denison, J. Schulman, A. Somani, P. Hase, M. Wagner, F. Roger, et al. Reasoning models don’t always say what they think.arXiv preprint arXiv:2505.05410, 2025

  25. [25]

    Compressed Chain of Thought: Efficient Reasoning Through Dense Representations

    J. Cheng and B. V. Durme. Compressed chain of thought: Efficient reasoning through dense represen- tations.CoRR, abs/2412.13171, 2024

  26. [26]

    Cheng, X

    Y. Cheng, X. Liang, Y. Gong, W. Xiao, S. Wang, Y. Zhang, W. Hou, K. Xu, W. Liu, W. Li, J. Jiao, Q. Chen, P. CHENG, and W. Xiong. Integrative decoding: Improving factuality via implicit self- consistency. InThe Thirteenth International Conference on Learning Representations, 2025. 12

  27. [27]

    Y. Cui, P. He, X. Tang, Q. He, C. Luo, J. Tang, and Y. Xing. A theoretical understanding of chain- of-thought: Coherent reasoning and error-aware demonstration. In Y. Li, S. Mandt, S. Agrawal, and E. Khan, editors,Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, volume 258 ofProceedings of Machine Learning Resear...

  28. [28]

    S. B. David, T. Lu, T. Luu, and D. Pal. Impossibility theorems for domain adaptation. In Y. W. Teh and M. Titterington, editors,Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, volume 9 ofProceedings of Machine Learning Research, pages 129–136, Chia Laguna Resort, Sardinia, Italy, 13–15 May 2010. PMLR

  29. [29]

    Z. Dong, Z. Liu, and Y. Mao. On the hardness of unsupervised domain adaptation: Optimal learners and information-theoretic perspective. In S. Chandar, R. Pascanu, E. Eaton, B. Liu, R. Mahmood, and A. Rannen-Triki, editors,Proceedings of The 4th Conference on Lifelong Learning Agents, volume 330 ofProceedings of Machine Learning Research, pages 89–111. PML...

  30. [30]

    G. Feng, B. Zhang, Y. Gu, H. Ye, D. He, and L. Wang. Towards revealing the mystery behind chain of thought: A theoretical perspective. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems, volume 36, pages 70757–70798. Curran Associates, Inc., 2023

  31. [31]

    Gambardella, Y

    A. Gambardella, Y. Iwasawa, and Y. Matsuo. Language models do hard arithmetic tasks easily and hardly do easy arithmetic tasks. In L.-W. Ku, A. Martins, and V. Srikumar, editors,Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 85–91, Bangkok, Thailand, Aug. 2024. Association for Comput...

  32. [32]

    Z. Gan, Y. Liao, and Y. Liu. Rethinking external slow-thinking: From snowball errors to probability of correct reasoning. InForty-second International Conference on Machine Learning, 2025

  33. [33]

    Z. Gan, R. Ren, W. Yao, X. Hu, G. Xu, C. Qian, H. Tang, Z. Gong, X. Yao, P. Tang, et al. Beyond the black box: Theory and mechanism of large language models.arXiv preprint arXiv:2601.02907, 2026

  34. [34]

    Ganin, E

    Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. March, and V. Lem- pitsky. Domain-adversarial training of neural networks.Journal of machine learning research, 17(59): 1–35, 2016

  35. [35]

    Geiping, S

    J. Geiping, S. M. McLeish, N. Jain, J. Kirchenbauer, S. Singh, B. R. Bartoldson, B. Kailkhura, A. Bhatele, and T. Goldstein. Scaling up test-time compute with latent reasoning: A recurrent depth approach. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  36. [36]

    H. A. Gozeten, M. E. Ildiz, X. Zhang, H. Harutyunyan, A. S. Rawat, and S. Oymak. Continuous chain of thought enables parallel exploration and reasoning. InThe Fourteenth International Conference on Learning Representations, 2026

  37. [37]

    D. Guo, D. Yang, H. Zhang, J. Song, P. Wang, Q. Zhu, R. Xu, R. Zhang, S. Ma, X. Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025

  38. [38]

    M. Hahn. Theoretical limitations of self-attention in neural sequence models.Transactions of the Association for Computational Linguistics, 8:156–171, 2020. doi: 10.1162/tacl a 00306

  39. [39]

    Hanneke and S

    S. Hanneke and S. Kpotufe. On the value of target data in transfer learning. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch´ e-Buc, E. Fox, and R. Garnett, editors,Advances in Neu- ral Information Processing Systems, volume 32. Curran Associates, Inc., 2019. 13

  40. [40]

    S. Hao, S. Sukhbaatar, D. Su, X. Li, Z. Hu, J. E. Weston, and Y. Tian. Training large language models to reason in a continuous latent space. InSecond Conference on Language Modeling, 2025

  41. [41]

    X. Hu, F. Zhang, S. Chen, and Z. Yang. Unveiling the statistical foundations of chain-of-thought prompting methods.CoRR, abs/2408.14511, 2024

  42. [42]

    Huang, Z

    J. Huang, Z. Wang, and J. D. Lee. Transformers learn to implement multi-step gradient descent with chain of thought. InThe Thirteenth International Conference on Learning Representations, 2025

  43. [43]

    Huang, Z

    Y. Huang, Z. Wen, A. Singh, Y. Chi, and Y. Chen. Transformers provably learn chain-of-thought reasoning with length generalization. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  44. [44]

    Jiang and C

    J. Jiang and C. Zhai. Instance weighting for domain adaptation in NLP. In A. Zaenen and A. van den Bosch, editors,Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 264–271, Prague, Czech Republic, June 2007. Association for Computational Linguistics

  45. [45]

    Joshi, G

    N. Joshi, G. Vardi, A. Block, S. Goel, Z. Li, T. Misiakiewicz, and N. Srebro. A theory of learning with autoregressive chain of thought. In N. Haghtalab and A. Moitra, editors,Proceedings of Thirty Eighth Conference on Learning Theory, volume 291 ofProceedings of Machine Learning Research, pages 3161–3212. PMLR, 30 Jun–04 Jul 2025

  46. [46]

    Kim and T

    J. Kim and T. Suzuki. Transformers provably solve parity efficiently with chain of thought. InThe Thirteenth International Conference on Learning Representations, 2025

  47. [47]

    Kojima, S

    T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, and Y. Iwasawa. Large language models are zero-shot reasoners. In A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, editors,Advances in Neural Information Processing Systems, 2022

  48. [48]

    Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety

    T. Korbak, M. Balesni, E. Barnes, Y. Bengio, J. Benton, J. Bloom, M. Chen, A. Cooney, A. Dafoe, A. D. Dragan, S. Emmons, O. Evans, D. Farhi, R. Greenblatt, D. Hendrycks, M. Hobbhahn, E. Hub- inger, G. Irving, E. Jenner, D. Kokotajlo, V. Krakovna, S. Legg, D. Lindner, D. Luan, A. Madry, J. Michael, N. Nanda, D. Orr, J. Pachocki, E. Perez, M. Phuong, F. Rog...

  49. [49]

    Kruttschnitt, J

    G. Kruttschnitt, J. Shim, A. Ma, D. Kim, B. Chek, A. Anand, K. Zhu, and S. O’Brien. Contrastive chain-of-thought prompting.CoRR, abs/2407.03600, 2024

  50. [50]

    A. Lee, E. Che, and T. Peng. How well do LLMs compress their own chain-of-thought? a token complexity approach. InES-FoMo III: 3rd Workshop on Efficient Systems for Foundation Models, 2025

  51. [51]

    H. Li, S. Lu, P.-Y. Chen, X. Cui, and M. Wang. Training nonlinear transformers for chain-of-thought inference: A theoretical generalization analysis. InThe Thirteenth International Conference on Learn- ing Representations, 2025

  52. [52]

    J. Li, Y. Fu, L. Fan, J. Liu, Y. Shu, C. Qin, M. Yang, I. King, and R. Ying. Implicit reasoning in large language models: A comprehensive survey.arXiv preprint arXiv:2509.02350, 2025

  53. [53]

    Z. Li, H. Liu, D. Zhou, and T. Ma. Chain of thought empowers transformers to solve inherently serial problems. InThe Twelfth International Conference on Learning Representations, 2024

  54. [54]

    T. Liu, Q. Guo, X. Hu, C. Jiayang, Y. Zhang, X. Qiu, and Z. Zhang. Can language models learn to skip steps? In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neural Information Processing Systems, volume 37, pages 45359–45385. Curran Associates, Inc., 2024. doi: 10.52202/079017-1441. 14

  55. [55]

    T. Liu, W. Xu, W. Huang, Y. Zeng, J. Wang, X. Wang, H. Yang, and J. Li. Logic-of-thought: Injecting logic into contexts for full reasoning in large language models. In L. Chiruzzo, A. Ritter, and L. Wang, editors,Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Techn...

  56. [56]

    M. Long, H. Zhu, J. Wang, and M. I. Jordan. Deep transfer learning with joint adaptation networks. In D. Precup and Y. W. Teh, editors,Proceedings of the 34th International Conference on Machine Learning, volume 70 ofProceedings of Machine Learning Research, pages 2208–2217. PMLR, 06–11 Aug 2017

  57. [57]

    X. Ma, G. Wan, R. Yu, G. Fang, and X. Wang. CoT-valve: Length-compressible chain-of-thought tuning. In W. Che, J. Nabende, E. Shutova, and M. T. Pilehvar, editors,Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6025–6035, Vienna, Austria, July 2025. Association for Computational Lingui...

  58. [58]

    Madaan, K

    A. Madaan, K. Hermann, and A. Yazdanbakhsh. What makes chain-of-thought prompting effective? a counterfactual study. In H. Bouamor, J. Pino, and K. Bali, editors,Findings of the Association for Computational Linguistics: EMNLP 2023, pages 1448–1535, Singapore, Dec. 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.findings-emnlp.101

  59. [59]

    E. Malach. Auto-regressive next-token predictors are universal learners. In R. Salakhutdinov, Z. Kolter, K. Heller, A. Weller, N. Oliver, J. Scarlett, and F. Berkenkamp, editors,Proceedings of the 41st Inter- national Conference on Machine Learning, volume 235 ofProceedings of Machine Learning Research, pages 34417–34431. PMLR, 21–27 Jul 2024

  60. [60]

    Malon and X

    C. Malon and X. Zhu. Self-consistent decoding for more factual open responses.ArXiv, abs/2403.00696, 2024

  61. [61]

    Mansour, M

    Y. Mansour, M. Mohri, and A. Rostamizadeh. Domain adaptation: Learning bounds and algorithms. InProceedings of The 22nd Annual Conference on Learning Theory (COLT 2009), Montr´ eal, Canada, 2009

  62. [62]

    Merrill and A

    W. Merrill and A. Sabharwal. The expressive power of transformers with chain of thought. InThe Twelfth International Conference on Learning Representations, 2024

  63. [63]

    S. I. Mirzadeh, K. Alizadeh, H. Shahrokhi, O. Tuzel, S. Bengio, and M. Farajtabar. GSM-symbolic: Understanding the limitations of mathematical reasoning in large language models. InThe Thirteenth International Conference on Learning Representations, 2025

  64. [64]

    Mondorf and B

    P. Mondorf and B. Plank. Beyond accuracy: Evaluating the reasoning behavior of large language models - a survey. InFirst Conference on Language Modeling, 2024

  65. [65]

    Y. Ning, W. Li, J. Fang, N. Tan, and H. Liu. Not all thoughts are generated equal: Efficient llm reasoning via multi-turn reinforcement learning.arXiv preprint arXiv:2505.11827, 2025

  66. [66]

    M. Nye, A. J. Andreassen, G. Gur-Ari, H. Michalewski, J. Austin, D. Bieber, D. Dohan, A. Lewkowycz, M. Bosma, D. Luan, et al. Show your work: Scratchpads for intermediate computation with language models.arXiv preprint arXiv:2112.00114, 2021

  67. [67]

    Q. Pan, W. Ji, Y. Ding, J. Li, S. Chen, J. Wang, J. Zhou, Q. Chen, M. Zhang, Y. Wu, et al. A survey of slow thinking-based reasoning llms using reinforced learning and inference-time scaling law.arXiv preprint arXiv:2505.02665, 2025. 15

  68. [68]

    B. Peng, S. Narayanan, and C. Papadimitriou. On limitations of the transformer architecture. InFirst Conference on Language Modeling, 2024

  69. [69]

    P´ erez, P

    J. P´ erez, P. Barcel´ o, and J. Marinkovic. Attention is turing-complete.Journal of Machine Learning Research, 22(75):1–35, 2021

  70. [70]

    Prystawski, M

    B. Prystawski, M. Li, and N. Goodman. Why think step by step? reasoning emerges from the locality of experience. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems, volume 36, pages 70926–70947. Curran Associates, Inc., 2023

  71. [71]

    C. Qian, D. Liu, H. Wen, Z. Bai, Y. Liu, and J. Shao. Demystifying reasoning dynamics with mutual information: Thinking tokens are information peaks in LLM reasoning. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  72. [72]

    Radford, K

    A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever. Improving Language Understanding by Generative Pre-Training. Technical report, OpenAI, 2018

  73. [73]

    Radford, J

    A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever. Language Models are Unsupervised Multitask Learners. Technical report, OpenAI, 2019

  74. [74]

    Redko, A

    I. Redko, A. Habrard, and M. Sebban. On the analysis of adaptability in multi-source domain adap- tation.Machine Learning, 108(8):1635–1652, 2019

  75. [75]

    Roark and M

    B. Roark and M. Bacchiani. Supervised and unsupervised PCFG adaptation to novel domains. In Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, pages 205–212, 2003

  76. [76]

    Saito, K

    K. Saito, K. Watanabe, Y. Ushiku, and T. Harada. Maximum classifier discrepancy for unsupervised domain adaptation. In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3723–3732, 2018. doi: 10.1109/CVPR.2018.00392

  77. [77]

    Sanford, D

    C. Sanford, D. Hsu, and M. Telgarsky. Representational strengths and limitations of transformers. In Thirty-seventh Conference on Neural Information Processing Systems, 2023

  78. [78]

    Saunshi, N

    N. Saunshi, N. Dikkala, Z. Li, S. Kumar, and S. J. Reddi. Reasoning with latent thoughts: On the power of looped transformers. InThe Thirteenth International Conference on Learning Representations, 2025

  79. [79]

    J. Shen, Y. Qu, W. Zhang, and Y. Yu. Wasserstein distance guided representation learning for do- main adaptation. InProceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, AAAI’18/IAA...

  80. [80]

    OpenAI GPT-5 System Card

    A. Singh, A. Fry, A. Perelman, A. Tart, A. Ganesh, A. El-Kishky, A. McLaughlin, A. Low, A. Ostrow, A. Ananthram, et al. Openai gpt-5 system card.arXiv preprint arXiv:2601.03267, 2025

Showing first 80 references.