pith. sign in

arxiv: 2606.06840 · v1 · pith:VYPOSO4Anew · submitted 2026-06-05 · 💻 cs.CL · cs.AI· cs.LG

Characterize Then Distill: Mechanistic Reasoning in Large Output Spaces

Pith reviewed 2026-06-27 22:21 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.LG
keywords mechanistic reasoningdistillationlarge output spacesshortlistingmulti-label classificationzero-shot performance
0
0 comments X

The pith

Reasoning in large output spaces proceeds via broad shortlisting followed by fine-grained evaluation over the narrowed set.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Modern models achieve strong zero-shot results on multi-label tasks with hundreds of thousands or millions of candidate labels. The paper characterizes this as a two-phase process in which an initial broad shortlisting narrows the options and a subsequent fine-grained phase evaluates the shortlist. Evidence across datasets shows the phases can be isolated and are complementary. This separation is used to create a distillation procedure that outperforms standard distillation by addressing each phase distinctly.

Core claim

Reasoning is a two-phase process of broad shortlisting of candidates followed by fine-grained reasoning over the resulting set. These steps can be isolated and are complementary. This characterization supports a mechanistic distillation strategy that consistently outperforms standard distillation.

What carries the argument

The two-phase reasoning process of broad candidate shortlisting followed by fine-grained reasoning over the shortlist.

If this is right

  • Shortlisting reduces the candidate pool from millions to a tractable size before detailed evaluation occurs.
  • The two phases are complementary, so both must function for strong overall performance.
  • Isolating the phases permits targeted knowledge transfer during distillation.
  • The resulting distillation method improves performance consistently across multiple datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The two-phase separation may apply to other prediction tasks with very large label spaces.
  • Models could potentially be trained or prompted to make the shortlisting step more explicit.
  • Disrupting one phase independently might produce predictable drops in accuracy.

Load-bearing premise

The shortlisting and fine-grained reasoning phases can be reliably isolated from each other in a manner that directly yields a superior distillation procedure.

What would settle it

A demonstration that the proposed mechanistic distillation does not outperform standard distillation on the tested datasets would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.06840 by Byron C. Wallace, Debjyoti Saha Roy, Javed A. Aslam.

Figure 1
Figure 1. Figure 1: LLaMA3-70B exhibits early Focus buildup & late Confusion reduction over CoT. Mechanis￾tic distillation recovers teacher trajectory while representative CoT distillations [He et al., 2025] fail. In very large-scale multi-label settings, a small set of relevant labels must be selected from hundreds of thousands to millions of candidates [Zhou et al., 2024, Zhang et al., 2025, Ortego et al., 2025]. Such setti… view at source ↗
Figure 2
Figure 2. Figure 2: LLM Reasoning. (Left) Early CoT progressively builds focus on the broad categories. (Right) Late CoT progressively rules out near-miss categories until only the true signals remain. [Modarressi et al., 2025, Wang, 2025, Kuratov et al., 2024, Marjanovic et al. ´ , 2026], by leveraging structured mechanisms such as long Chain-of-Thought (CoT) [Chen et al., 2025a, Yeo et al., 2025], tree-of-thoughts [Yao et a… view at source ↗
Figure 3
Figure 3. Figure 3: Phase 1—While generating early CoT, we measure whether attention heads attend to salient [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: (Left) Top heads by CoarseScore in LLaMA3-70B model. Early-layer heads (L1–6) exhibit￾ing anchor-focused attention and aligned residual updates during early CoT. (Right) Attention from a top early-layer head (L3H22), sharply focusing on key clinical anchors (red: cardiac phrases like “heart failure” and “EF 35%”; orange: respiratory/renal like “pulmonary edema” & “pneumonia”). Top 3 Top 8 Top 16 Top 40 Top… view at source ↗
Figure 5
Figure 5. Figure 5: Early-layer attention heads causally control coarse filtering. (Left) Denoising patching progressively larger bins of top heads ranked by 𝖢𝗈𝖺𝗋𝗌𝖾𝖲𝖼𝗈𝗋𝖾 substantially restores reasoning focus toward semantic anchors. (Right) Noising these heads significantly degrades focus. MI” (cardiac anchors), as well as related respiratory and renal anchors like “pulmonary edema” and “pneumonia”. This indicates early-laye… view at source ↗
Figure 6
Figure 6. Figure 6: Phase 2—During later CoT, attention heads refine predictions by suppressing near-misses, [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Later-layer heads showing iterative refinement. (Left) Mid-to-late attention heads ranked by 𝑄𝐾 preference for own prior shortlist keys over near-miss keys + 𝑂𝑉 updates widening margins. (Right) Over CoT, top refinement heads sharpen 𝑄𝐾 preference toward shortlist representations, downweight near-misses, and strengthen 𝑂𝑉 contributions for iterative margin widening. Top 3 Top 8 Top 16 Top 40 Top 100 Cumula… view at source ↗
Figure 8
Figure 8. Figure 8: Later-layer heads causally drive iterative refinement. (Left/Right) Denoising successively larger bins of top 𝖱𝖾𝖿𝗂𝗇𝖾𝖲𝖼𝗈𝗋𝖾-ranked heads sharply ↓↓ near-miss confusion & noising them ↑↑ it. High positive 𝖱𝖾𝖿𝗂𝗇𝖾𝖲𝖼𝗈𝗋𝖾 identifies heads that re-attend to prior shortlist representations while avoiding near-misses, and write updates that widen the target–near-miss margin. As illustrated in [PITH_FULL_IMAGE:figure… view at source ↗
Figure 9
Figure 9. Figure 9: Ablating phases & specific atten./write/interaction losses in Eqs. [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗
read the original abstract

Modern reasoning models offer surprisingly strong zero-shot performance on challenging multi-label tasks that require selecting a small set of relevant options from hundreds of thousands to millions of candidate labels. We investigate how they achieve this mechanistically. We characterize reasoning as a two-phase process: A broad "shortlisting" of candidates followed by fine-grained reasoning over the resulting set. We provide evidence across a range of datasets that these steps can be isolated and are complementary. Using this characterization, we develop a mechanistic distillation strategy that consistently outperforms standard distillation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper claims that modern reasoning models achieve strong zero-shot performance on multi-label tasks with very large candidate sets via a two-phase process of broad shortlisting followed by fine-grained reasoning over the shortlist. It asserts that these phases can be isolated and shown to be complementary across datasets, and that a distillation procedure derived from this characterization consistently outperforms standard distillation.

Significance. If the two-phase characterization is valid and the isolation procedure is shown to be causal rather than post-hoc, the work could provide a useful mechanistic lens on how transformers manage large output spaces and yield practically better distillation recipes for such tasks. The absence of any equations, algorithms, datasets, or quantitative results in the manuscript prevents assessment of whether these benefits are realized or attributable to the proposed framing.

major comments (2)
  1. [Abstract] Abstract: the manuscript asserts that 'evidence across a range of datasets' exists for isolability and complementarity and that the resulting distillation 'consistently outperforms standard distillation,' yet supplies no methods section, no experimental protocol, no tables of results, and no description of baselines or controls. This renders the central empirical claims unverifiable from the text.
  2. [Abstract] Abstract: the claim that the phases 'can be isolated' is load-bearing for the distillation contribution, but no procedure, loss function, or intervention for performing the isolation is described, making it impossible to determine whether the reported gains are due to the mechanistic insight or to ancillary implementation choices.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their comments. We agree that the current manuscript text consists only of a high-level abstract and lacks the detailed methods, protocols, results, and isolation procedure needed to substantiate the claims. We will revise the manuscript accordingly to address these issues.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the manuscript asserts that 'evidence across a range of datasets' exists for isolability and complementarity and that the resulting distillation 'consistently outperforms standard distillation,' yet supplies no methods section, no experimental protocol, no tables of results, and no description of baselines or controls. This renders the central empirical claims unverifiable from the text.

    Authors: We accept this observation. The provided manuscript is limited to the abstract summarizing the claims without supporting details. In the revised version we will add a methods section, full experimental protocol, descriptions of datasets and baselines, and tables of quantitative results to make the claims verifiable. revision: yes

  2. Referee: [Abstract] Abstract: the claim that the phases 'can be isolated' is load-bearing for the distillation contribution, but no procedure, loss function, or intervention for performing the isolation is described, making it impossible to determine whether the reported gains are due to the mechanistic insight or to ancillary implementation choices.

    Authors: We agree that the isolation procedure must be explicitly described for the contribution to be assessable. The revised manuscript will include a detailed account of the isolation method, including any loss functions or interventions used, to clarify its role in the distillation strategy. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The provided abstract and description characterize reasoning as a two-phase shortlisting plus fine-grained process, claim empirical isolation across datasets, and derive a distillation strategy from that characterization. No equations, fitted parameters renamed as predictions, self-citations, or ansatzes are supplied that would allow any load-bearing step to reduce to its own inputs by construction. The derivation therefore remains self-contained and relies on external empirical evidence rather than internal redefinition or self-referential fitting.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are identifiable from the abstract alone.

pith-pipeline@v0.9.1-grok · 5616 in / 911 out tokens · 23439 ms · 2026-06-27T22:21:04.752400+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

167 extracted references · 128 canonical work pages · 33 internal anchors

  1. [1]

    RIP , author=

    Kurtosis as peakedness, 1905--2014. RIP , author=. The American Statistician , volume=. 2014 , publisher=

  2. [2]

    2025 , eprint =

    Eliciting Latent Predictions from Transformers with the Tuned Lens , author =. 2025 , eprint =

  3. [3]

    2024 , journal =

    Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet , author =. 2024 , journal =

  4. [4]

    arXiv preprint arXiv:2509.25002 , year=

    Circuit Distillation , author=. arXiv preprint arXiv:2509.25002 , year=

  5. [5]

    arXiv preprint arXiv:2501.12948 , year=

    Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning , author=. arXiv preprint arXiv:2501.12948 , year=

  6. [6]

    arXiv e-prints , pages=

    Test-time computing: from system-1 thinking to system-2 thinking , author=. arXiv e-prints , pages=. 2025 , url=

  7. [7]

    Introducing OpenAI o3 and o4-mini , year =

  8. [8]

    Introducing GPT-5.2 , year =

  9. [9]

    2023 , howpublished =

    Kamradt, Greg , title =. 2023 , howpublished =

  10. [10]

    and Dahiya, K

    Bhatia, K. and Dahiya, K. and Jain, H. and Kar, P. and Mittal, A. and Prabhu, Y. and Varma, M. , title =

  11. [11]

    Scientific data , volume=

    MIMIC-IV, a freely accessible electronic health record dataset , author=. Scientific data , volume=. 2023 , publisher=

  12. [12]

    Proceedings of the 40th annual meeting of the Association for Computational Linguistics , pages=

    Bleu: a method for automatic evaluation of machine translation , author=. Proceedings of the 40th annual meeting of the Association for Computational Linguistics , pages=

  13. [13]

    Journal of Machine learning research , volume=

    Statistical comparisons of classifiers over multiple data sets , author=. Journal of Machine learning research , volume=

  14. [14]

    Investigating Mysteries of C o T -Augmented Distillation

    Wadhwa, Somin and Amir, Silvio and Wallace, Byron C. Investigating Mysteries of C o T -Augmented Distillation. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.349

  15. [15]

    2025 , url =

    WHO , title =. 2025 , url =

  16. [16]

    Advances in Neural Information Processing Systems , volume=

    Towards semi-structured automatic ICD coding via tree-based contrastive learning , author=. Advances in Neural Information Processing Systems , volume=

  17. [17]

    arXiv preprint arXiv:2509.20317 , year=

    SIM-CoT: Supervised Implicit Chain-of-Thought , author=. arXiv preprint arXiv:2509.20317 , year=

  18. [18]

    Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

    Softcot: Soft chain-of-thought for efficient reasoning with llms , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

  19. [19]

    arXiv preprint arXiv:2311.01460 , year=

    Implicit chain of thought reasoning via knowledge distillation , author=. arXiv preprint arXiv:2311.01460 , year=

  20. [20]

    arXiv e-prints , pages=

    Cothink: Token-efficient reasoning via instruct models guiding reasoning models , author=. arXiv e-prints , pages=

  21. [21]

    Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

    Codi: Compressing chain-of-thought into continuous space via self-distillation , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

  22. [22]

    arXiv preprint arXiv:2412.13171 , year=

    Compressed chain of thought: Efficient reasoning through dense representations , author=. arXiv preprint arXiv:2412.13171 , year=

  23. [23]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    C3ot: Generating shorter chain-of-thought without compromising effectiveness , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  24. [24]

    arXiv preprint arXiv:2405.16064 , year=

    Keypoint-based progressive chain-of-thought distillation for llms , author=. arXiv preprint arXiv:2405.16064 , year=

  25. [25]

    Advances in Neural Information Processing Systems , volume=

    Iteration head: A mechanistic study of chain-of-thought , author=. Advances in Neural Information Processing Systems , volume=

  26. [27]

    doi:10.48550/ARXIV.2510.24940 , abstract =

    He, Yinhan and Zheng, Wendy and Zhu, Yaochen and Zheng, Zaiyi and Su, Lin and Vasudevan, Sriram and Guo, Qi and Hong, Liangjie and Li, Jundong , year =. doi:10.48550/ARXIV.2510.24940 , abstract =

  27. [28]

    Chen, Xiaoshu and Zhou, Sihang and Liang, Ke and Sun, Xiaoyu and Liu, Xinwang , editor =. Skip-. Proceedings of the 2025. 2025 , pages =. doi:10.18653/v1/2025.emnlp-main.610 , abstract =

  28. [29]

    Yan, JianZhi and Liu, Le and Pan, Youcheng and Chen, Shiwei and Xiang, Yang and Tang, Buzhou , editor =. Towards. Findings of the. 2025 , pages =. doi:10.18653/v1/2025.findings-emnlp.413 , abstract =

  29. [30]

    Zhuang, Xianwei and Zhu, Zhihong and Wang, Zhichang and Cheng, Xuxin and Zou, Yuexian , year =

  30. [31]

    arXiv.org , author =

    Probing to. arXiv.org , author =

  31. [32]

    and Aslam, Javed A

    Roy, Debjyoti Saha and Wallace, Byron C. and Aslam, Javed A. , month = dec, year =. Don't. doi:10.48550/arXiv.2410.23066 , abstract =

  32. [33]

    Distilling the Knowledge in a Neural Network

    Hinton, Geoffrey and Vinyals, Oriol and Dean, Jeff , month = mar, year =. Distilling the. doi:10.48550/arXiv.1503.02531 , abstract =

  33. [34]

    Kim, Jaehoon and Seo, Kwangwook and Lee, Dongha , month = sep, year =. In. doi:10.48550/arXiv.2509.22230 , abstract =

  34. [35]

    Interpretable Traces, Unexpected Outcomes: Investigating the Disconnect in Trace-Based Knowledge Distillation

    Bhambri, Siddhant and Biswas, Upasana and Kambhampati, Subbarao , month = may, year =. Interpretable. doi:10.48550/arXiv.2505.13792 , abstract =

  35. [36]

    Ramesh, Suhas Kamasetty and Sengupta, Ayan and Chakraborty, Tanmoy , month = aug, year =. On the. doi:10.48550/arXiv.2505.15442 , abstract =

  36. [37]

    Knowledge

    Fang, Luyang and Yu, Xiaowei and Cai, Jiazhang and Chen, Yongkai and Wu, Shushan and Liu, Zhengliang and Yang, Zhenyuan and Lu, Haoran and Gong, Xilin and Liu, Yufang and Ma, Terry and Ruan, Wei and Abbasi, Ali and Zhang, Jing and Wang, Tao and Latif, Ehsan and You, Weihang and Jiang, Hanqi and Liu, Wei and Zhang, Wei and Kolouri, Soheil and Zhai, Xiaomin...

  37. [38]

    Belinkov, Yonatan , month = mar, year =. Probing. Computational Linguistics , publisher =. doi:10.1162/coli_a_00422 , abstract =

  38. [39]

    Mixture of

    Fu, Tianyu and Huang, Haofeng and Ning, Xuefei and Zhang, Genghan and Chen, Boju and Wu, Tianqi and Wang, Hongyi and Huang, Zixiao and Li, Shiyao and Yan, Shengen and Dai, Guohao and Yang, Huazhong and Wang, Yu , month = nov, year =. Mixture of. doi:10.48550/arXiv.2406.14909 , abstract =

  39. [40]

    doi:10.48550/arXiv.2407.15891 , abstract =

    Tang, Hanlin and Lin, Yang and Lin, Jing and Han, Qingsen and Hong, Shikuan and Yao, Yiwu and Wang, Gongyi , month = jul, year =. doi:10.48550/arXiv.2407.15891 , abstract =

  40. [41]

    Attention

    Zheng, Zifan and Wang, Yezhaohui and Huang, Yuxin and Song, Shichao and Yang, Mingchuan and Tang, Bo and Xiong, Feiyu and Li, Zhiyu , month = dec, year =. Attention. doi:10.48550/arXiv.2409.03752 , abstract =

  41. [42]

    Retrieval

    Wu, Wenhao and Wang, Yizhong and Xiao, Guangxuan and Peng, Hao and Fu, Yao , month = apr, year =. Retrieval. doi:10.48550/arXiv.2404.15574 , abstract =

  42. [43]

    arXiv.org , author =

    A. arXiv.org , author =

  43. [44]

    arXiv.org , author =

    Eliciting. arXiv.org , author =

  44. [45]

    2024 , booktitle =

    Syed, Aaquib and Rager, Can and Conmy, Arthur , editor =. Attribution. Proceedings of the 7th. 2024 , pages =. doi:10.18653/v1/2024.blackboxnlp-1.25 , abstract =

  45. [46]

    arXiv.org , author =

  46. [47]

    Iteration

    Cabannes, Vivien and Arnal, Charles and Bouaziz, Wassim and Yang, Alice and Charton, Francois and Kempe, Julia , month = oct, year =. Iteration. doi:10.48550/arXiv.2406.02128 , abstract =

  47. [48]

    rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

    Guan, Xinyu and Zhang, Li Lyna and Liu, Yifei and Shang, Ning and Sun, Youran and Zhu, Yi and Yang, Fan and Yang, Mao , month = jan, year =. doi:10.48550/arXiv.2501.04519 , abstract =

  48. [49]

    Lieberum, Tom and Rajamanoharan, Senthooran and Conmy, Arthur and Smith, Lewis and Sonnerat, Nicolas and Varma, Vikrant and Kramár, János and Dragan, Anca and Shah, Rohin and Nanda, Neel , month = aug, year =. Gemma. doi:10.48550/arXiv.2408.05147 , abstract =

  49. [50]

    Shu, Dong and Wu, Xuansheng and Zhao, Haiyan and Rai, Daking and Yao, Ziyu and Liu, Ninghao and Du, Mengnan , month = sep, year =. A. doi:10.48550/arXiv.2503.05613 , abstract =

  50. [51]

    and Tutubalina, Elena and Oseledets, Ivan , month = aug, year =

    Galichin, Andrey and Dontsov, Alexey and Druzhinina, Polina and Razzhigaev, Anton and Rogov, Oleg Y. and Tutubalina, Elena and Oseledets, Ivan , month = aug, year =. I. doi:10.48550/arXiv.2503.18878 , abstract =

  51. [52]

    Distill , author =

    Multimodal. Distill , author =. 2021 , pages =. doi:10.23915/distill.00030 , number =

  52. [53]

    Distill , author =

    High/. Distill , author =. 2021 , pages =. doi:10.23915/distill.00024.005 , number =

  53. [54]

    Distill , author =

    Feature. Distill , author =. 2017 , pages =. doi:10.23915/distill.00007 , number =

  54. [55]

    Zoom in: An introduction to circuits

    Zoom. Distill , author =. 2020 , pages =. doi:10.23915/distill.00024.001 , number =

  55. [56]

    Scaling and evaluating sparse autoencoders

    Gao, Leo and Tour, Tom Dupré la and Tillman, Henk and Goh, Gabriel and Troll, Rajan and Radford, Alec and Sutskever, Ilya and Leike, Jan and Wu, Jeffrey , month = jun, year =. Scaling and evaluating sparse autoencoders , url =. doi:10.48550/arXiv.2406.04093 , abstract =

  56. [57]

    Cunningham, Hoagy and Ewart, Aidan and Riggs, Logan and Huben, Robert and Sharkey, Lee , month = oct, year =. Sparse. doi:10.48550/arXiv.2309.08600 , abstract =

  57. [58]

    Zhang, Fred and Nanda, Neel , month = jan, year =. Towards. doi:10.48550/arXiv.2309.16042 , abstract =

  58. [59]

    How to use and interpret activation patching

    Heimersheim, Stefan and Nanda, Neel , month = apr, year =. How to use and interpret activation patching , url =. doi:10.48550/arXiv.2404.15255 , abstract =

  59. [60]

    Nanda, Neel , month = jul, year =. An

  60. [61]

    Stolfo, Alessandro and Belinkov, Yonatan and Sachan, Mrinmaya , month = oct, year =. A. doi:10.48550/arXiv.2305.15054 , abstract =

  61. [62]

    Hierarchical Reasoning Model

    Wang, Guan and Li, Jin and Sun, Yuhao and Chen, Xing and Liu, Changling and Wu, Yue and Lu, Meng and Song, Sen and Yadkori, Yasin Abbasi , month = aug, year =. Hierarchical. doi:10.48550/arXiv.2506.21734 , abstract =

  62. [63]

    Ren, Zirui and Liu, Ziming , month = jan, year =. Are. doi:10.48550/arXiv.2601.10679 , abstract =

  63. [64]

    Recursive Language Models

    Zhang, Alex L. and Kraska, Tim and Khattab, Omar , month = jan, year =. Recursive. doi:10.48550/arXiv.2512.24601 , abstract =

  64. [65]

    and Rossi, Ryan A

    Basu, Samyadeep and Morariu, Vlad I. and Rossi, Ryan A. and Zhao, Nanxuan and Wang, Zichao and Feizi, Soheil and Manjunatha, Varun , month = aug, year =. On

  65. [66]

    Du, Hongzhe and Li, Weikai and Cai, Min and Saraipour, Karim and Zhang, Zimin and Lakkaraju, Himabindu and Sun, Yizhou and Zhang, Shichang , month = nov, year =. How. doi:10.48550/arXiv.2504.02904 , abstract =

  66. [67]

    Hanna, Michael and Pezzelle, Sandro and Belinkov, Yonatan , month = jul, year =. Have. doi:10.48550/arXiv.2403.17806 , abstract =

  67. [68]

    Yan, Jianzhi and Liu, Le and Pan, Youcheng and Chen, Shiwei and Xiang, Yang and Tang, Buzhou , month = sep, year =. Towards. doi:10.48550/arXiv.2509.23574 , abstract =

  68. [69]

    Distilling the

    Chen, Wei-Rui and Kothapalli, Vignesh and Fatahibaarzi, Ata and Sang, Hejian and Tang, Shao and Song, Qingquan and Wang, Zhipeng and Abdul-Mageed, Muhammad , month = jan, year =. Distilling the. doi:10.48550/arXiv.2512.21002 , abstract =

  69. [70]

    , month = nov, year =

    Tian, Yijun and Han, Yikun and Chen, Xiusi and Wang, Wei and Chawla, Nitesh V. , month = nov, year =. Beyond. doi:10.48550/arXiv.2402.04616 , abstract =

  70. [71]

    Dai, Chengwei and Li, Kun and Zhou, Wei and Hu, Songlin , month = may, year =. Beyond. doi:10.48550/arXiv.2405.19737 , abstract =

  71. [72]

    Hu, Yueqing and Peng, Xinyang and Peng, Shuting and Wang, Hanqi and Wang, Tianhong , month = jan, year =. Hán. doi:10.48550/arXiv.2601.05019 , abstract =

  72. [73]

    Li, Chenglin and Chen, Qianglong and Li, Liangyue and Wang, Caiyu and Li, Yicheng and Chen, Zulong and Zhang, Yin , month = feb, year =. Mixed. doi:10.48550/arXiv.2312.10730 , abstract =

  73. [74]

    doi:10.48550/arXiv.2310.14747 , abstract =

    Chen, Hongzhan and Wu, Siyue and Quan, Xiaojun and Wang, Rui and Yan, Ming and Zhang, Ji , month = dec, year =. doi:10.48550/arXiv.2310.14747 , abstract =

  74. [75]

    Chen, Qiguang and Du, Yantao and Li, Ziniu and Liu, Jinhao and Duan, Songyao and Guo, Jiarui and Liu, Minghao and Liu, Jiaheng and Yang, Tong and Zhang, Ge and Qin, Libo and Che, Wanxiang and Huang, Wenhao , month = jan, year =. The. doi:10.48550/arXiv.2601.06002 , abstract =

  75. [76]

    and Zaharia, Matei and Gonzalez, Joseph E

    Li, Dacheng and Cao, Shiyi and Griggs, Tyler and Liu, Shu and Mo, Xiangxi and Tang, Eric and Hegde, Sumanth and Hakhamaneshi, Kourosh and Patil, Shishir G. and Zaharia, Matei and Gonzalez, Joseph E. and Stoica, Ion , month = feb, year =. doi:10.48550/arXiv.2502.07374 , abstract =

  76. [77]

    Demystifying Long Chain-of-Thought Reasoning in LLMs

    Yeo, Edward and Tong, Yuxuan and Niu, Morry and Neubig, Graham and Yue, Xiang , month = feb, year =. Demystifying. doi:10.48550/arXiv.2502.03373 , abstract =

  77. [78]

    Keypoint-based

    Feng, Kaituo and Li, Changsheng and Zhang, Xiaolu and Zhou, Jun and Yuan, Ye and Wang, Guoren , month = may, year =. Keypoint-based. doi:10.48550/arXiv.2405.16064 , abstract =

  78. [79]

    Chen, Xiao and Zhou, Sihang and Liang, Ke and Sun, Xiaoyu and Liu, Xinwang , month = may, year =. Skip-. doi:10.48550/arXiv.2505.18642 , abstract =

  79. [80]

    Michaud, Eric J. and Liao, Isaac and Lad, Vedang and Liu, Ziming and Mudide, Anish and Loughridge, Chloe and Guo, Zifan Carl and Kheirkhah, Tara Rezaei and Vukelić, Mateja and Tegmark, Max , month = feb, year =. Opening the. doi:10.48550/arXiv.2402.05110 , abstract =

  80. [81]

    doi:10.48550/arXiv.2402.04678 , abstract =

    Chuang, Yu-Neng and Wang, Guanchu and Chang, Chia-Yuan and Tang, Ruixiang and Zhong, Shaochen and Yang, Fan and Du, Mengnan and Cai, Xuanting and Braverman, Vladimir and Hu, Xia , month = oct, year =. doi:10.48550/arXiv.2402.04678 , abstract =

Showing first 80 references.