arxiv: 2602.03396 · v3 · submitted 2026-02-03 · 💻 cs.CL

Recognition: 2 theorem links

· Lean Theorem

Towards Distillation-Resistant Large Language Models: An Information-Theoretic Perspective

Hao Fang , Tianyi Zhang , Tianqu Zhuang , Jiawei Kong , Kuofeng Gao , Bin Chen , Leqi Zheng , Shu-Tao Xia

show 1 more author

Ke Xu

Authors on Pith no claims yet

Pith reviewed 2026-05-16 08:13 UTC · model grok-4.3

classification 💻 cs.CL

keywords distillation resistanceconditional mutual informationLLM protectionlogit transformationmodel extractioninformation theoretic defenseAPI security

0 comments

The pith

Minimizing conditional mutual information in LLM outputs via a learned linear transformation blocks logit-based distillation attacks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to defend large language models against logit-based distillation by characterizing the relevant information in their outputs using conditional mutual information. Specifically, it measures the CMI between teacher logits and input queries conditioned on ground-truth labels, which captures information that aids model extraction. By learning a transformation matrix to minimize this CMI, the approach purifies the outputs to reduce distillation success while maintaining the model's task performance. This is important because existing defenses do not address logit-based distillation, allowing adversaries to extract valuable knowledge from black-box APIs.

Core claim

We characterize distillation-relevant information in teacher outputs using the conditional mutual information (CMI) between teacher logits and input queries conditioned on ground-truth labels. Guided by this, we learn a transformation matrix that purifies the original outputs by minimizing a CMI-inspired objective, which removes distillation-relevant information while preserving output utility.

What carries the argument

The conditional mutual information (CMI) between teacher logits and input queries conditioned on ground-truth labels, minimized through optimization of a learned linear transformation matrix applied to the logits.

If this is right

Distillation algorithms achieve significantly lower performance on the protected models.
Task accuracy on original benchmarks remains nearly unchanged.
The approach works across different large language models and multiple distillation techniques.
Model owners can deploy the transformation at API level to safeguard intellectual property.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This purification technique could be extended to other output formats beyond logits to prevent additional extraction methods.
The linear nature of the transformation allows for efficient deployment in API services with minimal computational cost.
One might test whether the defense holds when attackers have partial knowledge of the transformation matrix.

Load-bearing premise

Minimizing the defined conditional mutual information via the learned linear transformation removes exactly the distillation-relevant information without introducing new vulnerabilities or causing unintended utility loss.

What would settle it

If experiments show that strong distillation algorithms still achieve high accuracy on the transformed outputs compared to untransformed ones across multiple LLMs, the central claim would be falsified.

Figures

Figures reproduced from arXiv: 2602.03396 by Bin Chen, Hao Fang, Jiawei Kong, Ke Xu, Kuofeng Gao, Leqi Zheng, Shu-Tao Xia, Tianqu Zhuang, Tianyi Zhang.

**Figure 2.** Figure 2: An illustration of the proposed method. In the left subfigure, we introduce a surrogate [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Performance of the teacher model under the defense and visualization comparison of [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Accuracy of teacher and distilled student models [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 6.** Figure 6: Ablation study on the role of LCE in teacher model’s accuracy. We conduct experiments on the Qwen2.5-7B. 0.0 0.3 0.5 0.7 1.0 30 40 50 60 70 Accuracy (%) Vanilla / ABKD Ours / ABKD [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

**Figure 8.** Figure 8: Qualitative comparison between the teacher before and after applying the defensive matrix [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

**Figure 9.** Figure 9: Qualitative comparison between the teacher before and after applying the defensive matrix [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗

**Figure 10.** Figure 10: Qualitative comparison between the student before and after applying the defensive matrix [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗

**Figure 11.** Figure 11: Qualitative comparison between the student before and after applying the defensive matrix [PITH_FULL_IMAGE:figures/full_fig_p019_11.png] view at source ↗

**Figure 12.** Figure 12: Qualitative comparison between the student before and after applying the defensive matrix [PITH_FULL_IMAGE:figures/full_fig_p019_12.png] view at source ↗

**Figure 13.** Figure 13: Qualitative comparison between the student before and after applying the defensive matrix [PITH_FULL_IMAGE:figures/full_fig_p020_13.png] view at source ↗

**Figure 14.** Figure 14: Qualitative comparison between the student before and after applying the defensive matrix [PITH_FULL_IMAGE:figures/full_fig_p021_14.png] view at source ↗

read the original abstract

Proprietary large language models (LLMs) embody substantial economic value and are generally exposed only as black-box APIs, yet adversaries can still exploit their outputs to extract knowledge via distillation. Existing defenses focus exclusively on text-based distillation, leaving the important logit-based distillation largely unexplored. In this work, we analyze this problem and present an effective solution from an information-theoretic perspective. We characterize distillation-relevant information in teacher outputs using the conditional mutual information (CMI) between teacher logits and input queries conditioned on ground-truth labels. This quantity captures contextual information beneficial for model extraction, motivating us to defend distillation via CMI minimization. Guided by our theoretical analysis, we propose learning a transformation matrix that purifies the original outputs to enhance distillation resistance. We further derive a CMI-inspired anti-distillation objective to optimize this transformation, which effectively removes distillation-relevant information while preserving output utility. Extensive experiments across multiple LLMs and strong distillation algorithms demonstrate that the proposed method significantly degrades distillation performance while preserving task accuracy, effectively protecting models' intellectual property.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper uses conditional mutual information to motivate a learned linear matrix that purifies LLM logits against distillation, backed by experiments on several models, but the link from CMI reduction to actual unlabeled extraction resistance needs tighter checking.

read the letter

The main takeaway is that this work frames logit distillation as an information leakage problem and proposes learning a transformation matrix to minimize the conditional mutual information between teacher outputs and input queries given labels. They then optimize a CMI-inspired loss to purify the logits while trying to keep downstream task performance intact. Experiments across multiple LLMs and several distillation algorithms show the student models end up with noticeably lower accuracy after the defense, with only small drops in the teacher's own utility. That combination of theory sketch and concrete numbers is the useful part here. It gives API providers a practical knob they can turn without retraining the whole model. The approach is new in applying this specific CMI characterization to logit-based extraction rather than text-only defenses. The experiments appear reproducible enough on the surface to let others test the matrix on their own setups. The soft spot is the gap between the stated objective and what distillation actually exploits. Distillation typically runs on unlabeled queries, so it relies on the marginal dependence between outputs and inputs. Minimizing the conditional quantity does not automatically drive the unconditional one to zero, and a linear transform can easily leave query-dependent structure that is merely uncorrelated with the label. The paper would be stronger with direct measurements of how much extractable information remains after the transform and with checks against stronger or adaptive distillation methods that ignore labels entirely. No analysis is given on whether the learned matrix creates new side channels or fails on out-of-distribution queries. This is for readers focused on model extraction defenses and API security rather than core LLM capabilities. It has enough of a fresh angle and empirical backing to justify sending it to referees, though the review should press for clearer evidence that the CMI reduction translates to robust protection in the unlabeled setting.

Referee Report

2 major / 2 minor

Summary. The paper claims that distillation-relevant information in LLM teacher logits can be characterized via the conditional mutual information (CMI) between logits and input queries conditioned on ground-truth labels. It proposes learning a linear transformation matrix optimized by a CMI-inspired anti-distillation objective that removes this information while preserving output utility, thereby degrading logit-based distillation performance without harming task accuracy. Experiments across multiple LLMs and strong distillation algorithms are reported to support the defense.

Significance. If the central claim holds, the work supplies a principled, information-theoretic defense against logit-based model extraction, filling a gap left by text-only defenses. The CMI framing and efficient linear transform are attractive for practical API protection of proprietary models, and the reported preservation of task accuracy alongside reduced distillability would be a notable contribution.

major comments (2)

[§3.2] §3.2 and the derivation of the CMI objective: the central claim that minimizing I(T·logits; query | label) removes exactly the information exploitable by logit-based distillation is not supported by a bound or reduction argument relating conditional MI to the marginal I(T·logits; query). Distillation operates on unlabeled queries and matches output distributions, so a linear T that only decorrelates from the label could leave query-dependent structure intact; the manuscript must either derive that the unconditional MI is also controlled or provide a counter-example analysis.
[§5] §5, experimental section: the reported degradation in distillation performance is presented without ablation on the gap between the CMI-inspired surrogate loss and true CMI, nor on whether the learned T introduces new vulnerabilities (e.g., invertibility of the transform or leakage through the preserved marginal). Without these controls, it is unclear whether the observed resistance is robust to stronger or adaptive extractors.

minor comments (2)

[§2] Notation for the transformation matrix T and the precise definition of the CMI estimator should be introduced earlier and used consistently to aid readability.
[§5] Figure captions and axis labels in the experimental plots would benefit from explicit mention of the baseline (untransformed) distillation accuracy for direct comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below with clarifications and commit to revisions that strengthen the theoretical grounding and experimental validation without altering the core claims or results.

read point-by-point responses

Referee: [§3.2] §3.2 and the derivation of the CMI objective: the central claim that minimizing I(T·logits; query | label) removes exactly the information exploitable by logit-based distillation is not supported by a bound or reduction argument relating conditional MI to the marginal I(T·logits; query). Distillation operates on unlabeled queries and matches output distributions, so a linear T that only decorrelates from the label could leave query-dependent structure intact; the manuscript must either derive that the unconditional MI is also controlled or provide a counter-example analysis.

Authors: We appreciate the referee highlighting the need to relate conditional and unconditional mutual information. The CMI formulation is motivated by the observation that ground-truth labels encode the primary task signal, so the residual dependence I(T·logits; query | label) isolates the contextual information that logit-based distillation exploits beyond label matching. While the current manuscript does not contain an explicit bound, a short derivation shows that I(T·logits; query) ≤ I(T·logits; query | label) + I(T·logits; label), where the second term is controlled by the utility-preserving regularizer. We will add this inequality and a brief proof sketch to §3.2, together with a short discussion of why the linear transform does not leave exploitable query structure intact under the proposed objective. This revision will be included in the next version. revision: yes
Referee: [§5] §5, experimental section: the reported degradation in distillation performance is presented without ablation on the gap between the CMI-inspired surrogate loss and true CMI, nor on whether the learned T introduces new vulnerabilities (e.g., invertibility of the transform or leakage through the preserved marginal). Without these controls, it is unclear whether the observed resistance is robust to stronger or adaptive extractors.

Authors: We agree that additional controls would strengthen the experimental claims. The surrogate loss is employed for tractability, as direct CMI estimation scales poorly with logit dimensionality. In the revision we will add (i) an ablation on a subset of models comparing the surrogate objective against a Monte-Carlo estimate of true CMI, quantifying the approximation gap, and (ii) targeted experiments that test invertibility of the learned T (by attempting logit recovery) and evaluate leakage through the preserved marginal by training adaptive extractors that explicitly model the transformation. These results will be reported in an expanded §5. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper defines CMI from standard information theory as a characterization of distillation-relevant information in logits, then derives a linear transformation and CMI-inspired objective to minimize it. This is a standard theoretical motivation followed by an optimization procedure whose effectiveness is checked via external experiments on multiple LLMs and distillation algorithms. No step reduces by construction to a fitted parameter renamed as prediction, a self-citation chain, or an ansatz smuggled from prior work; the central claim that the transformed outputs degrade distillation while preserving utility is supported by empirical results rather than definitional equivalence.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on the assumption that CMI precisely isolates distillation-relevant information and that a linear transformation suffices to minimize it; no free parameters are explicitly named but the matrix itself is learned from data.

free parameters (1)

transformation matrix entries
Parameters of the learned purification matrix optimized via the CMI-inspired objective.

axioms (1)

domain assumption Conditional mutual information between logits and queries given labels captures exactly the information useful for distillation
Central modeling choice stated in the abstract as the basis for the defense.

pith-pipeline@v0.9.0 · 5499 in / 1180 out tokens · 45515 ms · 2026-05-16T08:13:42.395892+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We characterize distillation-relevant information ... via the conditional mutual information (CMI) ... I(X;Z|Y) ... Theorem 3: I(X;Z|Y)=I(X;Z)−I(Z;Y) (β=1 IB case). We propose learning a transformation matrix M ... Z′=M·Z ... L_M = L_CE + λ·L_grad
IndisputableMonolith/Foundation/LogicAsFunctionalEquation.lean SatisfiesLawsOfLogic unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

minimizing I(X;Z|Y) ... Markov chain Y→X→Z→Z′ ... I(X;Z|Y)≥I(X;Z′|Y)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 6 internal anchors

[1]

Healai: A healthcare llm for effective medical documentation

Sagar Goyal, Eti Rastogi, Sree Prasanna Rajagopal, Dong Yuan, Fen Zhao, Jai Chintagunta, Gautam Naik, and Jeff Ward. Healai: A healthcare llm for effective medical documentation. InProceedings of the 17th ACM International Conference on Web Search and Data Mining, pages 1167–1168, 2024

work page 2024
[2]

Autogen: Enabling next-gen llm applications via multi-agent conversation

Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al. Autogen: Enabling next-gen llm applications via multi-agent conversation. InICLR 2024 Workshop on Large Language Model (LLM) Agents, 2024

work page 2024
[3]

Improving factuality and reasoning in language models through multiagent debate

Yilun Du, Shuang Li, Antonio Torralba, Joshua B Tenenbaum, and Igor Mordatch. Improving factuality and reasoning in language models through multiagent debate. InInternational Conference on Machine Learning, pages 11733–11763. PMLR, 2024

work page 2024
[4]

Distilling the Knowledge in a Neural Network

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network.arXiv preprint arXiv:1503.02531, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[5]

Sequence-level knowledge distillation

Yoon Kim and Alexander M Rush. Sequence-level knowledge distillation. InProceedings of the 2016 conference on empirical methods in natural language processing, pages 1317–1327, 2016

work page 2016
[6]

Minillm: Knowledge distillation of large language models

Yuxian Gu, Li Dong, Furu Wei, and Minlie Huang. Minillm: Knowledge distillation of large language models. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[7]

A watermark for large language models

John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. A watermark for large language models. InInternational Conference on Machine Learning, pages 17061–17084. PMLR, 2023

work page 2023
[8]

Securing large language models: A survey of watermarking and fingerprinting techniques.ACM Computing Surveys, 2025

Peigen Ye, Huali Ren, Zhengdao Li, Anli Yan, Hongyang Yan, Shaowei Wang, and Jin Li. Securing large language models: A survey of watermarking and fingerprinting techniques.ACM Computing Surveys, 2025

work page 2025
[9]

Watermarking techniques for large language models: A survey.Artificial Intelligence Review, 2026

Yuqing Liang, Jiancheng Xiao, Wensheng Gan, and Philip S Yu. Watermarking techniques for large language models: A survey.Artificial Intelligence Review, 2026

work page 2026
[10]

D-dae: Defense-penetrating model extraction attacks

Yanjiao Chen, Rui Guan, Xueluan Gong, Jianshuo Dong, and Meng Xue. D-dae: Defense-penetrating model extraction attacks. In2023 IEEE Symposium on Security and Privacy (SP), pages 382–399. IEEE, 2023

work page 2023
[11]

Artificial fingerprinting for generative models: Rooting deepfake attribution in training data

Ning Yu, Vladislav Skripniuk, Sahar Abdelnabi, and Mario Fritz. Artificial fingerprinting for generative models: Rooting deepfake attribution in training data. InProceedings of the IEEE/CVF International conference on computer vision, pages 14448–14457, 2021

work page 2021
[12]

Fingerprinting deep neural networks globally via universal adversarial perturbations

Zirui Peng, Shaofeng Li, Guoxing Chen, Cheng Zhang, Haojin Zhu, and Minhui Xue. Fingerprinting deep neural networks globally via universal adversarial perturbations. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13430–13439, 2022

work page 2022
[13]

Antidistillation sampling.arXiv preprint arXiv:2504.13146, 2025

Yash Savani, Asher Trockman, Zhili Feng, Yixuan Even Xu, Avi Schwarzschild, Alexander Robey, Marc Finzi, and J Zico Kolter. Antidistillation sampling.arXiv preprint arXiv:2504.13146, 2025

work page arXiv 2025
[14]

Doge: Defensive output generation for llm protection against knowledge distillation.arXiv preprint arXiv:2505.19504, 2025

Pingzhi Li, Zhen Tan, Mohan Zhang, Huaizhi Qu, Huan Liu, and Tianlong Chen. Doge: Defensive output generation for llm protection against knowledge distillation.arXiv preprint arXiv:2505.19504, 2025

work page arXiv 2025
[15]

Alphanet: Improved training of supernets with alpha-divergence

Dilin Wang, Chengyue Gong, Meng Li, Qiang Liu, and Vikas Chandra. Alphanet: Improved training of supernets with alpha-divergence. InInternational Conference on Machine Learning, pages 10760–10771. PMLR, 2021

work page 2021
[16]

Abkd: Pursuing a proper allocation of the probability mass in knowledge distillation via α-β-divergence

Guanghui Wang, Zhiyong Yang, Zitai Wang, Shi Wang, Qianqian Xu, and Qingming Huang. Abkd: Pursuing a proper allocation of the probability mass in knowledge distillation via α-β-divergence. In Forty-second International Conference on Machine Learning, 2025

work page 2025
[17]

Bayes conditional distribution estimation for knowledge distillation based on conditional mutual information

Linfeng Ye, Shayan Mohajer Hamidi, Renhao Tan, and EN-HUI Y ANG. Bayes conditional distribution estimation for knowledge distillation based on conditional mutual information. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[18]

Opening the black box of deep neural networks via information

Ravid Shwartz-Ziv and Naftali Tishby. Opening the black box of deep neural networks via information. ArXiv, 2017. 10

work page 2017
[19]

Instructional fingerprinting of large language models

Jiashu Xu, Fei Wang, Mingyu Ma, Pang Wei Koh, Chaowei Xiao, and Muhao Chen. Instructional fingerprinting of large language models. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 3277–3306, 2024

work page 2024
[20]

The information bottleneck method.ArXiv, 2000

Naftali Tishby, Fernando C Pereira, and William Bialek. The information bottleneck method.ArXiv, 2000

work page 2000
[21]

Deep learning and the information bottleneck principle.Information Theory Workshop (ITW), 2015

Naftali Tishby and Noga Zaslavsky. Deep learning and the information bottleneck principle.Information Theory Workshop (ITW), 2015

work page 2015
[22]

Alemi, Ian Fischer, and Joshua V

Alexander A. Alemi, Ian Fischer, and Joshua V . Dillon. Deep variational information bottleneck.Interna- tional Conference on Learning Representations (ICLR), 2017

work page 2017
[23]

Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022

work page 2022
[24]

Qwen2.5 Technical Report

Qwen An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxin Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin,...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[25]

The Llama 3 Herd of Models

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al- Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[26]

Training Verifiers to Solve Math Word Problems

Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training verifiers to solve math word problems.arXiv preprint arXiv:2110.14168, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[27]

Measuring Massive Multitask Language Understanding

Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. Measuring massive multitask language understanding.arXiv preprint arXiv:2009.03300, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2009
[28]

Measuring Mathematical Problem Solving With the MATH Dataset

Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt. Measuring mathematical problem solving with the math dataset.arXiv preprint arXiv:2103.03874, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[29]

Tokenskip: Controllable chain-of-thought compression in llms.arXiv preprint arXiv:2502.12067, 2025

Heming Xia, Chak Tou Leong, Wenjie Wang, Yongqi Li, and Wenjie Li. Tokenskip: Controllable chain-of-thought compression in llms.arXiv preprint arXiv:2502.12067, 2025

work page arXiv 2025
[30]

Let’s verify step by step

Hunter Lightman, Vineet Kosaraju, Yuri Burda, Harrison Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. Let’s verify step by step. InThe Twelfth International Conference on Learning Representations, 2023

work page 2023
[31]

Distillm: Towards streamlined distillation for large language models

Jongwoo Ko, Sungnyun Kim, Tianyi Chen, and Se-Young Yun. Distillm: Towards streamlined distillation for large language models. InInternational Conference on Machine Learning, pages 24872–24895. PMLR, 2024. 11 A Proofs Theorem 2.Let X∈ V ∗ be the current input sequence, Y∈ V be the ground-truth next token, Z=f(X)∈R |V| be the teacher logits for predicting ...

work page 2024
[32]

Provide a short explanation within 128 words

work page
[33]

Don’t repeat the question

Avoid unnecessary or verbose explanations. Don’t repeat the question

work page
[34]

using someone as a mere means

After the short explanation, give your final answer enclosed in \boxed{}. Now, please give your response. 13 B.2 Knowledge Distillation Configuration We train the transformation matrix for 5 epochs and select the final matrix that minimizes Lgrad on the premise of preserving the teacher’s original performance. The distillation loss coefficient α is set to...

work page 2048
[35]

9. 10. 11. 12. 13. 14

work page
[36]

16. 17. 18. 19. 20. 21

work page
[37]

23. 24. 25. 26. 27. 28

work page
[38]

30. 31. 32. 33. 34. 35

work page
[39]

37. 38. 39. 40. 41. 42

work page
[40]

44. 45. 46. 47. 48. 49

work page
[41]

51. 52. (more continued) Figure 12: Qualitative comparison between the student before and after applying the defensive matrix with Qwen2.5-1.5B on MMLU. 19 Question A Senate committee has 8 Republicans and 6 Democrats. In how many ways can we form a subcommittee of 5 members that has at least one member from each party? Vallina Distillation We can use the...

work page 2002