arxiv: 2604.25110 · v2 · submitted 2026-04-28 · 💻 cs.LG · cs.AI

Recognition: unknown

Knowledge Distillation Must Account for What It Loses

Wenshuo Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-08 03:27 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords knowledge distillationmodel compressionevaluation metricscapability preservationlossy projectiondistillation lossesposition paper

0 comments

The pith

Distillation often lets students match teacher task scores while losing the capabilities that make those scores reliable.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Knowledge distillation compresses large teacher models into smaller students for practical use. The paper contends that judging success solely by retained task performance misses whether the student still behaves like the teacher in the ways that matter for reliability. It reframes distillation as a lossy projection that can match selected observables while dropping other properties. Evidence from existing work is organized into a taxonomy of recurring off-metric losses that are measurable yet rarely reported. The authors propose scenario-specific preservation targets and a Distillation Loss Statement to make explicit what is kept, what is discarded, and why the losses are tolerable.

Core claim

The paper claims that current evaluation assumes retained task scores imply retained teacher capabilities, but reframing distillation as a lossy projection shows students can match selected observables without preserving the capabilities that make teacher behavior reliable; existing studies contain concrete, recurring, measurable off-metric losses that are unaccounted for when only retention is reported.

What carries the argument

Reframing knowledge distillation as a lossy projection, together with a taxonomy of off-metric distillation losses and the proposed Distillation Loss Statement that reports preserved elements, lost elements, and acceptable remaining losses.

If this is right

Evaluations will need to check preservation of specific teacher capabilities beyond headline task metrics.
Different deployment scenarios will require distinct preservation targets rather than uniform score matching.
A Distillation Loss Statement will document what was kept, what was lost, and the justification for remaining losses.
Studies will shift from reporting only retained performance to also quantifying and accepting off-metric losses.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Benchmark suites could add capability probes that are independent of the original training task to expose hidden losses.
In regulated domains such as healthcare or autonomous systems, the statement could become part of model release documentation.
The same logic may apply to other compression methods like pruning or quantization where performance metrics can mask behavioral drift.

Load-bearing premise

That retained task scores reliably indicate preserved teacher capabilities and that off-metric losses are concrete enough to be identified and measured in practice.

What would settle it

A controlled distillation experiment in which students achieve equivalent task scores to the teacher yet show no measurable differences on capability tests for robustness, calibration, or out-of-distribution behavior would weaken the claim.

read the original abstract

This position paper argues that knowledge distillation must account for what it loses: student models should be judged not only by retained task scores, but by whether they preserve the teacher capabilities that make those scores reliable. This matters because distillation is increasingly used to turn large teacher models into deployable students, yet headline metrics can obscure losses in the capabilities that make teacher behavior reliable. Conceptually, we show that current evaluation often assumes retained task scores imply retained teacher capabilities. Reframing distillation as a lossy projection exposes this flaw: students may match selected teacher observables without preserving the capabilities that make them reliable. We then synthesize existing evidence into a taxonomy of off-metric distillation losses, showing that such losses are concrete, recurring, and measurable, yet often unaccounted for when studies report what students retain rather than what they lose. To make the position actionable, we propose scenario-specific preservation targets and a Distillation Loss Statement that reports what was preserved, what was lost, and why the remaining losses are acceptable. The goal is not lossless distillation, but accountable distillation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper usefully calls for reporting what distillation loses beyond task scores, but adds no new data or examples to show how common or costly those losses are.

read the letter

The core point is that distillation can keep headline accuracy while dropping other teacher capabilities that matter for reliability, and papers rarely check for it. The authors synthesize prior examples into a taxonomy of off-metric losses and propose a simple Distillation Loss Statement plus scenario-specific preservation targets to make the accounting explicit. That framing is clear and the reporting template is practical enough that groups working on compression could adopt it without much overhead. It pulls scattered concerns into one place and gives them a name, which is the main service the paper performs. The weakness is that the argument stays at the level of synthesis and assertion. There are no fresh measurements, no worked example applying the taxonomy to a real distillation run, and no quantification of how large or frequent the losses tend to be in practice. The claim that these losses are concrete and recurring therefore rests on the cited literature rather than anything demonstrated here. Readers already convinced by the referenced papers will nod along; others will want to see the size of the problem before changing their evaluation habits. This is the sort of position piece that belongs in the model-efficiency community, where single-metric optimization is still common. It is coherent on its own terms and engages the existing literature honestly, so it deserves a serious referee even though the contribution is mainly organizational rather than empirical.

Referee Report

2 major / 1 minor

Summary. This position paper argues that knowledge distillation must account for what it loses: student models should be judged not only by retained task scores, but by whether they preserve the teacher capabilities that make those scores reliable. It reframes distillation as a lossy projection to expose the flaw in assuming retained task scores imply retained capabilities, synthesizes existing evidence into a taxonomy of off-metric distillation losses (showing they are concrete, recurring, and measurable yet often unaccounted for), and proposes scenario-specific preservation targets along with a Distillation Loss Statement that reports what was preserved, what was lost, and why remaining losses are acceptable.

Significance. If the position holds, this work could shift evaluation norms in distillation research toward more accountable reporting of capability losses, particularly for deployed student models in reliability-sensitive settings. The synthesis of prior evidence into a structured taxonomy and the introduction of concrete tools (preservation targets and the Distillation Loss Statement) provide a practical framework that builds directly on existing literature without introducing new parameters or ungrounded entities.

major comments (2)

[Proposal for preservation targets and Distillation Loss Statement] The proposal for scenario-specific preservation targets and the Distillation Loss Statement is central to the claim of actionability. The manuscript does not supply a template, example format, or worked illustration of the Statement (e.g., what fields it would contain or how it would be populated for a concrete distillation scenario), which is load-bearing for readers to assess its feasibility.
[Taxonomy of off-metric distillation losses] The taxonomy of off-metric distillation losses asserts that such losses 'are concrete, recurring, and measurable, yet often unaccounted for.' Because this synthesis underpins the reframing and the call for change, the manuscript should include at least one specific citation or brief summary per category that demonstrates an observed loss in prior work that was omitted from standard task-score reporting.

minor comments (1)

[Abstract] The abstract introduces the term 'Distillation Loss Statement' without a one-sentence definition or parenthetical gloss; a brief clarification on first use would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment of our position paper and the recommendation for minor revision. We address each major comment below, agreeing to incorporate concrete additions that strengthen the actionability and evidentiary support of our proposals without altering the core arguments.

read point-by-point responses

Referee: [Proposal for preservation targets and Distillation Loss Statement] The proposal for scenario-specific preservation targets and the Distillation Loss Statement is central to the claim of actionability. The manuscript does not supply a template, example format, or worked illustration of the Statement (e.g., what fields it would contain or how it would be populated for a concrete distillation scenario), which is load-bearing for readers to assess its feasibility.

Authors: We agree that an explicit template and worked example are necessary to demonstrate feasibility. In the revised manuscript, we will add a new subsection providing a clear template for the Distillation Loss Statement with fields including Scenario Description, Preservation Targets, Measured Losses (with methods), and Justification for Acceptability of Remaining Losses. We will populate this template with a worked illustration drawn from a standard distillation scenario in the literature (e.g., distilling a vision transformer for image classification), showing how the fields would be completed based on patterns from existing studies. This addition will be placed in the section on making the position actionable. revision: yes
Referee: [Taxonomy of off-metric distillation losses] The taxonomy of off-metric distillation losses asserts that such losses 'are concrete, recurring, and measurable, yet often unaccounted for.' Because this synthesis underpins the reframing and the call for change, the manuscript should include at least one specific citation or brief summary per category that demonstrates an observed loss in prior work that was omitted from standard task-score reporting.

Authors: We acknowledge the value of grounding each taxonomy category with specific evidence. We will revise the taxonomy section to include, for every loss category, at least one citation to prior work accompanied by a brief summary of the observed off-metric loss that was not reported via standard task scores. These citations and summaries will be selected from the existing literature synthesized in the paper, ensuring the additions remain within the scope of a position paper and do not require new experiments. This will directly support the claim that such losses are recurring yet unaccounted for. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

This is a position paper that reframes knowledge distillation conceptually as a lossy projection and synthesizes existing literature into a taxonomy of off-metric losses, without introducing equations, derivations, fitted parameters, or quantitative predictions. The central claims rest on references to prior external evidence rather than internal self-citations, self-definitions, or renamings that reduce to the paper's own inputs by construction. No load-bearing step equates a claimed result to a fitted input or prior author work in a circular manner; the argument remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central position rests on the domain assumption that distillation is inherently lossy in unmeasured capabilities and that current metrics obscure this. It introduces the Distillation Loss Statement as a new reporting construct without independent prior evidence.

axioms (1)

domain assumption Current distillation evaluation assumes retained task scores imply retained teacher capabilities
Explicitly stated in the abstract as the flaw being addressed.

invented entities (1)

Distillation Loss Statement no independent evidence
purpose: A reporting format that documents preserved capabilities, lost capabilities, and justification for acceptable losses
Proposed as a new actionable tool in the abstract with no reference to prior existence or validation.

pith-pipeline@v0.9.0 · 5474 in / 1250 out tokens · 60462 ms · 2026-05-08T03:27:12.282950+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

104 extracted references · 17 canonical work pages · 8 internal anchors

[1]

Hinton, G., Vinyals, O., and Dean, J. (2015). Distilling the Knowledge in a Neural Network. arXiv preprint arXiv:1503.02531

work page internal anchor Pith review arXiv 2015
[2]

J., and Tao, D

Gou, J., Yu, B., Maybank, S. J., and Tao, D. (2021). Knowledge Distillation: A Survey.International Journal of Computer Vision, 129(6):1789–1819

2021
[3]

Sanh, V ., Debut, L., Chaumond, J., and Wolf, T. (2019). DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter. arXiv preprint arXiv:1910.01108

work page internal anchor Pith review arXiv 2019
[4]

Xu, X., Li, M., Tao, C., Shen, T., Cheng, R., Li, J., Xu, C., Tao, D., and Zhou, T. (2024). A Survey on Knowledge Distillation of Large Language Models. arXiv preprint arXiv:2402.13116

work page arXiv 2024
[5]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-AI (2025). DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. arXiv preprint arXiv:2501.12948

work page internal anchor Pith review arXiv 2025
[6]

Kang, M., Jeong, J., Lee, S., Cho, J., and Hwang, S. J. (2025). Distilling LLM Agent into Small Models with Retrieval and Code Tools. OpenReview, NeurIPS 2025 Conference (Spotlight)

2025
[7]

Stanton, S., Izacard, G., and Roux, N. L. (2021). Does Knowledge Distillation Really Work? InAdvances in Neural Information Processing Systems

2021
[8]

Ojha, U., Li, Y ., Sundara Rajan, A., Liang, Y ., and Lee, Y . J. (2023). What Knowledge Gets Distilled in Knowledge Distillation? InAdvances in Neural Information Processing Systems, 36:11037–11048

2023
[9]

Mohanty, S., Roosta, T., and Passban, P. (2023). What Is Lost in Knowledge Distillation? arXiv preprint arXiv:2311.04142

work page arXiv 2023
[10]

Hebbalaguppe, R., Baranwal, M., Prakash, J., Madan, N., Anand, K., and Arora, C. (2024). Understanding Calibration Transfer in Knowledge Distillation. OpenReview, ICLR 2024 withdrawn submission

2024
[11]

and Rei, M

Stacey, J. and Rei, M. (2024). Distilling Robustness into Natural Language Inference Models with Domain- Targeted Augmentation. InFindings of the Association for Computational Linguistics: ACL 2024, pages 2239–2258

2024
[12]

A., Carlini, N., and Tramèr, F

Jagielski, M., Nasr, M., Lee, K., Choquette-Choo, C. A., Carlini, N., and Tramèr, F. (2023). Students Parrot Their Teachers: Membership Inference on Model Distillation. InAdvances in Neural Information Processing Systems, 36

2023
[13]

S., Lu, H., Cai, Y ., and Haddadi, H

Zhang, Z., Shamsabadi, A. S., Lu, H., Cai, Y ., and Haddadi, H. (2025). Membership and Memorization in LLM Knowledge Distillation. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 20074–20084

2025
[14]

Reasoning Models Don't Always Say What They Think

Chen, Y ., Benton, J., Radhakrishnan, A., Uesato, J., Denison, C., Schulman, J., Somani, A., Hase, P., Wagner, M., Roger, F., Mikulik, V ., Bowman, S. R., Leike, J., Kaplan, J., and Perez, E. (2025). Reasoning Models Don’t Always Say What They Think. arXiv preprint arXiv:2505.05410

work page internal anchor Pith review arXiv 2025
[15]

A Survey of On-Policy Distillation for Large Language Models

Song, M. and Zheng, M. (2026). A Survey of On-Policy Distillation for Large Language Models. arXiv preprint arXiv:2604.00626

work page internal anchor Pith review arXiv 2026
[16]

Shumailov, I., Shumaylov, Z., Zhao, Y ., Papernot, N., Anderson, R., and Gal, Y . (2024). AI Models Collapse When Trained on Recursively Generated Data.Nature, 631:755–759

2024
[17]

D., and Gebru, T

Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., Spitzer, E., Raji, I. D., and Gebru, T. (2019). Model Cards for Model Reporting. InProceedings of the Conference on Fairness, Accountability, and Transparency, pages 220–229

2019
[18]

W., Wallach, H., Daumé III, H., and Crawford, K

Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daumé III, H., and Crawford, K. (2021). Datasheets for Datasets.Communications of the ACM, 64(12):86–92

2021
[19]

Zhao, D., Andrews, J. T. A., Papakyriakopoulos, O., and Xiang, A. (2024). Position: Measure Dataset Diversity, Don’t Just Claim It. InProceedings of the 41st International Conference on Machine Learning, volume 235 ofProceedings of Machine Learning Research, pages 60644–60673

2024
[20]

Tramèr, F., Kamath, G., and Carlini, N. (2024). Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining. InProceedings of the 41st International Conference on Machine Learning, volume 235 ofProceedings of Machine Learning Research, pages 48453–48467. 10

2024
[21]

Shao, R., Yi, J., Chen, P.-Y ., and Hsieh, C.-J. (2022). How and When Adversarial Robustness Transfers in Knowledge Distillation? arXiv preprint arXiv:2110.12072

work page arXiv 2022
[22]

and Ioannou, Y

Mohammadshahi, A. and Ioannou, Y . (2025). What Is Left After Distillation? How Knowledge Transfer Impacts Fairness and Bias.Transactions on Machine Learning Research

2025
[23]

Zhang, M., Liu, D., Zhang, K., Franco, J., and Liu, H. (2026). Response-Based Knowledge Distillation for Multilingual Jailbreak Prevention Unwittingly Compromises Safety. arXiv preprint arXiv:2602.11157

work page internal anchor Pith review Pith/arXiv arXiv 2026
[24]

K., Rawat, A

Menon, A. K., Rawat, A. S., Reddi, S. J., Kim, S., and Kumar, S. (2021). A Statistical Perspective on Distillation. InProceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 7632–7642

2021
[25]

Gu, Y ., Dong, L., Wei, F., and Huang, M. (2024). MiniLLM: Knowledge Distillation of Large Language Models. InInternational Conference on Learning Representations

2024
[26]

E., Chassang, A., Gatta, C., and Bengio, Y

Romero, A., Ballas, N., Kahou, S. E., Chassang, A., Gatta, C., and Bengio, Y . (2015). FitNets: Hints for Thin Deep Nets. InInternational Conference on Learning Representations

2015
[27]

Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., and Zhou, M. (2020). MiniLM: Deep Self-Attention Dis- tillation for Task-Agnostic Compression of Pre-Trained Transformers. InAdvances in Neural Information Processing Systems

2020
[28]

K., and Kumar, S

Lukasik, M., Bhojanapalli, S., Menon, A. K., and Kumar, S. (2021). Teacher’s Pet: Understanding and Mitigating Biases in Distillation. arXiv preprint arXiv:2106.10494

work page arXiv 2021
[29]

A., Xu, Z., and Garcia-Olano, D

Borkar, J., Chadha, K., Mireshghallah, N., Zhang, Y ., Veliche, I.-E., Mitra, A., Smith, D. A., Xu, Z., and Garcia-Olano, D. (2026). Memorization Dynamics in Knowledge Distillation for Language Models. arXiv preprint arXiv:2601.15394

work page arXiv 2026
[30]

Guo, C., Pleiss, G., Sun, Y ., and Weinberger, K. Q. (2017). On Calibration of Modern Neural Networks. InProceedings of the 34th International Conference on Machine Learning, volume 70 ofProceedings of Machine Learning Research, pages 1321–1330

2017
[31]

Fan, H., Jiang, Z., Lei, J., and Zhang, M. (2024). Revisit the Essence of Distilling Knowledge Through Calibration. InProceedings of the 41st International Conference on Machine Learning, volume 235 of Proceedings of Machine Learning Research, pages 12882–12894

2024
[32]

Geng, J., Cai, F., Wang, Y ., Koeppl, H., Nakov, P., and Gurevych, I. (2024). A Survey of Confidence Estimation and Calibration in Large Language Models. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 6577–6595

2024
[33]

Kapoor, S., Gruver, N., Roberts, M., Collins, K., Pal, A., Bhatt, U., Weller, A., Dooley, S., Goldblum, M., and Wilson, A. G. (2024). Large Language Models Must Be Taught to Know What They Don’t Know. In Advances in Neural Information Processing Systems, 37

2024
[34]

Wen, B., Yao, J., Feng, S., Xu, C., Tsvetkov, Y ., Howe, B., and Wang, L. L. (2025). Know Your Limits: A Survey of Abstention in Large Language Models.Transactions of the Association for Computational Linguistics, 13

2025
[35]

Hsieh, C.-Y ., Li, C.-L., Yeh, C.-K., Nakhost, H., Fujii, Y ., Ratner, A., Krishna, R., Lee, C.-Y ., and Pfister, T. (2023). Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes. InFindings of the Association for Computational Linguistics: ACL 2023, pages 8003–8017

2023
[36]

Yu, P., Xu, J., Weston, J., and Kulikov, I. (2024). Distilling System 2 into System 1. arXiv preprint arXiv:2407.06023

work page arXiv 2024
[37]

Turpin, M., Michael, J., Perez, E., and Bowman, S. R. (2023). Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting. InAdvances in Neural Information Processing Systems, 36:74952–74965

2023
[38]

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V ., Goyal, N., Küttler, H., Lewis, M., Yih, W.-t., Rocktäschel, T., Riedel, S., and Kiela, D. (2020). Retrieval-Augmented Generation for Knowledge- Intensive NLP Tasks. InAdvances in Neural Information Processing Systems, 33

2020
[39]

Jia, P., Xu, D., Li, X., Du, Z., Li, X., Wang, Y ., Wang, Y ., Liu, Q., Wang, M., Guo, H., Tang, R., and Zhao, X. (2025). Bridging Relevance and Reasoning: Rationale Distillation in Retrieval-Augmented Generation. InFindings of the Association for Computational Linguistics: ACL 2025, pages 4242–4256. 11

2025
[40]

Huang, L., Feng, X., Ma, W., Gu, Y ., Zhong, W., Feng, X., Yu, W., Peng, W., Tang, D., Tu, D., and Qin, B. (2024). Learning Fine-Grained Grounded Citations for Attributed Large Language Models. InFindings of the Association for Computational Linguistics: ACL 2024, pages 14095–14113

2024
[41]

Cui, J., Chiang, W.-L., Stoica, I., and Hsieh, C.-J. (2024). OR-Bench: An Over-Refusal Benchmark for Large Language Models. arXiv preprint arXiv:2405.20947

work page arXiv 2024
[42]

Muhamed, A., Ribeiro, L. F. R., Dreyer, M., Smith, V ., and Diab, M. T. (2026). RefusalBench: Generative Evaluation of Selective Refusal in Grounded Language Models. InProceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6811–6856

2026
[43]

Is model collapse inevitable? Breaking the curse of recursion by accumulating real and synthetic data,

Gerstgrasser, M., Schaeffer, R., Dey, A., Rafailov, R., Korbak, T., Sleight, H., Agrawal, R., Hughes, J., Pai, D. B., Gromov, A., Roberts, D., Yang, D., Donoho, D. L., and Koyejo, S. (2024). Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data. arXiv preprint arXiv:2404.01413

work page arXiv 2024
[44]

A Metamorphic Testing Perspective on Knowledge Distillation for Language Models of Code: Does the Student Deeply Mimic the Teacher?

Awal, M. A., Rochan, M., and Roy, C. K. (2025). A Metamorphic Testing Perspective on Knowledge Distillation for Language Models of Code: Does the Student Deeply Mimic the Teacher? arXiv preprint arXiv:2511.05476

work page internal anchor Pith review Pith/arXiv arXiv 2025
[45]

Tabassi, E. (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0). NIST AI 100-1, National Institute of Standards and Technology.https://doi.org/10.6028/NIST.AI.100-1

work page doi:10.6028/nist.ai.100-1 2023
[46]

Bucila, C., Caruana, R., and Niculescu-Mizil, A. (2006). Model Compression. InProceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 535–541

2006
[47]

V ., Chi, E

Wang, X., Wei, J., Schuurmans, D., Le, Q. V ., Chi, E. H., Narang, S., Chowdhery, A., and Zhou, D. (2023). Self-Consistency Improves Chain of Thought Reasoning in Language Models. InInternational Conference on Learning Representations

2023
[48]

C., Tschannen, M., Itti, L., and Anandkumar, A

Furlanello, T., Lipton, Z. C., Tschannen, M., Itti, L., and Anandkumar, A. (2018). Born Again Neural Networks. InProceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 1607–1616

2018
[49]

and Lampert, C

Phuong, M. and Lampert, C. (2019). Towards Understanding Knowledge Distillation. InProceedings of the 36th International Conference on Machine Learning, volume 97 ofProceedings of Machine Learning Research, pages 5142–5151

2019
[50]

V ., and Zhou, D

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q. V ., and Zhou, D. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. InAdvances in Neural Information Processing Systems, 35:24824–24837

2022
[51]

Li, Y ., Zhang, H., Cao, J., Ma, X., and Gao, J. (2023). Symbolic Chain-of-Thought Distillation: Small Models Can Also “Think” Step-by-Step. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2665–2679

2023
[52]

Measuring Faithfulness in Chain-of-Thought Reasoning

Lanham, T., Garriga-Alonso, A., Cooper, A. F., Hill, K., Greenblatt, R., Noble, R., Birch, A., and others (2023). Measuring Faithfulness in Chain-of-Thought Reasoning. arXiv preprint arXiv:2307.13702

work page internal anchor Pith review arXiv 2023
[53]

Madsen, A., Chandar, S., and Reddy, S. (2024). Are Self-Explanations from Large Language Models Faithful? InFindings of the Association for Computational Linguistics: ACL 2024, pages 295–337

2024
[54]

Cao, L. (2024). Learn to Refuse: Making Large Language Models More Controllable and Reliable through Knowledge Scope Limitation and Refusal Mechanism. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 3628–3646. 12 A Representative Reporting-Pattern Checklist Table 4 is a representative 50-paper checklist used...

2024
[55]

Retention and distribu- tion Introduces soft targets as information beyond hard labels
[56]

Method taxonomy Distinguishes response-, feature-, and relation- based KD
[57]

Retention evidence Shows successful compression and retained language-understanding performance
[58]

LLM KD background Documents the diversity of modern LLM distil- lation settings
[59]

Reasoning distillation Illustrates the contemporary importance of dis- tilled reasoning students
[60]

Agent/tool distillation Shows that tool behavior can become a distilla- tion target
[61]

Distribution loss Shows student predictive distributions may di- verge from teachers
[62]

Distribution loss Explains why teacher probability estimates can matter beyond accuracy
[63]

Generative distribution Studies how distribution-matching choices affect LLM KD
[64]

Theory Analyzes why KD can work without reducing success to score retention
[65]

Property transfer Studies which off-task properties are inherited by students
[66]

Loss study Directly studies information loss between teacher and student
[67]

Representation preserva- tion Uses intermediate hints, showing outputs alone may be insufficient
[68]

Relation preservation Transfers attention and value relations, not only final outputs
[69]

Counterpoint Shows students may outperform teachers on some metrics
[70]

Robustness loss Shows adversarial robustness may fail to transfer under KD
[71]

OOD loss Shows in-distribution gains do not guarantee tar- get robustness
[72]

Subgroup behavior Studies uneven group-wise effects of distillation
[73]

Fairness loss Examines fairness and bias after knowledge transfer
[74]

Calibration metric Establishes confidence calibration as distinct from accuracy
[75]

Calibration transfer Studies whether calibration transfers through KD
[76]

Calibration as KD Treats calibration as central to distilling knowl- edge
[77]

Uncertainty background Surveys confidence estimation and calibration in LLMs
[78]

Uncertainty behavior Argues models must learn what they do not know
[79]

13 Work Evidence type Role in our argument

Abstention Surveys abstention as a distinct LLM capability. 13 Work Evidence type Role in our argument
[80]

Rationale distillation Shows rationales can improve small-model learn- ing

Showing first 80 references.