Towards Distillation Guarantees under Algorithmic Alignment for Combinatorial Optimization

Melanie Weber; Thien Le

arxiv: 2605.20074 · v1 · pith:WUVIJCRJnew · submitted 2026-05-19 · 💻 cs.LG

Towards Distillation Guarantees under Algorithmic Alignment for Combinatorial Optimization

Thien Le , Melanie Weber This is my paper

Pith reviewed 2026-05-20 06:25 UTC · model grok-4.3

classification 💻 cs.LG

keywords distillationgraph neural networksdynamic programmingcombinatorial optimizationalgorithmic alignmentlinear representation hypothesisdecision trees

0 comments

The pith

When the source model satisfies the linear representation hypothesis, distilling to a graph neural network aligned with dynamic programming solves combinatorial optimization efficiently in the parameters of the DP transition function.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies knowledge distillation from a large source model to a smaller graph neural network for combinatorial optimization problems. The target network architecture is chosen to match a dynamic programming algorithm that solves the task. Under the assumption that the source is sufficiently rich, formalized by the linear representation hypothesis, the authors prove that distillation succeeds efficiently, with runtime governed by the complexity parameters of the DP transition function when it is represented as a decision tree. This supplies a rigorous sufficient condition for successful distillation in structured settings. A reader would care because it identifies concrete conditions under which smaller, deployable models can reliably inherit performance from larger ones without losing the benefits of algorithmic alignment.

Core claim

Assuming the source model is sufficiently rich under the linear representation hypothesis, the distillation problem can be solved efficiently in the complexity parameters of the DP transition function, represented as a decision tree.

What carries the argument

Algorithmic alignment between a graph neural network architecture and a dynamic programming algorithm, with the DP transition function encoded as a decision tree that governs the efficiency of distillation.

If this is right

Distillation succeeds whenever the target graph neural network is aligned with a dynamic programming algorithm for the underlying combinatorial task.
Runtime of successful distillation is controlled by the complexity parameters of the DP transition function expressed as a decision tree.
The result extends prior decision-tree distillation guarantees to graph neural networks in combinatorial optimization settings.
A sufficient condition now exists for when algorithmic alignment produces rigorous distillation success.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar efficiency guarantees may hold for other neural architectures aligned with known dynamic programming recurrences beyond graphs.
Practitioners could systematically design target models around explicit DP transition structures to improve distillation reliability in optimization pipelines.
The framework could be tested on additional structured prediction tasks where decision-tree representations of transitions are feasible to construct.

Load-bearing premise

The source model satisfies the linear representation hypothesis and is therefore sufficiently rich for the distillation guarantees to apply.

What would settle it

A concrete combinatorial optimization instance, such as shortest paths on graphs of increasing size, where the source model meets the linear representation hypothesis yet measured distillation error or runtime fails to scale with the decision-tree size of the DP transition function.

Figures

Figures reproduced from arXiv: 2605.20074 by Melanie Weber, Thien Le.

**Figure 1.** Figure 1: Illustration of Algorithm 1. (Left) A GNN that aggregates neighboring information at each iteration. Its architecture reflects the inductive bias of our local-iteration algorithms. (Middle) In our setting, the aggregation function g is represented by a decision tree. (Right) A root-prefix path in the decision tree. The path in this example can be viewed as a degenerate decision tree that outputs 0 (the lea… view at source ↗

**Figure 3.** Figure 3: The dynamic program that optimizes for the in [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

read the original abstract

Distillation transfers knowledge from a large model trained on broad data to a smaller, more efficient model suitable for deployment. In structured prediction settings, prior knowledge about the task can guide the choice of a target architecture that is algorithmically aligned with the underlying problem. Building on recent learning-theoretic analyses of decision-tree (DT) distillation (Boix-Adsera, 2024), we study when distillation succeeds for combinatorial optimization tasks. We focus on the case where the target model is a graph neural network whose architecture is aligned with a dynamic programming (DP) algorithm for the task. Assuming that the source model is sufficiently rich, formalized through the linear representation hypothesis (LRH) (Elhage et al., 2022; Park et al., 2024), we show that the distillation problem can be solved efficiently in the complexity parameters of the DP transition function, represented as a DT. Our results provide a rigorous sufficient condition for successful distillation in the flavour of algorithmic alignment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper derives a sufficient condition for efficient distillation to DP-aligned GNNs in combinatorial optimization under the linear representation hypothesis, extending decision-tree distillation analysis.

read the letter

The main thing to know is that this work shows how, if the source model satisfies the linear representation hypothesis, distillation to a graph neural network aligned with a dynamic programming algorithm succeeds with runtime polynomial in the parameters of the DP transition function when that function is cast as a decision tree. It takes the distillation guarantees from Boix-Adsera 2024 and carries them over to combinatorial optimization tasks where the target architecture matches the underlying DP structure. The paper does a clean job of stating the assumption up front and deriving the efficiency claim from it without claiming more than a sufficient condition. That framing matches the title's cautious 'Towards' wording and keeps the argument internally consistent with the cited prior results on LRH and algorithmic alignment. The connection to structured prediction in optimization is a reasonable next step from the decision-tree setting. The central limitation is the dependence on LRH holding for the source model. That assumption is strong and not trivial to verify in practice, so the result stays conditional rather than general. The derivation appears to follow directly from the referenced works with no obvious circularity or hidden exponential blow-up in the DP representation, but the full proof steps would need checking to confirm there are no overlooked dependencies on decision-tree size. Overall the math looks like a straightforward extension rather than a deep new technique. This paper is aimed at researchers who already follow distillation theory and algorithmic alignment for GNNs on combinatorial problems. Someone working on model compression for optimization tasks or on DP-based architectures would get the most direct value from the sufficient condition and the complexity bound. It is worth sending to peer review because the idea connects existing pieces in a new combination and the efficiency claim is stated clearly enough to invite concrete feedback on the assumption and the DP representation.

Referee Report

0 major / 2 minor

Summary. The manuscript claims that, assuming the source model satisfies the linear representation hypothesis (LRH), the distillation problem for combinatorial optimization tasks can be solved efficiently when the target is a graph neural network algorithmically aligned with a dynamic programming (DP) algorithm. Specifically, the runtime is polynomial in the complexity parameters of the DP transition function when the latter is represented as a decision tree. This yields a rigorous sufficient condition for successful distillation, building on prior analyses of decision-tree distillation and LRH.

Significance. If the derivation holds, the result supplies a conditional but formally grounded link between algorithmic alignment, the linear representation hypothesis, and efficient distillation in structured prediction settings. It extends learning-theoretic tools from decision-tree distillation to combinatorial optimization and appropriately frames the contribution as a sufficient condition rather than an unconditional guarantee. The explicit invocation of LRH and the focus on DP transition complexity parameters are strengths that make the claim falsifiable in principle.

minor comments (2)

[Abstract] The abstract and introduction invoke 'complexity parameters of the DP transition function' without an early, self-contained definition or small illustrative example; adding this would improve accessibility for readers unfamiliar with the specific DP formulation.
The manuscript cites Boix-Adsera (2024), Elhage et al. (2022), and Park et al. (2024) for the DT distillation and LRH components; a brief recap of the precise statements from those works that are being invoked would help readers verify the reduction steps without external lookup.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. The manuscript is framed as providing a sufficient condition under the stated assumptions, and we appreciate the recognition that the result is falsifiable in principle via the explicit use of LRH and DP transition complexity parameters.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper frames its main result as a sufficient-condition theorem: under the explicit upfront assumption that the source model satisfies the linear representation hypothesis (LRH) drawn from external citations (Elhage et al. 2022; Park et al. 2024), distillation to a DP-aligned GNN succeeds with runtime polynomial in the parameters of the DP transition function when represented as a decision tree. This builds directly on the external DT-distillation analysis of Boix-Adsera (2024) without any internal reduction of a claimed prediction to a fitted parameter, self-defined quantity, or load-bearing self-citation chain. The derivation chain remains conditional on the stated assumption and does not equate any output to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the linear representation hypothesis for the source model and the representation of the DP transition as a decision tree; no free parameters or invented entities are introduced.

axioms (2)

domain assumption Linear representation hypothesis (LRH) holds for the source model
Invoked to formalize that the source is sufficiently rich; cited from Elhage et al. 2022 and Park et al. 2024
domain assumption DP transition function can be represented as a decision tree
Used to measure complexity parameters for the efficiency result

pith-pipeline@v0.9.0 · 5693 in / 1323 out tokens · 23720 ms · 2026-05-20T06:25:58.305425+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages

[1]

Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) , chapter =

Caleb Koch and Carmen Strassle and Li-Yang Tan , title =. Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) , chapter =. doi:10.1137/1.9781611977554.ch75 , URL =. https://epubs.siam.org/doi/pdf/10.1137/1.9781611977554.ch75 , abstract =

work page doi:10.1137/1.9781611977554.ch75 2023
[2]

2024 , eprint=

Towards a theory of model distillation , author=. 2024 , eprint=

work page 2024
[3]

Valiant, L. G. , title =. Proceedings of the Sixteenth Annual ACM Symposium on Theory of Computing , pages =. 1984 , isbn =. doi:10.1145/800057.808710 , abstract =

work page doi:10.1145/800057.808710 1984
[4]

2022 , journal=

Toy Models of Superposition , author=. 2022 , journal=

work page 2022
[5]

B lack is to Criminal as C aucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings

Manzini, Thomas and Yao Chong, Lim and Black, Alan W and Tsvetkov, Yulia. B lack is to Criminal as C aucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings. Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Pape...

work page doi:10.18653/v1/n19-1062 2019
[6]

2023 , eprint=

Debiasing Vision-Language Models via Biased Prompts , author=. 2023 , eprint=

work page 2023
[7]

Proceedings of the 31st International Conference on Algorithmic Learning Theory , pages =

Distribution Free Learning with Local Queries , author =. Proceedings of the 31st International Conference on Algorithmic Learning Theory , pages =. 2020 , editor =

work page 2020
[8]

Towards Debiasing Sentence Representations

Liang, Paul Pu and Li, Irene Mengze and Zheng, Emily and Lim, Yao Chong and Salakhutdinov, Ruslan and Morency, Louis-Philippe. Towards Debiasing Sentence Representations. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.488

work page doi:10.18653/v1/2020.acl-main.488 2020
[9]

Proceedings of the 30th International Conference on Neural Information Processing Systems , pages =

Bolukbasi, Tolga and Chang, Kai-Wei and Zou, James and Saligrama, Venkatesh and Kalai, Adam , title =. Proceedings of the 30th International Conference on Neural Information Processing Systems , pages =. 2016 , isbn =

work page 2016
[10]

International Conference on Learning Representations , year=

What Can Neural Networks Reason About? , author=. International Conference on Learning Representations , year=

work page
[11]

Exact learning when irrelevant variables abound , year =

Guijarro, David and Lav\'. Exact learning when irrelevant variables abound , year =. doi:10.1016/S0020-0190(99)00063-0 , journal =

work page doi:10.1016/s0020-0190(99)00063-0
[12]

Mehta and Vijay Raghavan , title=

Dinesh P. Mehta and Vijay Raghavan , title=. Theor. Comput. Sci. , volume=. 2002 , cdate=

work page 2002
[13]

Advances in Neural Information Processing Systems (NeurIPS) , volume=

ImageNet classification with deep convolutional neural networks , author=. Advances in Neural Information Processing Systems (NeurIPS) , volume=. 2012 , publisher=

work page 2012
[14]

Nature , volume=

Deep learning , author=. Nature , volume=. 2015 , publisher=

work page 2015
[15]

Proceedings of the IEEE , volume=

Gradient-based learning applied to document recognition , author=. Proceedings of the IEEE , volume=. 1998 , publisher=

work page 1998
[16]

MIT Press , year=

Deep Learning (book) , author=. MIT Press , year=

work page
[17]

Gregor Bachmann and Sotiris Anagnostidis and Thomas Hofmann , booktitle=. Scaling. 2023 , url=

work page 2023
[18]

Thirty-seventh Conference on Neural Information Processing Systems , year=

The Exact Sample Complexity Gain from Invariances for Kernel Regression , author=. Thirty-seventh Conference on Neural Information Processing Systems , year=

work page
[19]

Proceedings of the 35th International Conference on Neural Information Processing Systems , articleno =

Bietti, Alberto and Venturi, Luca and Bruna, Joan , title =. Proceedings of the 35th International Conference on Neural Information Processing Systems , articleno =. 2021 , isbn =

work page 2021
[20]

Proceedings of the 35th International Conference on Neural Information Processing Systems , articleno =

Elesedy, Bryn , title =. Proceedings of the 35th International Conference on Neural Information Processing Systems , articleno =. 2021 , isbn =

work page 2021
[21]

The Twelfth International Conference on Learning Representations , year=

On the hardness of learning under symmetries , author=. The Twelfth International Conference on Learning Representations , year=

work page
[22]

and Riley, Patrick F

Gilmer, Justin and Schoenholz, Samuel S. and Riley, Patrick F. and Vinyals, Oriol and Dahl, George E. , title =. Proceedings of the 34th International Conference on Machine Learning - Volume 70 , pages =. 2017 , publisher =

work page 2017
[23]

International Conference on Learning Representations , year=

Semi-Supervised Classification with Graph Convolutional Networks , author=. International Conference on Learning Representations , year=

work page
[24]

and Nerem, Robert R

Kahng, Andrew B. and Nerem, Robert R. and Wang, Yusu and Yang, Chien-Yi , title =. Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence , articleno =. 2024 , isbn =. doi:10.160...

work page doi:10.1609/aaai.v38i12.29200 2024
[25]

2025 , eprint=

Graph neural networks extrapolate out-of-distribution for shortest paths , author=. 2025 , eprint=

work page 2025
[26]

Forty-second International Conference on Machine Learning , year=

Primal-Dual Neural Algorithmic Reasoning , author=. Forty-second International Conference on Machine Learning , year=

work page
[27]

Exact combinatorial optimization with graph convolutional neural networks , year =

Gasse, Maxime and Ch\'. Exact combinatorial optimization with graph convolutional neural networks , year =. Proceedings of the 33rd International Conference on Neural Information Processing Systems , articleno =

work page
[28]

Graph Neural Networks are Dynamic Programmers , url =

Dudzik, Andrew J and Veli. Graph Neural Networks are Dynamic Programmers , url =. Advances in Neural Information Processing Systems , editor =

work page
[29]

Proceedings of the Second Learning on Graphs Conference , pages =

Asynchronous Algorithmic Alignment With Cocycles , author =. Proceedings of the Second Learning on Graphs Conference , pages =. 2024 , editor =

work page 2024
[30]

Depth-first search and linear graph algorithms , year=

Tarjan, Robert , booktitle=. Depth-first search and linear graph algorithms , year=

work page
[31]

Transactions on Machine Learning Research , issn=

Does equivariance matter at scale? , author=. Transactions on Machine Learning Research , issn=. 2025 , url=

work page 2025
[32]

Distributed Representations of Words and Phrases and their Compositionality , url =

Mikolov, Tomas and Sutskever, Ilya and Chen, Kai and Corrado, Greg S and Dean, Jeff , booktitle =. Distributed Representations of Words and Phrases and their Compositionality , url =

work page
[33]

Proceedings of the 41st International Conference on Machine Learning , articleno =

Park, Kiho and Choe, Yo Joong and Veitch, Victor , title =. Proceedings of the 41st International Conference on Machine Learning , articleno =. 2024 , publisher =

work page 2024
[34]

2015 , eprint=

Distilling the Knowledge in a Neural Network , author=. 2015 , eprint=

work page 2015
[35]

In: Cohn, T., He, Y., Liu, Y

Jiao, Xiaoqi and Yin, Yichun and Shang, Lifeng and Jiang, Xin and Chen, Xiao and Li, Linlin and Wang, Fang and Liu, Qun. T iny BERT : Distilling BERT for Natural Language Understanding. Findings of the Association for Computational Linguistics: EMNLP 2020. 2020. doi:10.18653/v1/2020.findings-emnlp.372

work page doi:10.18653/v1/2020.findings-emnlp.372 2020

[1] [1]

Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) , chapter =

Caleb Koch and Carmen Strassle and Li-Yang Tan , title =. Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) , chapter =. doi:10.1137/1.9781611977554.ch75 , URL =. https://epubs.siam.org/doi/pdf/10.1137/1.9781611977554.ch75 , abstract =

work page doi:10.1137/1.9781611977554.ch75 2023

[2] [2]

2024 , eprint=

Towards a theory of model distillation , author=. 2024 , eprint=

work page 2024

[3] [3]

Valiant, L. G. , title =. Proceedings of the Sixteenth Annual ACM Symposium on Theory of Computing , pages =. 1984 , isbn =. doi:10.1145/800057.808710 , abstract =

work page doi:10.1145/800057.808710 1984

[4] [4]

2022 , journal=

Toy Models of Superposition , author=. 2022 , journal=

work page 2022

[5] [5]

B lack is to Criminal as C aucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings

Manzini, Thomas and Yao Chong, Lim and Black, Alan W and Tsvetkov, Yulia. B lack is to Criminal as C aucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings. Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Pape...

work page doi:10.18653/v1/n19-1062 2019

[6] [6]

2023 , eprint=

Debiasing Vision-Language Models via Biased Prompts , author=. 2023 , eprint=

work page 2023

[7] [7]

Proceedings of the 31st International Conference on Algorithmic Learning Theory , pages =

Distribution Free Learning with Local Queries , author =. Proceedings of the 31st International Conference on Algorithmic Learning Theory , pages =. 2020 , editor =

work page 2020

[8] [8]

Towards Debiasing Sentence Representations

Liang, Paul Pu and Li, Irene Mengze and Zheng, Emily and Lim, Yao Chong and Salakhutdinov, Ruslan and Morency, Louis-Philippe. Towards Debiasing Sentence Representations. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.488

work page doi:10.18653/v1/2020.acl-main.488 2020

[9] [9]

Proceedings of the 30th International Conference on Neural Information Processing Systems , pages =

Bolukbasi, Tolga and Chang, Kai-Wei and Zou, James and Saligrama, Venkatesh and Kalai, Adam , title =. Proceedings of the 30th International Conference on Neural Information Processing Systems , pages =. 2016 , isbn =

work page 2016

[10] [10]

International Conference on Learning Representations , year=

What Can Neural Networks Reason About? , author=. International Conference on Learning Representations , year=

work page

[11] [11]

Exact learning when irrelevant variables abound , year =

Guijarro, David and Lav\'. Exact learning when irrelevant variables abound , year =. doi:10.1016/S0020-0190(99)00063-0 , journal =

work page doi:10.1016/s0020-0190(99)00063-0

[12] [12]

Mehta and Vijay Raghavan , title=

Dinesh P. Mehta and Vijay Raghavan , title=. Theor. Comput. Sci. , volume=. 2002 , cdate=

work page 2002

[13] [13]

Advances in Neural Information Processing Systems (NeurIPS) , volume=

ImageNet classification with deep convolutional neural networks , author=. Advances in Neural Information Processing Systems (NeurIPS) , volume=. 2012 , publisher=

work page 2012

[14] [14]

Nature , volume=

Deep learning , author=. Nature , volume=. 2015 , publisher=

work page 2015

[15] [15]

Proceedings of the IEEE , volume=

Gradient-based learning applied to document recognition , author=. Proceedings of the IEEE , volume=. 1998 , publisher=

work page 1998

[16] [16]

MIT Press , year=

Deep Learning (book) , author=. MIT Press , year=

work page

[17] [17]

Gregor Bachmann and Sotiris Anagnostidis and Thomas Hofmann , booktitle=. Scaling. 2023 , url=

work page 2023

[18] [18]

Thirty-seventh Conference on Neural Information Processing Systems , year=

The Exact Sample Complexity Gain from Invariances for Kernel Regression , author=. Thirty-seventh Conference on Neural Information Processing Systems , year=

work page

[19] [19]

Proceedings of the 35th International Conference on Neural Information Processing Systems , articleno =

Bietti, Alberto and Venturi, Luca and Bruna, Joan , title =. Proceedings of the 35th International Conference on Neural Information Processing Systems , articleno =. 2021 , isbn =

work page 2021

[20] [20]

Proceedings of the 35th International Conference on Neural Information Processing Systems , articleno =

Elesedy, Bryn , title =. Proceedings of the 35th International Conference on Neural Information Processing Systems , articleno =. 2021 , isbn =

work page 2021

[21] [21]

The Twelfth International Conference on Learning Representations , year=

On the hardness of learning under symmetries , author=. The Twelfth International Conference on Learning Representations , year=

work page

[22] [22]

and Riley, Patrick F

Gilmer, Justin and Schoenholz, Samuel S. and Riley, Patrick F. and Vinyals, Oriol and Dahl, George E. , title =. Proceedings of the 34th International Conference on Machine Learning - Volume 70 , pages =. 2017 , publisher =

work page 2017

[23] [23]

International Conference on Learning Representations , year=

Semi-Supervised Classification with Graph Convolutional Networks , author=. International Conference on Learning Representations , year=

work page

[24] [24]

and Nerem, Robert R

Kahng, Andrew B. and Nerem, Robert R. and Wang, Yusu and Yang, Chien-Yi , title =. Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence , articleno =. 2024 , isbn =. doi:10.160...

work page doi:10.1609/aaai.v38i12.29200 2024

[25] [25]

2025 , eprint=

Graph neural networks extrapolate out-of-distribution for shortest paths , author=. 2025 , eprint=

work page 2025

[26] [26]

Forty-second International Conference on Machine Learning , year=

Primal-Dual Neural Algorithmic Reasoning , author=. Forty-second International Conference on Machine Learning , year=

work page

[27] [27]

Exact combinatorial optimization with graph convolutional neural networks , year =

Gasse, Maxime and Ch\'. Exact combinatorial optimization with graph convolutional neural networks , year =. Proceedings of the 33rd International Conference on Neural Information Processing Systems , articleno =

work page

[28] [28]

Graph Neural Networks are Dynamic Programmers , url =

Dudzik, Andrew J and Veli. Graph Neural Networks are Dynamic Programmers , url =. Advances in Neural Information Processing Systems , editor =

work page

[29] [29]

Proceedings of the Second Learning on Graphs Conference , pages =

Asynchronous Algorithmic Alignment With Cocycles , author =. Proceedings of the Second Learning on Graphs Conference , pages =. 2024 , editor =

work page 2024

[30] [30]

Depth-first search and linear graph algorithms , year=

Tarjan, Robert , booktitle=. Depth-first search and linear graph algorithms , year=

work page

[31] [31]

Transactions on Machine Learning Research , issn=

Does equivariance matter at scale? , author=. Transactions on Machine Learning Research , issn=. 2025 , url=

work page 2025

[32] [32]

Distributed Representations of Words and Phrases and their Compositionality , url =

Mikolov, Tomas and Sutskever, Ilya and Chen, Kai and Corrado, Greg S and Dean, Jeff , booktitle =. Distributed Representations of Words and Phrases and their Compositionality , url =

work page

[33] [33]

Proceedings of the 41st International Conference on Machine Learning , articleno =

Park, Kiho and Choe, Yo Joong and Veitch, Victor , title =. Proceedings of the 41st International Conference on Machine Learning , articleno =. 2024 , publisher =

work page 2024

[34] [34]

2015 , eprint=

Distilling the Knowledge in a Neural Network , author=. 2015 , eprint=

work page 2015

[35] [35]

In: Cohn, T., He, Y., Liu, Y

Jiao, Xiaoqi and Yin, Yichun and Shang, Lifeng and Jiang, Xin and Chen, Xiao and Li, Linlin and Wang, Fang and Liu, Qun. T iny BERT : Distilling BERT for Natural Language Understanding. Findings of the Association for Computational Linguistics: EMNLP 2020. 2020. doi:10.18653/v1/2020.findings-emnlp.372

work page doi:10.18653/v1/2020.findings-emnlp.372 2020