arxiv: 2605.09345 · v1 · submitted 2026-05-10 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

Selection Plateau and a Sparsity-Dependent Hierarchy of Pruning Features

Guangqi Li , Yongxin Li

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:16 UTC · model grok-4.3

classification 💻 cs.LG

keywords neural network pruningone-shot pruningselection plateaufeature complexitysparsityweight scoringSICS hypothesispruning features

0 comments

The pith

All rank-monotone weight scorers converge to identical accuracy at fixed sparsity in one-shot pruning, independent of their specific form.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a Selection Plateau in which any rank-monotone scorer for pruning weights produces the same final accuracy once the target sparsity is held constant. This convergence occurs regardless of the scorer's detailed mathematical shape, implying that the ranking order alone determines the outcome under one-shot pruning. The authors introduce the SICS hypothesis to explain when and how this plateau can be escaped: a minimum feature complexity level kappa rises with sparsity, requiring progressively richer non-monotone signals at higher compression rates. Experiments on a Vision Transformer trained on CIFAR-10 show that smooth non-monotone features improve accuracy by 6.6 percent at 70 percent sparsity while only high-frequency raw features succeed at 80 percent sparsity. The hypothesis unifies why many existing pruning methods cluster together in performance and indicates that future scorers should match their complexity to the intended sparsity.

Core claim

In one-shot neural network pruning, all rank-monotone weight scorers converge to identical accuracy at fixed sparsity independent of functional form. The Sparsity-Information-Complexity Spectrum hypothesis states that a sparsity-dependent minimum feature complexity kappa(S) governs plateau escape, with kappa equal to zero sufficient below 65 percent sparsity, kappa equal to one dominant near 70 percent, and kappa equal to two required above 75 percent.

What carries the argument

The Sparsity-Information-Complexity Spectrum (SICS) hypothesis, which asserts that escaping the selection plateau requires a minimum information complexity kappa(S) that increases with the target sparsity level.

If this is right

Below 65 percent sparsity, any rank-monotone feature suffices because kappa equals zero.
Near 70 percent sparsity, smooth non-monotone features with kappa equal to one deliver measurable accuracy gains over monotone baselines.
Above 75 percent sparsity, only raw features carrying high-frequency non-monotonicity with kappa equal to two can escape the plateau.
A synthetic non-monotone scorer lacking proper rank alignment underperforms the gradient baseline, confirming that magnitude-independent non-monotonicity is required.
Handcrafted Gaussian features achieve far smaller gains than chaos-derived features, showing that rank alignment alone is insufficient without sufficient complexity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Pruning pipelines could adaptively select feature complexity according to the desired sparsity target rather than using a fixed scorer.
The same hierarchy may apply to other one-shot compression methods such as quantization or low-rank approximation when sparsity-like constraints are imposed.
Extending the tests to larger models and datasets would reveal whether the kappa thresholds remain stable or depend on model scale.
If the hypothesis holds, extreme-sparsity regimes may demand entirely new classes of irregular, non-smooth scoring functions.

Load-bearing premise

Observed performance differences across the nine tested feature classes arise from their information-complexity levels rather than from the specific model architecture, dataset, or the particular construction of those feature classes.

What would settle it

Running the identical nine feature classes on ResNet-50 trained on ImageNet and finding that rank-monotone scorers still fail to converge to identical accuracy at fixed sparsity, or that the required kappa thresholds shift, would directly challenge the plateau and SICS claims.

Figures

Figures reproduced from arXiv: 2605.09345 by Guangqi Li, Yongxin Li.

**Figure 1.** Figure 1: extends this verification to all nine feature classes at S=0.6: rank-monotone-equivalent features cluster within ±0.013 of Π(0.6), several non-monotone features actively degrade below plateau (consistent with the prediction that S=0.6 lies in the κ=0 zone where escape is impossible), and the magnitude-derived PureMag4 control falls below the gradient baseline. V1a peak (plateau) RandomSpline (κ = 0) V1b va… view at source ↗

**Figure 2.** Figure 2: Main empirical result: SICS hierarchy across sparsity. Held-out accuracy for plateau anchor (V1a peak), single-indicator κ=1 (V1b var), full κ=1 combo (V1), κ=2 raw chaos (A5), and the NL_sin non-chaos non-monotone control. The continuous background shading indicates the qualitative regime assignment for the four sparsity points actually tested (S ∈ {0.5, 0.6, 0.7, 0.8}); the precise boundary locations bet… view at source ↗

**Figure 3.** Figure 3: illustrates the rank-alignment hypothesis directly in rank space: a synthetic var_smoothlike shape exhibits a non-smooth bump near rank ≈ 0.7, which aligns with the channel-keep boundary at S=0.7 (the most important 30% of channels are retained, so the kept/dropped boundary lies at rank 0.7). The monotone reference scorer (peak, dashed) provides no discriminative structure at this boundary. 0.0 0.2 0.4 0.… view at source ↗

read the original abstract

We identify a Selection Plateau phenomenon in one-shot neural network pruning: all rank-monotone weight scorers converge to identical accuracy at fixed sparsity, independent of functional form. We propose the Sparsity-Information-Complexity Spectrum (SICS) hypothesis: a sparsity-dependent minimum feature complexity kappa(S) governs plateau escape, with kappa=0 sufficient at low sparsity (S<0.65), kappa=1 dominant at critical sparsity (S~0.7), and kappa=2 necessary at extreme sparsity (S>0.75). On ViT-Small/CIFAR-10, testing nine feature classes across four sparsities, smooth non-monotone features provide +6.6% escape at S=0.7, while only raw features with high-frequency wiggle escape at S=0.8 (+2.6%). A fake non-monotone scorer underperforms the gradient baseline, indicating the requirement is magnitude-independent non-monotonicity. A handcrafted Gaussian bump achieves only +0.006 escape vs. chaos-derived +0.046, indicating rank-alignment is necessary but insufficient. SICS provides a unifying explanation for the performance clustering of diverse pruning methods and suggests that future selection algorithms should adapt feature complexity to target sparsity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper flags a performance clustering among monotone pruning scorers on one ViT/CIFAR setup and offers a post-hoc complexity hierarchy that fits the observed transitions but lacks independent grounding.

read the letter

The core observation is that several weight-scoring methods land at nearly the same accuracy once sparsity is fixed, at least in the ViT-Small/CIFAR-10 runs they report. They label this the selection plateau and tie escape from it to a sparsity-dependent minimum feature complexity they call kappa(S). That framing is new enough to be worth noting; it pulls together scattered reports that different pruning heuristics often perform similarly at moderate sparsity and gives a simple rule for when raw weights stop being enough.

Referee Report

3 major / 2 minor

Summary. The paper claims to identify a 'Selection Plateau' phenomenon in one-shot neural network pruning, asserting that all rank-monotone weight scorers converge to identical accuracy at any fixed sparsity level, independent of their specific functional form. It introduces the Sparsity-Information-Complexity Spectrum (SICS) hypothesis, which posits a sparsity-dependent minimum feature complexity threshold kappa(S) that governs escape from the plateau: kappa=0 suffices for S<0.65, kappa=1 dominates near S~0.7, and kappa=2 is required for S>0.75. These claims are supported by experiments testing nine feature classes (including non-monotone and handcrafted variants) on ViT-Small/CIFAR-10 at four sparsity levels, reporting gains such as +6.6% escape for smooth non-monotone features at S=0.7 and +2.6% for high-frequency raw features at S=0.8.

Significance. If the central claims hold under broader validation, the work would offer a unifying lens on why diverse pruning methods often cluster in performance at moderate sparsities and could guide the design of sparsity-adaptive selection algorithms. The concrete distinctions drawn between monotonicity requirements, rank-alignment, and complexity (e.g., chaos-derived vs. Gaussian bump features) represent a useful empirical contribution. However, the current evidence base is narrow, limiting immediate impact.

major comments (3)

[Abstract] Abstract: The Selection Plateau claim—that all rank-monotone scorers converge to identical accuracy independent of functional form—rests on experiments with only nine hand-selected feature classes evaluated on a single model (ViT-Small) and dataset (CIFAR-10) at four sparsity levels. No formal definition of rank-monotonicity is supplied, nor is there an argument that these classes exhaustively or representatively sample the space of rank-preserving monotone functions.
[Abstract] Abstract: The SICS hypothesis assigns specific kappa(S) thresholds (0.65, 0.7, 0.75) and complexity levels (0, 1, 2) that align exactly with the sparsity regimes where the tested feature classes begin to show performance transitions in the reported experiments. This makes the governing relation appear post-hoc and descriptive of the observed data rather than independently derived or prospectively tested.
[Abstract] Abstract: Reported escape gains (e.g., +6.6% at S=0.7 for smooth non-monotone features, +2.6% at S=0.8 for raw high-frequency features) are presented without error bars, statistical significance tests, or details on run-to-run variance, undermining assessment of whether the differences between feature classes are reliable or could arise from uncontrolled factors in architecture, dataset, or feature construction.

minor comments (2)

[Abstract] Abstract: The distinction between 'fake non-monotone scorer' and 'handcrafted Gaussian bump' would benefit from a brief description of their explicit functional forms or construction methods to allow reproducibility.
[Abstract] Abstract: The phrase 'chaos-derived' features is used without a reference or prior definition in the provided summary, which could confuse readers unfamiliar with the specific generation process.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the insightful comments on our work. We address each major comment below, indicating where revisions will be made to improve clarity and rigor.

read point-by-point responses

Referee: [Abstract] Abstract: The Selection Plateau claim—that all rank-monotone scorers converge to identical accuracy independent of functional form—rests on experiments with only nine hand-selected feature classes evaluated on a single model (ViT-Small) and dataset (CIFAR-10) at four sparsity levels. No formal definition of rank-monotonicity is supplied, nor is there an argument that these classes exhaustively or representatively sample the space of rank-preserving monotone functions.

Authors: We will add a formal definition of rank-monotonicity to the manuscript, defining it as a property where the scorer strictly preserves the ranking order of absolute weight values. The nine feature classes were deliberately chosen to include both monotone and non-monotone variants across different complexity levels to test the core claims. While we recognize that this does not constitute an exhaustive sampling of all possible rank-monotone functions, the consistent results across this diverse set provide supporting evidence for the Selection Plateau. We will expand the discussion to include this caveat and suggest directions for more comprehensive sampling in future studies. revision: yes
Referee: [Abstract] Abstract: The SICS hypothesis assigns specific kappa(S) thresholds (0.65, 0.7, 0.75) and complexity levels (0, 1, 2) that align exactly with the sparsity regimes where the tested feature classes begin to show performance transitions in the reported experiments. This makes the governing relation appear post-hoc and descriptive of the observed data rather than independently derived or prospectively tested.

Authors: We agree that the specific thresholds in the SICS hypothesis appear closely tied to the experimental observations. The hypothesis was developed based on theoretical intuition about how feature complexity needs to scale with sparsity to capture higher-order information, with the values refined through preliminary tests before the main experiments. To address the post-hoc concern, we will revise the text to present the derivation process more transparently and position SICS as an empirically grounded hypothesis open to further testing, rather than a definitive governing law. revision: partial
Referee: [Abstract] Abstract: Reported escape gains (e.g., +6.6% at S=0.7 for smooth non-monotone features, +2.6% at S=0.8 for raw high-frequency features) are presented without error bars, statistical significance tests, or details on run-to-run variance, undermining assessment of whether the differences between feature classes are reliable or could arise from uncontrolled factors in architecture, dataset, or feature construction.

Authors: We will include error bars, run-to-run variance details, and statistical significance tests in the revised figures and tables. Specifically, we plan to report standard deviations from multiple random seeds and perform significance testing to confirm the reliability of the reported accuracy differences. revision: yes

standing simulated objections not resolved

The experimental evaluation is restricted to a single model (ViT-Small) and dataset (CIFAR-10), and expanding this would require additional computational resources and time not available for the current revision.

Circularity Check

1 steps flagged

SICS hypothesis thresholds assigned post-hoc to match observed escape regimes in experiments

specific steps

fitted input called prediction [Abstract]
"We propose the Sparsity-Information-Complexity Spectrum (SICS) hypothesis: a sparsity-dependent minimum feature complexity kappa(S) governs plateau escape, with kappa=0 sufficient at low sparsity (S<0.65), kappa=1 dominant at critical sparsity (S~0.7), and kappa=2 necessary at extreme sparsity (S>0.75). On ViT-Small/CIFAR-10, testing nine feature classes across four sparsities, smooth non-monotone features provide +6.6% escape at S=0.7, while only raw features with high-frequency wiggle escape at S=0.8 (+2.6%)."

The kappa levels (0,1,2) and sparsity thresholds (0.65, 0.7, 0.75) are defined to coincide precisely with the sparsity points where the nine tested feature classes first exhibit escape from the plateau in the reported experiments. The hypothesis is therefore constructed directly from the observed clustering rather than derived from first principles or an independent complexity measure.

full rationale

The paper identifies the Selection Plateau from empirical tests on nine hand-selected feature classes and then proposes the SICS hypothesis with specific kappa(S) values and sparsity boundaries that align exactly with the sparsity levels at which those same classes show performance divergence. This renders the claimed governing relation a re-description of the input data rather than an independent derivation. The generality claim for all rank-monotone scorers lacks a formal definition or exhaustive argument, but the central circularity is in the hypothesis construction itself.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 2 invented entities

The central claim rests on experimental observations from a single architecture and dataset, with the complexity levels and sparsity thresholds appearing to be determined from the results rather than derived independently.

free parameters (1)

kappa(S) thresholds
Sparsity boundaries 0.65, 0.7, 0.75 and complexity values 0,1,2 are assigned to match observed performance transitions.

axioms (2)

domain assumption All rank-monotone weight scorers converge to identical accuracy at fixed sparsity independent of functional form
Stated as the core identified phenomenon that the hypothesis explains.
ad hoc to paper A sparsity-dependent minimum feature complexity kappa(S) governs plateau escape
The SICS hypothesis itself, introduced to account for when different feature classes succeed.

invented entities (2)

Sparsity-Information-Complexity Spectrum (SICS) no independent evidence
purpose: Unifying explanation for performance clustering of pruning methods via sparsity-dependent feature complexity
New hypothesis without external validation beyond the reported experiments.
kappa(S) no independent evidence
purpose: Minimum feature complexity required to escape the plateau at sparsity S
Postulated quantity whose values are set to fit the experimental regimes.

pith-pipeline@v0.9.0 · 5518 in / 1902 out tokens · 76142 ms · 2026-05-12T04:16:42.513561+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Lemma 1 (Rank-Monotone Equivalence) and Hypothesis 1 (SICS) define κ=0 as strictly monotone functions of rank-MM, κ=1 as smooth non-monotone bumps, κ=2 as high-frequency wiggle; all evaluated via DBO fusion on ViT-Small/CIFAR-10 at S∈{0.5,0.6,0.7,0.8}.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Empirical Observation 1 and Table 2 report plateau convergence for rank-monotone scorers independent of functional form.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 2 internal anchors

[1]

Reconciling modern machine- learning practice and the classical bias–variance trade-off

Mikhail Belkin, Daniel Hsu, Siyuan Ma, and Soumik Mandal. Reconciling modern machine- learning practice and the classical bias–variance trade-off. InProceedings of the National Academy of Sciences, volume 116, pages 15849–15854, 2019

work page 2019
[2]

Token merging: Your ViT but faster

Daniel Bolya, Cheng-Yang Fu, Xiaoliang Dai, Peizhao Zhang, Christoph Feichtenhofer, and Judy Hoffman. Token merging: Your ViT but faster. InInternational Conference on Learning Representations (ICLR), 2023

work page 2023
[3]

Rigging the lottery: Making all tickets winners

Utku Evci, Trevor Gale, Jacob Menick, Pablo Samuel Castro, and Erich Elsen. Rigging the lottery: Making all tickets winners. InInternational Conference on Machine Learning (ICML), 2020

work page 2020
[4]

The lottery ticket hypothesis: Finding sparse, trainable neural networks

Jonathan Frankle and Michael Carbin. The lottery ticket hypothesis: Finding sparse, trainable neural networks. InInternational Conference on Learning Representations (ICLR), 2019

work page 2019
[5]

Stabilizing the lottery ticket hypothesis

Jonathan Frankle, Gintare Karolina Dziugaite, Daniel Roy, and Michael Carbin. Stabilizing the lottery ticket hypothesis. InInternational Conference on Machine Learning (ICML), 2020

work page 2020
[6]

SparseGPT: Massive language models can be accurately pruned in one-shot

Elias Frantar and Dan Alistarh. SparseGPT: Massive language models can be accurately pruned in one-shot. InInternational Conference on Machine Learning (ICML), 2023

work page 2023
[7]

Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding

Song Han, Huizi Mao, and William J Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. InInternational Conference on Learning Representations (ICLR), 2016

work page 2016
[8]

Second order derivatives for network pruning: Optimal brain surgeon

Babak Hassibi and David G Stork. Second order derivatives for network pruning: Optimal brain surgeon. InAdvances in Neural Information Processing Systems (NeurIPS), 1993

work page 1993
[9]

Training compute-optimal large language models

Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al. Training compute-optimal large language models. InAdvances in Neural Information Processing Systems (NeurIPS), 2022

work page 2022
[10]

Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication.Science, 304(5667):78–80, 2004

Herbert Jaeger and Harald Haas. Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication.Science, 304(5667):78–80, 2004

work page 2004
[11]

Scaling Laws for Neural Language Models

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models.arXiv preprint arXiv:2001.08361, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2001
[12]

Optimal brain damage

Yann LeCun, John S Denker, and Sara A Solla. Optimal brain damage. InAdvances in Neural Information Processing Systems (NeurIPS), 1989

work page 1989
[13]

Not all patches are what you need: Expediting vision transformers via token reorganizations

Youwei Liang, Chongjian Ge, Zhan Tong, Yibing Song, Jue Wang, and Pengtao Xie. Not all patches are what you need: Expediting vision transformers via token reorganizations. In International Conference on Learning Representations (ICLR), 2022

work page 2022
[14]

Are sixteen heads really better than one? In Advances in Neural Information Processing Systems (NeurIPS), 2019

Paul Michel, Omer Levy, and Graham Neubig. Are sixteen heads really better than one? In Advances in Neural Information Processing Systems (NeurIPS), 2019. 16

work page 2019
[15]

Pruning convolutional neural networks for resource efficient inference

Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. Pruning convolutional neural networks for resource efficient inference. InInternational Conference on Learning Representations (ICLR), 2017

work page 2017
[16]

Deep double descent: Where bigger models and more data hurt

Preetum Nakkiran, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, and Ilya Sutskever. Deep double descent: Where bigger models and more data hurt. InInternational Conference on Learning Representations (ICLR), 2020

work page 2020
[17]

Traces of class/cross-class structure pervade deep learning spectra.Journal of Machine Learning Research, 21(252):1–64, 2020

Vardan Papyan. Traces of class/cross-class structure pervade deep learning spectra.Journal of Machine Learning Research, 21(252):1–64, 2020

work page 2020
[18]

Nonlinear random matrix theory for deep learning

Jeffrey Pennington and Pratik Worah. Nonlinear random matrix theory for deep learning. Advances in Neural Information Processing Systems (NeurIPS), 2017

work page 2017
[19]

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

Alethea Power, Yuri Burda, Harri Edwards, Igor Babuschkin, and Vedant Misra. Grokking: Generalization beyond overfitting on small algorithmic datasets.arXiv preprint arXiv:2201.02177, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[20]

Dynam- icViT: Efficient vision transformers with dynamic token sparsification

Yongming Rao, Wenliang Zhao, Benlin Liu, Jiwen Lu, Jie Zhou, and Cho-Jui Hsieh. Dynam- icViT: Efficient vision transformers with dynamic token sparsification. InAdvances in Neural Information Processing Systems (NeurIPS), 2021

work page 2021
[21]

Empirical Analysis of the Hessian of Over-Parametrized Neural Networks

Levent Sagun, Utku Evci, V Ugur Guney, Yann Dauphin, and Leon Bottou. Empirical analysis of the Hessian of over-parametrized neural networks.arXiv preprint arXiv:1706.04454, 2017

work page Pith review arXiv 2017
[22]

On the information bottleneck theory of deep learning

Andrew Michael Saxe, Yamini Bansal, Joel Dapello, Madhu Advani, Artemy Kolchinsky, Bren- dan Daniel Tracey, and David Daniel Cox. On the information bottleneck theory of deep learning. InInternational Conference on Learning Representations (ICLR), 2018

work page 2018
[23]

Opening the Black Box of Deep Neural Networks via Information

Ravid Shwartz-Ziv and Naftali Tishby. Opening the black box of deep neural networks via information. InarXiv preprint arXiv:1703.00810, 2017

work page Pith review arXiv 2017
[24]

Chaos in random neural networks.Physical Review Letters, 61(3):259, 1988

Haim Sompolinsky, Andrea Crisanti, and Hans-Jurgen Sommers. Chaos in random neural networks.Physical Review Letters, 61(3):259, 1988

work page 1988
[25]

A simple and effective pruning approach for large language models

Mingjie Sun, Zhuang Liu, Anna Bair, and J Zico Kolter. A simple and effective pruning approach for large language models. InInternational Conference on Learning Representations (ICLR), 2024

work page 2024
[26]

Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned.Annual Meeting of the Association for Computational Linguistics (ACL), 2019

Elena Voita, David Talbot, Fedor Moiseev, Rico Sennrich, and Ivan Titov. Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned.Annual Meeting of the Association for Computational Linguistics (ACL), 2019

work page 2019
[27]

Dung beetle optimizer: A new meta-heuristic algorithm for global optimization.The Journal of Supercomputing, 79(7):7305–7336, 2023

Jiankai Xue and Bo Shen. Dung beetle optimizer: A new meta-heuristic algorithm for global optimization.The Journal of Supercomputing, 79(7):7305–7336, 2023

work page 2023
[28]

raw chaos

Miao Yin, Burak Uzkent, Yilin Shen, Hongxia Jin, and Bo Yuan. GOHSP: A unified framework of graph and optimization-based heterogeneous structured pruning for vision transformer.AAAI Conference on Artificial Intelligence, 2023. A Feature Class Definitions We provide complete definitions for the 9 feature classes used in the main experiment, along with thei...

work page 2023
[29]

We have not directly computedκϵ(ϕ)for our nine feature classes; the assignment of features to κclasses in the main text is based on Spearman correlation with rank, visual inspection of the indicator shapes, and the smoothing operator applied (none / Savitzky-Golay / raw chaos)

work page
[30]

We have not derived a principled choice ofϵfrom the experimental setup

The toleranceϵis a hyperparameter; differentϵwould assign different discreteκlevels. We have not derived a principled choice ofϵfrom the experimental setup

work page
[31]

We chose Chebyshev for its standard use in approximation theory, not for any property tied to the SICS phenomenon

The Chebyshev basis is one of many possible orthogonal bases on[0.1, 1.0]; Fourier or wavelet bases would give differentD values for the sameϕ. We chose Chebyshev for its standard use in approximation theory, not for any property tied to the SICS phenomenon. 21 A rigorous mechanism-level theory would: (i) computeκϵfor each feature in our battery; (ii) ver...

work page