pith. machine review for the scientific record. sign in

arxiv: 2605.09345 · v1 · submitted 2026-05-10 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

Selection Plateau and a Sparsity-Dependent Hierarchy of Pruning Features

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:16 UTC · model grok-4.3

classification 💻 cs.LG
keywords neural network pruningone-shot pruningselection plateaufeature complexitysparsityweight scoringSICS hypothesispruning features
0
0 comments X

The pith

All rank-monotone weight scorers converge to identical accuracy at fixed sparsity in one-shot pruning, independent of their specific form.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a Selection Plateau in which any rank-monotone scorer for pruning weights produces the same final accuracy once the target sparsity is held constant. This convergence occurs regardless of the scorer's detailed mathematical shape, implying that the ranking order alone determines the outcome under one-shot pruning. The authors introduce the SICS hypothesis to explain when and how this plateau can be escaped: a minimum feature complexity level kappa rises with sparsity, requiring progressively richer non-monotone signals at higher compression rates. Experiments on a Vision Transformer trained on CIFAR-10 show that smooth non-monotone features improve accuracy by 6.6 percent at 70 percent sparsity while only high-frequency raw features succeed at 80 percent sparsity. The hypothesis unifies why many existing pruning methods cluster together in performance and indicates that future scorers should match their complexity to the intended sparsity.

Core claim

In one-shot neural network pruning, all rank-monotone weight scorers converge to identical accuracy at fixed sparsity independent of functional form. The Sparsity-Information-Complexity Spectrum hypothesis states that a sparsity-dependent minimum feature complexity kappa(S) governs plateau escape, with kappa equal to zero sufficient below 65 percent sparsity, kappa equal to one dominant near 70 percent, and kappa equal to two required above 75 percent.

What carries the argument

The Sparsity-Information-Complexity Spectrum (SICS) hypothesis, which asserts that escaping the selection plateau requires a minimum information complexity kappa(S) that increases with the target sparsity level.

If this is right

  • Below 65 percent sparsity, any rank-monotone feature suffices because kappa equals zero.
  • Near 70 percent sparsity, smooth non-monotone features with kappa equal to one deliver measurable accuracy gains over monotone baselines.
  • Above 75 percent sparsity, only raw features carrying high-frequency non-monotonicity with kappa equal to two can escape the plateau.
  • A synthetic non-monotone scorer lacking proper rank alignment underperforms the gradient baseline, confirming that magnitude-independent non-monotonicity is required.
  • Handcrafted Gaussian features achieve far smaller gains than chaos-derived features, showing that rank alignment alone is insufficient without sufficient complexity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Pruning pipelines could adaptively select feature complexity according to the desired sparsity target rather than using a fixed scorer.
  • The same hierarchy may apply to other one-shot compression methods such as quantization or low-rank approximation when sparsity-like constraints are imposed.
  • Extending the tests to larger models and datasets would reveal whether the kappa thresholds remain stable or depend on model scale.
  • If the hypothesis holds, extreme-sparsity regimes may demand entirely new classes of irregular, non-smooth scoring functions.

Load-bearing premise

Observed performance differences across the nine tested feature classes arise from their information-complexity levels rather than from the specific model architecture, dataset, or the particular construction of those feature classes.

What would settle it

Running the identical nine feature classes on ResNet-50 trained on ImageNet and finding that rank-monotone scorers still fail to converge to identical accuracy at fixed sparsity, or that the required kappa thresholds shift, would directly challenge the plateau and SICS claims.

Figures

Figures reproduced from arXiv: 2605.09345 by Guangqi Li, Yongxin Li.

Figure 1
Figure 1. Figure 1: extends this verification to all nine feature classes at S=0.6: rank-monotone-equivalent features cluster within ±0.013 of Π(0.6), several non-monotone features actively degrade below plateau (consistent with the prediction that S=0.6 lies in the κ=0 zone where escape is impossible), and the magnitude-derived PureMag4 control falls below the gradient baseline. V1a peak (plateau) RandomSpline (κ = 0) V1b va… view at source ↗
Figure 2
Figure 2. Figure 2: Main empirical result: SICS hierarchy across sparsity. Held-out accuracy for plateau anchor (V1a peak), single-indicator κ=1 (V1b var), full κ=1 combo (V1), κ=2 raw chaos (A5), and the NL_sin non-chaos non-monotone control. The continuous background shading indicates the qualitative regime assignment for the four sparsity points actually tested (S ∈ {0.5, 0.6, 0.7, 0.8}); the precise boundary locations bet… view at source ↗
Figure 3
Figure 3. Figure 3: illustrates the rank-alignment hypothesis directly in rank space: a synthetic var_smooth￾like shape exhibits a non-smooth bump near rank ≈ 0.7, which aligns with the channel-keep boundary at S=0.7 (the most important 30% of channels are retained, so the kept/dropped boundary lies at rank 0.7). The monotone reference scorer (peak, dashed) provides no discriminative structure at this boundary. 0.0 0.2 0.4 0.… view at source ↗
read the original abstract

We identify a Selection Plateau phenomenon in one-shot neural network pruning: all rank-monotone weight scorers converge to identical accuracy at fixed sparsity, independent of functional form. We propose the Sparsity-Information-Complexity Spectrum (SICS) hypothesis: a sparsity-dependent minimum feature complexity kappa(S) governs plateau escape, with kappa=0 sufficient at low sparsity (S<0.65), kappa=1 dominant at critical sparsity (S~0.7), and kappa=2 necessary at extreme sparsity (S>0.75). On ViT-Small/CIFAR-10, testing nine feature classes across four sparsities, smooth non-monotone features provide +6.6% escape at S=0.7, while only raw features with high-frequency wiggle escape at S=0.8 (+2.6%). A fake non-monotone scorer underperforms the gradient baseline, indicating the requirement is magnitude-independent non-monotonicity. A handcrafted Gaussian bump achieves only +0.006 escape vs. chaos-derived +0.046, indicating rank-alignment is necessary but insufficient. SICS provides a unifying explanation for the performance clustering of diverse pruning methods and suggests that future selection algorithms should adapt feature complexity to target sparsity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims to identify a 'Selection Plateau' phenomenon in one-shot neural network pruning, asserting that all rank-monotone weight scorers converge to identical accuracy at any fixed sparsity level, independent of their specific functional form. It introduces the Sparsity-Information-Complexity Spectrum (SICS) hypothesis, which posits a sparsity-dependent minimum feature complexity threshold kappa(S) that governs escape from the plateau: kappa=0 suffices for S<0.65, kappa=1 dominates near S~0.7, and kappa=2 is required for S>0.75. These claims are supported by experiments testing nine feature classes (including non-monotone and handcrafted variants) on ViT-Small/CIFAR-10 at four sparsity levels, reporting gains such as +6.6% escape for smooth non-monotone features at S=0.7 and +2.6% for high-frequency raw features at S=0.8.

Significance. If the central claims hold under broader validation, the work would offer a unifying lens on why diverse pruning methods often cluster in performance at moderate sparsities and could guide the design of sparsity-adaptive selection algorithms. The concrete distinctions drawn between monotonicity requirements, rank-alignment, and complexity (e.g., chaos-derived vs. Gaussian bump features) represent a useful empirical contribution. However, the current evidence base is narrow, limiting immediate impact.

major comments (3)
  1. [Abstract] Abstract: The Selection Plateau claim—that all rank-monotone scorers converge to identical accuracy independent of functional form—rests on experiments with only nine hand-selected feature classes evaluated on a single model (ViT-Small) and dataset (CIFAR-10) at four sparsity levels. No formal definition of rank-monotonicity is supplied, nor is there an argument that these classes exhaustively or representatively sample the space of rank-preserving monotone functions.
  2. [Abstract] Abstract: The SICS hypothesis assigns specific kappa(S) thresholds (0.65, 0.7, 0.75) and complexity levels (0, 1, 2) that align exactly with the sparsity regimes where the tested feature classes begin to show performance transitions in the reported experiments. This makes the governing relation appear post-hoc and descriptive of the observed data rather than independently derived or prospectively tested.
  3. [Abstract] Abstract: Reported escape gains (e.g., +6.6% at S=0.7 for smooth non-monotone features, +2.6% at S=0.8 for raw high-frequency features) are presented without error bars, statistical significance tests, or details on run-to-run variance, undermining assessment of whether the differences between feature classes are reliable or could arise from uncontrolled factors in architecture, dataset, or feature construction.
minor comments (2)
  1. [Abstract] Abstract: The distinction between 'fake non-monotone scorer' and 'handcrafted Gaussian bump' would benefit from a brief description of their explicit functional forms or construction methods to allow reproducibility.
  2. [Abstract] Abstract: The phrase 'chaos-derived' features is used without a reference or prior definition in the provided summary, which could confuse readers unfamiliar with the specific generation process.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the insightful comments on our work. We address each major comment below, indicating where revisions will be made to improve clarity and rigor.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The Selection Plateau claim—that all rank-monotone scorers converge to identical accuracy independent of functional form—rests on experiments with only nine hand-selected feature classes evaluated on a single model (ViT-Small) and dataset (CIFAR-10) at four sparsity levels. No formal definition of rank-monotonicity is supplied, nor is there an argument that these classes exhaustively or representatively sample the space of rank-preserving monotone functions.

    Authors: We will add a formal definition of rank-monotonicity to the manuscript, defining it as a property where the scorer strictly preserves the ranking order of absolute weight values. The nine feature classes were deliberately chosen to include both monotone and non-monotone variants across different complexity levels to test the core claims. While we recognize that this does not constitute an exhaustive sampling of all possible rank-monotone functions, the consistent results across this diverse set provide supporting evidence for the Selection Plateau. We will expand the discussion to include this caveat and suggest directions for more comprehensive sampling in future studies. revision: yes

  2. Referee: [Abstract] Abstract: The SICS hypothesis assigns specific kappa(S) thresholds (0.65, 0.7, 0.75) and complexity levels (0, 1, 2) that align exactly with the sparsity regimes where the tested feature classes begin to show performance transitions in the reported experiments. This makes the governing relation appear post-hoc and descriptive of the observed data rather than independently derived or prospectively tested.

    Authors: We agree that the specific thresholds in the SICS hypothesis appear closely tied to the experimental observations. The hypothesis was developed based on theoretical intuition about how feature complexity needs to scale with sparsity to capture higher-order information, with the values refined through preliminary tests before the main experiments. To address the post-hoc concern, we will revise the text to present the derivation process more transparently and position SICS as an empirically grounded hypothesis open to further testing, rather than a definitive governing law. revision: partial

  3. Referee: [Abstract] Abstract: Reported escape gains (e.g., +6.6% at S=0.7 for smooth non-monotone features, +2.6% at S=0.8 for raw high-frequency features) are presented without error bars, statistical significance tests, or details on run-to-run variance, undermining assessment of whether the differences between feature classes are reliable or could arise from uncontrolled factors in architecture, dataset, or feature construction.

    Authors: We will include error bars, run-to-run variance details, and statistical significance tests in the revised figures and tables. Specifically, we plan to report standard deviations from multiple random seeds and perform significance testing to confirm the reliability of the reported accuracy differences. revision: yes

standing simulated objections not resolved
  • The experimental evaluation is restricted to a single model (ViT-Small) and dataset (CIFAR-10), and expanding this would require additional computational resources and time not available for the current revision.

Circularity Check

1 steps flagged

SICS hypothesis thresholds assigned post-hoc to match observed escape regimes in experiments

specific steps
  1. fitted input called prediction [Abstract]
    "We propose the Sparsity-Information-Complexity Spectrum (SICS) hypothesis: a sparsity-dependent minimum feature complexity kappa(S) governs plateau escape, with kappa=0 sufficient at low sparsity (S<0.65), kappa=1 dominant at critical sparsity (S~0.7), and kappa=2 necessary at extreme sparsity (S>0.75). On ViT-Small/CIFAR-10, testing nine feature classes across four sparsities, smooth non-monotone features provide +6.6% escape at S=0.7, while only raw features with high-frequency wiggle escape at S=0.8 (+2.6%)."

    The kappa levels (0,1,2) and sparsity thresholds (0.65, 0.7, 0.75) are defined to coincide precisely with the sparsity points where the nine tested feature classes first exhibit escape from the plateau in the reported experiments. The hypothesis is therefore constructed directly from the observed clustering rather than derived from first principles or an independent complexity measure.

full rationale

The paper identifies the Selection Plateau from empirical tests on nine hand-selected feature classes and then proposes the SICS hypothesis with specific kappa(S) values and sparsity boundaries that align exactly with the sparsity levels at which those same classes show performance divergence. This renders the claimed governing relation a re-description of the input data rather than an independent derivation. The generality claim for all rank-monotone scorers lacks a formal definition or exhaustive argument, but the central circularity is in the hypothesis construction itself.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 2 invented entities

The central claim rests on experimental observations from a single architecture and dataset, with the complexity levels and sparsity thresholds appearing to be determined from the results rather than derived independently.

free parameters (1)
  • kappa(S) thresholds
    Sparsity boundaries 0.65, 0.7, 0.75 and complexity values 0,1,2 are assigned to match observed performance transitions.
axioms (2)
  • domain assumption All rank-monotone weight scorers converge to identical accuracy at fixed sparsity independent of functional form
    Stated as the core identified phenomenon that the hypothesis explains.
  • ad hoc to paper A sparsity-dependent minimum feature complexity kappa(S) governs plateau escape
    The SICS hypothesis itself, introduced to account for when different feature classes succeed.
invented entities (2)
  • Sparsity-Information-Complexity Spectrum (SICS) no independent evidence
    purpose: Unifying explanation for performance clustering of pruning methods via sparsity-dependent feature complexity
    New hypothesis without external validation beyond the reported experiments.
  • kappa(S) no independent evidence
    purpose: Minimum feature complexity required to escape the plateau at sparsity S
    Postulated quantity whose values are set to fit the experimental regimes.

pith-pipeline@v0.9.0 · 5518 in / 1902 out tokens · 76142 ms · 2026-05-12T04:16:42.513561+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 2 internal anchors

  1. [1]

    Reconciling modern machine- learning practice and the classical bias–variance trade-off

    Mikhail Belkin, Daniel Hsu, Siyuan Ma, and Soumik Mandal. Reconciling modern machine- learning practice and the classical bias–variance trade-off. InProceedings of the National Academy of Sciences, volume 116, pages 15849–15854, 2019

  2. [2]

    Token merging: Your ViT but faster

    Daniel Bolya, Cheng-Yang Fu, Xiaoliang Dai, Peizhao Zhang, Christoph Feichtenhofer, and Judy Hoffman. Token merging: Your ViT but faster. InInternational Conference on Learning Representations (ICLR), 2023

  3. [3]

    Rigging the lottery: Making all tickets winners

    Utku Evci, Trevor Gale, Jacob Menick, Pablo Samuel Castro, and Erich Elsen. Rigging the lottery: Making all tickets winners. InInternational Conference on Machine Learning (ICML), 2020

  4. [4]

    The lottery ticket hypothesis: Finding sparse, trainable neural networks

    Jonathan Frankle and Michael Carbin. The lottery ticket hypothesis: Finding sparse, trainable neural networks. InInternational Conference on Learning Representations (ICLR), 2019

  5. [5]

    Stabilizing the lottery ticket hypothesis

    Jonathan Frankle, Gintare Karolina Dziugaite, Daniel Roy, and Michael Carbin. Stabilizing the lottery ticket hypothesis. InInternational Conference on Machine Learning (ICML), 2020

  6. [6]

    SparseGPT: Massive language models can be accurately pruned in one-shot

    Elias Frantar and Dan Alistarh. SparseGPT: Massive language models can be accurately pruned in one-shot. InInternational Conference on Machine Learning (ICML), 2023

  7. [7]

    Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding

    Song Han, Huizi Mao, and William J Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. InInternational Conference on Learning Representations (ICLR), 2016

  8. [8]

    Second order derivatives for network pruning: Optimal brain surgeon

    Babak Hassibi and David G Stork. Second order derivatives for network pruning: Optimal brain surgeon. InAdvances in Neural Information Processing Systems (NeurIPS), 1993

  9. [9]

    Training compute-optimal large language models

    Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al. Training compute-optimal large language models. InAdvances in Neural Information Processing Systems (NeurIPS), 2022

  10. [10]

    Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication.Science, 304(5667):78–80, 2004

    Herbert Jaeger and Harald Haas. Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication.Science, 304(5667):78–80, 2004

  11. [11]

    Scaling Laws for Neural Language Models

    Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models.arXiv preprint arXiv:2001.08361, 2020

  12. [12]

    Optimal brain damage

    Yann LeCun, John S Denker, and Sara A Solla. Optimal brain damage. InAdvances in Neural Information Processing Systems (NeurIPS), 1989

  13. [13]

    Not all patches are what you need: Expediting vision transformers via token reorganizations

    Youwei Liang, Chongjian Ge, Zhan Tong, Yibing Song, Jue Wang, and Pengtao Xie. Not all patches are what you need: Expediting vision transformers via token reorganizations. In International Conference on Learning Representations (ICLR), 2022

  14. [14]

    Are sixteen heads really better than one? In Advances in Neural Information Processing Systems (NeurIPS), 2019

    Paul Michel, Omer Levy, and Graham Neubig. Are sixteen heads really better than one? In Advances in Neural Information Processing Systems (NeurIPS), 2019. 16

  15. [15]

    Pruning convolutional neural networks for resource efficient inference

    Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. Pruning convolutional neural networks for resource efficient inference. InInternational Conference on Learning Representations (ICLR), 2017

  16. [16]

    Deep double descent: Where bigger models and more data hurt

    Preetum Nakkiran, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, and Ilya Sutskever. Deep double descent: Where bigger models and more data hurt. InInternational Conference on Learning Representations (ICLR), 2020

  17. [17]

    Traces of class/cross-class structure pervade deep learning spectra.Journal of Machine Learning Research, 21(252):1–64, 2020

    Vardan Papyan. Traces of class/cross-class structure pervade deep learning spectra.Journal of Machine Learning Research, 21(252):1–64, 2020

  18. [18]

    Nonlinear random matrix theory for deep learning

    Jeffrey Pennington and Pratik Worah. Nonlinear random matrix theory for deep learning. Advances in Neural Information Processing Systems (NeurIPS), 2017

  19. [19]

    Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

    Alethea Power, Yuri Burda, Harri Edwards, Igor Babuschkin, and Vedant Misra. Grokking: Generalization beyond overfitting on small algorithmic datasets.arXiv preprint arXiv:2201.02177, 2022

  20. [20]

    Dynam- icViT: Efficient vision transformers with dynamic token sparsification

    Yongming Rao, Wenliang Zhao, Benlin Liu, Jiwen Lu, Jie Zhou, and Cho-Jui Hsieh. Dynam- icViT: Efficient vision transformers with dynamic token sparsification. InAdvances in Neural Information Processing Systems (NeurIPS), 2021

  21. [21]

    Empirical Analysis of the Hessian of Over-Parametrized Neural Networks

    Levent Sagun, Utku Evci, V Ugur Guney, Yann Dauphin, and Leon Bottou. Empirical analysis of the Hessian of over-parametrized neural networks.arXiv preprint arXiv:1706.04454, 2017

  22. [22]

    On the information bottleneck theory of deep learning

    Andrew Michael Saxe, Yamini Bansal, Joel Dapello, Madhu Advani, Artemy Kolchinsky, Bren- dan Daniel Tracey, and David Daniel Cox. On the information bottleneck theory of deep learning. InInternational Conference on Learning Representations (ICLR), 2018

  23. [23]

    Opening the Black Box of Deep Neural Networks via Information

    Ravid Shwartz-Ziv and Naftali Tishby. Opening the black box of deep neural networks via information. InarXiv preprint arXiv:1703.00810, 2017

  24. [24]

    Chaos in random neural networks.Physical Review Letters, 61(3):259, 1988

    Haim Sompolinsky, Andrea Crisanti, and Hans-Jurgen Sommers. Chaos in random neural networks.Physical Review Letters, 61(3):259, 1988

  25. [25]

    A simple and effective pruning approach for large language models

    Mingjie Sun, Zhuang Liu, Anna Bair, and J Zico Kolter. A simple and effective pruning approach for large language models. InInternational Conference on Learning Representations (ICLR), 2024

  26. [26]

    Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned.Annual Meeting of the Association for Computational Linguistics (ACL), 2019

    Elena Voita, David Talbot, Fedor Moiseev, Rico Sennrich, and Ivan Titov. Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned.Annual Meeting of the Association for Computational Linguistics (ACL), 2019

  27. [27]

    Dung beetle optimizer: A new meta-heuristic algorithm for global optimization.The Journal of Supercomputing, 79(7):7305–7336, 2023

    Jiankai Xue and Bo Shen. Dung beetle optimizer: A new meta-heuristic algorithm for global optimization.The Journal of Supercomputing, 79(7):7305–7336, 2023

  28. [28]

    raw chaos

    Miao Yin, Burak Uzkent, Yilin Shen, Hongxia Jin, and Bo Yuan. GOHSP: A unified framework of graph and optimization-based heterogeneous structured pruning for vision transformer.AAAI Conference on Artificial Intelligence, 2023. A Feature Class Definitions We provide complete definitions for the 9 feature classes used in the main experiment, along with thei...

  29. [29]

    We have not directly computedκϵ(ϕ)for our nine feature classes; the assignment of features to κclasses in the main text is based on Spearman correlation with rank, visual inspection of the indicator shapes, and the smoothing operator applied (none / Savitzky-Golay / raw chaos)

  30. [30]

    We have not derived a principled choice ofϵfrom the experimental setup

    The toleranceϵis a hyperparameter; differentϵwould assign different discreteκlevels. We have not derived a principled choice ofϵfrom the experimental setup

  31. [31]

    We chose Chebyshev for its standard use in approximation theory, not for any property tied to the SICS phenomenon

    The Chebyshev basis is one of many possible orthogonal bases on[0.1, 1.0]; Fourier or wavelet bases would give differentD values for the sameϕ. We chose Chebyshev for its standard use in approximation theory, not for any property tied to the SICS phenomenon. 21 A rigorous mechanism-level theory would: (i) computeκϵfor each feature in our battery; (ii) ver...