TOAST: Transformer Optimization using Adaptive and Simple Transformations
Pith reviewed 2026-05-23 19:43 UTC · model grok-4.3
The pith
Large portions of transformer depth can be replaced by trivial functions like linear maps or the identity.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TOAST is a framework that uses intra-network representation similarities to approximate entire transformer blocks with lightweight closed-form mappings such as linear transformations or the identity function. Applied to pretrained vision models including ViT, DINOv2, and DeiT, it reduces parameters and computation across datasets from MNIST to ImageNet-1k while preserving or improving downstream performance, without any additional training.
What carries the argument
Adaptive replacement of transformer blocks by simple closed-form functions (linear or identity) selected based on representation similarities.
If this is right
- Model size decreases substantially by removing or simplifying multiple blocks.
- Computational cost during inference drops due to fewer operations.
- Downstream task performance remains comparable or better on tested vision datasets.
- No retraining or fine-tuning is needed for the optimization.
Where Pith is reading between the lines
- Similar redundancies might exist in other architectures like language transformers, allowing broader application.
- Models could be trained with awareness of these approximations to optimize depth from the start.
- Dynamic selection of which blocks to approximate could adapt to different inputs or tasks.
- Exploring the limits on more challenging benchmarks would clarify how far the replacement can go.
Load-bearing premise
The internal representations in the transformer are similar enough across blocks that a closed-form linear map or identity can substitute for a full block without performance loss.
What would settle it
Observing a significant accuracy drop on ImageNet-1k after applying TOAST to a standard ViT model would falsify the central claim.
Figures
read the original abstract
Foundation models achieve state-of-the-art performance across different tasks, but their size and computational demands raise concerns about accessibility and sustainability. Existing efficiency methods often require additional retraining or finetuning, limiting their practicality. Recent findings suggest that deep neural networks exhibit internal representation similarities. While such similarities across different models have been exploited for enabling techniques such as model stitching and merging, intra-network redundancy remains underexplored as a source for efficiency gains. In this paper, we introduce Transformer Optimization using Adaptive and Simple Transformations (TOAST), a framework that exploits these redundancies to approximate entire transformer blocks with lightweight closed-form mappings, such as linear transformations or even the identity function, without any additional training. Across state-of-the-art pretrained vision models (e.g., ViT, DINOv2, DeiT) and datasets ranging from MNIST to ImageNet-1k, TOAST reduces parameters and computation while preserving, and in some cases improving, downstream performance. These results show that large portions of transformer depth can be replaced by trivial functions, opening a new perspective on efficient foundation models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces TOAST, a framework that exploits intra-network representation similarities in pretrained vision transformers (ViT, DINOv2, DeiT) to replace selected blocks with closed-form lightweight mappings such as linear transformations or the identity function. These replacements are performed without retraining or finetuning. Experiments across datasets from MNIST to ImageNet-1k report that parameter count and computation can be reduced while downstream performance is preserved or improved, supporting the claim that large portions of transformer depth can be replaced by trivial functions.
Significance. If the central empirical claim holds after proper controls, the result would be significant for efficient foundation-model deployment: it would demonstrate that intra-network redundancy can be exploited via simple, training-free substitutions rather than distillation or pruning, and would open a new direction for depth reduction that relies on closed-form mappings instead of learned approximations. The absence of retraining is a notable practical strength.
major comments (4)
- [§3.2] §3.2 (Mapping Computation): the procedure for deriving the linear map is not specified. It is unclear whether the least-squares fit is performed on activations from a held-out validation split, the training set, or the evaluation distribution; without this, it is impossible to rule out that block selection or map fitting overfits the reported test metrics.
- [§4.2, Table 3] §4.2 and Table 3: no error bars, standard deviations across seeds, or statistical significance tests are reported for the accuracy deltas. Several entries claim small improvements (e.g., +0.3 % on ImageNet) that cannot be distinguished from run-to-run variance, undermining the claim that performance is “preserved or improved.”
- [§4.3] §4.3 (Block Selection): the criterion used to decide which blocks are replaced by identity versus linear map versus left unchanged is not described. If selection is performed after measuring downstream accuracy on the test set, the reported compression ratios are post-hoc and the central claim of automatic, similarity-driven replacement is not supported.
- [§5] §5 (Ablations): there is no control experiment that applies random block replacement or random linear maps of the same rank; without this baseline it is impossible to determine whether the observed preservation of accuracy is due to the intra-network similarity hypothesis or simply to the robustness of the remaining network.
minor comments (3)
- [§3.1] Notation for the linear map (e.g., the matrix W and bias b) is introduced without an explicit equation; add Eq. (X) defining the replacement operation.
- [Figure 2] Figure 2 caption does not state the number of models or random seeds used to generate the similarity heatmaps.
- [Abstract vs §4.1] The abstract states “across state-of-the-art pretrained vision models” but the experimental section only reports three families; clarify the exact model list in the main text.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major point below and indicate where revisions will be made to strengthen the manuscript.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Mapping Computation): the procedure for deriving the linear map is not specified. It is unclear whether the least-squares fit is performed on activations from a held-out validation split, the training set, or the evaluation distribution; without this, it is impossible to rule out that block selection or map fitting overfits the reported test metrics.
Authors: We will revise §3.2 to explicitly describe the mapping procedure. The linear maps are obtained via least-squares regression on activations collected from the training set (with no access to validation or test data). This detail was omitted for brevity but will be added along with the precise optimization objective to eliminate any ambiguity regarding data leakage or overfitting. revision: yes
-
Referee: [§4.2, Table 3] §4.2 and Table 3: no error bars, standard deviations across seeds, or statistical significance tests are reported for the accuracy deltas. Several entries claim small improvements (e.g., +0.3 % on ImageNet) that cannot be distinguished from run-to-run variance, undermining the claim that performance is “preserved or improved.”
Authors: We agree that variability measures are important for interpreting small deltas. In the revision we will rerun the key ImageNet experiments across multiple random seeds, report means and standard deviations in Table 3, and add a note on whether the observed changes exceed typical run-to-run variance. The primary claim remains preservation rather than consistent improvement, but the added statistics will allow readers to assess this directly. revision: yes
-
Referee: [§4.3] §4.3 (Block Selection): the criterion used to decide which blocks are replaced by identity versus linear map versus left unchanged is not described. If selection is performed after measuring downstream accuracy on the test set, the reported compression ratios are post-hoc and the central claim of automatic, similarity-driven replacement is not supported.
Authors: Block selection is performed solely on the basis of intra-block representation similarity measured on training-set activations (via reconstruction error or cosine similarity between the original block output and the candidate mapping). No test-set accuracy is used at any stage. We will expand §4.3 to state the exact similarity threshold and decision rule, making the automatic, training-only nature of the procedure explicit. revision: yes
-
Referee: [§5] §5 (Ablations): there is no control experiment that applies random block replacement or random linear maps of the same rank; without this baseline it is impossible to determine whether the observed preservation of accuracy is due to the intra-network similarity hypothesis or simply to the robustness of the remaining network.
Authors: We will add the requested control in the revised §5. Specifically, we will report results for random block selection followed by either identity substitution or random linear maps of matching rank. Preliminary checks indicate that such random replacements cause clear accuracy degradation relative to the similarity-driven choices; the new table will quantify this gap and thereby support that the performance retention stems from the identified redundancies. revision: yes
Circularity Check
No circularity: TOAST presents empirical approximations without self-referential reductions
full rationale
The abstract and description introduce TOAST as a new framework that exploits observed intra-network representation similarities (from prior literature) to replace blocks with closed-form linear maps or identity functions. No equations, fitting procedures, or derivation steps are described that would reduce a claimed prediction or result to its own inputs by construction. The central contribution is an empirical method and evaluation across ViT/DINOv2/DeiT models on MNIST-to-ImageNet, which stands as independent content rather than a renaming, self-citation load-bearing premise, or fitted parameter presented as a prediction. No self-citation chains or ansatzes are invoked in the provided text to justify uniqueness or force the outcome.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Unified data-free compression: Pruning and quantization without fine-tuning
Shipeng Bai, Jun Chen, Xintian Shen, Yixuan Qian, and Yong Liu. Unified data-free compression: Pruning and quantization without fine-tuning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.\ 5876--5885, 2023
work page 2023
-
[2]
Topological data analysis for neural network analysis: A comprehensive survey
Rubén Ballester, Carles Casacuberta, and Sergio Escalera. Topological data analysis for neural network analysis: A comprehensive survey. arXiv preprint arXiv:2312.05840, December 2023
-
[3]
Representation topology divergence: A method for comparing neural network representations
Serguei Barannikov, Ilya Trofimov, Nikita Balabin, and Evgeny Burnaev. Representation topology divergence: A method for comparing neural network representations. arXiv preprint arXiv:2201.00058, 2021
-
[4]
Bootstrapping parallel anchors for relative representations
Irene Cannistraci, Luca Moschella, Valentino Maiorca, Marco Fumero, Antonio Norelli, and Emanuele Rodol \` a . Bootstrapping parallel anchors for relative representations. In Krystal Maughan, Rosanne Liu, and Thomas F. Burns (eds.), The First Tiny Papers Track at ICLR 2023, Tiny Papers @ ICLR 2023, Kigali, Rwanda, May 5, 2023 . OpenReview.net, 2023. URL h...
work page 2023
-
[5]
From bricks to bridges: Product of invariances to enhance latent space communication
Irene Cannistraci, Luca Moschella, Marco Fumero, Valentino Maiorca, and Emanuele Rodol \`a . From bricks to bridges: Product of invariances to enhance latent space communication. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=vngVydDWft
work page 2024
-
[6]
From charts to atlas: Merging latent spaces into one
Donato Crisostomi, Irene Cannistraci, Luca Moschella, Pietro Barbiero, Marco Ciccone, Pietro Lio, and Emanuele Rodol \`a . From charts to atlas: Merging latent spaces into one. In NeurIPS 2023 Workshop on Symmetry and Geometry in Neural Representations, 2023. URL https://openreview.net/forum?id=ZFu7CPtznY
work page 2023
-
[7]
Reliability of cka as a similarity measure in deep learning
MohammadReza Davari, Stefan Horoi, Amine Natik, Guillaume Lajoie, Guy Wolf, and Eugene Belilovsky. Reliability of cka as a similarity measure in deep learning. arXiv preprint arXiv:2210.16156, 2022
-
[8]
The mnist database of handwritten digit images for machine learning research
Li Deng. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29 0 (6): 0 141--142, 2012
work page 2012
-
[9]
An image is worth 16x16 words: Transformers for image recognition at scale
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In 9th International Conference on Learning Representations, ICLR 2021, ...
work page 2021
-
[10]
Latent functional maps: a spectral framework for representation alignment
Marco Fumero, Marco Pegoraro, Valentino Maiorca, Francesco Locatello, and Emanuele Rodol\` a . Latent functional maps: a spectral framework for representation alignment. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang (eds.), Advances in Neural Information Processing Systems, volume 37, pp.\ 66178--66203. Curran Associ...
work page 2024
-
[11]
Relations between two sets of variates
Harold Hotelling. Relations between two sets of variates. Breakthroughs in statistics: methodology and distribution, pp.\ 162--190, 1992
work page 1992
-
[12]
Similarity of neural network models: A survey of functional and representational measures
Max Klabunde, Tobias Schumacher, Markus Strohmaier, and Florian Lemmerich. Similarity of neural network models: A survey of functional and representational measures. arXiv preprint arXiv:2305.06329, 2023
-
[13]
Similarity of neural network representations revisited
Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton. Similarity of neural network representations revisited. In International Conference on Machine Learning, pp.\ 3519--3529. PMLR, 2019
work page 2019
-
[14]
Learning multiple layers of features from tiny images
Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009
work page 2009
-
[16]
Internal representations of vision models through the lens of frames on data manifolds
Henry Kvinge, Grayson Jorgenson, Davis Brown, Charles Godfrey, and Tegan Emerson. Internal representations of vision models through the lens of frames on data manifolds. In NeurIPS 2023 Workshop on Symmetry and Geometry in Neural Representations, 2022
work page 2023
-
[17]
On the direct alignment of latent spaces
Zorah L\"ahner and Michael Moeller. On the direct alignment of latent spaces. In Marco Fumero, Emanuele Rodolá, Clementine Domine, Francesco Locatello, Karolina Dziugaite, and Caron Mathilde (eds.), Proceedings of UniReps: the First Workshop on Unifying Representations in Neural Models, volume 243 of Proceedings of Machine Learning Research, pp.\ 158--169...
work page 2024
-
[18]
Zhu Liao, Victor Qu \'e tu, Van-Tam Nguyen, and Enzo Tartaglione. Can unstructured pruning reduce the depth in deep neural networks? In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.\ 1402--1406, 2023
work page 2023
-
[19]
Llm-pruner: On the structural pruning of large language models
Xinyin Ma, Gongfan Fang, and Xinchao Wang. Llm-pruner: On the structural pruning of large language models. Advances in neural information processing systems, 36: 0 21702--21720, 2023
work page 2023
-
[20]
Latent space translation via semantic alignment
Valentino Maiorca, Luca Moschella, Antonio Norelli, Marco Fumero, Francesco Locatello, and Emanuele Rodol \`a . Latent space translation via semantic alignment. Advances in Neural Information Processing Systems, 36, 2024
work page 2024
-
[21]
Insights on representational similarity in neural networks with canonical correlation
Ari Morcos, Maithra Raghu, and Samy Bengio. Insights on representational similarity in neural networks with canonical correlation. Advances in Neural Information Processing Systems, 31, 2018
work page 2018
-
[22]
Relative representations enable zero-shot latent space communication
Luca Moschella, Valentino Maiorca, Marco Fumero, Antonio Norelli, Francesco Locatello, and Emanuele Rodol \`a . Relative representations enable zero-shot latent space communication. In Proc. ICLR, 2023
work page 2023
-
[23]
Thao Nguyen, Maithra Raghu, and Simon Kornblith. Do wide and deep networks learn the same things? uncovering how neural network representations vary with width and depth. arXiv preprint arXiv:2010.15327, 2020
-
[24]
Asif: Coupled data turns unimodal models to multimodal without training
Antonio Norelli, Marco Fumero, Valentino Maiorca, Luca Moschella, Emanuele Rodola, and Francesco Locatello. Asif: Coupled data turns unimodal models to multimodal without training. Advances in Neural Information Processing Systems, 36: 0 15303--15319, 2023
work page 2023
-
[25]
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab, Timoth \'e e Darcet, Th \'e o Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[26]
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In International conference on machine learning, pp.\ 8748--8763. PMLR, 2021
work page 2021
-
[27]
Maithra Raghu, Justin Gilmer, Jason Yosinski, and Jascha Sohl-Dickstein. Svcca: Singular vector canonical correlation analysis for deep learning dynamics and interpretability. Advances in neural information processing systems, 30, 2017
work page 2017
-
[28]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet Large Scale Visual Recognition Challenge . International Journal of Computer Vision (IJCV), 115 0 (3): 0 211--252, 2015. doi:10.1007/s11263-015-0816-y
-
[29]
On the effect of dropping layers of pre-trained transformer models
Hassan Sajjad, Fahim Dalvi, Nadir Durrani, and Preslav Nakov. On the effect of dropping layers of pre-trained transformer models. Computer Speech & Language, 77: 0 101429, 2023
work page 2023
-
[30]
Laion-5b: An open large-scale dataset for training next generation image-text models
Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, Patrick Schramowski, Srivatsa Kundurthy, Katherine Crowson, Ludwig Schmidt, Robert Kaczmarczyk, and Jenia Jitsev. Laion-5b: An open large-scale dataset for training next generation image-text models....
work page 2022
-
[31]
Woodfisher: Efficient second-order approximation for neural network compression
Sidak Pal Singh and Dan Alistarh. Woodfisher: Efficient second-order approximation for neural network compression. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp.\ 18098--18109. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/202...
work page 2020
-
[32]
You need multiple exiting: Dynamic early exiting for accelerating unified vision language model
Shengkun Tang, Yaqing Wang, Zhenglun Kong, Tianchi Zhang, Yao Li, Caiwen Ding, Yanzhi Wang, Yi Liang, and Dongkuan Xu. You need multiple exiting: Dynamic early exiting for accelerating unified vision language model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 10781--10791, 2023
work page 2023
-
[33]
Training data-efficient image transformers & distillation through attention
Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Herv \'e J \'e gou. Training data-efficient image transformers & distillation through attention. arxiv 2020. arXiv preprint arXiv:2012.12877, 2 0 (3), 2020
-
[34]
The geometry of hidden representations of large transformer models
Lucrezia Valeriani, Diego Doimo, Francesca Cuturello, Alessandro Laio, Alessio Ansuini, and Alberto Cazzaniga. The geometry of hidden representations of large transformer models. Advances in Neural Information Processing Systems, 36, 2024
work page 2024
-
[35]
Convolutional networks with adaptive inference graphs
Andreas Veit and Serge Belongie. Convolutional networks with adaptive inference graphs. In Proceedings of the European Conference on Computer Vision (ECCV), September 2018
work page 2018
-
[36]
Residual networks behave like ensembles of relatively shallow networks
Andreas Veit, Michael J Wilber, and Serge Belongie. Residual networks behave like ensembles of relatively shallow networks. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016. URL https://proceedings.neurips.cc/paper_files/paper/2016/file/37bc2f75...
work page 2016
-
[37]
Skip-attention: Improving vision transformers by paying less attention
Shashanka Venkataramanan, Amir Ghodrati, Yuki M Asano, Fatih Porikli, and Amir Habibian. Skip-attention: Improving vision transformers by paying less attention. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=vI95kcLAoU
work page 2024
-
[38]
Practical network acceleration with tiny sets
Guo-Hua Wang and Jianxin Wu. Practical network acceleration with tiny sets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
work page 2023
-
[39]
Davis, Kristen Grauman, and Rogerio Feris
Zuxuan Wu, Tushar Nagarajan, Abhishek Kumar, Steven Rennie, Larry S. Davis, Kristen Grauman, and Rogerio Feris. Blockdrop: Dynamic inference paths in residual networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018
work page 2018
-
[40]
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms
Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[41]
DeeBERT: Dynam ic Early Exiting for Accelerating BERT Inference,
Ji Xin, Raphael Tang, Jaejun Lee, Yaoliang Yu, and Jimmy Lin. Deebert: Dynamic early exiting for accelerating bert inference. arXiv preprint arXiv:2004.12993, 2020
-
[42]
Width & depth pruning for vision transformers
Fang Yu, Kun Huang, Meng Wang, Yuan Cheng, Wei Chu, and Li Cui. Width & depth pruning for vision transformers. In Proc. AAAI, 2022
work page 2022
-
[43]
A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark
Xiaohua Zhai, Joan Puigcerver, Alexander Kolesnikov, Pierre Ruyssen, Carlos Riquelme, Mario Lucic, Josip Djolonga, Andre Susano Pinto, Maxim Neumann, Alexey Dosovitskiy, et al. A large-scale study of representation learning with the visual task adaptation benchmark. arXiv preprint arXiv:1910.04867, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1910
-
[44]
Dense vision transformer compression with few samples
Hanxiao Zhang, Yifan Zhou, and Guo-Hua Wang. Dense vision transformer compression with few samples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.\ 15825--15834, June 2024
work page 2024
-
[45]
Accelerating training of transformer-based language models with progressive layer dropping
Minjia Zhang and Yuxiong He. Accelerating training of transformer-based language models with progressive layer dropping. Advances in neural information processing systems, 33: 0 14011--14023, 2020
work page 2020
-
[46]
Bert loses patience: Fast and robust inference with early exit
Wangchunshu Zhou, Canwen Xu, Tao Ge, Julian McAuley, Ke Xu, and Furu Wei. Bert loses patience: Fast and robust inference with early exit. Advances in Neural Information Processing Systems, 33: 0 18330--18341, 2020
work page 2020
-
[47]
\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...
-
[48]
\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...
-
[49]
, " * write output.state after.block = add.period write newline
@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...
-
[50]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.