Recognition: 2 theorem links
· Lean TheoremSpectral Condition for μP under Width-Depth Scaling
Pith reviewed 2026-05-15 17:59 UTC · model grok-4.3
The pith
A spectral framework for maximal update parameterization shows that scaling rules based on k greater than or equal to 2 transformations stabilize feature learning under joint width and depth growth in residual networks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For residual networks whose blocks contain k transformations, the spectral conditions on weight norms and per-step updates yield a μP formulation that, when k is at least 2, produces stable feature learning and robust hyperparameter transfer across simultaneous width-depth scaling; the k equals 1 case and standard parameterization do not.
What carries the argument
The spectral framework that converts constraints on the norms of weights and their updates into explicit scaling rules with width and depth for residual blocks containing k transformations.
If this is right
- The k greater than or equal to 2 scaling rules recover existing μP results for a broad class of optimizers and extend them to additional ones.
- Practical architectures such as Transformers, whose blocks contain multiple transformations, align with the stable regime identified by the framework.
- Hyperparameter transfer from small to large models becomes reliable once the spectral conditions for k greater than or equal to 2 are followed.
- Standard parameterization and the k equals 1 formulation lose stability once both width and depth are increased together.
Where Pith is reading between the lines
- The same spectral lens could be applied to non-residual architectures to test whether a comparable transition appears.
- If the framework holds, it predicts that attention-based models will continue to benefit from the k greater than or equal to 2 rules even at extreme scales.
- The approach supplies a way to derive μP variants for new optimizers without re-deriving the entire theory from scratch.
Load-bearing premise
The mapping from weight and update norms directly to stable feature learning holds across joint width-depth regimes without needing further justification of the spectral radius conditions inside the residual blocks.
What would settle it
Training a GPT-2-style model at two different widths and depths using the k greater than or equal to 2 μP rules and checking whether the same learning-rate schedule still produces stable loss curves; if the k equals 1 rules transfer equally well the claimed distinction collapses.
read the original abstract
Generative foundation models are increasingly scaled in both width and depth, posing significant challenges for stable feature learning and reliable hyperparameter (HP) transfer across model sizes. While maximal update parameterization ($\mu$P) has provided a principled solution to both problems for width scaling, existing extensions to the joint width-depth scaling regime remain fragmented, architecture- and optimizer-specific, and often rely on technically involved theories. In this work, we develop a simple and unified spectral framework for $\mu$P under joint width-depth scaling. For deep residual networks whose residual blocks contain $k$ transformations, the framework specifies how the norms of weights and their per-step updates should scale with width and depth. It reveals a fundamental transition from $k=1$ to $k\geq 2$, unifying previously disparate $\mu$P formulations and identifying the $k\geq 2$ case as more appropriate for practical architectures with multi-transformation branches such as Transformers. Building on this framework, we derive a general recipe for implementing $\mu$P across a broad class of optimizers by mapping spectral constraints to concrete HP parameterizations, recovering existing results and extending them to additional optimizers. Finally, experiments on GPT-2 style language models show that the $\mu$P formulation derived from the $k\geq 2$ case achieves stable feature learning and robust HP transfer under width-depth scaling, whereas standard parameterization and $\mu$P in the $k=1$ case often fail to do so. These results support the practical effectiveness of the proposed spectral framework.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript develops a spectral framework for maximal update parameterization (μP) under joint width-depth scaling in deep residual networks. For residual blocks containing k transformations, it derives scaling rules for the norms of weights and per-step updates, identifies a transition from the k=1 to k≥2 regime, unifies prior μP formulations, and provides a general recipe mapping spectral constraints to hyperparameter choices across optimizers. GPT-2 experiments are presented to show that the k≥2 formulation yields stable feature learning and robust HP transfer, while standard parameterization and the k=1 case do not.
Significance. If the spectral conditions correctly capture feature-learning dynamics in multi-branch architectures, the framework supplies a simple, architecture-aware route to μP that extends beyond width-only scaling and recovers existing results while covering additional optimizers. The GPT-2 validation, if the residual-block modeling holds, would constitute concrete evidence that the k≥2 rules improve stability and transfer under simultaneous width-depth growth.
major comments (2)
- [Spectral framework derivation (around the k-transition)] The central claim that the k≥2 spectral rules are the appropriate choice for Transformers rests on the assumption that the general residual block with k transformations accurately reproduces the effective Jacobian spectral radius of a self-attention + FFN block. The manuscript does not supply an explicit linearization or eigenvalue-multiplication argument showing how the depth-multiplication factor for k≥2 maps onto the attention-MLP composition; without this step the experimental interpretation that the observed stability is due to the derived parameterization rather than other implementation details remains open.
- [Experimental section on GPT-2 models] In the GPT-2 experiments, the paper reports that the k≥2 μP achieves stable feature learning and HP transfer while k=1 and standard parameterization fail. To make this comparison load-bearing, the manuscript should include a direct check that the implemented residual structure matches the k≥2 model (e.g., by measuring the empirical spectral radius of the block Jacobian or by ablating the number of transformations inside each residual unit).
minor comments (1)
- [Notation and definitions] Notation for the per-step update norm scaling should be introduced once and used consistently; the current presentation mixes “update norm” and “ΔW norm” without a single defining equation.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We agree that additional derivations and empirical checks will strengthen the connection between the theoretical framework and the Transformer experiments, and we will incorporate these in the revised manuscript.
read point-by-point responses
-
Referee: [Spectral framework derivation (around the k-transition)] The central claim that the k≥2 spectral rules are the appropriate choice for Transformers rests on the assumption that the general residual block with k transformations accurately reproduces the effective Jacobian spectral radius of a self-attention + FFN block. The manuscript does not supply an explicit linearization or eigenvalue-multiplication argument showing how the depth-multiplication factor for k≥2 maps onto the attention-MLP composition; without this step the experimental interpretation that the observed stability is due to the derived parameterization rather than other implementation details remains open.
Authors: We agree that an explicit linearization argument would make the mapping more rigorous. In the revision we will add a dedicated subsection deriving the effective Jacobian spectral radius for a residual block composed of self-attention followed by an FFN. Under the standard assumption that the individual Jacobians have spectral radii controlled by the width scaling, their product yields the depth-multiplication factor matching the k=2 case, thereby justifying the choice of the k≥2 rules for Transformers and clarifying that the observed stability arises from the parameterization. revision: yes
-
Referee: [Experimental section on GPT-2 models] In the GPT-2 experiments, the paper reports that the k≥2 μP achieves stable feature learning and HP transfer while k=1 and standard parameterization fail. To make this comparison load-bearing, the manuscript should include a direct check that the implemented residual structure matches the k≥2 model (e.g., by measuring the empirical spectral radius of the block Jacobian or by ablating the number of transformations inside each residual unit).
Authors: We accept that a direct verification is needed to make the comparison conclusive. In the revised experimental section we will report measurements of the empirical spectral radius of the residual-block Jacobians computed on the trained GPT-2 models, confirming that the implemented architecture aligns with the k≥2 regime. We will also add an ablation that varies the number of transformations per residual unit and shows the corresponding change in stability and transfer behavior. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper derives its spectral framework for μP under width-depth scaling from first-principles analysis of residual blocks with k transformations, specifying weight-norm and update scaling rules that unify prior μP variants. No equations or steps in the abstract reduce predictions to fitted inputs by construction, nor do they rely on self-citations for load-bearing uniqueness claims. The k≥2 transition and resulting HP recipes are presented as outputs of the spectral radius conditions rather than inputs, and the GPT-2 experiments serve as external validation rather than tautological confirmation. The derivation chain remains self-contained against the stated assumptions.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Condition 3.1 (Spectral condition for μP under joint width-depth scaling) … α_l ∥W(2)_l∥_R ∥W(1)_l∥_R = Θ(1/L)
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
For residual blocks of depth k … product of the α_l and the norms of the k hidden weights to scale as Θ(1/L)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Lei Jimmy Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. Layer normalization.CoRR, abs/1607.06450, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[2]
Haeffele, Soo Min Kwon, Qing Qu, Peng Wang, Zhangyang Wang, and Can Yaras
Laura Balzano, Tianjiao Ding, Benjamin D. Haeffele, Soo Min Kwon, Qing Qu, Peng Wang, Zhangyang Wang, and Can Yaras. An overview of low-rank structures in the training and adaptation of large models.CoRR, abs/2503.19859, 2025
-
[3]
u- µp: The unit-scaled maximal update parametrization
Charlie Blake, Constantin Eichenberg, Josef Dean, Lukas Balles, Luke Yuri Prince, Björn Deiseroth, Andrés Felipe Cruz-Salinas, Carlo Luschi, Samuel Weinbach, and Douglas Orr. u- µp: The unit-scaled maximal update parametrization. In ICLR, 2025
work page 2025
-
[4]
Self-consistent dynamical field theory of kernel evolution in wide neural networks
Blake Bordelon and Cengiz Pehlevan. Self-consistent dynamical field theory of kernel evolution in wide neural networks. InNeurIPS, 2022
work page 2022
-
[5]
Infinite limits of multi-head transformer dynamics
Blake Bordelon, Hamza Tahir Chaudhry, and Cengiz Pehlevan. Infinite limits of multi-head transformer dynamics. In NeurIPS, 2024
work page 2024
-
[6]
Depthwise hyperparameter transfer in residual networks: Dynamics and scaling limit
Blake Bordelon, Lorenzo Noci, Mufan Bill Li, Boris Hanin, and Cengiz Pehlevan. Depthwise hyperparameter transfer in residual networks: Dynamics and scaling limit. InICLR, 2024
work page 2024
-
[7]
Xiangning Chen, Chen Liang, Da Huang, Esteban Real, Kaiyuan Wang, Hieu Pham, Xuanyi Dong, Thang Luong, Cho-Jui Hsieh, Yifeng Lu, and Quoc V. Le. Symbolic discovery of optimization algorithms. InNeurIPS, 2023
work page 2023
-
[8]
Cerebras-gpt: Open compute-optimal language models trained on the cerebras wafer-scale cluster
Nolan Dey, Gurpreet Gosal, Zhiming Chen, Hemant Khachane, William Marshall, Ribhu Pathria, Marvin Tom, and Joel Hestness. Cerebras-gpt: Open compute-optimal language models trained on the cerebras wafer-scale cluster. CoRR, abs/2304.03208, 2023
-
[9]
Sparse maximal update parameterization: A holistic approach to sparse training dynamics
Nolan Dey, Shane Bergsma, and Joel Hestness. Sparse maximal update parameterization: A holistic approach to sparse training dynamics. InNeurIPS, 2024
work page 2024
-
[10]
Don’t be lazy: Completep enables compute-efficient deep transformers.CoRR, abs/2505.01618, 2025
Nolan Dey, Bin Claire Zhang, Lorenzo Noci, Mufan Bill Li, Blake Bordelon, Shane Bergsma, Cengiz Pehlevan, Boris Hanin, and Joel Hestness. Don’t be lazy: Completep enables compute-efficient deep transformers.CoRR, abs/2505.01618, 2025
-
[11]
Openwebtext corpus.http://Skylion007.github.io/OpenWebTextCorpus, 2019
Aaron Gokaslan and Vanya Cohen. Openwebtext corpus.http://Skylion007.github.io/OpenWebTextCorpus, 2019
work page 2019
-
[12]
Shampoo: Preconditioned stochastic tensor optimization
Vineet Gupta, Tomer Koren, and Yoram Singer. Shampoo: Preconditioned stochastic tensor optimization. In International Conference on Machine Learning, pages 1842–1850. PMLR, 2018
work page 2018
-
[13]
Moritz Haas, Jin Xu, Volkan Cevher, and Leena Chennuru Vankadara.µp2: Effective sharpness aware minimization requires layerwise perturbation scaling. InNeurIPS, 2024
work page 2024
-
[14]
Delving deep into rectifiers: Surpassing human-level performance on imagenet classification
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. InICCV, pages 1026–1034, 2015
work page 2015
-
[15]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, pages 770–778, 2016
work page 2016
-
[16]
Query-key normalization for transformers
Alex Henry, Prudhvi Raj Dachapally, Shubham Shantaram Pawar, and Yuxuan Chen. Query-key normalization for transformers. In Trevor Cohn, Yulan He, and Yang Liu, editors,Findings of the Association for Computational Linguistics: EMNLP 2020, volume EMNLP 2020, pages 4246–4253, 2020
work page 2020
-
[17]
Training Compute-Optimal Large Language Models
Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al. Training compute-optimal large language models. CoRR, abs/2203.15556, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[18]
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Shengding Hu, Yuge Tu, Xu Han, Chaoqun He, Ganqu Cui, Xiang Long, Zhi Zheng, Yewei Fang, Yuxiang Huang, Weilin Zhao, et al. Minicpm: Unveiling the potential of small language models with scalable training strategies. arXiv preprint arXiv:2404.06395, 2024. 12
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[19]
On the parameterization of second-order optimization effective towards the infinite width
Satoki Ishikawa and Ryo Karakida. On the parameterization of second-order optimization effective towards the infinite width. InICLR, 2024
work page 2024
-
[20]
Neural tangent kernel: Convergence and generalization in neural networks
Arthur Jacot, Clément Hongler, and Franck Gabriel. Neural tangent kernel: Convergence and generalization in neural networks. InNeurIPS, pages 8580–8589, 2018
work page 2018
-
[21]
Muon: An optimizer for hidden layers in neural networks.URL https://kellerjordan
Keller Jordan, Yuchen Jin, Vlado Boza, You Jiacheng, Franz Cecista, Laker Newhouse, and Jeremy Bernstein. Muon: An optimizer for hidden layers in neural networks.URL https://kellerjordan. github. io/posts/muon, 6, 2024
work page 2024
-
[22]
Scaling Laws for Neural Language Models
Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models.CoRR, abs/2001.08361, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2001
-
[23]
nanogpt.https://github.com/karpathy/nanoGPT, 2022
Andrej Karpathy. nanogpt.https://github.com/karpathy/nanoGPT, 2022
work page 2022
-
[24]
Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. Deepseek-v3 technical report.arXiv preprint arXiv:2412.19437, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[25]
Sophia: A scalable stochastic second-order optimizer for language model pre-training
Hong Liu, Zhiyuan Li, David Leo Wright Hall, Percy Liang, and Tengyu Ma. Sophia: A scalable stochastic second-order optimizer for language model pre-training. InICLR, 2024
work page 2024
-
[26]
Muon is Scalable for LLM Training
Jingyuan Liu, Jianlin Su, Xingcheng Yao, Zhejun Jiang, Guokun Lai, Yulun Du, Yidao Qin, Weixin Xu, Enzhe Lu, Junjie Yan, et al. Muon is scalable for llm training.arXiv preprint arXiv:2502.16982, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[27]
Decoupled weight decay regularization
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InICLR, 2019
work page 2019
-
[28]
A method for solving the convex programming problem with convergence rate o(1/k2)
Yurii Nesterov. A method for solving the convex programming problem with convergence rate o(1/k2). InDokl akad nauk Sssr, volume 269, page 543, 1983
work page 1983
-
[29]
Extendingµp: Spectral conditions for feature learning across optimizers
Marieme Ngom, Sam Foreman, Venkatram Vishwanath, et al. Extendingµp: Spectral conditions for feature learning across optimizers. InOPT 2025: Optimization for Machine Learning, 2025
work page 2025
-
[30]
Large Language Diffusion Models
Shen Nie, Fengqi Zhu, Zebin You, Xiaolu Zhang, Jingyang Ou, Jun Hu, Jun Zhou, Yankai Lin, Ji-Rong Wen, and Chongxuan Li. Large language diffusion models.CoRR, abs/2502.09992, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[31]
Shikai Qiu, Zixi Chen, Hoang Phan, Qi Lei, and Andrew Gordon Wilson. Hyperparameter transfer enables consistent gains of matrix-preconditioned optimizers across scales.arXiv preprint arXiv:2512.05620, 2025
-
[32]
Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019
work page 2019
-
[33]
Schoenholz, Justin Gilmer, Surya Ganguli, and Jascha Sohl-Dickstein
Samuel S. Schoenholz, Justin Gilmer, Surya Ganguli, and Jascha Sohl-Dickstein. Deep information propagation. In ICLR, 2017
work page 2017
-
[34]
Aaditya Singh, Adam Fry, Adam Perelman, Adam Tart, Adi Ganesh, Ahmed El-Kishky, Aidan McLaughlin, Aiden Low, AJ Ostrow, Akhila Ananthram, et al. Openai gpt-5 system card.arXiv preprint arXiv:2601.03267, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[35]
Kimi K2: Open Agentic Intelligence
Kimi Team, Yifan Bai, Yiping Bao, Guanduo Chen, Jiahao Chen, Ningxin Chen, Ruijue Chen, Yanru Chen, Yuankun Chen, Yutian Chen, et al. Kimi k2: Open agentic intelligence.arXiv preprint arXiv:2507.20534, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[36]
On feature learning in structured state space models
Leena Chennuru Vankadara, Jin Xu, Moritz Haas, and Volkan Cevher. On feature learning in structured state space models. InNeurIPS, 2024
work page 2024
-
[37]
Gomez, Lukasz Kaiser, and Illia Polosukhin
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. InNIPS, pages 5998–6008, 2017
work page 2017
-
[38]
Cambridge university press, 2018
Roman Vershynin.High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018
work page 2018
-
[39]
SOAP: Improving and Stabilizing Shampoo using Adam
Nikhil Vyas, Depen Morwani, Rosie Zhao, Mujin Kwun, Itai Shapira, David Brandfonbrener, Lucas Janson, and Sham Kakade. Soap: Improving and stabilizing shampoo using adam.arXiv preprint arXiv:2409.11321, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[40]
Controlled llm training on spectral sphere.arXiv preprint arXiv:2601.08393, 2026
Tian Xie, Haoming Luo, Haoyu Tang, Yiwen Hu, Jason Klein Liu, Qingnan Ren, Yang Wang, Wayne Xin Zhao, Rui Yan, Bing Su, et al. Controlled llm training on spectral sphere.arXiv preprint arXiv:2601.08393, 2026
-
[41]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[42]
Tensor programs III: neural matrix laws.CoRR, abs/2009.10685, 2020
Greg Yang. Tensor programs III: neural matrix laws.CoRR, abs/2009.10685, 2020. 13
-
[43]
Greg Yang and Edward J. Hu. Tensor programs IV: feature learning in infinite-width neural networks. InICML, volume 139, pages 11727–11737. PMLR, 2021
work page 2021
-
[44]
Tensor programs ivb: Adaptive optimization in the infinite-width limit.CoRR, abs/2308.01814, 2023
Greg Yang and Etai Littwin. Tensor programs ivb: Adaptive optimization in the infinite-width limit.CoRR, abs/2308.01814, 2023
-
[45]
Greg Yang, Edward J. Hu, Igor Babuschkin, Szymon Sidor, Xiaodong Liu, David Farhi, Nick Ryder, Jakub Pachocki, Weizhu Chen, and Jianfeng Gao. Tensor programs V: tuning large neural networks via zero-shot hyperparameter transfer. CoRR, abs/2203.03466, 2022
-
[46]
Greg Yang, James B. Simon, and Jeremy Bernstein. A spectral condition for feature learning. CoRR, abs/2310.17813, 2023
-
[47]
Tensor programs VI: feature learning in infinite depth neural networks
Greg Yang, Dingli Yu, Chen Zhu, and Soufiane Hayou. Tensor programs VI: feature learning in infinite depth neural networks. InICLR, 2024
work page 2024
-
[48]
Galore: Memory-efficient LLM training by gradient low-rank projection
Jiawei Zhao, Zhenyu Zhang, Beidi Chen, Zhangyang Wang, Anima Anandkumar, and Yuandong Tian. Galore: Memory-efficient LLM training by gradient low-rank projection. InICML, 2024
work page 2024
-
[49]
Scaling diffusion transformers efficiently viaµp.CoRR, abs/2505.15270, 2025
Chenyu Zheng, Xinyu Zhang, Rongzhen Wang, Wei Huang, Zhi Tian, Weilin Huang, Jun Zhu, and Chongxuan Li. Scaling diffusion transformers efficiently viaµp.CoRR, abs/2505.15270, 2025. 14 Contents of Appendix A Additional Related Work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16 A.1µP under Width Scaling . . . . . . . . . ....
-
[50]
does not directly extend to practical architectures (residual block with two or more layers), nor do they support robust HP transfer across depth [10, 47]. 17 B.1.3 Derivation for Initial Condition We first derive the initialization condition that ensures stability of feature magnitudes during forward propagation for single-layer residual blocks. We consi...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.