{"total":15,"items":[{"citing_arxiv_id":"2606.28486","ref_index":20,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Spectral phase transitions and trainability in neural network learning dynamics","primary_cat":"cond-mat.dis-nn","submitted_at":"2026-06-26T18:00:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SGD on neural network weights induces a BBP phase transition that detaches signal eigenvalues from the random bulk, yielding an analytically solvable phase diagram for trainability in a linear teacher-student model.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.20299","ref_index":159,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Statistical Properties of Training & Generalization","primary_cat":"stat.ML","submitted_at":"2026-06-18T14:35:53+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":1.0,"formal_verification":"none","one_line_summary":"Review of neural scaling laws and their relation to constraints and inductive biases when applying machine learning to physics problems.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.08167","ref_index":3,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Explaining Data Mixing Scaling Laws","primary_cat":"cs.LG","submitted_at":"2026-06-06T13:31:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A framework using capacity competition and noise reduction under an overlapping-skills assumption explains multi-domain loss behaviors and extrapolates optimal mixtures to large scales from small-scale fits with fewer parameters.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.01292","ref_index":1,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"What Makes a Strong Model? A Unified Spectral Analysis of Knowledge Transfer over High-dimensional Linear Regression","primary_cat":"cs.LG","submitted_at":"2026-05-31T15:24:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Unified spectral analysis shows knowledge transfer efficacy arises from spectral horizon expansion in KD and spectral denoising in W2S, governed by implicit regularization and heterogeneous spectral learning speeds.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.29548","ref_index":134,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention","primary_cat":"cs.LG","submitted_at":"2026-05-28T08:02:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Larger models succeed on rare and complex tasks by reducing gradient interference from common tasks, allowing rare-task features to accumulate, as shown via synthetic task mixtures and OLMo pretraining from 4M to 4B parameters.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.24316","ref_index":1,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"From One-Pass SGD to Data Reuse: Mini-Batch Scaling Laws in Sketched Linear Regression","primary_cat":"cs.LG","submitted_at":"2026-05-23T00:48:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Derives mini-batch scaling laws for sketched linear regression, with shared approximation terms and protocol-specific variance/fluctuation scalings under power-law spectrum and source condition.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.23591","ref_index":16,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Asymmetric Scaling Laws from Sparse Features","primary_cat":"stat.ML","submitted_at":"2026-05-22T13:00:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A sparse-activation model predicts double-descent loss with distinct under- and over-parameterized scaling exponents set by sparsity, plus a compute-optimal frontier favoring dataset growth.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14567","ref_index":42,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Scaling Laws from Sequential Feature Recovery: A Solvable Hierarchical Model","primary_cat":"stat.ML","submitted_at":"2026-05-14T08:37:28+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A solvable hierarchical model with power-law feature strengths yields explicit power-law scaling of prediction error through sequential recovery of latent directions by a layer-wise spectral algorithm.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.07870","ref_index":65,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Spectral Dynamics in Deep Networks: Feature Learning, Outlier Escape, and Learning Rate Transfer","primary_cat":"cond-mat.dis-nn","submitted_at":"2026-05-08T15:28:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A two-level DMFT tracks bulk and outlier spectral dynamics in wide networks, predicting width-consistent outlier growth and hyperparameter transfer under muP scaling for deep linear nets while noting bulk restructuring for large-output tasks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[63] Clémentine CJ Dominé, Nicolas Anguita, Alexandra M Proca, Lukas Braun, Daniel Kunin, Pedro AM Mediano, and Andrew M Saxe. From lazy to rich: Exact learning dynamics in deep linear networks.arXiv preprint arXiv:2409.14623, 2024. [64] Blake Bordelon and Cengiz Pehlevan. Deep linear network training dynamics from random initialization: Data, width, depth, and hyperparameter transfer.arXiv preprint arXiv:2502.02531, 2025. [65] Alexander B Atanasov, Jacob A Zavatone-Veth, and Cengiz Pehlevan. Scaling and renormaliza- tion in high-dimensional regression.arXiv preprint arXiv:2405.00592, 2024. [66] Greg Yang, Edward J Hu, Igor Babuschkin, Szymon Sidor, Xiaodong Liu, David Farhi, Nick Ryder, Jakub Pachocki, Weizhu Chen, and Jianfeng Gao. Tuning large neural networks via zero-shot hyperparameter transfer."},{"citing_arxiv_id":"2604.24037","ref_index":85,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"A Limit Theory of Foundation Models: A Mathematical Approach to Understanding Emergent Intelligence and Scaling Laws","primary_cat":"cs.LG","submitted_at":"2026-04-27T04:43:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"Formalizes emergent intelligence in foundation models as the limit of E(N,P,K) as N,P,K approach infinity, proves existence conditions via nonlinear Lipschitz operators, and derives scaling laws from covering numbers.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[83] Maloney, A., Roberts, D.A., Sully, J.: A solvable model of neural scaling laws. arXiv preprint arXiv:2210.16859 (2022) [84] Defilippis, L., Loureiro, B., Misiakiewicz, T.: Dimension-free deterministic equiv- alents and scaling laws for random feature regression. In: Advances in Neural Information Processing Systems, vol. 37, pp. 104630-104693 (2024) [85] Atanasov, A., Zavatone-Veth, J.A., Pehlevan, C.: Scaling and renormalization in high-dimensional regression. arXiv preprint arXiv:2405.00592 (2024) [86] Bordelon, B., Atanasov, A., Pehlevan, C.: A dynamical model of neural scaling laws. In: International Conference on Machine Learning, pp. 4345-4382 (2024). PMLR [87] Lin, L., Wu, J., Kakade, S.M., Bartlett, P."},{"citing_arxiv_id":"2604.21691","ref_index":89,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"There Will Be a Scientific Theory of Deep Learning","primary_cat":"stat.ML","submitted_at":"2026-04-23T13:58:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":2.0,"formal_verification":"none","one_line_summary":"A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universal behaviors.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.18450","ref_index":16,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Random Matrix Theory of Early-Stopped Gradient Flow: A Transient BBP Scenario","primary_cat":"stat.ML","submitted_at":"2026-04-20T16:05:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"In an anisotropic random-matrix model of gradient flow, the teacher signal produces a transient BBP transition where the outlier eigenvalue emerges only in an intermediate time window before overfitting.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.17202","ref_index":9,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Double Descent in Quantum Kernel Ridge Regression","primary_cat":"quant-ph","submitted_at":"2026-04-19T02:09:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Quantum kernel ridge regression shows double descent in test risk, with the interpolation peak suppressible by regularization, via random matrix theory asymptotics in the high-dimensional limit.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2512.13883","ref_index":31,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Renormalization group for spectral collapse in random matrices with power-law variance profiles","primary_cat":"cond-mat.stat-mech","submitted_at":"2025-12-15T20:36:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A renormalization group scheme with running normalization collapses eigenvalue spectra of Wigner and Wishart matrices modified by power-law variance profiles, confirmed via fixed-point equations and simulations.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2502.05074","ref_index":29,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Two-Point Deterministic Equivalence for Stochastic Gradient Dynamics in Linear Models","primary_cat":"cond-mat.dis-nn","submitted_at":"2025-02-07T16:45:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Derives a novel two-point deterministic equivalence for random matrix resolvents to obtain unified asymptotics for SGD-trained linear regression, kernel regression, and random feature models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}