Factor Augmented High-Dimensional SGD

Shubo Li; Xiufan Yu; Yuefeng Han

arxiv: 2605.19291 · v1 · pith:ZBCOYK3Lnew · submitted 2026-05-19 · 📊 stat.ML · cs.LG· math.ST· stat.TH

Factor Augmented High-Dimensional SGD

Shubo Li , Yuefeng Han , Xiufan Yu This is my paper

Pith reviewed 2026-05-20 03:27 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.STstat.TH

keywords factor-augmented SGDhigh-dimensional optimizationstreaming datalatent factorsconvergence analysisstochastic gradient descentmoment convergence

0 comments

The pith

Factor-Augmented SGD incorporates latent factor estimation error directly into streaming optimization analysis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Factor-Augmented SGD (FSGD) to optimize high-dimensional models using latent factor representations from streaming data alone. Standard approaches require offline dimension reduction and full data storage, but FSGD updates factors and parameters on the fly. It develops a new convergence theory that folds the error from estimating the latent factors into the SGD moment bounds under decaying steps and mini-batches. A sympathetic reader would care because this removes a practical barrier for applying SGD to massive, high-dimensional streaming problems where hidden low-rank structure is common.

Core claim

We propose Factor-Augmented SGD (FSGD), a new optimization method that leverages latent factor representations in high-dimensional learning tasks. Unlike standard two-stage dimension reduction approaches that rely on offline representation learning and full data storage, a key novelty of FSGD is that it operates purely on streaming data, making it scalable to large-scale and high-dimensional problems. Furthermore, we establish the first theoretical framework that explicitly incorporates latent factor estimation error into the analysis of SGD, and provide moment convergence in ℓ^s norm under decaying step sizes and mini-batch updates. Our results provide a new foundation for employing SGDreli

What carries the argument

Factor-Augmented SGD (FSGD), which augments the SGD update with an online estimate of the latent factor structure to reduce effective dimension while propagating estimation error into the convergence bound.

If this is right

Convergence is guaranteed in moments of order s for the parameter iterates when step sizes decay appropriately.
The method works with mini-batch updates without requiring full dataset access.
Latent factor estimation error is treated as an explicit additive term in the error bound rather than hidden in assumptions.
Scalability follows from processing data in a single pass without offline precomputation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This framework might be adapted to other stochastic optimizers such as Adam or RMSprop by similar error propagation.
Connections to online PCA or streaming matrix factorization could yield sharper bounds when the factor model is estimated jointly.
Practical implementations could test whether the added factor step improves wall-clock performance on real high-dimensional datasets like image or text streams.

Load-bearing premise

The data admits a low-dimensional latent factor structure whose estimation error can be bounded and fed directly into the SGD convergence analysis without additional assumptions on the factor loading matrix or the streaming arrival process.

What would settle it

Generate synthetic data with a known low-rank factor model, run FSGD while varying the factor estimation accuracy, and check whether the observed moment errors match the rates predicted by the theorem when estimation error increases.

Figures

Figures reproduced from arXiv: 2605.19291 by Shubo Li, Xiufan Yu, Yuefeng Han.

**Figure 2.** Figure 2: Evaluation of Empirical Performance of FSGD [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

read the original abstract

Stochastic gradient descent (SGD) is a fundamental optimization algorithm widely used in modern machine learning. In this paper, we propose Factor-Augmented SGD (FSGD), a new optimization method that leverages latent factor representations in high-dimensional learning tasks. Unlike standard two-stage dimension reduction approaches that rely on offline representation learning and full data storage, a key novelty of FSGD is that it operates purely on streaming data, making it scalable to large-scale and high-dimensional problems. Furthermore, we establish the first theoretical framework that explicitly incorporates latent factor estimation error into the analysis of SGD, and provide moment convergence in $\ell^s$ norm under decaying step sizes and mini-batch updates. Our results provide a new foundation for employing SGD reliably and scalably in high-dimensional machine learning systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FSGD adds online factor estimation to high-dim SGD with explicit error terms in the l^s convergence bounds, but the clean separation of that error from the streaming process is the assumption to watch.

read the letter

The main point is a streaming version of SGD that estimates latent factors on the fly and builds moment convergence results that include the factor estimation error rather than treating dimension reduction as error-free. The algorithm runs with mini-batches and decaying steps entirely on streaming data, skipping the usual offline two-stage setup that requires storing everything first. That practical angle is useful for large-scale high-dimensional problems where full data access is impossible. The paper does a solid job stating the algorithm clearly and laying out a proof strategy that adds the factor error as an extra term inside the standard SGD analysis. The derivations look formally grounded and the assumptions are written out so a reader can see where the bounds come from. The soft spot is exactly where the stress-test note flags it. The analysis needs the factor estimation error to admit a uniform bound that stays additive and does not depend on the particular streaming arrival sequence or extra conditions on the loading matrix. Because the online factor estimator and the SGD updates draw from the same data stream, that separation is not automatic. The paper derives the bounds under the low-rank factor model, but the coupling could make the error term depend on the iterates in ways the current argument does not fully control. If that happens the claimed rates do not follow directly from the stated hypotheses. This is aimed at researchers in stochastic optimization and high-dimensional statistics who care about theoretical guarantees for combined online estimation and optimization. A reader looking for concrete ways to propagate factor-model error into SGD bounds would get value from the setup. The work shows clear thinking and honest engagement with the relevant literature, so it deserves a serious referee even though the advance is incremental rather than revolutionary. I would send it out for peer review and ask the referees to focus on whether the streaming error separation holds under the given assumptions.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Factor-Augmented SGD (FSGD), an optimization method that incorporates latent factor representations for high-dimensional learning tasks. It operates purely on streaming data without requiring offline representation learning or full data storage. The central contribution is a theoretical framework that explicitly folds latent factor estimation error into the SGD analysis, establishing moment convergence in the ℓ^s norm under decaying step sizes and mini-batch updates.

Significance. If the claimed separation between online factor estimation error and SGD iterates holds, the work would supply a new analytic foundation for reliable SGD in high-dimensional streaming regimes. The explicit incorporation of estimation error into ℓ^s moment bounds is a potentially useful technical step beyond standard two-stage approaches.

major comments (2)

[§4] §4 (Convergence Analysis): The proof that the factor estimation error remains additive (or Lipschitz) with respect to the SGD iterates and does not couple to the mini-batch gradients is load-bearing for the ℓ^s moment bound. The manuscript must supply an explicit uniform bound on this error term that holds under the stated streaming arrival process and does not invoke extra regularity on the loading matrix beyond what is already used for the SGD analysis.
[Assumption set] Assumption set (e.g., Assumption 3.2 or 4.1): The claim that the data-generating process admits a low-rank factor model whose estimation error admits a bound independent of the particular streaming realization appears to be the weakest link. If the online factor estimator shares mini-batches with the SGD updates or if the arrival process violates the moment conditions needed for the factor error bound, the stated convergence rate no longer follows directly from the hypotheses.

minor comments (2)

[§2] Notation for the factor loading matrix and the online estimator should be introduced with a clear distinction between population quantities and their streaming estimates.
[Introduction] The abstract states 'first theoretical framework'; a brief comparison paragraph in the introduction citing the closest prior works on online factor models and SGD with estimation error would improve context.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the insightful comments on our paper 'Factor Augmented High-Dimensional SGD'. The feedback helps us strengthen the presentation of the convergence analysis. We address each major comment below, indicating where revisions will be made to the manuscript.

read point-by-point responses

Referee: [§4] §4 (Convergence Analysis): The proof that the factor estimation error remains additive (or Lipschitz) with respect to the SGD iterates and does not couple to the mini-batch gradients is load-bearing for the ℓ^s moment bound. The manuscript must supply an explicit uniform bound on this error term that holds under the stated streaming arrival process and does not invoke extra regularity on the loading matrix beyond what is already used for the SGD analysis.

Authors: We appreciate this observation. In our analysis, the factor estimation error is treated as an additive perturbation in the SGD recursion. Its Lipschitz property with respect to the iterates follows directly from the bounded feature maps under the assumed factor model. To make this explicit, we will add a dedicated lemma in the revised Section 4 deriving a uniform bound on the factor estimation error that holds for the streaming arrival process. The bound uses only the existing moment conditions and bounded operator norm of the loading matrix from the SGD analysis, without extra regularity assumptions. The proof relies on martingale concentration inequalities applied to the online estimator. revision: yes
Referee: [Assumption set] Assumption set (e.g., Assumption 3.2 or 4.1): The claim that the data-generating process admits a low-rank factor model whose estimation error admits a bound independent of the particular streaming realization appears to be the weakest link. If the online factor estimator shares mini-batches with the SGD updates or if the arrival process violates the moment conditions needed for the factor error bound, the stated convergence rate no longer follows directly from the hypotheses.

Authors: The low-rank factor model is a core data-generating assumption, and the estimation error bound is derived uniformly over realizations via concentration that averages over the probability space under the weak dependence of arrivals. The online factor estimator operates on the same streaming data (including possible mini-batch overlap for efficiency), but the analysis decouples the errors: the factor error enters as an additive term whose moments are controlled independently of the current SGD parameter due to the linear factor structure. We will revise the assumption section to include an explicit remark on this decoupling and on mini-batch sharing. The stated rate holds precisely when the moment conditions are met, as hypothesized; violations fall outside the theorem. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation builds from explicit assumptions to convergence bounds without reduction to inputs or self-citations.

full rationale

The paper derives moment convergence for FSGD by starting from standard SGD analysis under decaying steps and mini-batches, then adding an explicit additive term for latent factor estimation error. This error is bounded via the low-rank structure assumption and inserted into the ℓ^s-norm bounds; the steps are forward derivations from stated hypotheses rather than any fitted parameter renamed as prediction or any self-citation chain that forces the result. The framework is self-contained once the factor-error bound is granted as an independent modeling choice, with no evidence that any central equation equals its input by construction or that uniqueness is smuggled via prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no equations or proofs, so no free parameters, axioms, or invented entities can be identified.

pith-pipeline@v0.9.0 · 5658 in / 1105 out tokens · 44320 ms · 2026-05-20T03:27:03.870634+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

92 extracted references · 92 canonical work pages · 3 internal anchors

[1]

The Thirty-Ninth Annual Conference on Neural Information Processing Systems , year=

Statistical Guarantees for High-Dimensional Stochastic Gradient Descent , author=. The Thirty-Ninth Annual Conference on Neural Information Processing Systems , year=

work page
[2]

Streaming

Huang, De and Niles-Weed, Jonathan and Ward, Rachel , booktitle=. Streaming. 2021 , organization=

work page 2021
[3]

Liu, Xiyang and Kong, Weihao and Jain, Prateek and Oh, Sewoong , booktitle =

work page
[4]

SIAM Journal on Matrix Analysis and Applications , volume=

New perturbation bounds for the unitary polar factor , author=. SIAM Journal on Matrix Analysis and Applications , volume=. 1995 , publisher=

work page 1995
[5]

2018 , journal=

Rate-optimal perturbation bounds for singular subspaces with applications to high-dimensional statistics , author=. 2018 , journal=

work page 2018
[6]

SIAM Journal on Numerical Analysis , volume=

Perturbation bounds for the QR factorization of a matrix , author=. SIAM Journal on Numerical Analysis , volume=. 1977 , publisher=

work page 1977
[7]

2018 , publisher=

High-dimensional probability: An introduction with applications in data science , author=. 2018 , publisher=

work page 2018
[8]

2012 , journal=

Statistical analysis of factor models of high dimension , author=. 2012 , journal=

work page 2012
[9]

Journal of the American Statistical Association , volume=

Factor augmented sparse throughput deep relu neural networks for high dimensional regression , author=. Journal of the American Statistical Association , volume=. 2024 , publisher=

work page 2024
[10]

The Annals of Mathematical Statistics , volume=

A stochastic approximation method , author=. The Annals of Mathematical Statistics , volume=. 1951 , publisher=

work page 1951
[11]

The Annals of Statistics , volume=

Stochastic approximation , author=. The Annals of Statistics , volume=. 2003 , publisher=

work page 2003
[12]

Wu, Lei and Ma, Chao , booktitle=. How

work page
[13]

Advances in Neural Information Processing Systems , volume=

Train longer, generalize better: closing the generalization gap in large batch training of neural networks , author=. Advances in Neural Information Processing Systems , volume=

work page
[14]

arXiv preprint arXiv:2103.00065 , year=

Gradient descent on neural networks typically occurs at the edge of stability , author=. arXiv preprint arXiv:2103.00065 , year=

work page arXiv
[15]

Econometrica , volume=

Determining the number of factors in approximate factor models , author=. Econometrica , volume=. 2002 , publisher=

work page 2002
[16]

The Annals of Statistics , pages=

Factor modeling for high-dimensional time series: inference for the number of factors , author=. The Annals of Statistics , pages=. 2012 , publisher=

work page 2012
[17]

Journal of Mathematical Biology , volume=

Simplified neuron model as a principal component analyzer , author=. Journal of Mathematical Biology , volume=. 1982 , publisher=

work page 1982
[18]

Journal of Mathematical Analysis and Applications , volume=

On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix , author=. Journal of Mathematical Analysis and Applications , volume=. 1985 , publisher=

work page 1985
[19]

Journal of the American Statistical Association , volume=

Forecasting using principal components from a large number of predictors , author=. Journal of the American Statistical Association , volume=. 2002 , publisher=

work page 2002
[20]

The Journal of Machine Learning Research , volume=

Optimal distributed online prediction using mini-batches , author=. The Journal of Machine Learning Research , volume=. 2012 , publisher=

work page 2012
[21]

Proceedings of the 29th International Coference on International Conference on Machine Learning , pages=

Making gradient descent optimal for strongly convex stochastic optimization , author=. Proceedings of the 29th International Coference on International Conference on Machine Learning , pages=

work page
[22]

A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method

A simpler approach to obtaining an O (1/t) convergence rate for the projected stochastic subgradient method , author=. arXiv preprint arXiv:1212.2002 , year=

work page internal anchor Pith review Pith/arXiv arXiv 2002
[23]

Advances in Neural Information Processing Systems , volume=

Better mini-batch algorithms via accelerated gradient methods , author=. Advances in Neural Information Processing Systems , volume=

work page
[24]

2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton) , pages=

Distributed stochastic optimization and learning , author=. 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton) , pages=. 2014 , organization=

work page 2014
[25]

Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining , pages=

Efficient mini-batch training for stochastic optimization , author=. Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining , pages=

work page
[26]

Advances in Neural Information Processing Systems , volume=

Tight high probability bounds for linear stochastic approximation with fixed stepsize , author=. Advances in Neural Information Processing Systems , volume=

work page
[27]

Journal of Machine Learning Research , volume=

Online stochastic gradient descent on non-convex losses from high-dimensional inference , author=. Journal of Machine Learning Research , volume=

work page
[28]

Nonparametric regression using deep neural networks with

Schmidt-Hieber, Johannes , journal=. Nonparametric regression using deep neural networks with

work page
[29]

Journal of Machine Learning Research , volume=

Community detection and stochastic block models: recent developments , author=. Journal of Machine Learning Research , volume=

work page
[30]

Nature genetics , volume=

Principal component analysis of genetic data , author=. Nature genetics , volume=. 2008 , publisher=

work page 2008
[31]

PLoS medicine , volume=

UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age , author=. PLoS medicine , volume=. 2015 , publisher=

work page 2015
[32]

Streaming

Jain, Prateek and Jin, Chi and Kakade, Sham M and Netrapalli, Praneeth and Sidford, Aaron , booktitle=. Streaming. 2016 , organization=

work page 2016
[33]

Remote Sensing , volume=

Implementation of the principal component analysis onto high-performance computer facilities for hyperspectral dimensionality reduction: Results and comparisons , author=. Remote Sensing , volume=. 2018 , publisher=

work page 2018
[34]

Nature communications , volume=

Efficient and accurate frailty model approach for genome-wide survival association analysis in large-scale biobanks , author=. Nature communications , volume=. 2022 , publisher=

work page 2022
[35]

IEEE signal processing magazine , volume=

Federated learning: Challenges, methods, and future directions , author=. IEEE signal processing magazine , volume=. 2020 , publisher=

work page 2020
[36]

Artificial intelligence and statistics , pages=

Communication-efficient learning of deep networks from decentralized data , author=. Artificial intelligence and statistics , pages=. 2017 , organization=

work page 2017
[37]

Clinical and translational science , volume=

Principles of human subjects protections applied in an opt-out, de-identified biobank , author=. Clinical and translational science , volume=. 2010 , publisher=

work page 2010
[38]

Cell Reports Medicine , volume=

Federated machine learning in healthcare: A systematic review on clinical applications and technical architecture , author=. Cell Reports Medicine , volume=. 2024 , publisher=

work page 2024
[39]

Federated Learning for Mobile Keyboard Prediction

Federated learning for mobile keyboard prediction , author=. arXiv preprint arXiv:1811.03604 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[40]

2021 , organization=

Kasiviswanathan, Shiva Prasad , booktitle=. 2021 , organization=

work page 2021
[41]

Stochastic Subspace Descent

Stochastic subspace descent , author=. arXiv preprint arXiv:1904.01145 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1904
[42]

arXiv preprint arXiv:2410.11227 , year=

Guarantees for nonlinear representation learning: Non-identical covariates, dependent data, fewer samples , author=. arXiv preprint arXiv:2410.11227 , year=

work page arXiv
[43]

International Conference on Artificial Intelligence and Statistics , pages=

Freeze then train: Towards provable representation learning under spurious correlations and feature noise , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2023 , organization=

work page 2023
[44]

Information and Inference: A Journal of the IMA , volume=

Nonparametric regression on low-dimensional manifolds using deep ReLU networks: Function approximation and statistical recovery , author=. Information and Inference: A Journal of the IMA , volume=. 2022 , publisher=

work page 2022
[45]

2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS) , pages=

First efficient convergence for streaming k-pca: a global, gap-free, and near-optimal rate , author=. 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS) , pages=. 2017 , organization=

work page 2017
[46]

IEEE Transactions on Information Theory , year=

Theoretical guarantees for sparse principal component analysis based on the elastic net , author=. IEEE Transactions on Information Theory , year=

work page
[47]

SIAM Journal on Control and Optimization , volume=

Acceleration of stochastic approximation by averaging , author=. SIAM Journal on Control and Optimization , volume=. 1992 , publisher=

work page 1992
[48]

2020 , journal=

Bridging the gap between constant step size stochastic gradient descent and Markov chains , author=. 2020 , journal=

work page 2020
[49]

2019 , publisher=

High-dimensional statistics: A non-asymptotic viewpoint , author=. 2019 , publisher=

work page 2019
[50]

Journal of the American Statistical Association , volume=

Variable selection via nonconcave penalized likelihood and its oracle properties , author=. Journal of the American Statistical Association , volume=. 2001 , publisher=

work page 2001
[51]

The Annals of Applied Probability , volume=

Concentration of contractive stochastic approximation: Additive and multiplicative noise , author=. The Annals of Applied Probability , volume=. 2025 , publisher=

work page 2025
[52]

SIAM Review , volume=

Optimization methods for large-scale machine learning , author=. SIAM Review , volume=. 2018 , publisher=

work page 2018
[53]

Econometrica , volume=

Inferential theory for factor models of large dimensions , author=. Econometrica , volume=. 2003 , publisher=

work page 2003
[54]

Vogels, Thijs and Karimireddy, Sai Praneeth and Jaggi, Martin , booktitle=. Power

work page
[55]

Handbook of convergence theorems for (stochastic) gradient methods,

Handbook of convergence theorems for (stochastic) gradient methods , author=. arXiv preprint arXiv:2301.11235 , year=

work page arXiv
[56]

Galore: Memory-efficient

Zhao, Jiawei and Zhang, Zhenyu and Chen, Beidi and Wang, Zhangyang and Anandkumar, Anima and Tian, Yuandong , journal=. Galore: Memory-efficient

work page
[57]

A useful variant of the

Yu, Yi and Wang, Tengyao and Samworth, Richard J , journal=. A useful variant of the. 2015 , publisher=

work page 2015
[58]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Factor Augmented Tensor-on-Tensor Neural Networks , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[59]

arXiv preprint arXiv:2505.20536 , year=

Covariate-Adjusted Deep Causal Learning for Heterogeneous Panel Data Models , author=. arXiv preprint arXiv:2505.20536 , year=

work page arXiv
[60]

arXiv preprint arXiv:2508.06548 , year=

Factor Augmented Supervised Learning with Text Embeddings , author=. arXiv preprint arXiv:2508.06548 , year=

work page arXiv
[61]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Supervised dynamic dimension reduction with deep neural network , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[62]

Journal of Econometrics , volume=

Sufficient forecasting using factor models , author=. Journal of Econometrics , volume=. 2017 , publisher=

work page 2017
[63]

Biometrika , volume=

Inverse moment methods for sufficient forecasting using high-dimensional predictors , author=. Biometrika , volume=. 2022 , publisher=

work page 2022
[64]

Journal of Business & Economic Statistics , volume=

Nonparametric estimation and conformal inference of the sufficient forecasting with a diverging number of factors , author=. Journal of Business & Economic Statistics , volume=. 2022 , publisher=

work page 2022
[65]

Power enhancement for testing multi-factor asset pricing models via

Yu, Xiufan and Yao, Jiawei and Xue, Lingzhou , journal=. Power enhancement for testing multi-factor asset pricing models via. 2024 , publisher=

work page 2024
[66]

The Annals of Statistics , volume=

Tensor factor model estimation by iterative projection , author=. The Annals of Statistics , volume=. 2024 , publisher=

work page 2024
[67]

2024 , publisher=

Han, Yuefeng and Yang, Dan and Zhang, Cun-Hui and Chen, Rong , journal=. 2024 , publisher=

work page 2024
[68]

Journal of the American Statistical Association , volume=

Simultaneous decorrelation of matrix time series , author=. Journal of the American Statistical Association , volume=. 2024 , publisher=

work page 2024
[69]

IEEE Transactions on Information Theory , volume=

Tensor principal component analysis in high dimensional CP models , author=. IEEE Transactions on Information Theory , volume=. 2022 , publisher=

work page 2022
[70]

arXiv preprint arXiv:2407.05624 , year=

Dynamic matrix factor models for high dimensional time series , author=. arXiv preprint arXiv:2407.05624 , year=

work page arXiv
[71]

Journal of Econometrics , volume=

Diffusion index forecasting with tensor data , author=. Journal of Econometrics , volume=. 2026 , publisher=

work page 2026
[72]

Journal of Econometrics , volume=

Estimation and inference for CP tensor factor models , author=. Journal of Econometrics , volume=. 2026 , publisher=

work page 2026
[73]

Journal of the Royal Statistical Society

Factor analysis as a statistical method , author=. Journal of the Royal Statistical Society. Series D (The Statistician) , volume=. 1962 , publisher=

work page 1962
[74]

Foundations and Trends

Large dimensional factor analysis , author=. Foundations and Trends. 2008 , publisher=

work page 2008
[75]

Journal of Business & Economic Statistics , volume=

Macroeconomic forecasting using diffusion indexes , author=. Journal of Business & Economic Statistics , volume=. 2002 , publisher=

work page 2002
[76]

Journal of the American Statistical Association , volume=

Prediction by supervised principal components , author=. Journal of the American Statistical Association , volume=. 2006 , publisher=

work page 2006
[77]

Advances in Neural Information Processing Systems , volume=

Atomo: Communication-efficient learning via atomic sparsification , author=. Advances in Neural Information Processing Systems , volume=

work page
[78]

Advances in Neural Information Processing Systems , volume=

Practical low-rank communication compression in decentralized deep learning , author=. Advances in Neural Information Processing Systems , volume=

work page
[79]

Zhengbo Wang and Jian Liang and Ran He and Zilei Wang and Tieniu Tan , booktitle=. Lo. 2025 , url=

work page 2025
[80]

2021 , organization=

Paquette, Courtney and Lee, Kiwon and Pedregosa, Fabian and Paquette, Elliot , booktitle=. 2021 , organization=

work page 2021

Showing first 80 references.

[1] [1]

The Thirty-Ninth Annual Conference on Neural Information Processing Systems , year=

Statistical Guarantees for High-Dimensional Stochastic Gradient Descent , author=. The Thirty-Ninth Annual Conference on Neural Information Processing Systems , year=

work page

[2] [2]

Streaming

Huang, De and Niles-Weed, Jonathan and Ward, Rachel , booktitle=. Streaming. 2021 , organization=

work page 2021

[3] [3]

Liu, Xiyang and Kong, Weihao and Jain, Prateek and Oh, Sewoong , booktitle =

work page

[4] [4]

SIAM Journal on Matrix Analysis and Applications , volume=

New perturbation bounds for the unitary polar factor , author=. SIAM Journal on Matrix Analysis and Applications , volume=. 1995 , publisher=

work page 1995

[5] [5]

2018 , journal=

Rate-optimal perturbation bounds for singular subspaces with applications to high-dimensional statistics , author=. 2018 , journal=

work page 2018

[6] [6]

SIAM Journal on Numerical Analysis , volume=

Perturbation bounds for the QR factorization of a matrix , author=. SIAM Journal on Numerical Analysis , volume=. 1977 , publisher=

work page 1977

[7] [7]

2018 , publisher=

High-dimensional probability: An introduction with applications in data science , author=. 2018 , publisher=

work page 2018

[8] [8]

2012 , journal=

Statistical analysis of factor models of high dimension , author=. 2012 , journal=

work page 2012

[9] [9]

Journal of the American Statistical Association , volume=

Factor augmented sparse throughput deep relu neural networks for high dimensional regression , author=. Journal of the American Statistical Association , volume=. 2024 , publisher=

work page 2024

[10] [10]

The Annals of Mathematical Statistics , volume=

A stochastic approximation method , author=. The Annals of Mathematical Statistics , volume=. 1951 , publisher=

work page 1951

[11] [11]

The Annals of Statistics , volume=

Stochastic approximation , author=. The Annals of Statistics , volume=. 2003 , publisher=

work page 2003

[12] [12]

Wu, Lei and Ma, Chao , booktitle=. How

work page

[13] [13]

Advances in Neural Information Processing Systems , volume=

Train longer, generalize better: closing the generalization gap in large batch training of neural networks , author=. Advances in Neural Information Processing Systems , volume=

work page

[14] [14]

arXiv preprint arXiv:2103.00065 , year=

Gradient descent on neural networks typically occurs at the edge of stability , author=. arXiv preprint arXiv:2103.00065 , year=

work page arXiv

[15] [15]

Econometrica , volume=

Determining the number of factors in approximate factor models , author=. Econometrica , volume=. 2002 , publisher=

work page 2002

[16] [16]

The Annals of Statistics , pages=

Factor modeling for high-dimensional time series: inference for the number of factors , author=. The Annals of Statistics , pages=. 2012 , publisher=

work page 2012

[17] [17]

Journal of Mathematical Biology , volume=

Simplified neuron model as a principal component analyzer , author=. Journal of Mathematical Biology , volume=. 1982 , publisher=

work page 1982

[18] [18]

Journal of Mathematical Analysis and Applications , volume=

On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix , author=. Journal of Mathematical Analysis and Applications , volume=. 1985 , publisher=

work page 1985

[19] [19]

Journal of the American Statistical Association , volume=

Forecasting using principal components from a large number of predictors , author=. Journal of the American Statistical Association , volume=. 2002 , publisher=

work page 2002

[20] [20]

The Journal of Machine Learning Research , volume=

Optimal distributed online prediction using mini-batches , author=. The Journal of Machine Learning Research , volume=. 2012 , publisher=

work page 2012

[21] [21]

Proceedings of the 29th International Coference on International Conference on Machine Learning , pages=

Making gradient descent optimal for strongly convex stochastic optimization , author=. Proceedings of the 29th International Coference on International Conference on Machine Learning , pages=

work page

[22] [22]

A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method

A simpler approach to obtaining an O (1/t) convergence rate for the projected stochastic subgradient method , author=. arXiv preprint arXiv:1212.2002 , year=

work page internal anchor Pith review Pith/arXiv arXiv 2002

[23] [23]

Advances in Neural Information Processing Systems , volume=

Better mini-batch algorithms via accelerated gradient methods , author=. Advances in Neural Information Processing Systems , volume=

work page

[24] [24]

2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton) , pages=

Distributed stochastic optimization and learning , author=. 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton) , pages=. 2014 , organization=

work page 2014

[25] [25]

Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining , pages=

Efficient mini-batch training for stochastic optimization , author=. Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining , pages=

work page

[26] [26]

Advances in Neural Information Processing Systems , volume=

Tight high probability bounds for linear stochastic approximation with fixed stepsize , author=. Advances in Neural Information Processing Systems , volume=

work page

[27] [27]

Journal of Machine Learning Research , volume=

Online stochastic gradient descent on non-convex losses from high-dimensional inference , author=. Journal of Machine Learning Research , volume=

work page

[28] [28]

Nonparametric regression using deep neural networks with

Schmidt-Hieber, Johannes , journal=. Nonparametric regression using deep neural networks with

work page

[29] [29]

Journal of Machine Learning Research , volume=

Community detection and stochastic block models: recent developments , author=. Journal of Machine Learning Research , volume=

work page

[30] [30]

Nature genetics , volume=

Principal component analysis of genetic data , author=. Nature genetics , volume=. 2008 , publisher=

work page 2008

[31] [31]

PLoS medicine , volume=

UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age , author=. PLoS medicine , volume=. 2015 , publisher=

work page 2015

[32] [32]

Streaming

Jain, Prateek and Jin, Chi and Kakade, Sham M and Netrapalli, Praneeth and Sidford, Aaron , booktitle=. Streaming. 2016 , organization=

work page 2016

[33] [33]

Remote Sensing , volume=

Implementation of the principal component analysis onto high-performance computer facilities for hyperspectral dimensionality reduction: Results and comparisons , author=. Remote Sensing , volume=. 2018 , publisher=

work page 2018

[34] [34]

Nature communications , volume=

Efficient and accurate frailty model approach for genome-wide survival association analysis in large-scale biobanks , author=. Nature communications , volume=. 2022 , publisher=

work page 2022

[35] [35]

IEEE signal processing magazine , volume=

Federated learning: Challenges, methods, and future directions , author=. IEEE signal processing magazine , volume=. 2020 , publisher=

work page 2020

[36] [36]

Artificial intelligence and statistics , pages=

Communication-efficient learning of deep networks from decentralized data , author=. Artificial intelligence and statistics , pages=. 2017 , organization=

work page 2017

[37] [37]

Clinical and translational science , volume=

Principles of human subjects protections applied in an opt-out, de-identified biobank , author=. Clinical and translational science , volume=. 2010 , publisher=

work page 2010

[38] [38]

Cell Reports Medicine , volume=

Federated machine learning in healthcare: A systematic review on clinical applications and technical architecture , author=. Cell Reports Medicine , volume=. 2024 , publisher=

work page 2024

[39] [39]

Federated Learning for Mobile Keyboard Prediction

Federated learning for mobile keyboard prediction , author=. arXiv preprint arXiv:1811.03604 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[40] [40]

2021 , organization=

Kasiviswanathan, Shiva Prasad , booktitle=. 2021 , organization=

work page 2021

[41] [41]

Stochastic Subspace Descent

Stochastic subspace descent , author=. arXiv preprint arXiv:1904.01145 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1904

[42] [42]

arXiv preprint arXiv:2410.11227 , year=

Guarantees for nonlinear representation learning: Non-identical covariates, dependent data, fewer samples , author=. arXiv preprint arXiv:2410.11227 , year=

work page arXiv

[43] [43]

International Conference on Artificial Intelligence and Statistics , pages=

Freeze then train: Towards provable representation learning under spurious correlations and feature noise , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2023 , organization=

work page 2023

[44] [44]

Information and Inference: A Journal of the IMA , volume=

Nonparametric regression on low-dimensional manifolds using deep ReLU networks: Function approximation and statistical recovery , author=. Information and Inference: A Journal of the IMA , volume=. 2022 , publisher=

work page 2022

[45] [45]

2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS) , pages=

First efficient convergence for streaming k-pca: a global, gap-free, and near-optimal rate , author=. 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS) , pages=. 2017 , organization=

work page 2017

[46] [46]

IEEE Transactions on Information Theory , year=

Theoretical guarantees for sparse principal component analysis based on the elastic net , author=. IEEE Transactions on Information Theory , year=

work page

[47] [47]

SIAM Journal on Control and Optimization , volume=

Acceleration of stochastic approximation by averaging , author=. SIAM Journal on Control and Optimization , volume=. 1992 , publisher=

work page 1992

[48] [48]

2020 , journal=

Bridging the gap between constant step size stochastic gradient descent and Markov chains , author=. 2020 , journal=

work page 2020

[49] [49]

2019 , publisher=

High-dimensional statistics: A non-asymptotic viewpoint , author=. 2019 , publisher=

work page 2019

[50] [50]

Journal of the American Statistical Association , volume=

Variable selection via nonconcave penalized likelihood and its oracle properties , author=. Journal of the American Statistical Association , volume=. 2001 , publisher=

work page 2001

[51] [51]

The Annals of Applied Probability , volume=

Concentration of contractive stochastic approximation: Additive and multiplicative noise , author=. The Annals of Applied Probability , volume=. 2025 , publisher=

work page 2025

[52] [52]

SIAM Review , volume=

Optimization methods for large-scale machine learning , author=. SIAM Review , volume=. 2018 , publisher=

work page 2018

[53] [53]

Econometrica , volume=

Inferential theory for factor models of large dimensions , author=. Econometrica , volume=. 2003 , publisher=

work page 2003

[54] [54]

Vogels, Thijs and Karimireddy, Sai Praneeth and Jaggi, Martin , booktitle=. Power

work page

[55] [55]

Handbook of convergence theorems for (stochastic) gradient methods,

Handbook of convergence theorems for (stochastic) gradient methods , author=. arXiv preprint arXiv:2301.11235 , year=

work page arXiv

[56] [56]

Galore: Memory-efficient

Zhao, Jiawei and Zhang, Zhenyu and Chen, Beidi and Wang, Zhangyang and Anandkumar, Anima and Tian, Yuandong , journal=. Galore: Memory-efficient

work page

[57] [57]

A useful variant of the

Yu, Yi and Wang, Tengyao and Samworth, Richard J , journal=. A useful variant of the. 2015 , publisher=

work page 2015

[58] [58]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Factor Augmented Tensor-on-Tensor Neural Networks , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page

[59] [59]

arXiv preprint arXiv:2505.20536 , year=

Covariate-Adjusted Deep Causal Learning for Heterogeneous Panel Data Models , author=. arXiv preprint arXiv:2505.20536 , year=

work page arXiv

[60] [60]

arXiv preprint arXiv:2508.06548 , year=

Factor Augmented Supervised Learning with Text Embeddings , author=. arXiv preprint arXiv:2508.06548 , year=

work page arXiv

[61] [61]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Supervised dynamic dimension reduction with deep neural network , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page

[62] [62]

Journal of Econometrics , volume=

Sufficient forecasting using factor models , author=. Journal of Econometrics , volume=. 2017 , publisher=

work page 2017

[63] [63]

Biometrika , volume=

Inverse moment methods for sufficient forecasting using high-dimensional predictors , author=. Biometrika , volume=. 2022 , publisher=

work page 2022

[64] [64]

Journal of Business & Economic Statistics , volume=

Nonparametric estimation and conformal inference of the sufficient forecasting with a diverging number of factors , author=. Journal of Business & Economic Statistics , volume=. 2022 , publisher=

work page 2022

[65] [65]

Power enhancement for testing multi-factor asset pricing models via

Yu, Xiufan and Yao, Jiawei and Xue, Lingzhou , journal=. Power enhancement for testing multi-factor asset pricing models via. 2024 , publisher=

work page 2024

[66] [66]

The Annals of Statistics , volume=

Tensor factor model estimation by iterative projection , author=. The Annals of Statistics , volume=. 2024 , publisher=

work page 2024

[67] [67]

2024 , publisher=

Han, Yuefeng and Yang, Dan and Zhang, Cun-Hui and Chen, Rong , journal=. 2024 , publisher=

work page 2024

[68] [68]

Journal of the American Statistical Association , volume=

Simultaneous decorrelation of matrix time series , author=. Journal of the American Statistical Association , volume=. 2024 , publisher=

work page 2024

[69] [69]

IEEE Transactions on Information Theory , volume=

Tensor principal component analysis in high dimensional CP models , author=. IEEE Transactions on Information Theory , volume=. 2022 , publisher=

work page 2022

[70] [70]

arXiv preprint arXiv:2407.05624 , year=

Dynamic matrix factor models for high dimensional time series , author=. arXiv preprint arXiv:2407.05624 , year=

work page arXiv

[71] [71]

Journal of Econometrics , volume=

Diffusion index forecasting with tensor data , author=. Journal of Econometrics , volume=. 2026 , publisher=

work page 2026

[72] [72]

Journal of Econometrics , volume=

Estimation and inference for CP tensor factor models , author=. Journal of Econometrics , volume=. 2026 , publisher=

work page 2026

[73] [73]

Journal of the Royal Statistical Society

Factor analysis as a statistical method , author=. Journal of the Royal Statistical Society. Series D (The Statistician) , volume=. 1962 , publisher=

work page 1962

[74] [74]

Foundations and Trends

Large dimensional factor analysis , author=. Foundations and Trends. 2008 , publisher=

work page 2008

[75] [75]

Journal of Business & Economic Statistics , volume=

Macroeconomic forecasting using diffusion indexes , author=. Journal of Business & Economic Statistics , volume=. 2002 , publisher=

work page 2002

[76] [76]

Journal of the American Statistical Association , volume=

Prediction by supervised principal components , author=. Journal of the American Statistical Association , volume=. 2006 , publisher=

work page 2006

[77] [77]

Advances in Neural Information Processing Systems , volume=

Atomo: Communication-efficient learning via atomic sparsification , author=. Advances in Neural Information Processing Systems , volume=

work page

[78] [78]

Advances in Neural Information Processing Systems , volume=

Practical low-rank communication compression in decentralized deep learning , author=. Advances in Neural Information Processing Systems , volume=

work page

[79] [79]

Zhengbo Wang and Jian Liang and Ran He and Zilei Wang and Tieniu Tan , booktitle=. Lo. 2025 , url=

work page 2025

[80] [80]

2021 , organization=

Paquette, Courtney and Lee, Kiwon and Pedregosa, Fabian and Paquette, Elliot , booktitle=. 2021 , organization=

work page 2021