Recognition: unknown
SMART: A Spectral Transfer Approach to Multi-Task Learning
Pith reviewed 2026-05-10 00:34 UTC · model grok-4.3
The pith
SMART transfers spectral subspaces from a source model to estimate the target coefficient matrix with near-minimax error rates in multi-task linear regression.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SMART estimates the target coefficient matrix through structured regularization that incorporates spectral information from a source study. The method assumes that the target left and right singular subspaces lie within the corresponding source subspaces and are sparsely aligned with the source singular bases. It requires only a fitted source model, develops an ADMM algorithm for the resulting nonconvex problem, and establishes general non-asymptotic error bounds together with a minimax lower bound in the noiseless-source regime that yield near-minimax Frobenius error rates up to logarithmic factors under additional regularity conditions.
What carries the argument
Structured regularization that encodes the source singular subspaces and sparse alignment with the source singular bases inside the target estimation problem.
If this is right
- Knowledge transfer occurs using only the fitted source model, without access to raw source observations.
- The method applies in regimes where source and target differ by more than a bounded difference yet share spectral structure.
- An ADMM procedure solves the nonconvex optimization in practice.
- Estimation accuracy improves and negative transfer is avoided under the stated spectral conditions.
- Near-minimax Frobenius rates hold up to logarithmic factors once regularity conditions are met.
Where Pith is reading between the lines
- The same spectral-regularization idea could extend to other matrix-variate estimation tasks that share latent factor structure.
- Only model parameters need to be shared, which supports transfer in privacy-restricted domains.
- When multiple sources are available, their singular subspaces might be combined to tighten the regularization further.
Load-bearing premise
The target's left and right singular subspaces lie within the source subspaces and are sparsely aligned with the source singular bases.
What would settle it
Generate data exactly under the spectral similarity assumption and noiseless source, apply SMART, and check whether the observed Frobenius error exceeds the derived non-asymptotic upper bound or falls short of the minimax lower bound.
Figures
read the original abstract
Multi-task learning is effective for related applications, but its performance can deteriorate when the target sample size is small. Transfer learning can borrow strength from related studies; yet, many existing methods rely on restrictive bounded-difference assumptions between the source and target models. We propose SMART, a spectral transfer method for multi-task linear regression that instead assumes spectral similarity: the target left and right singular subspaces lie within the corresponding source subspaces and are sparsely aligned with the source singular bases. Such an assumption is natural when studies share latent structures and enables transfer beyond the bounded-difference settings. SMART estimates the target coefficient matrix through structured regularization that incorporates spectral information from a source study. Importantly, it requires only a fitted source model rather than the raw source data, making it useful when data sharing is limited. Although the optimization problem is nonconvex, we develop a practical ADMM-based algorithm. We establish general, non-asymptotic error bounds and a minimax lower bound in the noiseless-source regime. Under additional regularity conditions, these results yield near-minimax Frobenius error rates up to logarithmic factors. Simulations confirm improved estimation accuracy and robustness to negative transfer, and analysis of multi-modal single-cell data demonstrates better predictive performance. The Python implementation of SMART, along with the code to reproduce all experiments in this paper, is publicly available at https://github.com/boxinz17/smart.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces SMART, a spectral transfer method for multi-task linear regression. It assumes that the target coefficient matrix has left and right singular subspaces contained in those of a source model and that the singular vectors are sparsely aligned with the source bases. The estimator is defined via a non-convex structured regularizer that incorporates fitted source spectral information (without requiring raw source data). An ADMM algorithm is proposed for optimization. The main theoretical contributions are non-asymptotic upper bounds on the Frobenius error of the estimator together with a minimax lower bound in the noiseless-source regime; under additional regularity conditions these yield near-minimax rates up to log factors. Empirical results on simulations and multi-modal single-cell data are reported, with code released publicly.
Significance. If the central claims hold, the work offers a principled relaxation of bounded-difference transfer assumptions to a spectral-containment-plus-sparse-alignment condition that is natural for shared latent structure. The requirement of only a fitted source model is practically valuable under data-sharing constraints. The derivation of both upper and lower bounds, together with the public reproducibility package, strengthens the contribution relative to purely algorithmic transfer methods.
major comments (2)
- [§3, §4] §3 (estimator definition) and §4 (ADMM): The non-asymptotic Frobenius error bounds and the near-minimax rates are stated for the exact global minimizer of the non-convex objective that encodes the subspace-containment and sparse-alignment penalties. The ADMM procedure is shown only to reach a stationary point (or a global minimizer under unverified strong-convexity conditions). No quantitative bound is supplied on the distance between the ADMM output and the global minimizer, nor on how any such distance propagates into the statistical error. Consequently the claimed rates apply only conditionally on exact optimization, which the supplied algorithm does not guarantee.
- [§3.2] §3.2 (assumption statement): The key modeling assumption that the target left and right singular subspaces lie inside the corresponding source subspaces is used to derive the structured regularizer and the subsequent error bounds. The paper does not provide a quantitative measure of how much violation of this containment (e.g., small but nonzero projection onto the orthogonal complement) would inflate the error, leaving the robustness of the theoretical guarantees to mild misspecification unaddressed.
minor comments (2)
- [§2] Notation for the source and target singular-value decompositions is introduced without an explicit table of symbols; readers must track several related but distinct matrices (e.g., source vs. target left singular vectors) across sections.
- [§5] The simulation section reports average Frobenius errors but does not include standard errors or the number of Monte Carlo repetitions in the main text (they appear only in the supplement).
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major comment point by point below, with proposed revisions where appropriate to improve the manuscript.
read point-by-point responses
-
Referee: [§3, §4] §3 (estimator definition) and §4 (ADMM): The non-asymptotic Frobenius error bounds and the near-minimax rates are stated for the exact global minimizer of the non-convex objective that encodes the subspace-containment and sparse-alignment penalties. The ADMM procedure is shown only to reach a stationary point (or a global minimizer under unverified strong-convexity conditions). No quantitative bound is supplied on the distance between the ADMM output and the global minimizer, nor on how any such distance propagates into the statistical error. Consequently the claimed rates apply only conditionally on exact optimization, which the supplied algorithm does not guarantee.
Authors: We agree that the non-asymptotic Frobenius error bounds and near-minimax rates in Section 4 are derived for the global minimizer of the non-convex objective. The ADMM procedure in Section 5 is presented as a practical algorithm that reaches a stationary point, with empirical evidence from simulations that it yields solutions with strong statistical performance. We do not supply a quantitative bound on the distance to the global minimizer or its propagation into statistical error. In the revised manuscript we will add explicit clarification in Sections 3 and 4 distinguishing the theoretical estimator from the ADMM output, and we will expand the discussion of ADMM's empirical convergence behavior. This revision will be partial, as a full non-convex optimization analysis lies beyond the paper's scope. revision: partial
-
Referee: [§3.2] §3.2 (assumption statement): The key modeling assumption that the target left and right singular subspaces lie inside the corresponding source subspaces is used to derive the structured regularizer and the subsequent error bounds. The paper does not provide a quantitative measure of how much violation of this containment (e.g., small but nonzero projection onto the orthogonal complement) would inflate the error, leaving the robustness of the theoretical guarantees to mild misspecification unaddressed.
Authors: The referee correctly notes that exact subspace containment is central to the structured regularizer and the error bounds. We have not included a quantitative sensitivity analysis for small violations of containment. Developing such bounds would require additional perturbation arguments for singular subspaces and is technically involved. In the revision we will expand Section 3.2 to state the assumption more explicitly, discuss its role, and add a qualitative remark on potential sensitivity to mild misspecification. We will also include a small simulation experiment illustrating performance under approximate containment. A full quantitative robustness theory remains outside the current scope and is noted as a direction for future work. revision: partial
Circularity Check
No load-bearing circularity; theoretical bounds derived from stated assumptions without reduction to fitted parameters
full rationale
The paper introduces a new structured regularization penalty that encodes the spectral subspace-containment and sparse-alignment assumptions directly into the objective. The non-asymptotic error bounds and minimax lower bound are stated for the exact minimizer of this objective under the given assumptions and are not shown to be equivalent to any fitted source quantity by construction. No self-citation chain is invoked to justify the core uniqueness or rate results, and the derivation chain remains self-contained once the regularization form and assumptions are accepted. The ADMM solver is presented separately as a practical heuristic without affecting the theoretical claims.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Target left and right singular subspaces lie within source subspaces and are sparsely aligned with source singular bases
Reference graph
Works this paper leans on
-
[1]
Estimating linear restrictions on regression coefficients for multivariate normal distributions
Theodore Wilbur Anderson. Estimating linear restrictions on regression coefficients for multivariate normal distributions. The Annals of Mathematical Statistics, pages 327--351, 1951
1951
-
[2]
Optimal selection of reduced rank estimators of high-dimensional matrices
Florentina Bunea, Yiyuan She, and Marten H Wegkamp. Optimal selection of reduced rank estimators of high-dimensional matrices. The Annals of Statistics, pages 1282--1309, 2011
2011
-
[3]
Florentina Bunea, Yiyuan She, and Marten H. Wegkamp. Joint variable and rank selection for parsimonious estimation of high-dimensional matrices. The Annals of Statistics, 40 0 (5): 0 2359--2388, 2012
2012
-
[4]
Sparse PCA : optimal rates and adaptive estimation
T Tony Cai, Zongming Ma, and Yihong Wu. Sparse PCA : optimal rates and adaptive estimation. The Annals of Statistics, 41 0 (237): 0 3074--3110, 2013
2013
-
[5]
Transfer learning for functional mean estimation: phase transition and adaptive algorithms
T Tony Cai, Dongwoo Kim, and Hongming Pu. Transfer learning for functional mean estimation: phase transition and adaptive algorithms. The Annals of Statistics, 52 0 (2): 0 654--678, 2024
2024
-
[6]
Multitask learning
Rich Caruana. Multitask learning. Machine Learning, 28: 0 41--75, 1997
1997
-
[7]
Reduced rank regression via adaptive nuclear norm penalization
Kun Chen, Hongbo Dong, and Kung-Sik Chan. Reduced rank regression via adaptive nuclear norm penalization. Biometrika, 100 0 (4): 0 901--920, 2013
2013
-
[8]
Sparse reduced-rank regression for simultaneous dimension reduction and variable selection
Lisha Chen and Jianhua Z Huang. Sparse reduced-rank regression for simultaneous dimension reduction and variable selection. Journal of the American Statistical Association, 107 0 (500): 0 1533--1545, 2012
2012
-
[9]
Robust angle-based transfer learning in high dimensions
Tian Gu, Yi Han, and Rui Duan. Robust angle-based transfer learning in high dimensions. Journal of the Royal Statistical Society Series B: Statistical Methodology, 87 0 (3): 0 723--745, 2025
2025
-
[10]
Representational transfer learning for matrix completion
Yong He, Zeyu Li, Dong Liu, Kangxiang Qin, and Jiahui Xie. Representational transfer learning for matrix completion. arXiv preprint arXiv:2412.06233, 2024
-
[11]
Reduced-rank regression for the multivariate linear model
Alan Julian Izenman. Reduced-rank regression for the multivariate linear model. Journal of Multivariate Analysis, 5 0 (2): 0 248--264, 1975
1975
-
[12]
A dirty model for multi-task learning
Ali Jalali, Sujay Sanghavi, Chao Ruan, and Pradeep Ravikumar. A dirty model for multi-task learning. Advances in Neural Information Processing Systems, 23, 2010
2010
-
[13]
Transfer learning for high-dimensional linear regression: prediction, estimation and minimax optimality
Sai Li, T Tony Cai, and Hongzhe Li. Transfer learning for high-dimensional linear regression: prediction, estimation and minimax optimality. Journal of the Royal Statistical Society Series B, 84 0 (1): 0 149--173, 2022
2022
-
[14]
Estimation and inference for high-dimensional generalized linear models with knowledge transfer
Sai Li, Linjun Zhang, T Tony Cai, and Hongzhe Li. Estimation and inference for high-dimensional generalized linear models with knowledge transfer. Journal of the American Statistical Association, 0 0 (0): 0 1--12, 2023
2023
-
[15]
A sandbox for prediction and integration of DNA , RNA , and proteins in single cells
Malte Luecken, Daniel Burkhardt, Robrecht Cannoodt, Christopher Lance, Aditi Agrawal, Hananeh Aliee, Ann Chen, Louise Deconinck, Angela Detweiler, Alejandro Granados, Shelly Huynh, Laura Isacco, Yang Kim, Dominik Klein, BONY DE KUMAR, Sunil Kuppasani, Heiko Lickert, Aaron McGeever, Joaquin Melgarejo, Honey Mekonen, Maurizio Morri, Michaela M\" u ller, Nor...
2021
-
[16]
Adaptive estimation in two-way sparse reduced-rank regression
Zhuang Ma, Zongming Ma, and Tingni Sun. Adaptive estimation in two-way sparse reduced-rank regression. Statistica Sinica, 30 0 (4): 0 2179--2201, 2020
2020
-
[17]
Reduced rank ridge regression and its kernel extensions
Ashin Mukherjee and Ji Zhu. Reduced rank ridge regression and its kernel extensions. Statistical Analysis and Data Mining, 4 0 (6): 0 612--622, 2011
2011
-
[18]
On the degrees of freedom of reduced-rank estimators in multivariate regression
Ashin Mukherjee, Kun Chen, Naisyin Wang, and Ji Zhu. On the degrees of freedom of reduced-rank estimators in multivariate regression. Biometrika, 102 0 (2): 0 457--477, 2015
2015
-
[19]
A survey on transfer learning
Sinno Jialin Pan and Qiang Yang. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22 0 (10): 0 1345--1359, 2009
2009
-
[20]
Transfer learning under large-scale low-rank regression models
Seyoung Park, Eun Ryung Lee, Hyunjin Kim, and Hongyu Zhao. Transfer learning under large-scale low-rank regression models. Journal of the American Statistical Association, pages 1--13, 2025
2025
-
[21]
Rank-based transfer learning for high-dimensional survival data with application to sepsis data
Nan Qiao, Haowei Jiang, and Cunjie Lin. Rank-based transfer learning for high-dimensional survival data with application to sepsis data. arXiv preprint arXiv:2504.11270, 2025
-
[22]
Y. Shen and J. Lv. Sharp concentration inequalities: phase transition and mixing of O rlicz tails with variance. arXiv preprint arXiv:2603.25934, 2026
-
[23]
Nets of Grassmann manifold and orthogonal group
Stanislaw J Szarek. Nets of Grassmann manifold and orthogonal group. In Proceedings of Research Workshop on Banach Space Theory, volume 169, page 185, 1982
1982
-
[24]
Explainable multi-task learning for multi-modality biological data analysis
Xin Tang, Jiawei Zhang, Yichun He, Xinhe Zhang, Zuwan Lin, Sebastian Partarrieu, Emma Bou Hanna, Zhaolin Ren, Hao Shen, Yuhong Yang, et al. Explainable multi-task learning for multi-modality biological data analysis. Nature Communications, 14 0 (1): 0 2546, 2023
2023
-
[25]
SOFAR : large-scale association network learning
Yoshimasa Uematsu, Yingying Fan, Kun Chen, Jinchi Lv, and Wei Lin. SOFAR : large-scale association network learning. IEEE Transactions on Information Theory, 65 0 (8): 0 4924--4939, 2019
2019
-
[26]
Innate lymphoid cells: 10 years on
Eric Vivier, David Artis, Marco Colonna, Andreas Diefenbach, James P Di Santo, G \'e rard Eberl, Shigeo Koyasu, Richard M Locksley, Andrew NJ McKenzie, Reina E Mebius, et al. Innate lymphoid cells: 10 years on. Cell, 174 0 (5): 0 1054--1066, 2018
2018
-
[27]
High-Dimensional Statistics: A Non-Asymptotic Viewpoint, volume 48
Martin J Wainwright. High-Dimensional Statistics: A Non-Asymptotic Viewpoint, volume 48. Cambridge University Press, 2019
2019
-
[28]
Dimension reduction and coefficient estimation in multivariate linear regression
Ming Yuan, Ali Ekici, Zhaosong Lu, and Renato Monteiro. Dimension reduction and coefficient estimation in multivariate linear regression. Journal of the Royal Statistical Society Series B, 69 0 (3): 0 329--346, 2007
2007
-
[29]
A survey on multi-task learning
Yu Zhang and Qiang Yang. A survey on multi-task learning. IEEE Transactions on Knowledge and Data Engineering, 34 0 (12): 0 5586--5609, 2021
2021
-
[30]
Trans-glasso: A transfer learning approach to precision matrix estimation
Boxin Zhao, Cong Ma, and Mladen Kolar. Trans-glasso: A transfer learning approach to precision matrix estimation. Journal of the American Statistical Association, pages 1--21, 2025
2025
-
[31]
Pymanopt: a P ython toolbox for optimization on manifolds using automatic differentiation
James Townsend, Niklas Koep, and Sebastian Weichwald. Pymanopt: a P ython toolbox for optimization on manifolds using automatic differentiation. Journal of Machine Learning Research, 17 0 (137): 0 1--5, 2016
2016
-
[32]
High-Dimensional Probability: An Introduction with Applications in Data Science, volume 47
Roman Vershynin. High-Dimensional Probability: An Introduction with Applications in Data Science, volume 47. Cambridge University Press, 2018
2018
-
[33]
Minimax rates of estimation for sparse PCA in high dimensions
Vincent Vu and Jing Lei. Minimax rates of estimation for sparse PCA in high dimensions. In Artificial Intelligence and Statistics, 2012
2012
-
[34]
A useful variant of the Davis--Kahan theorem for statisticians
Yi Yu, Tengyao Wang, and Richard J Samworth. A useful variant of the Davis--Kahan theorem for statisticians. Biometrika, 102 0 (2): 0 315--323, 2015
2015
-
[35]
and Lv, J
Shen, Y. and Lv, J. , journal =. Sharp concentration inequalities: phase transition and mixing of
-
[36]
Personalized Federated Learning: A Unified Framework and Universal Optimization Techniques , year =
Filip Hanzely and Boxin Zhao and Mladen Kolar , journal =. Personalized Federated Learning: A Unified Framework and Universal Optimization Techniques , year =
-
[37]
International Conference on Artificial Intelligence and Statistics (AISTATS) , title =
Brendan McMahan and Eider Moore and Daniel Ramage and Seth Hampson and Blaise Ag. International Conference on Artificial Intelligence and Statistics (AISTATS) , title =
-
[38]
Advances and open problems in federated learning , year =
Kairouz, Peter and McMahan, H Brendan and Avent, Brendan and Bellet, Aur. Advances and open problems in federated learning , year =. Foundations and trends
-
[39]
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
Abadi, Mart. ArXiv Preprint ArXiv:1603.04467 , title =
-
[40]
Transactions on Machine Learning Research , title =
Wenlin Chen and Samuel Horv. Transactions on Machine Learning Research , title =
-
[41]
ArXiv preprint ArXiv:2007.00878 , title =
Charles, Zachary and Kone. ArXiv preprint ArXiv:2007.00878 , title =
-
[42]
Stochastic Optimization with Importance Sampling for Regularized Loss Minimization , year =
Peilin Zhao and Tong Zhang , booktitle =. Stochastic Optimization with Importance Sampling for Regularized Loss Minimization , year =
-
[43]
Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm , year =
Needell, Deanna and Srebro, Nathan and Ward, Rachel , journal =. Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm , year =
-
[44]
Johnson and Carlos Guestrin , booktitle =
Tyler B. Johnson and Carlos Guestrin , booktitle =. Training Deep Models Faster with Robust, Approximate Importance Sampling , year =
-
[45]
Stich and Anant Raj and Martin Jaggi , booktitle =
Sebastian U. Stich and Anant Raj and Martin Jaggi , booktitle =. Safe Adaptive Importance Sampling , year =
-
[46]
Communication-efficient federated learning via optimal client sampling , year =
Ribero, Monica and Vikalo, Haris , journal =. Communication-efficient federated learning via optimal client sampling , year =
-
[47]
Accurate and fast federated learning via combinatorial multi-armed bandits , year =
Kim, Taehyeon and Bae, Sangmin and Lee, Jin-woo and Yun, Seyoung , journal =. Accurate and fast federated learning via combinatorial multi-armed bandits , year =
-
[48]
Bandit-based Communication-Efficient Client Selection Strategies for Federated Learning , year =
Yae Jee Cho and Samarth Gupta and Gauri Joshi and Osman Yagan , booktitle =. Bandit-based Communication-Efficient Client Selection Strategies for Federated Learning , year =
-
[49]
Federated learning with class imbalance reduction , year =
Yang, Miao and Wang, Ximin and Zhu, Hongbin and Wang, Haifeng and Qian, Hua , booktitle =. Federated learning with class imbalance reduction , year =
-
[50]
Optimizing Federated Learning on Non-IID Data with Reinforcement Learning , year =
Hao Wang and Zakhary Kaplan and Di Niu and Baochun Li , booktitle =. Optimizing Federated Learning on Non-IID Data with Reinforcement Learning , year =
-
[51]
Client selection in federated learning: Convergence analysis and power-of-choice selection strategies , year =
Cho, Yae Jee and Wang, Jianyu and Joshi, Gauri , journal =. Client selection in federated learning: Convergence analysis and power-of-choice selection strategies , year =
-
[52]
Journal of Machine Learning Research , title =
Csiba, Dominik and Richt. Journal of Machine Learning Research , title =. 2018 , number =
2018
-
[53]
Adaptive Sampling for
Siddharth Gopal , booktitle =. Adaptive Sampling for
-
[54]
Efficiency of coordinate descent methods on huge-scale optimization problems , year =
Nesterov, Yurii , journal =. Efficiency of coordinate descent methods on huge-scale optimization problems , year =
-
[55]
Faster Coordinate Descent via Adaptive Importance Sampling , year =
Dmytro Perekrestenko and Volkan Cevher and Martin Jaggi , booktitle =. Faster Coordinate Descent via Adaptive Importance Sampling , year =
-
[56]
International Conference on Machine Learning (ICML) , title =
Zeyuan Allen Zhu and Zheng Qu and Peter Richt. International Conference on Machine Learning (ICML) , title =
-
[57]
Elisa Celis , booktitle =
Farnood Salehi and Patrick Thiran and L. Elisa Celis , booktitle =. Coordinate Descent with Bandit Sampling , year =
-
[58]
Duchi , booktitle =
Hongseok Namkoong and Aman Sinha and Steve Yadlowsky and John C. Duchi , booktitle =. Adaptive Sampling Probabilities for Non-Smooth Optimization , year =
-
[59]
International Conference on Machine Learning (ICML) , title =
Zal. International Conference on Machine Learning (ICML) , title =
-
[60]
Stephens , booktitle =
Ayoub El Hanchi and David A. Stephens , booktitle =. Adaptive Importance Sampling for Finite-Sum Optimization and Sampling with Decreasing Step-Sizes , year =
-
[61]
Bandit algorithms , year =
Lattimore, Tor and Szepesv. Bandit algorithms , year =
-
[62]
Introduction to online convex optimization , year =
Hazan, Elad , journal =. Introduction to online convex optimization , year =
-
[63]
Online convex optimization in dynamic environments , year =
Hall, Eric C and Willett, Rebecca M , journal =. Online convex optimization in dynamic environments , year =
-
[64]
Tracking Slowly Moving Clairvoyant: Optimal Dynamic Regret of Online Learning with True and Noisy Gradient , year =
Tianbao Yang and Lijun Zhang and Rong Jin and Jinfeng Yi , booktitle =. Tracking Slowly Moving Clairvoyant: Optimal Dynamic Regret of Online Learning with True and Noisy Gradient , year =
-
[65]
International Conference on Machine Learning (ICML) , title =
Amit Daniely and Alon Gonen and Shai Shalev. International Conference on Machine Learning (ICML) , title =
-
[66]
Context-aware online client selection for hierarchical federated learning , year =
Qu, Zhe and Duan, Rui and Chen, Lixing and Xu, Jie and Lu, Zhuo and Liu, Yao , journal =. Context-aware online client selection for hierarchical federated learning , year =
-
[67]
A survey of adaptive sorting algorithms , year =
Estivill-Castro, Vladmir and Wood, Derick , journal =. A survey of adaptive sorting algorithms , year =
-
[68]
Mime: Mimicking centralized stochastic algorithms in federated learning , year =
Karimireddy, Sai Praneeth and Jaggi, Martin and Kale, Satyen and Mohri, Mehryar and Reddi, Sashank J and Stich, Sebastian U and Suresh, Ananda Theertha , journal =. Mime: Mimicking centralized stochastic algorithms in federated learning , year =
-
[69]
Reddi and Sebastian U
Sai Praneeth Karimireddy and Satyen Kale and Mehryar Mohri and Sashank J. Reddi and Sebastian U. Stich and Ananda Theertha Suresh , booktitle =
-
[70]
Stochastic first-and zeroth-order methods for nonconvex stochastic programming , year =
Ghadimi, Saeed and Lan, Guanghui , journal =. Stochastic first-and zeroth-order methods for nonconvex stochastic programming , year =
-
[71]
Prediction, learning, and games , year =
Cesa-Bianchi, Nicolo and Lugosi, G. Prediction, learning, and games , year =
-
[72]
Koolen , booktitle =
Tim van Erven and Wouter M. Koolen , booktitle =. MetaGrad: Multiple Learning Rates in Online Learning , year =
-
[73]
Advances in Neural Information Processing Systems (NeurIPS) , title =
Lijun Zhang and Shiyin Lu and Zhi. Advances in Neural Information Processing Systems (NeurIPS) , title =
-
[74]
Stochastic optimization with bandit sampling , year =
Salehi, Farnood and Celis, L Elisa and Thiran, Patrick , journal =. Stochastic optimization with bandit sampling , year =
-
[75]
Levy , booktitle =
Zalan Borsos and Andreas Krause and Kfir Y. Levy , booktitle =. Online Variance Reduction for Stochastic Optimization , year =
-
[76]
Deep learning for classical japanese literature , year =
Clanuwat, Tarin and Bober-Irizar, Mikel and Kitamoto, Asanobu and Lamb, Alex and Yamamoto, Kazuaki and Ha, David , journal =. Deep learning for classical japanese literature , year =
-
[77]
Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms , year =
Xiao, Han and Rasul, Kashif and Vollgraf, Roland , journal =. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms , year =
-
[78]
LeCun, Yann and Cortes, Corinna , journal =
-
[79]
Differential Privacy:
Cynthia Dwork , booktitle =. Differential Privacy:
-
[80]
L-SVRG and L-Katyusha with Adaptive Sampling , year =
Zhao, Boxin and Lyu, Boxiang and Kolar, Mladen , journal =. L-SVRG and L-Katyusha with Adaptive Sampling , year =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.