arxiv: 2604.20161 · v1 · submitted 2026-04-22 · 💻 cs.LG · stat.ME· stat.ML

Recognition: unknown

SMART: A Spectral Transfer Approach to Multi-Task Learning

Boxin Zhao , Mladen Kolar , Jinchi Lv

Authors on Pith no claims yet

Pith reviewed 2026-05-10 00:34 UTC · model grok-4.3

classification 💻 cs.LG stat.MEstat.ML

keywords multi-task learningtransfer learningspectral similaritylinear regressionstructured regularizationsingular subspaceserror boundsADMM algorithm

0 comments

The pith

SMART transfers spectral subspaces from a source model to estimate the target coefficient matrix with near-minimax error rates in multi-task linear regression.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SMART, a spectral transfer method for multi-task linear regression that assumes the target's left and right singular subspaces lie within the source subspaces and are sparsely aligned with the source singular bases. This spectral similarity lets the method regularize the target estimates using only a fitted source model rather than raw source data. The approach moves beyond bounded-difference assumptions between source and target by exploiting shared latent structures through structured regularization. Non-asymptotic error bounds and a minimax lower bound are derived, which together give near-minimax Frobenius rates up to logarithmic factors under regularity conditions. Simulations and multi-modal single-cell data analysis show gains in accuracy and reduced negative transfer.

Core claim

SMART estimates the target coefficient matrix through structured regularization that incorporates spectral information from a source study. The method assumes that the target left and right singular subspaces lie within the corresponding source subspaces and are sparsely aligned with the source singular bases. It requires only a fitted source model, develops an ADMM algorithm for the resulting nonconvex problem, and establishes general non-asymptotic error bounds together with a minimax lower bound in the noiseless-source regime that yield near-minimax Frobenius error rates up to logarithmic factors under additional regularity conditions.

What carries the argument

Structured regularization that encodes the source singular subspaces and sparse alignment with the source singular bases inside the target estimation problem.

If this is right

Knowledge transfer occurs using only the fitted source model, without access to raw source observations.
The method applies in regimes where source and target differ by more than a bounded difference yet share spectral structure.
An ADMM procedure solves the nonconvex optimization in practice.
Estimation accuracy improves and negative transfer is avoided under the stated spectral conditions.
Near-minimax Frobenius rates hold up to logarithmic factors once regularity conditions are met.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same spectral-regularization idea could extend to other matrix-variate estimation tasks that share latent factor structure.
Only model parameters need to be shared, which supports transfer in privacy-restricted domains.
When multiple sources are available, their singular subspaces might be combined to tighten the regularization further.

Load-bearing premise

The target's left and right singular subspaces lie within the source subspaces and are sparsely aligned with the source singular bases.

What would settle it

Generate data exactly under the spectral similarity assumption and noiseless source, apply SMART, and check whether the observed Frobenius error exceeds the derived non-asymptotic upper bound or falls short of the minimax lower bound.

Figures

Figures reproduced from arXiv: 2604.20161 by Boxin Zhao, Jinchi Lv, Mladen Kolar.

**Figure 1.** Figure 1: Simulation results for Model I. size n “ 200, estimated rank rp “ 5, source subspace ranks ru “ rv “ 10, and source noise level σ0 “ 0.01. For all models, we fix r0 “ 10 and r “ 5. Each configuration is repeated 100 times to compute the averages and standard errors. 1) Experiment 1: Vary the sample size n, while keeping rp, ru, rv, and σ0 fixed. 2) Experiment 2: Vary the estimated rank rp, with n, ru, rv, … view at source ↗

**Figure 2.** Figure 2: Simulation results for Model II. • SOFAR. Enforces two-way sparsity on singular vectors by solving a regularized optimization problem with orthogonality constraints (Uematsu et al., 2019). • RSSVD. A layer-wise estimation method that applies adaptive penalties to individual rank-one components (Mukherjee and Zhu, 2011; Mukherjee et al., 2015). All baseline methods use only the target data and ignore the so… view at source ↗

**Figure 3.** Figure 3: Simulation results for Model III. set contains 248 shared gene features, and all 134 ADT markers are retained as responses. Additional preprocessing details are provided in Appendix C. We evaluate SMART on a Multimodal Single-Cell dataset of bone marrow mononuclear cells from 12 healthy donors1 . Each cell contains paired measurements of gene expression (GEX) and protein abundance (ADT). We treat each row … view at source ↗

**Figure 4.** Figure 4: Cell counts for each annotated cell type in the multi-modal single-cell data. where ntest denotes the number of test observations and q denotes the number of output variables. To improve robustness, we repeated this procedure 100 times using different train/test splits generated from a fixed list of random seeds. For each method, we report the mean, standard deviation, median, and interquartile range (Q1 a… view at source ↗

read the original abstract

Multi-task learning is effective for related applications, but its performance can deteriorate when the target sample size is small. Transfer learning can borrow strength from related studies; yet, many existing methods rely on restrictive bounded-difference assumptions between the source and target models. We propose SMART, a spectral transfer method for multi-task linear regression that instead assumes spectral similarity: the target left and right singular subspaces lie within the corresponding source subspaces and are sparsely aligned with the source singular bases. Such an assumption is natural when studies share latent structures and enables transfer beyond the bounded-difference settings. SMART estimates the target coefficient matrix through structured regularization that incorporates spectral information from a source study. Importantly, it requires only a fitted source model rather than the raw source data, making it useful when data sharing is limited. Although the optimization problem is nonconvex, we develop a practical ADMM-based algorithm. We establish general, non-asymptotic error bounds and a minimax lower bound in the noiseless-source regime. Under additional regularity conditions, these results yield near-minimax Frobenius error rates up to logarithmic factors. Simulations confirm improved estimation accuracy and robustness to negative transfer, and analysis of multi-modal single-cell data demonstrates better predictive performance. The Python implementation of SMART, along with the code to reproduce all experiments in this paper, is publicly available at https://github.com/boxinz17/smart.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SMART introduces spectral subspace transfer for multi-task regression that works from fitted source models only, but the non-asymptotic bounds apply to the exact non-convex minimizer while the ADMM solver only reaches stationary points.

read the letter

The core contribution is a shift from bounded-difference assumptions to spectral similarity: the target left and right singular subspaces sit inside the source subspaces and align sparsely with the source singular vectors. This yields a structured regularizer for the target coefficient matrix that needs only the fitted source model, which is practical when raw data cannot be shared. They derive general non-asymptotic error bounds and a minimax lower bound in the noiseless-source case, recovering near-minimax Frobenius rates up to logs under extra regularity. Simulations and a single-cell data example show gains over baselines and robustness to negative transfer, with public code for reproduction.

Referee Report

2 major / 2 minor

Summary. The paper introduces SMART, a spectral transfer method for multi-task linear regression. It assumes that the target coefficient matrix has left and right singular subspaces contained in those of a source model and that the singular vectors are sparsely aligned with the source bases. The estimator is defined via a non-convex structured regularizer that incorporates fitted source spectral information (without requiring raw source data). An ADMM algorithm is proposed for optimization. The main theoretical contributions are non-asymptotic upper bounds on the Frobenius error of the estimator together with a minimax lower bound in the noiseless-source regime; under additional regularity conditions these yield near-minimax rates up to log factors. Empirical results on simulations and multi-modal single-cell data are reported, with code released publicly.

Significance. If the central claims hold, the work offers a principled relaxation of bounded-difference transfer assumptions to a spectral-containment-plus-sparse-alignment condition that is natural for shared latent structure. The requirement of only a fitted source model is practically valuable under data-sharing constraints. The derivation of both upper and lower bounds, together with the public reproducibility package, strengthens the contribution relative to purely algorithmic transfer methods.

major comments (2)

[§3, §4] §3 (estimator definition) and §4 (ADMM): The non-asymptotic Frobenius error bounds and the near-minimax rates are stated for the exact global minimizer of the non-convex objective that encodes the subspace-containment and sparse-alignment penalties. The ADMM procedure is shown only to reach a stationary point (or a global minimizer under unverified strong-convexity conditions). No quantitative bound is supplied on the distance between the ADMM output and the global minimizer, nor on how any such distance propagates into the statistical error. Consequently the claimed rates apply only conditionally on exact optimization, which the supplied algorithm does not guarantee.
[§3.2] §3.2 (assumption statement): The key modeling assumption that the target left and right singular subspaces lie inside the corresponding source subspaces is used to derive the structured regularizer and the subsequent error bounds. The paper does not provide a quantitative measure of how much violation of this containment (e.g., small but nonzero projection onto the orthogonal complement) would inflate the error, leaving the robustness of the theoretical guarantees to mild misspecification unaddressed.

minor comments (2)

[§2] Notation for the source and target singular-value decompositions is introduced without an explicit table of symbols; readers must track several related but distinct matrices (e.g., source vs. target left singular vectors) across sections.
[§5] The simulation section reports average Frobenius errors but does not include standard errors or the number of Monte Carlo repetitions in the main text (they appear only in the supplement).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment point by point below, with proposed revisions where appropriate to improve the manuscript.

read point-by-point responses

Referee: [§3, §4] §3 (estimator definition) and §4 (ADMM): The non-asymptotic Frobenius error bounds and the near-minimax rates are stated for the exact global minimizer of the non-convex objective that encodes the subspace-containment and sparse-alignment penalties. The ADMM procedure is shown only to reach a stationary point (or a global minimizer under unverified strong-convexity conditions). No quantitative bound is supplied on the distance between the ADMM output and the global minimizer, nor on how any such distance propagates into the statistical error. Consequently the claimed rates apply only conditionally on exact optimization, which the supplied algorithm does not guarantee.

Authors: We agree that the non-asymptotic Frobenius error bounds and near-minimax rates in Section 4 are derived for the global minimizer of the non-convex objective. The ADMM procedure in Section 5 is presented as a practical algorithm that reaches a stationary point, with empirical evidence from simulations that it yields solutions with strong statistical performance. We do not supply a quantitative bound on the distance to the global minimizer or its propagation into statistical error. In the revised manuscript we will add explicit clarification in Sections 3 and 4 distinguishing the theoretical estimator from the ADMM output, and we will expand the discussion of ADMM's empirical convergence behavior. This revision will be partial, as a full non-convex optimization analysis lies beyond the paper's scope. revision: partial
Referee: [§3.2] §3.2 (assumption statement): The key modeling assumption that the target left and right singular subspaces lie inside the corresponding source subspaces is used to derive the structured regularizer and the subsequent error bounds. The paper does not provide a quantitative measure of how much violation of this containment (e.g., small but nonzero projection onto the orthogonal complement) would inflate the error, leaving the robustness of the theoretical guarantees to mild misspecification unaddressed.

Authors: The referee correctly notes that exact subspace containment is central to the structured regularizer and the error bounds. We have not included a quantitative sensitivity analysis for small violations of containment. Developing such bounds would require additional perturbation arguments for singular subspaces and is technically involved. In the revision we will expand Section 3.2 to state the assumption more explicitly, discuss its role, and add a qualitative remark on potential sensitivity to mild misspecification. We will also include a small simulation experiment illustrating performance under approximate containment. A full quantitative robustness theory remains outside the current scope and is noted as a direction for future work. revision: partial

Circularity Check

0 steps flagged

No load-bearing circularity; theoretical bounds derived from stated assumptions without reduction to fitted parameters

full rationale

The paper introduces a new structured regularization penalty that encodes the spectral subspace-containment and sparse-alignment assumptions directly into the objective. The non-asymptotic error bounds and minimax lower bound are stated for the exact minimizer of this objective under the given assumptions and are not shown to be equivalent to any fitted source quantity by construction. No self-citation chain is invoked to justify the core uniqueness or rate results, and the derivation chain remains self-contained once the regularization form and assumptions are accepted. The ADMM solver is presented separately as a practical heuristic without affecting the theoretical claims.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends primarily on the spectral similarity assumption as the key modeling choice that enables transfer beyond bounded-difference settings.

axioms (1)

domain assumption Target left and right singular subspaces lie within source subspaces and are sparsely aligned with source singular bases
This is the core modeling assumption stated in the abstract that replaces bounded-difference assumptions and underpins the regularization and error bounds.

pith-pipeline@v0.9.0 · 5545 in / 1251 out tokens · 34374 ms · 2026-05-10T00:34:03.870065+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

190 extracted references · 6 canonical work pages

[1]

Estimating linear restrictions on regression coefficients for multivariate normal distributions

Theodore Wilbur Anderson. Estimating linear restrictions on regression coefficients for multivariate normal distributions. The Annals of Mathematical Statistics, pages 327--351, 1951

1951
[2]

Optimal selection of reduced rank estimators of high-dimensional matrices

Florentina Bunea, Yiyuan She, and Marten H Wegkamp. Optimal selection of reduced rank estimators of high-dimensional matrices. The Annals of Statistics, pages 1282--1309, 2011

2011
[3]

Florentina Bunea, Yiyuan She, and Marten H. Wegkamp. Joint variable and rank selection for parsimonious estimation of high-dimensional matrices. The Annals of Statistics, 40 0 (5): 0 2359--2388, 2012

2012
[4]

Sparse PCA : optimal rates and adaptive estimation

T Tony Cai, Zongming Ma, and Yihong Wu. Sparse PCA : optimal rates and adaptive estimation. The Annals of Statistics, 41 0 (237): 0 3074--3110, 2013

2013
[5]

Transfer learning for functional mean estimation: phase transition and adaptive algorithms

T Tony Cai, Dongwoo Kim, and Hongming Pu. Transfer learning for functional mean estimation: phase transition and adaptive algorithms. The Annals of Statistics, 52 0 (2): 0 654--678, 2024

2024
[6]

Multitask learning

Rich Caruana. Multitask learning. Machine Learning, 28: 0 41--75, 1997

1997
[7]

Reduced rank regression via adaptive nuclear norm penalization

Kun Chen, Hongbo Dong, and Kung-Sik Chan. Reduced rank regression via adaptive nuclear norm penalization. Biometrika, 100 0 (4): 0 901--920, 2013

2013
[8]

Sparse reduced-rank regression for simultaneous dimension reduction and variable selection

Lisha Chen and Jianhua Z Huang. Sparse reduced-rank regression for simultaneous dimension reduction and variable selection. Journal of the American Statistical Association, 107 0 (500): 0 1533--1545, 2012

2012
[9]

Robust angle-based transfer learning in high dimensions

Tian Gu, Yi Han, and Rui Duan. Robust angle-based transfer learning in high dimensions. Journal of the Royal Statistical Society Series B: Statistical Methodology, 87 0 (3): 0 723--745, 2025

2025
[10]

Representational transfer learning for matrix completion

Yong He, Zeyu Li, Dong Liu, Kangxiang Qin, and Jiahui Xie. Representational transfer learning for matrix completion. arXiv preprint arXiv:2412.06233, 2024

work page arXiv 2024
[11]

Reduced-rank regression for the multivariate linear model

Alan Julian Izenman. Reduced-rank regression for the multivariate linear model. Journal of Multivariate Analysis, 5 0 (2): 0 248--264, 1975

1975
[12]

A dirty model for multi-task learning

Ali Jalali, Sujay Sanghavi, Chao Ruan, and Pradeep Ravikumar. A dirty model for multi-task learning. Advances in Neural Information Processing Systems, 23, 2010

2010
[13]

Transfer learning for high-dimensional linear regression: prediction, estimation and minimax optimality

Sai Li, T Tony Cai, and Hongzhe Li. Transfer learning for high-dimensional linear regression: prediction, estimation and minimax optimality. Journal of the Royal Statistical Society Series B, 84 0 (1): 0 149--173, 2022

2022
[14]

Estimation and inference for high-dimensional generalized linear models with knowledge transfer

Sai Li, Linjun Zhang, T Tony Cai, and Hongzhe Li. Estimation and inference for high-dimensional generalized linear models with knowledge transfer. Journal of the American Statistical Association, 0 0 (0): 0 1--12, 2023

2023
[15]

A sandbox for prediction and integration of DNA , RNA , and proteins in single cells

Malte Luecken, Daniel Burkhardt, Robrecht Cannoodt, Christopher Lance, Aditi Agrawal, Hananeh Aliee, Ann Chen, Louise Deconinck, Angela Detweiler, Alejandro Granados, Shelly Huynh, Laura Isacco, Yang Kim, Dominik Klein, BONY DE KUMAR, Sunil Kuppasani, Heiko Lickert, Aaron McGeever, Joaquin Melgarejo, Honey Mekonen, Maurizio Morri, Michaela M\" u ller, Nor...

2021
[16]

Adaptive estimation in two-way sparse reduced-rank regression

Zhuang Ma, Zongming Ma, and Tingni Sun. Adaptive estimation in two-way sparse reduced-rank regression. Statistica Sinica, 30 0 (4): 0 2179--2201, 2020

2020
[17]

Reduced rank ridge regression and its kernel extensions

Ashin Mukherjee and Ji Zhu. Reduced rank ridge regression and its kernel extensions. Statistical Analysis and Data Mining, 4 0 (6): 0 612--622, 2011

2011
[18]

On the degrees of freedom of reduced-rank estimators in multivariate regression

Ashin Mukherjee, Kun Chen, Naisyin Wang, and Ji Zhu. On the degrees of freedom of reduced-rank estimators in multivariate regression. Biometrika, 102 0 (2): 0 457--477, 2015

2015
[19]

A survey on transfer learning

Sinno Jialin Pan and Qiang Yang. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22 0 (10): 0 1345--1359, 2009

2009
[20]

Transfer learning under large-scale low-rank regression models

Seyoung Park, Eun Ryung Lee, Hyunjin Kim, and Hongyu Zhao. Transfer learning under large-scale low-rank regression models. Journal of the American Statistical Association, pages 1--13, 2025

2025
[21]

Rank-based transfer learning for high-dimensional survival data with application to sepsis data

Nan Qiao, Haowei Jiang, and Cunjie Lin. Rank-based transfer learning for high-dimensional survival data with application to sepsis data. arXiv preprint arXiv:2504.11270, 2025

work page arXiv 2025
[22]

Shen and J

Y. Shen and J. Lv. Sharp concentration inequalities: phase transition and mixing of O rlicz tails with variance. arXiv preprint arXiv:2603.25934, 2026

work page arXiv 2026
[23]

Nets of Grassmann manifold and orthogonal group

Stanislaw J Szarek. Nets of Grassmann manifold and orthogonal group. In Proceedings of Research Workshop on Banach Space Theory, volume 169, page 185, 1982

1982
[24]

Explainable multi-task learning for multi-modality biological data analysis

Xin Tang, Jiawei Zhang, Yichun He, Xinhe Zhang, Zuwan Lin, Sebastian Partarrieu, Emma Bou Hanna, Zhaolin Ren, Hao Shen, Yuhong Yang, et al. Explainable multi-task learning for multi-modality biological data analysis. Nature Communications, 14 0 (1): 0 2546, 2023

2023
[25]

SOFAR : large-scale association network learning

Yoshimasa Uematsu, Yingying Fan, Kun Chen, Jinchi Lv, and Wei Lin. SOFAR : large-scale association network learning. IEEE Transactions on Information Theory, 65 0 (8): 0 4924--4939, 2019

2019
[26]

Innate lymphoid cells: 10 years on

Eric Vivier, David Artis, Marco Colonna, Andreas Diefenbach, James P Di Santo, G \'e rard Eberl, Shigeo Koyasu, Richard M Locksley, Andrew NJ McKenzie, Reina E Mebius, et al. Innate lymphoid cells: 10 years on. Cell, 174 0 (5): 0 1054--1066, 2018

2018
[27]

High-Dimensional Statistics: A Non-Asymptotic Viewpoint, volume 48

Martin J Wainwright. High-Dimensional Statistics: A Non-Asymptotic Viewpoint, volume 48. Cambridge University Press, 2019

2019
[28]

Dimension reduction and coefficient estimation in multivariate linear regression

Ming Yuan, Ali Ekici, Zhaosong Lu, and Renato Monteiro. Dimension reduction and coefficient estimation in multivariate linear regression. Journal of the Royal Statistical Society Series B, 69 0 (3): 0 329--346, 2007

2007
[29]

A survey on multi-task learning

Yu Zhang and Qiang Yang. A survey on multi-task learning. IEEE Transactions on Knowledge and Data Engineering, 34 0 (12): 0 5586--5609, 2021

2021
[30]

Trans-glasso: A transfer learning approach to precision matrix estimation

Boxin Zhao, Cong Ma, and Mladen Kolar. Trans-glasso: A transfer learning approach to precision matrix estimation. Journal of the American Statistical Association, pages 1--21, 2025

2025
[31]

Pymanopt: a P ython toolbox for optimization on manifolds using automatic differentiation

James Townsend, Niklas Koep, and Sebastian Weichwald. Pymanopt: a P ython toolbox for optimization on manifolds using automatic differentiation. Journal of Machine Learning Research, 17 0 (137): 0 1--5, 2016

2016
[32]

High-Dimensional Probability: An Introduction with Applications in Data Science, volume 47

Roman Vershynin. High-Dimensional Probability: An Introduction with Applications in Data Science, volume 47. Cambridge University Press, 2018

2018
[33]

Minimax rates of estimation for sparse PCA in high dimensions

Vincent Vu and Jing Lei. Minimax rates of estimation for sparse PCA in high dimensions. In Artificial Intelligence and Statistics, 2012

2012
[34]

A useful variant of the Davis--Kahan theorem for statisticians

Yi Yu, Tengyao Wang, and Richard J Samworth. A useful variant of the Davis--Kahan theorem for statisticians. Biometrika, 102 0 (2): 0 315--323, 2015

2015
[35]

and Lv, J

Shen, Y. and Lv, J. , journal =. Sharp concentration inequalities: phase transition and mixing of
[36]

Personalized Federated Learning: A Unified Framework and Universal Optimization Techniques , year =

Filip Hanzely and Boxin Zhao and Mladen Kolar , journal =. Personalized Federated Learning: A Unified Framework and Universal Optimization Techniques , year =
[37]

International Conference on Artificial Intelligence and Statistics (AISTATS) , title =

Brendan McMahan and Eider Moore and Daniel Ramage and Seth Hampson and Blaise Ag. International Conference on Artificial Intelligence and Statistics (AISTATS) , title =
[38]

Advances and open problems in federated learning , year =

Kairouz, Peter and McMahan, H Brendan and Avent, Brendan and Bellet, Aur. Advances and open problems in federated learning , year =. Foundations and trends
[39]

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

Abadi, Mart. ArXiv Preprint ArXiv:1603.04467 , title =

work page Pith review arXiv
[40]

Transactions on Machine Learning Research , title =

Wenlin Chen and Samuel Horv. Transactions on Machine Learning Research , title =
[41]

ArXiv preprint ArXiv:2007.00878 , title =

Charles, Zachary and Kone. ArXiv preprint ArXiv:2007.00878 , title =

work page arXiv 2007
[42]

Stochastic Optimization with Importance Sampling for Regularized Loss Minimization , year =

Peilin Zhao and Tong Zhang , booktitle =. Stochastic Optimization with Importance Sampling for Regularized Loss Minimization , year =
[43]

Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm , year =

Needell, Deanna and Srebro, Nathan and Ward, Rachel , journal =. Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm , year =
[44]

Johnson and Carlos Guestrin , booktitle =

Tyler B. Johnson and Carlos Guestrin , booktitle =. Training Deep Models Faster with Robust, Approximate Importance Sampling , year =
[45]

Stich and Anant Raj and Martin Jaggi , booktitle =

Sebastian U. Stich and Anant Raj and Martin Jaggi , booktitle =. Safe Adaptive Importance Sampling , year =
[46]

Communication-efficient federated learning via optimal client sampling , year =

Ribero, Monica and Vikalo, Haris , journal =. Communication-efficient federated learning via optimal client sampling , year =
[47]

Accurate and fast federated learning via combinatorial multi-armed bandits , year =

Kim, Taehyeon and Bae, Sangmin and Lee, Jin-woo and Yun, Seyoung , journal =. Accurate and fast federated learning via combinatorial multi-armed bandits , year =
[48]

Bandit-based Communication-Efficient Client Selection Strategies for Federated Learning , year =

Yae Jee Cho and Samarth Gupta and Gauri Joshi and Osman Yagan , booktitle =. Bandit-based Communication-Efficient Client Selection Strategies for Federated Learning , year =
[49]

Federated learning with class imbalance reduction , year =

Yang, Miao and Wang, Ximin and Zhu, Hongbin and Wang, Haifeng and Qian, Hua , booktitle =. Federated learning with class imbalance reduction , year =
[50]

Optimizing Federated Learning on Non-IID Data with Reinforcement Learning , year =

Hao Wang and Zakhary Kaplan and Di Niu and Baochun Li , booktitle =. Optimizing Federated Learning on Non-IID Data with Reinforcement Learning , year =
[51]

Client selection in federated learning: Convergence analysis and power-of-choice selection strategies , year =

Cho, Yae Jee and Wang, Jianyu and Joshi, Gauri , journal =. Client selection in federated learning: Convergence analysis and power-of-choice selection strategies , year =
[52]

Journal of Machine Learning Research , title =

Csiba, Dominik and Richt. Journal of Machine Learning Research , title =. 2018 , number =

2018
[53]

Adaptive Sampling for

Siddharth Gopal , booktitle =. Adaptive Sampling for
[54]

Efficiency of coordinate descent methods on huge-scale optimization problems , year =

Nesterov, Yurii , journal =. Efficiency of coordinate descent methods on huge-scale optimization problems , year =
[55]

Faster Coordinate Descent via Adaptive Importance Sampling , year =

Dmytro Perekrestenko and Volkan Cevher and Martin Jaggi , booktitle =. Faster Coordinate Descent via Adaptive Importance Sampling , year =
[56]

International Conference on Machine Learning (ICML) , title =

Zeyuan Allen Zhu and Zheng Qu and Peter Richt. International Conference on Machine Learning (ICML) , title =
[57]

Elisa Celis , booktitle =

Farnood Salehi and Patrick Thiran and L. Elisa Celis , booktitle =. Coordinate Descent with Bandit Sampling , year =
[58]

Duchi , booktitle =

Hongseok Namkoong and Aman Sinha and Steve Yadlowsky and John C. Duchi , booktitle =. Adaptive Sampling Probabilities for Non-Smooth Optimization , year =
[59]

International Conference on Machine Learning (ICML) , title =

Zal. International Conference on Machine Learning (ICML) , title =
[60]

Stephens , booktitle =

Ayoub El Hanchi and David A. Stephens , booktitle =. Adaptive Importance Sampling for Finite-Sum Optimization and Sampling with Decreasing Step-Sizes , year =
[61]

Bandit algorithms , year =

Lattimore, Tor and Szepesv. Bandit algorithms , year =
[62]

Introduction to online convex optimization , year =

Hazan, Elad , journal =. Introduction to online convex optimization , year =
[63]

Online convex optimization in dynamic environments , year =

Hall, Eric C and Willett, Rebecca M , journal =. Online convex optimization in dynamic environments , year =
[64]

Tracking Slowly Moving Clairvoyant: Optimal Dynamic Regret of Online Learning with True and Noisy Gradient , year =

Tianbao Yang and Lijun Zhang and Rong Jin and Jinfeng Yi , booktitle =. Tracking Slowly Moving Clairvoyant: Optimal Dynamic Regret of Online Learning with True and Noisy Gradient , year =
[65]

International Conference on Machine Learning (ICML) , title =

Amit Daniely and Alon Gonen and Shai Shalev. International Conference on Machine Learning (ICML) , title =
[66]

Context-aware online client selection for hierarchical federated learning , year =

Qu, Zhe and Duan, Rui and Chen, Lixing and Xu, Jie and Lu, Zhuo and Liu, Yao , journal =. Context-aware online client selection for hierarchical federated learning , year =
[67]

A survey of adaptive sorting algorithms , year =

Estivill-Castro, Vladmir and Wood, Derick , journal =. A survey of adaptive sorting algorithms , year =
[68]

Mime: Mimicking centralized stochastic algorithms in federated learning , year =

Karimireddy, Sai Praneeth and Jaggi, Martin and Kale, Satyen and Mohri, Mehryar and Reddi, Sashank J and Stich, Sebastian U and Suresh, Ananda Theertha , journal =. Mime: Mimicking centralized stochastic algorithms in federated learning , year =
[69]

Reddi and Sebastian U

Sai Praneeth Karimireddy and Satyen Kale and Mehryar Mohri and Sashank J. Reddi and Sebastian U. Stich and Ananda Theertha Suresh , booktitle =
[70]

Stochastic first-and zeroth-order methods for nonconvex stochastic programming , year =

Ghadimi, Saeed and Lan, Guanghui , journal =. Stochastic first-and zeroth-order methods for nonconvex stochastic programming , year =
[71]

Prediction, learning, and games , year =

Cesa-Bianchi, Nicolo and Lugosi, G. Prediction, learning, and games , year =
[72]

Koolen , booktitle =

Tim van Erven and Wouter M. Koolen , booktitle =. MetaGrad: Multiple Learning Rates in Online Learning , year =
[73]

Advances in Neural Information Processing Systems (NeurIPS) , title =

Lijun Zhang and Shiyin Lu and Zhi. Advances in Neural Information Processing Systems (NeurIPS) , title =
[74]

Stochastic optimization with bandit sampling , year =

Salehi, Farnood and Celis, L Elisa and Thiran, Patrick , journal =. Stochastic optimization with bandit sampling , year =
[75]

Levy , booktitle =

Zalan Borsos and Andreas Krause and Kfir Y. Levy , booktitle =. Online Variance Reduction for Stochastic Optimization , year =
[76]

Deep learning for classical japanese literature , year =

Clanuwat, Tarin and Bober-Irizar, Mikel and Kitamoto, Asanobu and Lamb, Alex and Yamamoto, Kazuaki and Ha, David , journal =. Deep learning for classical japanese literature , year =
[77]

Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms , year =

Xiao, Han and Rasul, Kashif and Vollgraf, Roland , journal =. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms , year =
[78]

LeCun, Yann and Cortes, Corinna , journal =
[79]

Differential Privacy:

Cynthia Dwork , booktitle =. Differential Privacy:
[80]

L-SVRG and L-Katyusha with Adaptive Sampling , year =

Zhao, Boxin and Lyu, Boxiang and Kolar, Mladen , journal =. L-SVRG and L-Katyusha with Adaptive Sampling , year =

Showing first 80 references.