Multi-task Linear Regression without Eigenvalue Lower Bounds: Adaptivity, Robustness and Safety

Seok-Jin Kim

arxiv: 2605.17126 · v1 · pith:7A2TZDAKnew · submitted 2026-05-16 · 📊 stat.ML · cs.LG· stat.ME

Multi-task Linear Regression without Eigenvalue Lower Bounds: Adaptivity, Robustness and Safety

Seok-Jin Kim This is my paper

Pith reviewed 2026-05-20 14:32 UTC · model grok-4.3

classification 📊 stat.ML cs.LGstat.ME

keywords multi-task linear regressionrobust estimationoutlier taskshigh-dimensional statisticsadaptivitysafety guaranteeregularization

0 comments

The pith

A matrix-weighted estimator for multi-task linear regression achieves optimal rates under a relative balancedness condition that relaxes per-task eigenvalue lower bounds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an estimator for multi-task linear regression when a majority of tasks share similar parameters but some tasks are arbitrary outliers. It replaces the usual requirement that every task's second-moment matrix has a large minimum eigenvalue with a milder relative balancedness condition that only compares each task to the average inlier geometry. Under moderate balancedness the new bounds recover the best known prediction error rates and yield overall mean-squared error that is minimax optimal up to logarithmic factors. The same estimator is shown to be safe: when balancedness fails or tasks are unrelated, it performs no worse than learning each task independently. This combination yields simultaneous adaptivity, robustness, and safety.

Core claim

The estimator based on matrix-weighted norm regularization attains prediction MSE bounds matching earlier rates under substantially weaker spectral assumptions expressed by a relative balancedness constant; the resulting task-overall MSE is minimax optimal up to logarithmic factors. The estimator also satisfies a safety property that it performs no worse than independent task learning when the balancedness constant is large or infinite or when tasks are unrelated.

What carries the argument

Matrix-weighted norm regularization that adapts the penalty to the empirical second-moment matrices of the tasks, together with the relative balancedness constant that compares each task's second moment to the average inlier geometry.

If this is right

The method remains robust to a positive fraction of arbitrary outlier tasks while attaining near-optimal rates whenever balancedness holds.
Overall mean-squared error across tasks is minimax optimal up to logarithmic factors in favorable regimes.
The estimator adapts to task similarity without needing strong eigenvalue lower bounds on every individual task.
When tasks are unrelated or the balancedness constant grows large, performance is guaranteed to be no worse than separate single-task learning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Practical checks or estimates of the balancedness constant from data could make the method deployable in high-dimensional regimes where per-task eigenvalues vary widely.
The same weighted-regularization idea may apply to other multi-task problems such as classification where strict spectral assumptions are difficult to verify.
Numerical experiments that increase task dissimilarity while tracking whether performance stays at or above the single-task baseline would test the safety claim directly.

Load-bearing premise

The relative balancedness condition holds, so each task's second moment is comparable to the average geometry of the inlier tasks.

What would settle it

A high-dimensional dataset with moderate balancedness in which the estimator's prediction MSE exceeds the claimed minimax rate by more than logarithmic factors would falsify the optimality result.

Figures

Figures reproduced from arXiv: 2605.17126 by Seok-Jin Kim.

**Figure 1.** Figure 1: Synthetic sweep over the inlier radius δ. Rows show all-task, related-task, and outlier-task MSE. Sweep of outlier fraction ε. We next vary ε ∈ {0.05, 0.1, 0.2, 0.3, 0.4}, again under B¯ = 1. This directly isolates contamination while preserving favorable covariance alignment. Our method remains best on all-task MSE throughout the sweep and preserves a large advantage on related tasks even as the fraction … view at source ↗

**Figure 2.** Figure 2: Synthetic sweep over the outlier fraction [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗

**Figure 3.** Figure 3: Synthetic sweep over the eigendecay exponent [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗

**Figure 4.** Figure 4: Large-B¯ synthetic stress test based on common-operator-norm spiked covariances. Rows show all-task, related-task, and outlier-task MSE. Rank-deficient stress. This same sweep also probes the rank-deficient regime. The calibrated floor η is zero at the B¯ = 5 endpoint, so the inlier covariance matrices are singular rank-one spiked covariances; at the remaining sweep values the floor is small, giving near-s… view at source ↗

read the original abstract

We study the multi-task linear regression problem in the presence of contaminated tasks. We address the setting where the unknown parameters of a majority of tasks are close in the $\ell_2$-norm, while a fraction of tasks are arbitrary outliers. Existing theoretical frameworks for this problem rely heavily on the assumption that the empirical second moment of each task has a minimum eigenvalue bounded away from zero (order $\Omega(1)$). Crucially, this assumption fails in many high-dimensional scenarios, rendering prior guarantees vacuous. To overcome this limitation, we propose an estimator based on matrix-weighted norm regularization. We also introduce a relative balancedness condition, quantified by a balancedness constant, that compares each task's second moment with the average inlier geometry and relaxes the need for taskwise second-moment lower bounds. In favorable regimes with moderate balancedness, our prediction MSE bounds match the rate of Duan and Wang (2023) under substantially weaker spectral assumptions; the resulting task-overall MSE is minimax optimal up to logarithmic factors. Furthermore, we demonstrate that our estimator enjoys a safety guarantee: when the relevant balancedness constant is large or infinite, or when tasks are unrelated, the method performs no worse than independent task learning. Consequently, our methodology achieves simultaneous adaptivity to task similarity, robustness to outliers, and safety outside favorable transfer regimes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper replaces per-task eigenvalue lower bounds with a relative balancedness condition plus matrix-weighted regularization to recover prior multi-task rates under weaker assumptions while adding a safety fallback.

read the letter

The main takeaway is that this paper replaces the standard per-task eigenvalue lower bound assumption with a relative balancedness condition on second moments, using matrix-weighted regularization to achieve comparable rates under milder conditions in multi-task linear regression with contaminated tasks. It does well by adding a safety property that ensures the estimator is no worse than separate per-task ridge regressions when tasks are unrelated or balancedness fails. This makes the method more robust in practice for settings where transfer might not help. The claims about matching Duan and Wang rates up to logs and achieving minimax optimality seem consistent with the abstract and stress-test notes. The derivations are described as self-contained, which is positive. They avoid circularity by treating balancedness as an assumption, not a fitted thing. On the soft side, the benefits kick in only when balancedness is moderate, so in cases where tasks vary a lot in their geometries, you might not gain much. It would be good to see if the logarithmic factors are necessary or if they can be improved. Also, since the review was partly on abstract, confirming the full error analysis in the theorems would strengthen it, but the stress-test suggests no major gaps. This work is for researchers focused on robust and adaptive multi-task learning in high dimensions. Someone studying theoretical guarantees for transfer with outliers would get useful ideas from the balancedness concept and the weighted regularizer. Overall, the paper shows clear thinking on relaxing assumptions without losing the core benefits. I recommend putting it through peer review to get detailed feedback on the proofs and potential extensions.

Referee Report

2 major / 2 minor

Summary. The manuscript studies multi-task linear regression with a majority of tasks having similar parameters in l2-norm and a fraction of arbitrary outliers. It introduces a matrix-weighted norm regularizer together with a relative balancedness condition (quantified by a balancedness constant comparing each task's second-moment matrix to the average inlier geometry) that replaces the standard per-task eigenvalue lower bound of order Omega(1). Under moderate balancedness the paper claims that prediction MSE recovers the rate of Duan and Wang (2023) under substantially weaker spectral assumptions, that the task-overall MSE is minimax optimal up to logarithmic factors, and that the estimator satisfies a safety guarantee: when the balancedness constant is large or tasks are unrelated the method performs no worse than separate ridge regressions on each task.

Significance. If the stated bounds hold, the work meaningfully extends the applicability of multi-task transfer results to high-dimensional regimes where individual task covariances can be ill-conditioned or singular. The combination of adaptivity to task similarity, robustness to outliers, and an explicit safety fallback is practically valuable because it removes the risk that joint estimation degrades performance relative to independent learning. The relaxation of per-task spectral assumptions is a clear technical advance over prior frameworks that become vacuous under the same conditions.

major comments (2)

[§4.2, Theorem 4.1] §4.2, Theorem 4.1: the prediction-MSE upper bound is asserted to match the Duan-Wang rate up to logs under the relative balancedness condition, yet the explicit dependence of the leading constant on the balancedness parameter B is left implicit; without this dependence the claim that the rate is recovered 'under substantially weaker assumptions' cannot be verified from the main statement alone.
[§5.1] §5.1, safety guarantee: the argument that the estimator reduces to independent ridge regression when the balancedness constant tends to infinity relies on the matrix-weighted penalty becoming separable, but the proof sketch does not explicitly compute the limit of the regularizer or confirm that the resulting estimator coincides with the per-task ridge solutions used in the comparison.

minor comments (2)

[Abstract] The definition of 'task-overall MSE' appears only after the abstract; a one-sentence clarification in the abstract or introduction would improve immediate readability.
[§3] Notation for the matrix-weighted norm regularizer (Eq. (3)) uses an implicit dependence on the empirical second-moment matrices; an explicit display of how the weight matrix is constructed from the balancedness constant would aid verification.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading, positive assessment, and recommendation for minor revision. The two major comments identify opportunities to strengthen clarity in the presentation of our bounds and safety analysis. We address each point below and will incorporate the requested clarifications in the revised manuscript.

read point-by-point responses

Referee: [§4.2, Theorem 4.1] §4.2, Theorem 4.1: the prediction-MSE upper bound is asserted to match the Duan-Wang rate up to logs under the relative balancedness condition, yet the explicit dependence of the leading constant on the balancedness parameter B is left implicit; without this dependence the claim that the rate is recovered 'under substantially weaker assumptions' cannot be verified from the main statement alone.

Authors: We agree that an explicit statement of the dependence on the balancedness parameter B would make the comparison with Duan and Wang (2023) fully transparent. In the proof of Theorem 4.1 the leading constant scales linearly with B (or polylog(B) under the moderate-balancedness regime we consider), so that when B is bounded by a constant the rate matches the earlier result up to logarithmic factors while relaxing the per-task eigenvalue lower bound. To address the comment we will revise the statement of Theorem 4.1 (and the accompanying remark) to display this dependence explicitly. This change is a clarification rather than a correction of the underlying bound. revision: yes
Referee: [§5.1] §5.1, safety guarantee: the argument that the estimator reduces to independent ridge regression when the balancedness constant tends to infinity relies on the matrix-weighted penalty becoming separable, but the proof sketch does not explicitly compute the limit of the regularizer or confirm that the resulting estimator coincides with the per-task ridge solutions used in the comparison.

Authors: We thank the referee for noting this gap in the exposition. As the balancedness constant tends to infinity the weighting matrix in the regularizer converges to a block-diagonal form that decouples the tasks completely; the joint optimization therefore separates into independent ridge-regression problems whose solutions are exactly those used in the safety comparison. We will expand the argument in §5.1 to include the explicit limit calculation and the verification that the resulting estimator matches the per-task ridge solutions. This is a straightforward but useful elaboration of the existing proof sketch. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces the relative balancedness condition explicitly as an assumption that compares each task's second-moment matrix to the average inlier geometry, thereby relaxing per-task eigenvalue lower bounds without defining it in terms of the estimator's output. The matrix-weighted norm regularizer is constructed to adapt its penalty strength according to this stated condition, and the safety guarantee is framed as a fallback ensuring the estimator is at least as good as separate ridge regressions when balancedness is large or tasks are unrelated. The MSE bounds are derived directly from these assumptions and recover the Duan-Wang rate under weaker spectral conditions, with no steps reducing predictions to fitted parameters by construction or relying on load-bearing self-citations. All central derivations remain self-contained against the stated assumptions and external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claims rest on the new relative balancedness condition and the assumption that a majority of tasks are inliers with close parameters; these are not derived from prior literature but postulated for the setting.

axioms (1)

domain assumption A majority of tasks have unknown parameters close in the l2-norm while a fraction are arbitrary outliers
Defines the contaminated multi-task setting in the abstract.

invented entities (1)

relative balancedness constant no independent evidence
purpose: Quantifies comparison of each task's second moment to average inlier geometry to relax per-task eigenvalue bounds
Newly introduced to enable the weaker spectral assumptions and safety property.

pith-pipeline@v0.9.0 · 5771 in / 1378 out tokens · 55564 ms · 2026-05-20T14:32:46.226371+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose an estimator based on matrix-weighted norm regularization... relative balancedness condition, quantified by a balancedness constant, that compares each task's second moment with the average inlier geometry
IndisputableMonolith/Cost/FunctionalEquation washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Safety Guarantee: Regardless of the balancedness constant B... Ein_j(ˆθ_j) ≲ q² d/n ζ

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

The Impact of Process Competition on Energy Consumption: Analysis and Modeling
cs.DC 2026-02 unverdicted novelty 4.0

Experiments indicate a process's energy consumption under CPU competition changes from linear to root function as the number of host cores increases.

Reference graph

Works this paper leans on

127 extracted references · 127 canonical work pages · cited by 1 Pith paper · 3 internal anchors

[1]

Advances in neural information processing systems , volume=

Robust PCA via outlier pursuit , author=. Advances in neural information processing systems , volume=

work page
[2]

Journal of Machine Learning Research , volume=

Learning Multiple Tasks with Kernel Methods , author=. Journal of Machine Learning Research , volume=

work page
[3]

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining , pages=

Regularized multi--task learning , author=. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining , pages=

work page
[4]

Machine learning , volume=

Multitask learning , author=. Machine learning , volume=. 1997 , publisher=

work page 1997
[5]

Journal of the American Statistical Association , volume=

Optimal multitask linear regression and contextual bandits under sparse heterogeneity , author=. Journal of the American Statistical Association , volume=. 2025 , publisher=

work page 2025
[6]

arXiv preprint arXiv:2507.07941 , year=

Late Fusion Multi-task Learning for Semiparametric Inference with Nuisance Parameters , author=. arXiv preprint arXiv:2507.07941 , year=

work page arXiv
[7]

The Annals of Statistics , volume=

Adaptive and robust multi-task learning , author=. The Annals of Statistics , volume=. 2023 , publisher=

work page 2023
[8]

21th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN) , pages=

A Public Domain Dataset for Human Activity Recognition using Smartphones , author=. 21th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN) , pages=. 2013 , organization=

work page 2013
[9]

Advances in Neural Information Processing Systems , volume=

Minimax lower bounds for transfer learning with linear and one-hidden layer neural networks , author=. Advances in Neural Information Processing Systems , volume=

work page
[10]

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining , pages=

Integrating low-rank and group-sparse structures for robust multi-task learning , author=. Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining , pages=

work page
[11]

Advances in neural information processing systems , volume=

On the theory of transfer learning: The importance of task diversity , author=. Advances in neural information processing systems , volume=

work page
[12]

Journal of Machine Learning Research , volume=

Learning from similar linear representations: Adaptivity, minimaxity, and robustness , author=. Journal of Machine Learning Research , volume=

work page
[13]

A unified framework for semiparametrically efficient semi-supervised learning.arXiv preprint arXiv:2502.17741,

A Unified Framework for Semiparametrically Efficient Semi-Supervised Learning , author=. arXiv preprint arXiv:2502.17741 , year=

work page arXiv
[14]

International Conference on Machine Learning , pages=

Near-optimal representation learning for linear bandits and linear rl , author=. International Conference on Machine Learning , pages=. 2021 , organization=

work page 2021
[15]

International Conference on Machine Learning , pages=

Multi-task representation learning for pure exploration in linear bandits , author=. International Conference on Machine Learning , pages=. 2023 , organization=

work page 2023
[16]

International Conference on Artificial Intelligence and Statistics , pages=

Multitask bandit learning through heterogeneous feedback aggregation , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2021 , organization=

work page 2021
[17]

IEEE transactions on knowledge and data engineering , volume=

A survey on multi-task learning , author=. IEEE transactions on knowledge and data engineering , volume=. 2021 , publisher=

work page 2021
[18]

Journal of the American Statistical Association , volume=

Individual data protected integrative regression analysis of high-dimensional heterogeneous data , author=. Journal of the American Statistical Association , volume=. 2022 , publisher=

work page 2022
[19]

Journal of Machine Learning Research , volume=

Fused Lasso Approach in Regression Coefficients Clustering--Learning Parameter Heterogeneity in Data Integration , author=. Journal of Machine Learning Research , volume=

work page
[20]

2024 , publisher=

Learning theory from first principles , author=. 2024 , publisher=

work page 2024
[21]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Transfer learning for high-dimensional linear regression: Prediction, estimation and minimax optimality , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2022 , publisher=

work page 2022
[22]

NIPS2014 Workshop on Transfer and Multi-Task Learning: Theory Meets Practice , year=

Multi-task linear bandits , author=. NIPS2014 Workshop on Transfer and Multi-Task Learning: Theory Meets Practice , year=

work page
[23]

Adaptive and Multitask Learning Workshop at the ICML

Data enrichment: Multi-task learning in high dimension with theoretical guarantees , author=. Adaptive and Multitask Learning Workshop at the ICML. IMLS, Long Beach, CA , year=

work page
[24]

Management Science , volume=

Multitask learning and bandits via robust statistics , author=. Management Science , volume=. 2025 , doi=

work page 2025
[25]

Support union recovery in high-dimensional multivariate regression , author=

work page
[26]

2019 , series=

High-dimensional statistics: A non-asymptotic viewpoint , author=. 2019 , series=

work page 2019
[27]

arXiv preprint arXiv:2005.00944 , year=

Understanding and improving information transfer in multi-task learning , author=. arXiv preprint arXiv:2005.00944 , year=

work page arXiv 2005
[28]

Electronic Journal of Statistics , volume=

Data Enriched Linear Regression , author=. Electronic Journal of Statistics , volume=. 2015 , doi=

work page 2015
[29]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Predicting multivariate responses in multiple linear regression , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 1997 , publisher=

work page 1997
[30]

The Annals of Statistics , volume=

Randomized sketches for kernels: Fast and optimal nonparametric regression , author=. The Annals of Statistics , volume=. 2017 , doi=

work page 2017
[31]

International conference on machine learning , pages=

Which tasks should be learned together in multi-task learning? , author=. International conference on machine learning , pages=. 2020 , organization=

work page 2020
[32]

National Science Review , volume=

An overview of multi-task learning , author=. National Science Review , volume=. 2018 , publisher=

work page 2018
[33]

arXiv preprint arXiv:2406.18347 , year=

Sub-Gaussian High-Dimensional Covariance Matrix Estimation under Elliptical Factor Model with 2+ \ epsilon \ th Moment , author=. arXiv preprint arXiv:2406.18347 , year=

work page arXiv
[34]

Journal of Functional Analysis , volume=

Random vectors in the isotropic position , author=. Journal of Functional Analysis , volume=. 1999 , publisher=

work page 1999
[35]

2018 , series=

High-dimensional probability: An introduction with applications in data science , author=. 2018 , series=

work page 2018
[36]

Probability Theory and Related Fields , volume=

The lower tail of random quadratic forms with applications to ordinary least squares , author=. Probability Theory and Related Fields , volume=. 2016 , publisher=

work page 2016
[37]

International Conference on Machine Learning , pages=

Semiparametric contextual bandits , author=. International Conference on Machine Learning , pages=. 2018 , organization=

work page 2018
[38]

Journal of the American Statistical Association , volume=

Optimal experimental design for polynomial regression , author=. Journal of the American Statistical Association , volume=. 1971 , publisher=

work page 1971
[39]

arXiv preprint arXiv:2209.15224 , year=

Robust unsupervised multi-task and transfer learning on gaussian mixture models , author=. arXiv preprint arXiv:2209.15224 , year=

work page arXiv
[40]

A gentle introduction to concentration inequalities , author=. Dept. Comput. Sci., Cornell Univ., Tech. Rep , year=

work page
[41]

2021 , publisher=

Statistical methods for handling incomplete data , author=. 2021 , publisher=

work page 2021
[42]

Annals of the Institute of Statistical Mathematics , volume=

Local polynomial regression: Optimal kernels and asymptotic minimax efficiency , author=. Annals of the Institute of Statistical Mathematics , volume=. 1997 , publisher=

work page 1997
[43]

Proceedings of the 38th International Conference on Machine Learning , pages =

Sparsity-Agnostic Lasso Bandit , author =. Proceedings of the 38th International Conference on Machine Learning , pages =. 2021 , editor =

work page 2021
[44]

Advances in neural information processing systems , volume=

Improved algorithms for linear stochastic bandits , author=. Advances in neural information processing systems , volume=

work page
[45]

Management Science , volume=

Mostly exploration-free algorithms for contextual bandits , author=. Management Science , volume=. 2021 , publisher=

work page 2021
[46]

Artificial Intelligence and Statistics , pages=

Linear thompson sampling revisited , author=. Artificial Intelligence and Statistics , pages=. 2017 , organization=

work page 2017
[47]

Advances in neural information processing systems , volume=

A smoothed analysis of the greedy algorithm for the linear contextual bandit problem , author=. Advances in neural information processing systems , volume=

work page
[48]

Journal of Machine Learning Research , year=

Contextual bandits with linear Payoff functions , author=. Journal of Machine Learning Research , year=

work page
[49]

2011 , institution=

User-friendly tail bounds for matrix martingales , author=. 2011 , institution=

work page 2011
[50]

International conference on machine learning , pages=

Thompson sampling for contextual bandits with linear payoffs , author=. International conference on machine learning , pages=. 2013 , organization=

work page 2013
[51]

2006 , publisher=

Extreme value theory: an introduction , author=. 2006 , publisher=

work page 2006
[52]

arXiv preprint arXiv:1704.09011 , year=

Mostly exploration-free algorithms for contextual bandits , author=. arXiv preprint arXiv:1704.09011 , year=

work page arXiv
[53]

Management Science , year=

Mostly exploration-free algorithms for contextual bandits , author=. Management Science , year=

work page
[54]

2017 International Conference on Sampling Theory and Applications (SampTA) , pages=

Sparse linear contextual bandits via relevance vector machines , author=. 2017 International Conference on Sampling Theory and Applications (SampTA) , pages=. 2017 , organization=

work page 2017
[55]

arXiv preprint arXiv:1906.08947 , year=

Randomized Exploration in Generalized Linear Bandits , author=. arXiv preprint arXiv:1906.08947 , year=

work page arXiv 1906
[56]

Artificial Intelligence and Statistics , pages=

Bandit theory meets compressed sensing for high dimensional stochastic linear bandit , author=. Artificial Intelligence and Statistics , pages=

work page
[57]

2013 , publisher=

Concentration inequalities: A nonasymptotic theory of independence , author=. 2013 , publisher=

work page 2013
[58]

Advances in Neural Information Processing Systems , pages=

Doubly-Robust Lasso Bandit , author=. Advances in Neural Information Processing Systems , pages=

work page
[59]

Biometrics , volume=

Doubly robust estimation in missing data and causal inference models , author=. Biometrics , volume=. 2005 , publisher=

work page 2005
[60]

Operations Research , volume=

Online decision making with high-dimensional covariates , author=. Operations Research , volume=. 2020 , publisher=

work page 2020
[61]

Stochastic Systems , volume=

A linear response bandit problem , author=. Stochastic Systems , volume=. 2013 , publisher=

work page 2013
[62]

2011 , publisher=

Statistics for high-dimensional data: methods, theory and applications , author=. 2011 , publisher=

work page 2011
[63]

Journal of Machine Learning Research , volume=

Stochastic methods for l1-regularized loss minimization , author=. Journal of Machine Learning Research , volume=

work page
[64]

Artificial Intelligence and Statistics , pages=

Online-to-confidence-set conversions and application to sparse stochastic bandits , author=. Artificial Intelligence and Statistics , pages=

work page
[65]

Journal of Machine Learning Research , volume=

Sparsity regret bounds for individual sequences in online linear regression , author=. Journal of Machine Learning Research , volume=

work page
[66]

Introduction to the non-asymptotic analysis of random matrices

Introduction to the non-asymptotic analysis of random matrices , author=. arXiv preprint arXiv:1011.3027 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[67]

Proceedings of the 19th international conference on World wide web , pages=

A contextual-bandit approach to personalized news article recommendation , author=. Proceedings of the 19th international conference on World wide web , pages=. 2010 , organization=

work page 2010
[68]

Advances in neural information processing systems , pages=

An empirical evaluation of thompson sampling , author=. Advances in neural information processing systems , pages=

work page
[69]

MNL-Bandit: A Dynamic Learning Approach to Assortment Selection

MNL-bandit: a dynamic learning approach to assortment selection , author=. arXiv preprint arXiv:1706.03880 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[70]

International Conference on Machine Learning , pages=

Provably Optimal Algorithms for Generalized Linear Contextual Bandits , author=. International Conference on Machine Learning , pages=

work page
[71]

Tail bounds for sums of geometric and exponential variables

Tail bounds for sums of geometric and exponential variables , author=. arXiv preprint arXiv:1709.08157 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[72]

Machine learning , volume=

Finite-time analysis of the multiarmed bandit problem , author=. Machine learning , volume=. 2002 , publisher=

work page 2002
[73]

Proceedings of the 21st Annual Conference on Learning Theory , pages=

Stochastic linear optimization under bandit feedback , author=. Proceedings of the 21st Annual Conference on Learning Theory , pages=

work page
[74]

2019 , publisher =

Bandit Algorithms , author =. 2019 , publisher =

work page 2019
[75]

Advances in Neural Information Processing Systems , pages=

Parametric bandits: The generalized linear case , author=. Advances in Neural Information Processing Systems , pages=

work page
[76]

Manufacturing & Service Operations Management , volume=

Optimal dynamic assortment planning with demand learning , author=. Manufacturing & Service Operations Management , volume=. 2013 , publisher=

work page 2013
[77]

Artificial Intelligence and Statistics , pages=

Tight regret bounds for stochastic combinatorial semi-bandits , author=. Artificial Intelligence and Statistics , pages=

work page
[78]

Conference on Learning Theory , pages=

Thompson Sampling for the MNL-Bandit , author=. Conference on Learning Theory , pages=

work page
[79]

2012 , publisher=

Individual choice behavior: A theoretical analysis , author=. 2012 , publisher=

work page 2012
[80]

Journal of Machine Learning Research , volume=

Restricted eigenvalue properties for correlated Gaussian designs , author=. Journal of Machine Learning Research , volume=

work page

Showing first 80 references.

[1] [1]

Advances in neural information processing systems , volume=

Robust PCA via outlier pursuit , author=. Advances in neural information processing systems , volume=

work page

[2] [2]

Journal of Machine Learning Research , volume=

Learning Multiple Tasks with Kernel Methods , author=. Journal of Machine Learning Research , volume=

work page

[3] [3]

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining , pages=

Regularized multi--task learning , author=. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining , pages=

work page

[4] [4]

Machine learning , volume=

Multitask learning , author=. Machine learning , volume=. 1997 , publisher=

work page 1997

[5] [5]

Journal of the American Statistical Association , volume=

Optimal multitask linear regression and contextual bandits under sparse heterogeneity , author=. Journal of the American Statistical Association , volume=. 2025 , publisher=

work page 2025

[6] [6]

arXiv preprint arXiv:2507.07941 , year=

Late Fusion Multi-task Learning for Semiparametric Inference with Nuisance Parameters , author=. arXiv preprint arXiv:2507.07941 , year=

work page arXiv

[7] [7]

The Annals of Statistics , volume=

Adaptive and robust multi-task learning , author=. The Annals of Statistics , volume=. 2023 , publisher=

work page 2023

[8] [8]

21th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN) , pages=

A Public Domain Dataset for Human Activity Recognition using Smartphones , author=. 21th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN) , pages=. 2013 , organization=

work page 2013

[9] [9]

Advances in Neural Information Processing Systems , volume=

Minimax lower bounds for transfer learning with linear and one-hidden layer neural networks , author=. Advances in Neural Information Processing Systems , volume=

work page

[10] [10]

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining , pages=

Integrating low-rank and group-sparse structures for robust multi-task learning , author=. Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining , pages=

work page

[11] [11]

Advances in neural information processing systems , volume=

On the theory of transfer learning: The importance of task diversity , author=. Advances in neural information processing systems , volume=

work page

[12] [12]

Journal of Machine Learning Research , volume=

Learning from similar linear representations: Adaptivity, minimaxity, and robustness , author=. Journal of Machine Learning Research , volume=

work page

[13] [13]

A unified framework for semiparametrically efficient semi-supervised learning.arXiv preprint arXiv:2502.17741,

A Unified Framework for Semiparametrically Efficient Semi-Supervised Learning , author=. arXiv preprint arXiv:2502.17741 , year=

work page arXiv

[14] [14]

International Conference on Machine Learning , pages=

Near-optimal representation learning for linear bandits and linear rl , author=. International Conference on Machine Learning , pages=. 2021 , organization=

work page 2021

[15] [15]

International Conference on Machine Learning , pages=

Multi-task representation learning for pure exploration in linear bandits , author=. International Conference on Machine Learning , pages=. 2023 , organization=

work page 2023

[16] [16]

International Conference on Artificial Intelligence and Statistics , pages=

Multitask bandit learning through heterogeneous feedback aggregation , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2021 , organization=

work page 2021

[17] [17]

IEEE transactions on knowledge and data engineering , volume=

A survey on multi-task learning , author=. IEEE transactions on knowledge and data engineering , volume=. 2021 , publisher=

work page 2021

[18] [18]

Journal of the American Statistical Association , volume=

Individual data protected integrative regression analysis of high-dimensional heterogeneous data , author=. Journal of the American Statistical Association , volume=. 2022 , publisher=

work page 2022

[19] [19]

Journal of Machine Learning Research , volume=

Fused Lasso Approach in Regression Coefficients Clustering--Learning Parameter Heterogeneity in Data Integration , author=. Journal of Machine Learning Research , volume=

work page

[20] [20]

2024 , publisher=

Learning theory from first principles , author=. 2024 , publisher=

work page 2024

[21] [21]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Transfer learning for high-dimensional linear regression: Prediction, estimation and minimax optimality , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2022 , publisher=

work page 2022

[22] [22]

NIPS2014 Workshop on Transfer and Multi-Task Learning: Theory Meets Practice , year=

Multi-task linear bandits , author=. NIPS2014 Workshop on Transfer and Multi-Task Learning: Theory Meets Practice , year=

work page

[23] [23]

Adaptive and Multitask Learning Workshop at the ICML

Data enrichment: Multi-task learning in high dimension with theoretical guarantees , author=. Adaptive and Multitask Learning Workshop at the ICML. IMLS, Long Beach, CA , year=

work page

[24] [24]

Management Science , volume=

Multitask learning and bandits via robust statistics , author=. Management Science , volume=. 2025 , doi=

work page 2025

[25] [25]

Support union recovery in high-dimensional multivariate regression , author=

work page

[26] [26]

2019 , series=

High-dimensional statistics: A non-asymptotic viewpoint , author=. 2019 , series=

work page 2019

[27] [27]

arXiv preprint arXiv:2005.00944 , year=

Understanding and improving information transfer in multi-task learning , author=. arXiv preprint arXiv:2005.00944 , year=

work page arXiv 2005

[28] [28]

Electronic Journal of Statistics , volume=

Data Enriched Linear Regression , author=. Electronic Journal of Statistics , volume=. 2015 , doi=

work page 2015

[29] [29]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Predicting multivariate responses in multiple linear regression , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 1997 , publisher=

work page 1997

[30] [30]

The Annals of Statistics , volume=

Randomized sketches for kernels: Fast and optimal nonparametric regression , author=. The Annals of Statistics , volume=. 2017 , doi=

work page 2017

[31] [31]

International conference on machine learning , pages=

Which tasks should be learned together in multi-task learning? , author=. International conference on machine learning , pages=. 2020 , organization=

work page 2020

[32] [32]

National Science Review , volume=

An overview of multi-task learning , author=. National Science Review , volume=. 2018 , publisher=

work page 2018

[33] [33]

arXiv preprint arXiv:2406.18347 , year=

Sub-Gaussian High-Dimensional Covariance Matrix Estimation under Elliptical Factor Model with 2+ \ epsilon \ th Moment , author=. arXiv preprint arXiv:2406.18347 , year=

work page arXiv

[34] [34]

Journal of Functional Analysis , volume=

Random vectors in the isotropic position , author=. Journal of Functional Analysis , volume=. 1999 , publisher=

work page 1999

[35] [35]

2018 , series=

High-dimensional probability: An introduction with applications in data science , author=. 2018 , series=

work page 2018

[36] [36]

Probability Theory and Related Fields , volume=

The lower tail of random quadratic forms with applications to ordinary least squares , author=. Probability Theory and Related Fields , volume=. 2016 , publisher=

work page 2016

[37] [37]

International Conference on Machine Learning , pages=

Semiparametric contextual bandits , author=. International Conference on Machine Learning , pages=. 2018 , organization=

work page 2018

[38] [38]

Journal of the American Statistical Association , volume=

Optimal experimental design for polynomial regression , author=. Journal of the American Statistical Association , volume=. 1971 , publisher=

work page 1971

[39] [39]

arXiv preprint arXiv:2209.15224 , year=

Robust unsupervised multi-task and transfer learning on gaussian mixture models , author=. arXiv preprint arXiv:2209.15224 , year=

work page arXiv

[40] [40]

A gentle introduction to concentration inequalities , author=. Dept. Comput. Sci., Cornell Univ., Tech. Rep , year=

work page

[41] [41]

2021 , publisher=

Statistical methods for handling incomplete data , author=. 2021 , publisher=

work page 2021

[42] [42]

Annals of the Institute of Statistical Mathematics , volume=

Local polynomial regression: Optimal kernels and asymptotic minimax efficiency , author=. Annals of the Institute of Statistical Mathematics , volume=. 1997 , publisher=

work page 1997

[43] [43]

Proceedings of the 38th International Conference on Machine Learning , pages =

Sparsity-Agnostic Lasso Bandit , author =. Proceedings of the 38th International Conference on Machine Learning , pages =. 2021 , editor =

work page 2021

[44] [44]

Advances in neural information processing systems , volume=

Improved algorithms for linear stochastic bandits , author=. Advances in neural information processing systems , volume=

work page

[45] [45]

Management Science , volume=

Mostly exploration-free algorithms for contextual bandits , author=. Management Science , volume=. 2021 , publisher=

work page 2021

[46] [46]

Artificial Intelligence and Statistics , pages=

Linear thompson sampling revisited , author=. Artificial Intelligence and Statistics , pages=. 2017 , organization=

work page 2017

[47] [47]

Advances in neural information processing systems , volume=

A smoothed analysis of the greedy algorithm for the linear contextual bandit problem , author=. Advances in neural information processing systems , volume=

work page

[48] [48]

Journal of Machine Learning Research , year=

Contextual bandits with linear Payoff functions , author=. Journal of Machine Learning Research , year=

work page

[49] [49]

2011 , institution=

User-friendly tail bounds for matrix martingales , author=. 2011 , institution=

work page 2011

[50] [50]

International conference on machine learning , pages=

Thompson sampling for contextual bandits with linear payoffs , author=. International conference on machine learning , pages=. 2013 , organization=

work page 2013

[51] [51]

2006 , publisher=

Extreme value theory: an introduction , author=. 2006 , publisher=

work page 2006

[52] [52]

arXiv preprint arXiv:1704.09011 , year=

Mostly exploration-free algorithms for contextual bandits , author=. arXiv preprint arXiv:1704.09011 , year=

work page arXiv

[53] [53]

Management Science , year=

Mostly exploration-free algorithms for contextual bandits , author=. Management Science , year=

work page

[54] [54]

2017 International Conference on Sampling Theory and Applications (SampTA) , pages=

Sparse linear contextual bandits via relevance vector machines , author=. 2017 International Conference on Sampling Theory and Applications (SampTA) , pages=. 2017 , organization=

work page 2017

[55] [55]

arXiv preprint arXiv:1906.08947 , year=

Randomized Exploration in Generalized Linear Bandits , author=. arXiv preprint arXiv:1906.08947 , year=

work page arXiv 1906

[56] [56]

Artificial Intelligence and Statistics , pages=

Bandit theory meets compressed sensing for high dimensional stochastic linear bandit , author=. Artificial Intelligence and Statistics , pages=

work page

[57] [57]

2013 , publisher=

Concentration inequalities: A nonasymptotic theory of independence , author=. 2013 , publisher=

work page 2013

[58] [58]

Advances in Neural Information Processing Systems , pages=

Doubly-Robust Lasso Bandit , author=. Advances in Neural Information Processing Systems , pages=

work page

[59] [59]

Biometrics , volume=

Doubly robust estimation in missing data and causal inference models , author=. Biometrics , volume=. 2005 , publisher=

work page 2005

[60] [60]

Operations Research , volume=

Online decision making with high-dimensional covariates , author=. Operations Research , volume=. 2020 , publisher=

work page 2020

[61] [61]

Stochastic Systems , volume=

A linear response bandit problem , author=. Stochastic Systems , volume=. 2013 , publisher=

work page 2013

[62] [62]

2011 , publisher=

Statistics for high-dimensional data: methods, theory and applications , author=. 2011 , publisher=

work page 2011

[63] [63]

Journal of Machine Learning Research , volume=

Stochastic methods for l1-regularized loss minimization , author=. Journal of Machine Learning Research , volume=

work page

[64] [64]

Artificial Intelligence and Statistics , pages=

Online-to-confidence-set conversions and application to sparse stochastic bandits , author=. Artificial Intelligence and Statistics , pages=

work page

[65] [65]

Journal of Machine Learning Research , volume=

Sparsity regret bounds for individual sequences in online linear regression , author=. Journal of Machine Learning Research , volume=

work page

[66] [66]

Introduction to the non-asymptotic analysis of random matrices

Introduction to the non-asymptotic analysis of random matrices , author=. arXiv preprint arXiv:1011.3027 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[67] [67]

Proceedings of the 19th international conference on World wide web , pages=

A contextual-bandit approach to personalized news article recommendation , author=. Proceedings of the 19th international conference on World wide web , pages=. 2010 , organization=

work page 2010

[68] [68]

Advances in neural information processing systems , pages=

An empirical evaluation of thompson sampling , author=. Advances in neural information processing systems , pages=

work page

[69] [69]

MNL-Bandit: A Dynamic Learning Approach to Assortment Selection

MNL-bandit: a dynamic learning approach to assortment selection , author=. arXiv preprint arXiv:1706.03880 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[70] [70]

International Conference on Machine Learning , pages=

Provably Optimal Algorithms for Generalized Linear Contextual Bandits , author=. International Conference on Machine Learning , pages=

work page

[71] [71]

Tail bounds for sums of geometric and exponential variables

Tail bounds for sums of geometric and exponential variables , author=. arXiv preprint arXiv:1709.08157 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[72] [72]

Machine learning , volume=

Finite-time analysis of the multiarmed bandit problem , author=. Machine learning , volume=. 2002 , publisher=

work page 2002

[73] [73]

Proceedings of the 21st Annual Conference on Learning Theory , pages=

Stochastic linear optimization under bandit feedback , author=. Proceedings of the 21st Annual Conference on Learning Theory , pages=

work page

[74] [74]

2019 , publisher =

Bandit Algorithms , author =. 2019 , publisher =

work page 2019

[75] [75]

Advances in Neural Information Processing Systems , pages=

Parametric bandits: The generalized linear case , author=. Advances in Neural Information Processing Systems , pages=

work page

[76] [76]

Manufacturing & Service Operations Management , volume=

Optimal dynamic assortment planning with demand learning , author=. Manufacturing & Service Operations Management , volume=. 2013 , publisher=

work page 2013

[77] [77]

Artificial Intelligence and Statistics , pages=

Tight regret bounds for stochastic combinatorial semi-bandits , author=. Artificial Intelligence and Statistics , pages=

work page

[78] [78]

Conference on Learning Theory , pages=

Thompson Sampling for the MNL-Bandit , author=. Conference on Learning Theory , pages=

work page

[79] [79]

2012 , publisher=

Individual choice behavior: A theoretical analysis , author=. 2012 , publisher=

work page 2012

[80] [80]

Journal of Machine Learning Research , volume=

Restricted eigenvalue properties for correlated Gaussian designs , author=. Journal of Machine Learning Research , volume=

work page