Distribution-dependent Generalization Bounds for Tuning Linear Regression Across Tasks

Dravyansh Sharma; Maria-Florina Balcan; Saumya Goyal

arxiv: 2507.05084 · v2 · submitted 2025-07-07 · 💻 cs.LG · stat.ML

Distribution-dependent Generalization Bounds for Tuning Linear Regression Across Tasks

Maria-Florina Balcan , Saumya Goyal , Dravyansh Sharma This is my paper

Pith reviewed 2026-05-19 06:18 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords generalization boundsmulti-task learninglinear regressionregularization tuningsub-Gaussian distributionsdistribution-dependent boundshigh-dimensional datavalidation loss

0 comments

The pith

Under common distributional assumptions, tuning regularization hyperparameters in multi-task linear regression yields generalization bounds that do not worsen with feature dimension.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops generalization bounds for the validation loss when tuning L1 and L2 regularization coefficients across multiple related tasks in linear regression. These bounds depend on the data distribution and improve as the distribution becomes nicer, such as when instances are i.i.d. sub-Gaussian. This contrasts with prior uniform bounds that necessarily degrade as the feature dimension d increases. The results apply to ridge, lasso, and elastic net, and extend to a generalized ridge regression that incorporates an estimate of the mean. A sympathetic reader would care because this allows for effective hyperparameter tuning in high-dimensional settings without the bounds blowing up.

Core claim

The central claim is that distribution-dependent generalization bounds for tuning the L1 and L2 coefficients in multi-task linear regression do not get worse with increasing feature dimension d when instances within each task are i.i.d. draws from sub-Gaussian or similar distributions, and these bounds are much sharper than those from prior work for very large d. The bounds are obtained for the validation loss and also extended to a generalized form of ridge regression using mean estimates of the ground truth.

What carries the argument

Distribution-dependent generalization bounds on the validation loss for hyperparameter tuning, derived using concentration properties of sub-Gaussian random variables across tasks.

If this is right

The generalization error bounds stay controlled even for arbitrarily large feature dimensions.
Performance guarantees are tighter than uniform bounds when data is well-behaved.
Results cover standard regularization methods including ridge, lasso, and elastic net.
Tighter bounds are possible for generalized ridge regression by estimating the mean of the target distribution.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach could enable scaling multi-task learning to very high-dimensional problems in practice without bound degradation.
Similar distribution-dependent analyses might apply to hyperparameter tuning in other models beyond linear regression.
Practitioners could leverage these bounds for more reliable regularization choices in high-d regimes with structured data.

Load-bearing premise

The instances within each task must be i.i.d. draws from sub-Gaussian or analogous well-studied distribution classes.

What would settle it

Empirical observation that the generalization error of the tuned parameters increases with feature dimension d on data satisfying the i.i.d. sub-Gaussian assumption would disprove the bounds.

read the original abstract

Modern regression problems often involve high-dimensional data and a careful tuning of the regularization hyperparameters is crucial to avoid overly complex models that may overfit the training data while guaranteeing desirable properties like effective variable selection. We study the recently introduced direction of tuning regularization hyperparameters in linear regression across multiple related tasks. We obtain distribution-dependent bounds on the generalization error for the validation loss when tuning the L1 and L2 coefficients, including ridge, lasso and the elastic net. In contrast, prior work develops bounds that apply uniformly to all distributions, but such bounds necessarily degrade with feature dimension, d. While these bounds are shown to be tight for worst-case distributions, our bounds improve with the "niceness" of the data distribution. Concretely, we show that under additional assumptions that instances within each task are i.i.d. draws from broad well-studied classes of distributions including sub-Gaussians, our generalization bounds do not get worse with increasing d, and are much sharper than prior work for very large d. We also extend our results to a generalization of ridge regression, where we achieve tighter bounds that take into account an estimate of the mean of the ground truth distribution.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper derives distribution-dependent generalization bounds for multi-task linear regression hyperparameter tuning that stay independent of dimension under sub-Gaussian assumptions.

read the letter

The main takeaway is that under the assumption that each task's samples are i.i.d. from sub-Gaussian distributions, the generalization bounds for tuning L1/L2 regularization parameters across tasks do not degrade with feature dimension d and improve on the uniform bounds from earlier work for large d. They also extend the approach to a mean-estimated ridge variant with tighter results there. The multi-task formulation and the mean-estimate extension look like the genuinely new pieces relative to the cited single-task literature. The paper does a clean job of showing how the distributional niceness removes the usual d-dependent factors that appear in worst-case analyses. The low circularity burden is a plus, since the bounds rely on the stated assumptions rather than quantities fitted from the target data itself. The citation pattern seems reasonable and does not skip obvious priors. The soft spots are modest but real. Everything hinges on the per-task i.i.d. sub-Gaussian condition, which is standard yet restricts applicability to cleaner data regimes; the abstract gives no proof sketches or tightness checks, so it is hard to judge how loose the constants end up being in practice. Without seeing the derivations it is also unclear whether the bounds remain useful once you account for the hidden factors that often appear in these arguments. This paper is aimed at people working on high-dimensional generalization theory and multi-task regression. A reader who cares about replacing uniform convergence with distribution-dependent results would get direct value from it. It deserves a serious referee because the core claim addresses a concrete limitation in prior uniform bounds and the technical direction is coherent on its own terms. I would send it to peer review, expecting the authors to supply the missing proof details and perhaps a small simulation to show the bounds are not vacuous.

Referee Report

0 major / 2 minor

Summary. The manuscript derives distribution-dependent generalization bounds for the validation loss when tuning L1 and L2 regularization hyperparameters (including ridge, lasso, and elastic net) in a multi-task linear regression setting. Under the explicit assumption that instances within each task are i.i.d. draws from sub-Gaussian or similar well-studied distribution classes, the bounds remain independent of feature dimension d and are sharper than prior uniform bounds for large d. The results are extended to a generalized ridge regression that incorporates an estimate of the mean of the ground truth distribution.

Significance. If the central derivations hold, the work offers a meaningful advance by replacing worst-case uniform bounds (which necessarily degrade with d) with sharper, distribution-dependent guarantees that exploit common niceness properties of real data. This is particularly relevant for practical high-dimensional multi-task regression, where hyperparameter tuning must balance overfitting and variable selection without overly pessimistic dimension dependence.

minor comments (2)

[Introduction / Problem Setup] The abstract and introduction refer to 'broad well-studied classes of distributions including sub-Gaussians' without listing the precise classes for which the theorems are proved; this should be stated explicitly in the problem setup section.
[Preliminaries] In the multi-task notation, the symbols for the number of tasks T and samples per task n are introduced without a dedicated table or clear cross-reference to the single-task case; this creates minor ambiguity when comparing to prior single-task bounds.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive review and recommendation for minor revision. The referee's summary accurately reflects the paper's focus on deriving distribution-dependent generalization bounds for tuning L1/L2 regularization in multi-task linear regression, which remain independent of dimension d under sub-Gaussian i.i.d. assumptions and improve upon uniform bounds for large d.

Circularity Check

0 steps flagged

No significant circularity; bounds derived from explicit distributional assumptions

full rationale

The paper derives distribution-dependent generalization bounds for tuning L1/L2 regularization in multi-task linear regression. The central argument invokes the explicit assumption that instances within each task are i.i.d. draws from sub-Gaussian or similar well-studied distribution classes; this directly yields bounds independent of dimension d that improve over uniform worst-case bounds for large d. No load-bearing step reduces by construction to a fitted parameter, self-citation chain, or renamed input. The derivation remains self-contained with independent content supplied by the stated distributional niceness and standard concentration tools.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard statistical assumptions about data distributions rather than new free parameters or invented entities.

axioms (1)

domain assumption Instances within each task are i.i.d. draws from sub-Gaussian or similar well-studied distribution classes
Invoked in the abstract as the additional assumption enabling bounds that do not worsen with increasing d.

pith-pipeline@v0.9.0 · 5741 in / 1228 out tokens · 74555 ms · 2026-05-19T06:18:02.655096+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We obtain distribution-dependent bounds on the generalization error for the validation loss when tuning the L1 and L2 coefficients... under additional assumptions that instances within each task are i.i.d. draws from ... sub-Gaussians, our generalization bounds do not get worse with increasing d
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 3.1 ... lv(λERM) − lv(λ∗) ≤ 2M L ΛT_D / √T E[∥xv∥] + ...

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages

[1]

Adversarial learning guarantees for linear hypotheses and neural networks

Pranjal Awasthi, Natalie Frank, and Mehryar Mohri. Adversarial learning guarantees for linear hypotheses and neural networks. In Hal Daumé III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 431--441. PMLR, 13--18 Jul 2020

work page 2020
[2]

Data-Driven Algorithm Design (book chapter)

Maria-Florina Balcan. Data-Driven Algorithm Design (book chapter). In Beyond Worst-Case Analysis of Algorithms, Tim Roughgarden (Ed). Cambridge University Press , 2020

work page 2020
[3]

Data driven semi-supervised learning

Maria-Florina Balcan and Dravyansh Sharma. Data driven semi-supervised learning. Advances in Neural Information Processing Systems (NeurIPS), 34: 0 14782--14794, 2021

work page 2021
[4]

Learning accurate and interpretable decision trees

Maria-Florina Balcan and Dravyansh Sharma. Learning accurate and interpretable decision trees. In Uncertainty in Artificial Intelligence, pages 288--307. PMLR, 2024

work page 2024
[5]

Learning-theoretic foundations of algorithm configuration for combinatorial partitioning problems

Maria-Florina Balcan, Vaishnavh Nagarajan, Ellen Vitercik, and Colin White. Learning-theoretic foundations of algorithm configuration for combinatorial partitioning problems. In Conference on Learning Theory, pages 213--274. PMLR, 2017

work page 2017
[6]

Dispersion for data-driven algorithm design, online learning, and private optimization

Maria-Florina Balcan, Travis Dick, and Ellen Vitercik. Dispersion for data-driven algorithm design, online learning, and private optimization. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pages 603--614. IEEE, 2018 a

work page 2018
[7]

Data-driven clustering via parameterized L loyd's families

Maria-Florina Balcan, Travis Dick, and Colin White. Data-driven clustering via parameterized L loyd's families. Advances in Neural Information Processing Systems (NeurIPS), 31, 2018 b

work page 2018
[8]

A general theory of sample complexity for multi-item profit maximization

Maria-Florina Balcan, Tuomas Sandholm, and Ellen Vitercik. A general theory of sample complexity for multi-item profit maximization. In Proceedings of the 2018 ACM Conference on Economics and Computation, pages 173--174, 2018 c

work page 2018
[9]

Learning to link

Maria-Florina Balcan, Travis Dick, and Manuel Lang. Learning to link. In International Conference on Learning Representation (ICLR), 2020 a

work page 2020
[10]

Learning piecewise L ipschitz functions in changing environments

Maria-Florina Balcan, Travis Dick, and Dravyansh Sharma. Learning piecewise L ipschitz functions in changing environments. In International Conference on Artificial Intelligence and Statistics, pages 3567--3577. PMLR, 2020 b

work page 2020
[11]

How much data is sufficient to learn high-performing algorithms? generalization guarantees for data-driven algorithm design

Maria-Florina Balcan, Dan DeBlasio, Travis Dick, Carl Kingsford, Tuomas Sandholm, and Ellen Vitercik. How much data is sufficient to learn high-performing algorithms? generalization guarantees for data-driven algorithm design. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 919--932, 2021 a

work page 2021
[12]

Learning-to-learn non-convex piecewise- L ipschitz functions

Maria-Florina Balcan, Mikhail Khodak, Dravyansh Sharma, and Ameet Talwalkar. Learning-to-learn non-convex piecewise- L ipschitz functions. Advances in Neural Information Processing Systems, 34: 0 15056--15069, 2021 b

work page 2021
[13]

Provably tuning the ElasticNet across instances

Maria-Florina Balcan, Misha Khodak, Dravyansh Sharma, and Ameet Talwalkar. Provably tuning the ElasticNet across instances. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 27769--27782. Curran Associates, Inc., 2022 a

work page 2022
[14]

Structural analysis of branch-and-cut and the learnability of G omory mixed integer cuts

Maria-Florina Balcan, Siddharth Prasad, Tuomas Sandholm, and Ellen Vitercik. Structural analysis of branch-and-cut and the learnability of G omory mixed integer cuts. Advances in Neural Information Processing Systems (NeurIPS), 35: 0 33890--33903, 2022 b

work page 2022
[15]

An analysis of robustness of non- L ipschitz networks

Maria-Florina Balcan, Avrim Blum, Dravyansh Sharma, and Hongyang Zhang. An analysis of robustness of non- L ipschitz networks. Journal of Machine Learning Research, 24 0 (98): 0 1--43, 2023 a

work page 2023
[16]

New bounds for hyperparameter tuning of regression problems across instances

Maria-Florina Balcan, Anh Nguyen, and Dravyansh Sharma. New bounds for hyperparameter tuning of regression problems across instances. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 80066--80078. Curran Associates, Inc., 2023 b

work page 2023
[17]

Generalization guarantees for multi-item profit maximization: Pricing, auctions, and randomized mechanisms

Maria-Florina Balcan, Tuomas Sandholm, and Ellen Vitercik. Generalization guarantees for multi-item profit maximization: Pricing, auctions, and randomized mechanisms. Operations Research, 73 0 (2): 0 648--663, 2023 c

work page 2023
[18]

Learning to branch: Generalization guarantees and limits of data-independent discretization

Maria-Florina Balcan, Travis Dick, Tuomas Sandholm, and Ellen Vitercik. Learning to branch: Generalization guarantees and limits of data-independent discretization. Journal of the ACM (JACM), 2024 a

work page 2024
[19]

Accelerating ERM for data-driven algorithm design using output-sensitive techniques

Maria-Florina Balcan, Christopher Seiler, and Dravyansh Sharma. Accelerating ERM for data-driven algorithm design using output-sensitive techniques. Advances in Neural Information Processing Systems, 37: 0 72648--72687, 2024 b

work page 2024
[20]

Algorithm configuration for structured P faffian settings

Maria-Florina Balcan, Anh Tuan Nguyen, and Dravyansh Sharma. Algorithm configuration for structured P faffian settings. Transactions of Machine Learning Research (TMLR), 2025 a

work page 2025
[21]

Sample complexity of data-driven tuning of model hyperparameters in neural networks with structured parameter-dependent dual function

Maria-Florina Balcan, Anh Tuan Nguyen, and Dravyansh Sharma. Sample complexity of data-driven tuning of model hyperparameters in neural networks with structured parameter-dependent dual function. arXiv preprint arXiv:2501.13734, 2025 b

work page arXiv 2025
[22]

Local R ademacher complexities

Peter Bartlett, Olivier Bousquet, and Shahar Mendelson. Local R ademacher complexities. The Annals of Statistics, 33 0 (4), August 2005. ISSN 0090-5364. doi:10.1214/009053605000000282

work page doi:10.1214/009053605000000282 2005
[23]

Generalization bounds for data-driven numerical linear algebra

Peter Bartlett, Piotr Indyk, and Tal Wagner. Generalization bounds for data-driven numerical linear algebra. In Conference on Learning Theory (COLT), pages 2013--2040. PMLR, 2022

work page 2013
[24]

Learning complexity of simulated annealing

Avrim Blum, Chen Dan, and Saeed Seddighin. Learning complexity of simulated annealing. In International Conference on Artificial Intelligence and Statistics (AISTATS) , pages 1540--1548. PMLR, 2021

work page 2021
[25]

Learning cut generating functions for integer programming

Hongyu Cheng and Amitabh Basu. Learning cut generating functions for integer programming. Advances in Neural Information Processing Systems, 37: 0 61455--61480, 2024

work page 2024
[26]

Tuning algorithmic and architectural hyperparameters in graph-based semi-supervised learning with provable guarantees

Ally Yalei Du, Eric Huang, and Dravyansh Sharma. Tuning algorithmic and architectural hyperparameters in graph-based semi-supervised learning with provable guarantees. In The 41st Conference on Uncertainty in Artificial Intelligence, 2025

work page 2025
[27]

Two Modeling Strategies for Empirical Bayes Estimation

Bradley Efron. Two Modeling Strategies for Empirical Bayes Estimation . Statistical Science, 29 0 (2): 0 285 -- 301, 2014. doi:10.1214/13-STS455

work page doi:10.1214/13-sts455 2014
[28]

Recovery of exact sparse representations in the presence of bounded noise

Jean-Jacques Fuchs. Recovery of exact sparse representations in the presence of bounded noise. IEEE Transactions on Information Theory, 51 0 (10): 0 3601--3608, 2005

work page 2005
[29]

A simulation study of some ridge estimators

Diane Galarneau Gibbons. A simulation study of some ridge estimators. Journal of the American Statistical Association, 76 0 (373): 0 131--139, 1981

work page 1981
[30]

Generalized cross-validation as a method for choosing a good ridge parameter

Gene H Golub, Michael Heath, and Grace Wahba. Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics, 21 0 (2): 0 215--223, 1979

work page 1979
[31]

The elements of statistical learning: data mining, inference, and prediction, volume 2

Trevor Hastie, Robert Tibshirani, Jerome H Friedman, and Jerome H Friedman. The elements of statistical learning: data mining, inference, and prediction, volume 2. Springer, 2009

work page 2009
[32]

On the R ademacher series

Pawe Hitczenko and Stanis aw Kwapie \' n . On the R ademacher series. In J rgen Hoffmann-J rgensen, James Kuelbs, and Michael B. Marcus, editors, Probability in Banach Spaces, 9, pages 31--36, Boston, MA, 1994. Birkh \"a user Boston. ISBN 978-1-4612-0253-0

work page 1994
[33]

Ridge regression: applications to nonorthogonal problems

Arthur E Hoerl and Robert W Kennard. Ridge regression: applications to nonorthogonal problems. Technometrics, 12 0 (1): 0 69--82, 1970

work page 1970
[34]

Teh, and Yihong Wu

Soham Jana, Yury Polyanskiy, Anzo Z. Teh, and Yihong Wu. Empirical Bayes via ERM and Rademacher complexities: the P oisson model. In Gergely Neu and Lorenzo Rosasco, editors, Proceedings of Thirty Sixth Conference on Learning Theory, volume 195 of Proceedings of Machine Learning Research, pages 5199--5235. PMLR, 12--15 Jul 2023

work page 2023
[35]

Sample complexity of posted pricing for a single item

Billy Jin, Thomas Kesselheim, Will Ma, and Sahil Singla. Sample complexity of posted pricing for a single item. In Advances in Neural Information Processing Systems, 2024

work page 2024
[36]

Learning to relax: Setting solver parameters across a sequence of linear system instances

Mikhail Khodak, Edmond Chow, Maria-Florina Balcan, and Ameet Talwalkar. Learning to relax: Setting solver parameters across a sequence of linear system instances. The Twelfth International Conference on Learning Representations (ICLR), 2024

work page 2024
[37]

A flexible empirical B ayes approach to multiple linear regression and connections with penalized regression

Youngseok Kim, Wei Wang, Peter Carbonetto, and Matthew Stephens. A flexible empirical B ayes approach to multiple linear regression and connections with penalized regression. Journal of Machine Learning Research, 25 0 (185): 0 1--59, 2024

work page 2024
[38]

Concentration in unbounded metric spaces and algorithmic stability

Aryeh Kontorovich. Concentration in unbounded metric spaces and algorithmic stability. In Eric P. Xing and Tony Jebara, editors, Proceedings of the 31st International Conference on Machine Learning, volume 32 of Proceedings of Machine Learning Research, pages 28--36, Bejing, China, 22--24 Jun 2014. PMLR

work page 2014
[39]

Learning with square loss: Localization through offset rademacher complexity

Tengyuan Liang, Alexander Rakhlin, and Karthik Sridharan. Learning with square loss: Localization through offset rademacher complexity. In Peter Grünwald, Elad Hazan, and Satyen Kale, editors, Proceedings of The 28th Conference on Learning Theory, volume 40 of Proceedings of Machine Learning Research, pages 1260--1285, Paris, France, 03--06 Jul 2015. PMLR

work page 2015
[40]

Complexity analysis of the lasso regularization path

Julien Mairal and Bin Yu. Complexity analysis of the lasso regularization path. International Conference on Machine Learning, 2012

work page 2012
[41]

The benefit of multitask representation learning

Andreas Maurer, Massimiliano Pontil, and Bernardino Romera-Paredes. The benefit of multitask representation learning. Journal of Machine Learning Research, 17 0 (81): 0 1--32, 2016

work page 2016
[42]

Exact minimax risk for linear least squares, and the lower tail of sample covariance matrices

Jaouad Mourtada. Exact minimax risk for linear least squares, and the lower tail of sample covariance matrices . The Annals of Statistics, 50 0 (4): 0 2157 -- 2178, 2022. doi:10.1214/22-AOS2181

work page doi:10.1214/22-aos2181 2022
[43]

The B ayesian lasso

Trevor Park and George Casella. The B ayesian lasso. Journal of the American Statistical Association, 103 0 (482): 0 681--686, 2008. doi:10.1198/016214508000000337

work page doi:10.1198/016214508000000337 2008
[44]

Excess risk bounds for multitask learning with trace norm regularization

Massimiliano Pontil and Andreas Maurer. Excess risk bounds for multitask learning with trace norm regularization. In Shai Shalev-Shwartz and Ingo Steinwart, editors, Proceedings of the 26th Annual Conference on Learning Theory, volume 30 of Proceedings of Machine Learning Research, pages 55--76, Princeton, NJ, USA, 12--14 Jun 2013. PMLR

work page 2013
[45]

Samuel Robbins, Herbert E. (ed. Kotz and Norman L.) Johnson. An Empirical B ayes Approach to Statistics , pages 388--394. Springer New York, New York, NY, 1992. doi:10.1007/978-1-4612-0919-5_26

work page doi:10.1007/978-1-4612-0919-5_26 1992
[46]

More PAC-B ayes bounds: From bounded losses, to losses with general tail behaviors, to anytime validity

Borja Rodr \'i guez-G \'a lvez, Ragnar Thobaben, and Mikael Skoglund. More PAC-B ayes bounds: From bounded losses, to losses with general tail behaviors, to anytime validity. Journal of Machine Learning Research, 25 0 (110): 0 1--43, 2024

work page 2024
[47]

Generalization bound and learning methods for data-driven projections in linear programming

Shinsaku Sakaue and Taihei Oki. Generalization bound and learning methods for data-driven projections in linear programming. Advances in Neural Information Processing Systems, 37: 0 12825--12846, 2024

work page 2024
[48]

Understanding Machine Learning: From Theory to Algorithms

Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, USA, 2014. ISBN 1107057132

work page 2014
[49]

No internal regret with non-convex loss functions

Dravyansh Sharma. No internal regret with non-convex loss functions. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 14919--14927, 2024

work page 2024
[50]

Efficiently learning the graph for semi-supervised learning

Dravyansh Sharma and Maxwell Jones. Efficiently learning the graph for semi-supervised learning. In Uncertainty in Artificial Intelligence, pages 1900--1910. PMLR, 2023

work page 1900
[51]

Offline-to-online hyperparameter transfer for stochastic bandits

Dravyansh Sharma and Arun Suggala. Offline-to-online hyperparameter transfer for stochastic bandits. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 20362--20370, 2025

work page 2025
[52]

Empirical B ayes estimation: When does g -modeling beat f -modeling in theory (and in practice)?, 2024

Yandi Shen and Yihong Wu. Empirical B ayes estimation: When does g -modeling beat f -modeling in theory (and in practice)?, 2024

work page 2024
[53]

The lasso problem and uniqueness

Ryan J Tibshirani. The lasso problem and uniqueness. Electronic Journal of statistics, 7: 0 1456--1490, 2013

work page 2013
[54]

Solutions of ill-posed problems

Andrey Nikolayevich Tikhonov. Solutions of ill-posed problems. VH Winston and Sons, 1977

work page 1977
[55]

van Wieringen

Wessel N. van Wieringen. Lecture notes on ridge regression. arXiv preprint arXiv:2410.10572, 2023

work page arXiv 2023
[56]

High-dimensional probability: An introduction with applications in data science, volume 47

Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018

work page 2018
[57]

All of Statistics

Larry Wasserman. All of Statistics. Springer, New York, 2010

work page 2010
[58]

The convergence rates of empirical B ayes estimation in a multiple linear regression model

Laisheng Wei and Shunpu Zhang. The convergence rates of empirical B ayes estimation in a multiple linear regression model. Annals of the Institute of Statistical Mathematics, 47 0 (1): 0 81--97, January 1995. ISSN 0020-3157. doi:10.1007/BF00773413

work page doi:10.1007/bf00773413 1995
[59]

Lower bounds on the smallest eigenvalue of a sample covariance matrix

Pavel Yaskov. Lower bounds on the smallest eigenvalue of a sample covariance matrix. Electronic Communications in Probability, 19 0 (none): 0 1 -- 10, 2014. doi:10.1214/ECP.v19-3807

work page doi:10.1214/ecp.v19-3807 2014
[60]

The superiority of empirical B ayes estimator of parameters in linear model

Weiping Zhang, Laisheng Wei, and Yaning Yang. The superiority of empirical B ayes estimator of parameters in linear model. Statistics & Probability Letters, 72 0 (1): 0 43--50, 2005. ISSN 0167-7152. doi:https://doi.org/10.1016/j.spl.2004.12.001

work page doi:10.1016/j.spl.2004.12.001 2005

[1] [1]

Adversarial learning guarantees for linear hypotheses and neural networks

Pranjal Awasthi, Natalie Frank, and Mehryar Mohri. Adversarial learning guarantees for linear hypotheses and neural networks. In Hal Daumé III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 431--441. PMLR, 13--18 Jul 2020

work page 2020

[2] [2]

Data-Driven Algorithm Design (book chapter)

Maria-Florina Balcan. Data-Driven Algorithm Design (book chapter). In Beyond Worst-Case Analysis of Algorithms, Tim Roughgarden (Ed). Cambridge University Press , 2020

work page 2020

[3] [3]

Data driven semi-supervised learning

Maria-Florina Balcan and Dravyansh Sharma. Data driven semi-supervised learning. Advances in Neural Information Processing Systems (NeurIPS), 34: 0 14782--14794, 2021

work page 2021

[4] [4]

Learning accurate and interpretable decision trees

Maria-Florina Balcan and Dravyansh Sharma. Learning accurate and interpretable decision trees. In Uncertainty in Artificial Intelligence, pages 288--307. PMLR, 2024

work page 2024

[5] [5]

Learning-theoretic foundations of algorithm configuration for combinatorial partitioning problems

Maria-Florina Balcan, Vaishnavh Nagarajan, Ellen Vitercik, and Colin White. Learning-theoretic foundations of algorithm configuration for combinatorial partitioning problems. In Conference on Learning Theory, pages 213--274. PMLR, 2017

work page 2017

[6] [6]

Dispersion for data-driven algorithm design, online learning, and private optimization

Maria-Florina Balcan, Travis Dick, and Ellen Vitercik. Dispersion for data-driven algorithm design, online learning, and private optimization. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pages 603--614. IEEE, 2018 a

work page 2018

[7] [7]

Data-driven clustering via parameterized L loyd's families

Maria-Florina Balcan, Travis Dick, and Colin White. Data-driven clustering via parameterized L loyd's families. Advances in Neural Information Processing Systems (NeurIPS), 31, 2018 b

work page 2018

[8] [8]

A general theory of sample complexity for multi-item profit maximization

Maria-Florina Balcan, Tuomas Sandholm, and Ellen Vitercik. A general theory of sample complexity for multi-item profit maximization. In Proceedings of the 2018 ACM Conference on Economics and Computation, pages 173--174, 2018 c

work page 2018

[9] [9]

Learning to link

Maria-Florina Balcan, Travis Dick, and Manuel Lang. Learning to link. In International Conference on Learning Representation (ICLR), 2020 a

work page 2020

[10] [10]

Learning piecewise L ipschitz functions in changing environments

Maria-Florina Balcan, Travis Dick, and Dravyansh Sharma. Learning piecewise L ipschitz functions in changing environments. In International Conference on Artificial Intelligence and Statistics, pages 3567--3577. PMLR, 2020 b

work page 2020

[11] [11]

How much data is sufficient to learn high-performing algorithms? generalization guarantees for data-driven algorithm design

Maria-Florina Balcan, Dan DeBlasio, Travis Dick, Carl Kingsford, Tuomas Sandholm, and Ellen Vitercik. How much data is sufficient to learn high-performing algorithms? generalization guarantees for data-driven algorithm design. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 919--932, 2021 a

work page 2021

[12] [12]

Learning-to-learn non-convex piecewise- L ipschitz functions

Maria-Florina Balcan, Mikhail Khodak, Dravyansh Sharma, and Ameet Talwalkar. Learning-to-learn non-convex piecewise- L ipschitz functions. Advances in Neural Information Processing Systems, 34: 0 15056--15069, 2021 b

work page 2021

[13] [13]

Provably tuning the ElasticNet across instances

Maria-Florina Balcan, Misha Khodak, Dravyansh Sharma, and Ameet Talwalkar. Provably tuning the ElasticNet across instances. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 27769--27782. Curran Associates, Inc., 2022 a

work page 2022

[14] [14]

Structural analysis of branch-and-cut and the learnability of G omory mixed integer cuts

Maria-Florina Balcan, Siddharth Prasad, Tuomas Sandholm, and Ellen Vitercik. Structural analysis of branch-and-cut and the learnability of G omory mixed integer cuts. Advances in Neural Information Processing Systems (NeurIPS), 35: 0 33890--33903, 2022 b

work page 2022

[15] [15]

An analysis of robustness of non- L ipschitz networks

Maria-Florina Balcan, Avrim Blum, Dravyansh Sharma, and Hongyang Zhang. An analysis of robustness of non- L ipschitz networks. Journal of Machine Learning Research, 24 0 (98): 0 1--43, 2023 a

work page 2023

[16] [16]

New bounds for hyperparameter tuning of regression problems across instances

Maria-Florina Balcan, Anh Nguyen, and Dravyansh Sharma. New bounds for hyperparameter tuning of regression problems across instances. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 80066--80078. Curran Associates, Inc., 2023 b

work page 2023

[17] [17]

Generalization guarantees for multi-item profit maximization: Pricing, auctions, and randomized mechanisms

Maria-Florina Balcan, Tuomas Sandholm, and Ellen Vitercik. Generalization guarantees for multi-item profit maximization: Pricing, auctions, and randomized mechanisms. Operations Research, 73 0 (2): 0 648--663, 2023 c

work page 2023

[18] [18]

Learning to branch: Generalization guarantees and limits of data-independent discretization

Maria-Florina Balcan, Travis Dick, Tuomas Sandholm, and Ellen Vitercik. Learning to branch: Generalization guarantees and limits of data-independent discretization. Journal of the ACM (JACM), 2024 a

work page 2024

[19] [19]

Accelerating ERM for data-driven algorithm design using output-sensitive techniques

Maria-Florina Balcan, Christopher Seiler, and Dravyansh Sharma. Accelerating ERM for data-driven algorithm design using output-sensitive techniques. Advances in Neural Information Processing Systems, 37: 0 72648--72687, 2024 b

work page 2024

[20] [20]

Algorithm configuration for structured P faffian settings

Maria-Florina Balcan, Anh Tuan Nguyen, and Dravyansh Sharma. Algorithm configuration for structured P faffian settings. Transactions of Machine Learning Research (TMLR), 2025 a

work page 2025

[21] [21]

Sample complexity of data-driven tuning of model hyperparameters in neural networks with structured parameter-dependent dual function

Maria-Florina Balcan, Anh Tuan Nguyen, and Dravyansh Sharma. Sample complexity of data-driven tuning of model hyperparameters in neural networks with structured parameter-dependent dual function. arXiv preprint arXiv:2501.13734, 2025 b

work page arXiv 2025

[22] [22]

Local R ademacher complexities

Peter Bartlett, Olivier Bousquet, and Shahar Mendelson. Local R ademacher complexities. The Annals of Statistics, 33 0 (4), August 2005. ISSN 0090-5364. doi:10.1214/009053605000000282

work page doi:10.1214/009053605000000282 2005

[23] [23]

Generalization bounds for data-driven numerical linear algebra

Peter Bartlett, Piotr Indyk, and Tal Wagner. Generalization bounds for data-driven numerical linear algebra. In Conference on Learning Theory (COLT), pages 2013--2040. PMLR, 2022

work page 2013

[24] [24]

Learning complexity of simulated annealing

Avrim Blum, Chen Dan, and Saeed Seddighin. Learning complexity of simulated annealing. In International Conference on Artificial Intelligence and Statistics (AISTATS) , pages 1540--1548. PMLR, 2021

work page 2021

[25] [25]

Learning cut generating functions for integer programming

Hongyu Cheng and Amitabh Basu. Learning cut generating functions for integer programming. Advances in Neural Information Processing Systems, 37: 0 61455--61480, 2024

work page 2024

[26] [26]

Tuning algorithmic and architectural hyperparameters in graph-based semi-supervised learning with provable guarantees

Ally Yalei Du, Eric Huang, and Dravyansh Sharma. Tuning algorithmic and architectural hyperparameters in graph-based semi-supervised learning with provable guarantees. In The 41st Conference on Uncertainty in Artificial Intelligence, 2025

work page 2025

[27] [27]

Two Modeling Strategies for Empirical Bayes Estimation

Bradley Efron. Two Modeling Strategies for Empirical Bayes Estimation . Statistical Science, 29 0 (2): 0 285 -- 301, 2014. doi:10.1214/13-STS455

work page doi:10.1214/13-sts455 2014

[28] [28]

Recovery of exact sparse representations in the presence of bounded noise

Jean-Jacques Fuchs. Recovery of exact sparse representations in the presence of bounded noise. IEEE Transactions on Information Theory, 51 0 (10): 0 3601--3608, 2005

work page 2005

[29] [29]

A simulation study of some ridge estimators

Diane Galarneau Gibbons. A simulation study of some ridge estimators. Journal of the American Statistical Association, 76 0 (373): 0 131--139, 1981

work page 1981

[30] [30]

Generalized cross-validation as a method for choosing a good ridge parameter

Gene H Golub, Michael Heath, and Grace Wahba. Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics, 21 0 (2): 0 215--223, 1979

work page 1979

[31] [31]

The elements of statistical learning: data mining, inference, and prediction, volume 2

Trevor Hastie, Robert Tibshirani, Jerome H Friedman, and Jerome H Friedman. The elements of statistical learning: data mining, inference, and prediction, volume 2. Springer, 2009

work page 2009

[32] [32]

On the R ademacher series

Pawe Hitczenko and Stanis aw Kwapie \' n . On the R ademacher series. In J rgen Hoffmann-J rgensen, James Kuelbs, and Michael B. Marcus, editors, Probability in Banach Spaces, 9, pages 31--36, Boston, MA, 1994. Birkh \"a user Boston. ISBN 978-1-4612-0253-0

work page 1994

[33] [33]

Ridge regression: applications to nonorthogonal problems

Arthur E Hoerl and Robert W Kennard. Ridge regression: applications to nonorthogonal problems. Technometrics, 12 0 (1): 0 69--82, 1970

work page 1970

[34] [34]

Teh, and Yihong Wu

Soham Jana, Yury Polyanskiy, Anzo Z. Teh, and Yihong Wu. Empirical Bayes via ERM and Rademacher complexities: the P oisson model. In Gergely Neu and Lorenzo Rosasco, editors, Proceedings of Thirty Sixth Conference on Learning Theory, volume 195 of Proceedings of Machine Learning Research, pages 5199--5235. PMLR, 12--15 Jul 2023

work page 2023

[35] [35]

Sample complexity of posted pricing for a single item

Billy Jin, Thomas Kesselheim, Will Ma, and Sahil Singla. Sample complexity of posted pricing for a single item. In Advances in Neural Information Processing Systems, 2024

work page 2024

[36] [36]

Learning to relax: Setting solver parameters across a sequence of linear system instances

Mikhail Khodak, Edmond Chow, Maria-Florina Balcan, and Ameet Talwalkar. Learning to relax: Setting solver parameters across a sequence of linear system instances. The Twelfth International Conference on Learning Representations (ICLR), 2024

work page 2024

[37] [37]

A flexible empirical B ayes approach to multiple linear regression and connections with penalized regression

Youngseok Kim, Wei Wang, Peter Carbonetto, and Matthew Stephens. A flexible empirical B ayes approach to multiple linear regression and connections with penalized regression. Journal of Machine Learning Research, 25 0 (185): 0 1--59, 2024

work page 2024

[38] [38]

Concentration in unbounded metric spaces and algorithmic stability

Aryeh Kontorovich. Concentration in unbounded metric spaces and algorithmic stability. In Eric P. Xing and Tony Jebara, editors, Proceedings of the 31st International Conference on Machine Learning, volume 32 of Proceedings of Machine Learning Research, pages 28--36, Bejing, China, 22--24 Jun 2014. PMLR

work page 2014

[39] [39]

Learning with square loss: Localization through offset rademacher complexity

Tengyuan Liang, Alexander Rakhlin, and Karthik Sridharan. Learning with square loss: Localization through offset rademacher complexity. In Peter Grünwald, Elad Hazan, and Satyen Kale, editors, Proceedings of The 28th Conference on Learning Theory, volume 40 of Proceedings of Machine Learning Research, pages 1260--1285, Paris, France, 03--06 Jul 2015. PMLR

work page 2015

[40] [40]

Complexity analysis of the lasso regularization path

Julien Mairal and Bin Yu. Complexity analysis of the lasso regularization path. International Conference on Machine Learning, 2012

work page 2012

[41] [41]

The benefit of multitask representation learning

Andreas Maurer, Massimiliano Pontil, and Bernardino Romera-Paredes. The benefit of multitask representation learning. Journal of Machine Learning Research, 17 0 (81): 0 1--32, 2016

work page 2016

[42] [42]

Exact minimax risk for linear least squares, and the lower tail of sample covariance matrices

Jaouad Mourtada. Exact minimax risk for linear least squares, and the lower tail of sample covariance matrices . The Annals of Statistics, 50 0 (4): 0 2157 -- 2178, 2022. doi:10.1214/22-AOS2181

work page doi:10.1214/22-aos2181 2022

[43] [43]

The B ayesian lasso

Trevor Park and George Casella. The B ayesian lasso. Journal of the American Statistical Association, 103 0 (482): 0 681--686, 2008. doi:10.1198/016214508000000337

work page doi:10.1198/016214508000000337 2008

[44] [44]

Excess risk bounds for multitask learning with trace norm regularization

Massimiliano Pontil and Andreas Maurer. Excess risk bounds for multitask learning with trace norm regularization. In Shai Shalev-Shwartz and Ingo Steinwart, editors, Proceedings of the 26th Annual Conference on Learning Theory, volume 30 of Proceedings of Machine Learning Research, pages 55--76, Princeton, NJ, USA, 12--14 Jun 2013. PMLR

work page 2013

[45] [45]

Samuel Robbins, Herbert E. (ed. Kotz and Norman L.) Johnson. An Empirical B ayes Approach to Statistics , pages 388--394. Springer New York, New York, NY, 1992. doi:10.1007/978-1-4612-0919-5_26

work page doi:10.1007/978-1-4612-0919-5_26 1992

[46] [46]

More PAC-B ayes bounds: From bounded losses, to losses with general tail behaviors, to anytime validity

Borja Rodr \'i guez-G \'a lvez, Ragnar Thobaben, and Mikael Skoglund. More PAC-B ayes bounds: From bounded losses, to losses with general tail behaviors, to anytime validity. Journal of Machine Learning Research, 25 0 (110): 0 1--43, 2024

work page 2024

[47] [47]

Generalization bound and learning methods for data-driven projections in linear programming

Shinsaku Sakaue and Taihei Oki. Generalization bound and learning methods for data-driven projections in linear programming. Advances in Neural Information Processing Systems, 37: 0 12825--12846, 2024

work page 2024

[48] [48]

Understanding Machine Learning: From Theory to Algorithms

Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, USA, 2014. ISBN 1107057132

work page 2014

[49] [49]

No internal regret with non-convex loss functions

Dravyansh Sharma. No internal regret with non-convex loss functions. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 14919--14927, 2024

work page 2024

[50] [50]

Efficiently learning the graph for semi-supervised learning

Dravyansh Sharma and Maxwell Jones. Efficiently learning the graph for semi-supervised learning. In Uncertainty in Artificial Intelligence, pages 1900--1910. PMLR, 2023

work page 1900

[51] [51]

Offline-to-online hyperparameter transfer for stochastic bandits

Dravyansh Sharma and Arun Suggala. Offline-to-online hyperparameter transfer for stochastic bandits. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 20362--20370, 2025

work page 2025

[52] [52]

Empirical B ayes estimation: When does g -modeling beat f -modeling in theory (and in practice)?, 2024

Yandi Shen and Yihong Wu. Empirical B ayes estimation: When does g -modeling beat f -modeling in theory (and in practice)?, 2024

work page 2024

[53] [53]

The lasso problem and uniqueness

Ryan J Tibshirani. The lasso problem and uniqueness. Electronic Journal of statistics, 7: 0 1456--1490, 2013

work page 2013

[54] [54]

Solutions of ill-posed problems

Andrey Nikolayevich Tikhonov. Solutions of ill-posed problems. VH Winston and Sons, 1977

work page 1977

[55] [55]

van Wieringen

Wessel N. van Wieringen. Lecture notes on ridge regression. arXiv preprint arXiv:2410.10572, 2023

work page arXiv 2023

[56] [56]

High-dimensional probability: An introduction with applications in data science, volume 47

Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018

work page 2018

[57] [57]

All of Statistics

Larry Wasserman. All of Statistics. Springer, New York, 2010

work page 2010

[58] [58]

The convergence rates of empirical B ayes estimation in a multiple linear regression model

Laisheng Wei and Shunpu Zhang. The convergence rates of empirical B ayes estimation in a multiple linear regression model. Annals of the Institute of Statistical Mathematics, 47 0 (1): 0 81--97, January 1995. ISSN 0020-3157. doi:10.1007/BF00773413

work page doi:10.1007/bf00773413 1995

[59] [59]

Lower bounds on the smallest eigenvalue of a sample covariance matrix

Pavel Yaskov. Lower bounds on the smallest eigenvalue of a sample covariance matrix. Electronic Communications in Probability, 19 0 (none): 0 1 -- 10, 2014. doi:10.1214/ECP.v19-3807

work page doi:10.1214/ecp.v19-3807 2014

[60] [60]

The superiority of empirical B ayes estimator of parameters in linear model

Weiping Zhang, Laisheng Wei, and Yaning Yang. The superiority of empirical B ayes estimator of parameters in linear model. Statistics & Probability Letters, 72 0 (1): 0 43--50, 2005. ISSN 0167-7152. doi:https://doi.org/10.1016/j.spl.2004.12.001

work page doi:10.1016/j.spl.2004.12.001 2005