ℓ₂-Boosting exhibits benign overfitting with logarithmic excess variance decay Θ(σ²/log(p/n)) under isotropic noise due to ℓ₁ bias, and a subdifferential early stopping rule recovers minimax-optimal ℓ₁ rates.
Canonical reference
Title resolution pending
Canonical reference. 83% of citing Pith papers cite this work as background.
citation-role summary
citation-polarity summary
representative citing papers
The spectral weak-recovery threshold for linearized AMP in the multi-view spiked Wigner model is SNR(λ,B)=1, where SNR is the largest eigenvalue of Diag(√λ)(B⊙B)Diag(√λ), and this coincides with the information-theoretic threshold for a broad class of spike priors.
Proposes pointwise Riemannian Dimension from feature eigenvalues to derive tighter, representation-aware generalization bounds for deep networks in the nonlinear regime.
Nonconvex projected gradient descent for noisy inductive matrix completion achieves linear convergence and order-optimal error at sample complexity scaling with side-information dimension a instead of ambient dimension n.
TARCO corrects measurement-error-induced correlated contamination in tree-aggregated compositional regression via bias-corrected estimating equations, tree-aware PSD stabilization, and sparse regularization, with finite-sample bounds and sign consistency.
The paper establishes the first tilde O(epsilon^{-1}) upper bounds and matching lower bounds for forward-KL-regularized offline contextual bandits under single-policy concentrability in both tabular and general function approximation settings.
Optimistic bilevel optimization with manifold lower-level minimizers is differentiable if the optimistic selection is unique, yielding a pseudoinverse hyper-gradient and a convergent HG-MS algorithm whose rate depends on intrinsic manifold dimension.
ABGD parametrizes piecewise linear functions as difference of max-affine functions and converges linearly to an epsilon-accurate solution with O(d max(sigma/epsilon,1)^2) samples under sub-Gaussian noise, which is minimax optimal up to logs.
A modular reduction from budget-constrained contextual bandits with adversarial contexts to unconstrained bandits via surrogate rewards, yielding improved guarantees and an efficient algorithm based on SquareCB.
A stagewise greedy algorithm for semiparametric contextual dynamic pricing achieves regret T to the max of 1/2 and 3 over (2 beta plus 1) for linear m, with a matching lower bound proving optimality.
A Neyman-orthogonal estimator paired with Lasso nuisance estimation achieves root-T asymptotic normality for BLP demand parameters under high-dimensional controls and approximate sparsity.
Bayesian softmax-gated mixture-of-experts models achieve posterior contraction for density estimation and parameter recovery using Voronoi losses, plus two strategies for choosing the number of experts.
RAIC unifies uniform recovery of structured signals from nonlinear observations via PGD, yielding error rates comparable to nonuniform guarantees up to log factors in sparse and 1-bit settings.
A projection-based algorithm for COCO achieves O(log T) regret and O(log T) CCV for strongly convex losses and O(sqrt(T)) for convex losses by leveraging self-contracted curves.
Proposes Factor-Augmented SGD that runs on streaming high-dimensional data and supplies the first convergence analysis explicitly accounting for latent-factor estimation error.
Causal effects are identifiable for categorical unobserved confounders via mixture learning and tensor decomposition, yielding consistent estimators with non-asymptotic guarantees.
Near-linear time algorithm for robust regression under Gaussian covariates achieves O(sqrt(ε κ)) error with Õ(d/ε⁴) samples when ε κ ≲ 1, plus SQ and low-degree lower bounds.
A learning algorithm achieves tight Õ(√T) regret for profit maximization in bilateral trade against smooth adversaries, matching stochastic rates via continuity and algorithmic chaining.
Novel non-asymptotic uniform error bounds are derived for kernel regression under broad classes of non-Gaussian noise distributions that include correlated cases.
dFlowGRPO is a new rate-aware RL method for discrete flow models that outperforms prior GRPO approaches on image generation and matches continuous flow models while supporting broad probability paths.
Safety certification of dynamical systems is reformulated as direct classification via kernel embeddings on trajectories, bypassing recursive DP to avoid error compounding and support non-Markovian dynamics.
MoFI-FLR recovers active covariates and identifies their true functional forms (simple or complex) in high-dimensional functional linear regressions.
OGPO is a sample-efficient off-policy method for full finetuning of generative control policies that reaches SOTA on robotic manipulation tasks and can recover from poor behavior-cloning initializations without expert data.
A semi-supervised kernel two-sample test integrates unlabeled covariate data to achieve asymptotic normality under the null, higher power than standard kernel tests, and consistency against fixed and local alternatives.
citing papers explorer
-
When Does $\ell_2$-Boosting Overfit Benignly? High-Dimensional Risk Asymptotics and the $\ell_1$ Implicit Bias
ℓ₂-Boosting exhibits benign overfitting with logarithmic excess variance decay Θ(σ²/log(p/n)) under isotropic noise due to ℓ₁ bias, and a subdifferential early stopping rule recovers minimax-optimal ℓ₁ rates.
-
Sharp Spectral Thresholds for Multi-View Spiked Wigner Models
The spectral weak-recovery threshold for linearized AMP in the multi-view spiked Wigner model is SNR(λ,B)=1, where SNR is the largest eigenvalue of Diag(√λ)(B⊙B)Diag(√λ), and this coincides with the information-theoretic threshold for a broad class of spike priors.
-
Pointwise Generalization in Deep Neural Networks
Proposes pointwise Riemannian Dimension from feature eigenvalues to derive tighter, representation-aware generalization bounds for deep networks in the nonlinear regime.
-
Sample efficient inductive matrix completion with noise and inexact side information
Nonconvex projected gradient descent for noisy inductive matrix completion achieves linear convergence and order-optimal error at sample complexity scaling with side-information dimension a instead of ambient dimension n.
-
Tree-aggregated regression for compositional data with measurement errors
TARCO corrects measurement-error-induced correlated contamination in tree-aggregated compositional regression via bias-corrected estimating equations, tree-aware PSD stabilization, and sparse regularization, with finite-sample bounds and sign consistency.
-
Fast Rates for Offline Contextual Bandits with Forward-KL Regularization under Single-Policy Concentrability
The paper establishes the first tilde O(epsilon^{-1}) upper bounds and matching lower bounds for forward-KL-regularized offline contextual bandits under single-policy concentrability in both tabular and general function approximation settings.
-
Select-then-differentiate: Solving Bilevel Optimization with Manifold Lower-level Solution Sets
Optimistic bilevel optimization with manifold lower-level minimizers is differentiable if the optimistic selection is unique, yielding a pseudoinverse hyper-gradient and a convergent HG-MS algorithm whose rate depends on intrinsic manifold dimension.
-
Locally Near Optimal Piecewise Linear Regression in High Dimensions via Difference of Max-Affine Functions
ABGD parametrizes piecewise linear functions as difference of max-affine functions and converges linearly to an epsilon-accurate solution with O(d max(sigma/epsilon,1)^2) samples under sub-Gaussian noise, which is minimax optimal up to logs.
-
Constrained Contextual Bandits with Adversarial Contexts
A modular reduction from budget-constrained contextual bandits with adversarial contexts to unconstrained bandits via surrogate rewards, yielding improved guarantees and an efficient algorithm based on SquareCB.
-
Optimal Semiparametric Dynamic Pricing with Feature Diversity
A stagewise greedy algorithm for semiparametric contextual dynamic pricing achieves regret T to the max of 1/2 and 3 over (2 beta plus 1) for linear m, with a matching lower bound proving optimality.
-
Estimation of BLP models with high-dimensional controls
A Neyman-orthogonal estimator paired with Lasso nuisance estimation achieves root-T asymptotic normality for BLP demand parameters under high-dimensional controls and approximate sparsity.
-
On Bayesian Softmax-Gated Mixture-of-Experts Models
Bayesian softmax-gated mixture-of-experts models achieve posterior contraction for density estimation and parameter recovery using Voronoi losses, plus two strategies for choosing the number of experts.
-
Robust Uniform Recovery of Structured Signals from Nonlinear Observations
RAIC unifies uniform recovery of structured signals from nonlinear observations via PGD, yielding error rates comparable to nonuniform guarantees up to log factors in sparse and 1-bit settings.
-
Improved Guarantees for Constrained Online Convex Optimization via Self-Contraction
A projection-based algorithm for COCO achieves O(log T) regret and O(log T) CCV for strongly convex losses and O(sqrt(T)) for convex losses by leveraging self-contracted curves.
-
Factor Augmented High-Dimensional SGD
Proposes Factor-Augmented SGD that runs on streaming high-dimensional data and supplies the first convergence analysis explicitly accounting for latent-factor estimation error.
-
Causal Inference with Categorical Unobserved Confounder via Mixture Learning
Causal effects are identifiable for categorical unobserved confounders via mixture learning and tensor decomposition, yielding consistent estimators with non-asymptotic guarantees.
-
On efficient robust regression with subquadratic samples
Near-linear time algorithm for robust regression under Gaussian covariates achieves O(sqrt(ε κ)) error with Õ(d/ε⁴) samples when ε κ ≲ 1, plus SQ and low-degree lower bounds.
-
Profit Maximization in Bilateral Trade against a Smooth Adversary
A learning algorithm achieves tight Õ(√T) regret for profit maximization in bilateral trade against smooth adversaries, matching stochastic rates via continuity and algorithmic chaining.
-
On Uniform Error Bounds for Kernel Regression under Non-Gaussian Noise
Novel non-asymptotic uniform error bounds are derived for kernel regression under broad classes of non-Gaussian noise distributions that include correlated cases.
-
dFlowGRPO: Rate-Aware Policy Optimization for Discrete Flow Models
dFlowGRPO is a new rate-aware RL method for discrete flow models that outperforms prior GRPO approaches on image generation and matches continuous flow models while supporting broad probability paths.
-
Safety Certification is Classification
Safety certification of dynamical systems is reformulated as direct classification via kernel embeddings on trajectories, bypassing recursive DP to avoid error compounding and support non-Markovian dynamics.
-
Model Form Identification in High-Dimensional Functional Linear Regressions
MoFI-FLR recovers active covariates and identifies their true functional forms (simple or complex) in high-dimensional functional linear regressions.
-
OGPO: Sample Efficient Full-Finetuning of Generative Control Policies
OGPO is a sample-efficient off-policy method for full finetuning of generative control policies that reaches SOTA on robotic manipulation tasks and can recover from poor behavior-cloning initializations without expert data.
-
A Semi-Supervised Kernel Two-Sample Test
A semi-supervised kernel two-sample test integrates unlabeled covariate data to achieve asymptotic normality under the null, higher power than standard kernel tests, and consistency against fixed and local alternatives.
-
Stable Localized Conformal Prediction via Transduction
StCP leverages transfer learning to stabilize the size of conformal prediction sets without additional target labels.
-
Transfer Learning for Degree-Corrected Mixed Membership Network Models
Transfer learning from informative source networks improves target DCMM estimation accuracy by enlarging the eigenvalue gap of the connection probability matrix, with algorithms to avoid negative transfer.
-
Distributional Off-Policy Evaluation with Deep Quantile Process Regression
DQPOPE estimates the entire return distribution in off-policy evaluation via deep quantile process regression, providing statistical advantages over standard single-value methods with equivalent sample sizes.
-
PAC-Bayes Bounds for Gibbs Posteriors via Singular Learning Theory
PAC-Bayes bounds for Gibbs posteriors are obtained via singular learning theory, producing explicit and tighter posterior-averaged risk bounds that adapt to data structure in overparameterized models.
-
Mad Props: Parallelism in Markov Chain Monte Carlo Through the Lens of the Infinite Proposal Limit
Theoretical analysis of multiproposal MCMC in the infinite proposal limit using involutive theory yields new methods and inter-method relationships.
-
Layer by Layer: Uncovering Hidden Representations in Language Models
Intermediate layers in LLMs consistently provide stronger features than final layers across tasks and architectures, as quantified by a new framework of information-theoretic, geometric, and invariance metrics.
-
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
RAFT aligns generative models by ranking samples with a reward model and fine-tuning only on the top-ranked outputs, reporting gains on reward scores and automated metrics for LLMs and diffusion models.
-
Pricing, Matching, and Bundling: an Equilibrium Analysis of Online Platforms
Pricing, matching, and bundling act as complementary levers that platforms can adjust to balance their own profitability against overall market welfare in equilibrium.
- A Unified Theory of Conditional Coverage in Conformal Prediction with Applications
- Statistical Consistency and Generalization of Contrastive Representation Learning