archive
Every paper Pith has read. Search by title, abstract, or pith.
2684 papers in stat.ML · page 4
-
Riesz basis yields closed-form ANOVA for dependent inputs
Generalized Functional ANOVA in Closed-Form: A Unified View of Additive Explanations
-
Volterra signature computed in quadratic or better time
Computational aspects of the Volterra Signature
-
Quantile error in heavy-tailed projections splits into three parts
On Stability and Decomposition of Sample Quantiles under Heavy-Tailed Distributions
-
Heavy-tailed quantiles split into direction shift
On Stability and Decomposition of Sample Quantiles under Heavy-Tailed Distributions
-
Generalized posteriors fix overconfidence in misspecified network models
Bayesian Latent Space Models for Graphs Are Misspecified: Toward Robust Inference via Generalized Posteriors
-
RAE v2 reaches SOTA gFID 1.06 in 80 epochs on ImageNet
Improved Baselines with Representation Autoencoders
-
Attention learns PCA eigenvectors from Gaussian data
Attention-based PCA
-
Dictionary of spectral operators approximates dynamical systems manifold
Geometric Dictionary Learning of Dynamical Systems with Optimal Transport
-
Learned noising speeds discrete diffusion sampling
Forward-Learned Discrete Diffusion: Learning how to noise to denoise faster
-
Ridge regularization distorts feature-learning networks at vanishing strength
Canonical Regularisation of Wide Feature-Learning Neural Networks
4 Piths -
Ringmaster LMO recovers optimal async time complexity for LMO
Ringmaster LMO: Asynchronous Linear Minimization Oracle Momentum Method
1 Piths -
Symmetry-respecting updates beat AdamW in LLM pretraining
Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers
-
DDPMs reach optimal Wasserstein bounds in any dimension
Wasserstein bounds for denoising diffusion probabilistic models via the F\"ollmer process
-
-
Föllmer process sets DDPM sampler parameters naturally
A note on connections between the F\"ollmer process and the denoising diffusion probabilistic model
-
Frequency extraction recovers hidden generalization at 80% noise
Unveiling Memorization-Generalization Coexistence: A Case Study on Arithmetic Tasks with Label Noise
-
Neural nets learn densities from empirical characteristic functions
A data-driven Fourier-mixture neural-network method for density estimation
-
Deep ensembles with recalibrated Gaussian negative log-likelihood loss deliver stronger…
Uncertainty Reliability Under Domain Shift: An Investigation for Data-Driven Blood Pressure Estimation in Photoplethysmography
-
Cost-sensitive regression scales decision-focused learning
Scalable Decision-Focused Learning through Cost-Sensitive Regression
-
Newton method on Wasserstein space escapes saddles to global minima
From Saddle Points Toward Global Minima: A Newton-Type Method on Wasserstein Space
-
Mirrored unlearning boosts data attribution in diffusion models
Training data attribution in diffusion models via mirrored unlearning and noise-consistent skew
-
C-SymmPI achieves near-conditional coverage for symmetric structured data
Conditional Predictive Inference for General Structured Data with Group Symmetries
-
Girsanov weights enable unbiased resampling for diffusion models
Simple Approximation and Derivative Free Inference-Time Scaling for Diffusion Models via Sequential Monte Carlo on Path Measures
-
f-divergence drifts share a universal velocity form
A Unified Framework for Data-Free One-Step Sampling via Wasserstein Gradient Flows
-
s-step self-distillation optimizes shrinkage for s-spike covariances
Self-Distillation is Optimal Among Spectral Shrinkage Estimators in Spiked Covariance Models
-
Two GD steps yield floor(α2/(0.5-α1)) learned directions
Feature Learning in Linear-Width Two-Layer Networks: Two vs. One Step of Gradient Descent
-
Two GD steps produce multiple outliers in linear-width weights
Feature Learning in Linear-Width Two-Layer Networks: Two vs. One Step of Gradient Descent
-
Test compares two categorical Gini correlations for predictor importance
Comparing Two Categorical Gini Correlations with Applications to Classification Problems
-
New measure exactly bounds full swap regret and tests from small samples
Testable and Actionable Calibration for Full Swap Regret
-
Statistical analysis designs better quantizers for deep nets
StatQAT: Statistical Quantizer Optimization for Deep Networks
-
Large gradient step yields target-spiked features and adaptive kernel
How does feature learning reshape the function space?
-
Online method gives coverage bounds for panel data forecasts
Online Conformal Prediction for Non-Exchangeable Panel Data
-
Averaged Q-learning iterates converge to Gaussian at n^{-1/4} rate
On Gaussian approximation for entropy-regularized Q-learning with function approximation
-
Gradient flow reaches global minima for infinite-depth transformers
Training Infinitely Deep and Wide Transformers
4 Piths -
Kernel optimization controls FDR across structured hypotheses
Controlling False Discovery in Arbitrarily Structured Hypothesis Spaces via Reproducing Kernels
-
Physical properties guide Bayesian selection of spectral peaks
Integrating Bayesian Spectral Deconvolution and Expert Scientific Reasoning for Robust Peak Estimation
-
Bregman framework gives U-calibration for Tsallis losses
Calibeating for general proper losses: A Bregman divergence approach
-
Regret-optimal algorithms for position-aware MNL bandits
Learning in Position-Aware Multinomial Logit Bandits: From Multiplicative to General Position Effects
-
Adjoint equations remove S-dependence from discrete diffusion convergence
Dimension-Free Convergence of Discrete Diffusion Models: Adjoint Equations Induce the Right Space
-
Noisy matrix completion cuts samples to side info dimension
Sample-efficient inductive matrix completion with noise and inexact side-information
-
Nonlinear heads escape collapse by generating negative curvature
The Geometry of Projection Heads: Conditioning, Invariance, and Collapse
-
SGD on diagonal linear networks converges exponentially to zero risk
High-dimensional Limit of SGD for Diagonal Linear Networks
-
Spectral sparsification keeps MTP2 graphs accurate while making them sparse
Learning Gaussian Graphical Models under Total Positivity via Spectral Graph Sparsification
-
Multi-task estimator achieves optimal rates with weaker assumptions
Multi-task Linear Regression without Eigenvalue Lower Bounds: Adaptivity, Robustness, and Safety
-
Fairness layer guarantees output parity in neural networks
Differentiable Optimization Layers for Guaranteed Fairness in Deep Learning
-
New network learns SPDE solutions and their uncertainty from noisy data
Diffusion-Based Stochastic Operator Networks for Uncertainty Quantification in Stochastic Partial Differential Equations
-
Anchored transport forecasts next distributions from causal context
CAST: Causal Anchored Simplex Transport for Distribution-Valued Time Series
-
SGD needs over N^3 log^2 N steps for phase-only classification
A Fourier perspective on the learning dynamics of neural networks: from sample complexities to mechanistic insights
-
Intensity model generates hypergraphs with fidelity and novelty
HYVINT: Intensity-Driven Hypergraph Generation with Variational Representations
-
Stable-blanket predictors match or beat causal parents after interventions
Prediction-Intervention Games and Invariant Sets