archive
Every paper Pith has read. Search by title, abstract, or pith.
2684 papers in stat.ML · page 2
-
MMD-balls as credal sets bound worst-case risk in test-time adaptation
MMD-Balls as Credal Sets: A PAC-Bayesian Framework for Epistemic Uncertainty in Test-Time Adaptation
-
Only full-domain utilities make OCE risk measures PAC-learnable in RL
On the Sample Complexity of Discounted Reinforcement Learning with Optimized Certainty Equivalents
-
Support-aware method certifies ad reserve policies from logs
Support-aware offline policy selection for advertising marketplaces
-
Representation Gap is governed by task intrinsic dimension
Representation Gap: Explaining the Unreasonable Effectiveness of Neural Networks from a Geometric Perspective
-
Dropout creates two scaling-law classes by activation type
Dropout Universality: Scaling Laws and Optimal Scheduling at the Edge-of-Chaos
-
Conformal sets identify root-cause stream with finite-sample coverage
Distribution-free root cause analysis
-
Amortized noise sampling cuts diffusion teacher variance 10x
Variance Reduction for Expectations with Diffusion Teachers
-
Amortized resampling yields 2-3x compute gains for diffusion teachers
Variance Reduction for Expectations with Diffusion Teachers
-
Embedding learning rate boost replicates muP transfer
Quantifying Hyperparameter Transfer and the Importance of Embedding Layer Learning Rate
-
Per-cell dispersion cuts tail forecast error 12.5 percent
Neural Negative Binomial Regression for Weekly Seismicity Forecasting: Per-Cell Dispersion Estimation and Tail Risk Assessment
-
Models converge without recovering main latent factors
Memorisation, convergence and generalisation in generative models
-
Transport maps to PDE measures are Hölder continuous
On the Regularity and Generalization of One-Step Wasserstein-guided Generative Models for PDE-Induced Measures
-
L2 over Wasserstein gives random measures Riemannian geometry
$L^2$ over Wasserstein: Statistical Analysis for Optimal Transport
-
Debiasing fixes bias in bilevel hypergradients
Semiparametric Efficient Bilevel Gradient Estimation
-
Large learning rates alter transformer attractors to cycles and chaos
Large-Step Training Dynamics of a Two-Factor Linear Transformer Model
-
Wasserstein bounds set tuning rules for annealed Langevin in SBI
Theoretical guidelines for annealed Langevin dynamics in compositional simulation-based inference
-
Decomposition recovers shared LoRA subspace across clients
Federated LoRA Fine-Tuning for LLMs via Collaborative Alignment
-
Adaptive batch scaling unlocks large-batch RL
Scalable Reinforcement Learning via Adaptive Batch Scaling
-
Gradient similarities unify measures of model complexity
A Rigorous, Tractable Measure of Model Complexity
-
Projection algorithm reduces constraint violations to O(log T)
Improved Guarantees for Constrained Online Convex Optimization via Self-Contraction
-
Expectation consistency suffices for calibration under covariate shift
Expectation Consistency Loss: Rethink Confidence Calibration under Covariate Shift
-
Vector quantization builds local calibration maps for multiclass models
Divide et Calibra: Multiclass Local Calibration via Vector Quantization
-
Diffusion link lets GPs condition on text or physics
Conditioning Gaussian Processes on Almost Anything
-
Local boundary finds valid adjustment sets for causal effects
Local Covariate Selection for Average Causal Effect Estimation without Pretreatment and Causal Sufficiency Assumptions
-
SA error tails range from sub-Gaussian to near-Pareto with Markov noise
Concentration of General Stochastic Approximation Under Heavy-Tailed Markovian Noise
-
Frequency regularization lifts attack transfer to closed MLLMs
Frequency-Domain Regularized Adversarial Alignment for Transferable Attacks against Closed-Source MLLMs
-
LOSCAR-SGD overlaps local steps with sparse delayed updates
LOSCAR-SGD: Local SGD with Communication-Computation Overlap and Delay-Corrected Sparse Model Averaging
-
Bias correction cuts pretraining loss in AdamW and similar optimizers
Correcting Stochastic Update Bias in Preconditioned Language Model Optimizers
-
Conformal tests bound false discoveries for every possible threshold
Everywhere Valid Bounds on False Discovery Proportions in Conformal Inference
-
Decision path flips raise random forest accuracy
Decision-Path Patterns as Tree Reliability Signals: Path-based Adaptive Weighting for Random Forest Classification
-
Decision-path flips yield unbiased per-sample weights for random forests
Decision-Path Patterns as Tree Reliability Signals: Path-based Adaptive Weighting for Random Forest Classification
-
Agreement screening yields clearer text features at full accuracy
Interpretable Discriminative Text Representations via Agreement and Label Disentanglement
-
Localization method builds Transformers from local kernels
The General Theory of Localization Methods
-
CDF inversion fixes uneven Pareto front sampling
SURF: Steering the Scalarization Weight to Uniformly Traverse the Pareto Front
-
Unlearning by shifting erased points to retained semantic neighbors
Approximate Machine Unlearning through Manifold Representation Forgetting Guided by Self Mode Connectivity
-
Adaptive kernels and LOOCV improve RBF KAN models
Adaptive RBF-KAN: A Comparative Evaluation of Dynamic Shape Parameters in Kolmogorov-Arnold Networks
-
Overlapping nuclear norms recover subgroup low-rank geometry
Group-Aware Matrix Estimation and Latent Subspace Recovery
-
Bandits learn smooth graph payoffs scaling only with effective dimension
Spectral bandits for smooth graph functions with applications in recommender systems
-
Learn image-space generators matching latent-process marginals
Latent Process Generator Matching
-
Transfer learning reaches O(m^(-(α+1)/d)) rate for d>3
Sample Complexity of Transfer Learning: An Optimal Transport Approach
-
Geometric axioms explain neural network mechanisms
Axiomatizing Neural Networks via Pursuit of Subspaces
-
Neurons encode exact Maxwell solutions for fast sparse field reconstruction
Fast Reconstruction of Exact Maxwell Dynamics from Sparse Data
-
Min-gate fuses diffusion models to catch all four OOD shifts
Tippett-minimum Fusion of Representation-space Diffusion Models for Multi-Encoder Out-of-Distribution Detection
-
Classifier uncertainty narrows conformal intervals by 39% for confident cases
CASCADE Conformal Prediction: Uncertainty-Adaptive Prediction Intervals for Two-Stage Clinical Decision Support
-
Contradiction graph decides VC dimension threshold for any m
Contradiction Graphs Determine VC Dimension
5 Piths -
Negative random effects group shows 400x larger causal effects
Understanding Deterioration Random Effects for Causal Discovery in Infrastructure Management
-
Scoring functions recover causal graphs with latent variables
Score-Based Causal Discovery of Latent Variable Causal Models
-
Symmetrized cross-entropy produces unique convex multi-class unhinged loss
Symmetrization of Loss Functions for Robust Training of Neural Networks in the Presence of Noisy Labels
-
Importance sampling corrects ILA to recover true posteriors
Corrected Integrated Laplace Approximation for Bayesian Inference in Latent Gaussian Models
-
Post-hoc calibration sharpens GP lower tails for optimization
Goal-Oriented Lower-Tail Calibration of Gaussian Processes for Bayesian Optimization