archive
Every paper Pith has read. Search by title, abstract, or pith.
2684 papers in stat.ML · page 12
-
Low precision triggers slingshot loss spikes via feature inflation
Grokking or Glitching? How Low-Precision Drives Slingshot Loss Spikes
-
Kernel embeddings define Gaussian mixtures for Hilbert space data
Gaussian mixture models in Hilbert spaces via kernel methods
-
TabCF turns tabular models into fast control function estimators
TabCF: Distributional Control Function Estimation with Tabular Foundation Models
-
Repeated splits fix winner's curse in LLM adaptive benchmarks
Towards Reliable LLM Evaluation: Correcting the Winner's Curse in Adaptive Benchmarking
-
This paper proves that for many kernels
Sharper Guarantees for Misspecified Kernelized Bandit Optimization
-
Derivatives define causal fairness for continuous attributes
Tuning Derivatives for Causal Fairness in Machine Learning
-
CITE certifies target answers as LLM response modes with anytime-valid guarantees
CITE: Anytime-Valid Statistical Inference in LLM Self-Consistency
-
Kernel copula embeddings detect causal dependence shifts
Detecting Changes in Causal Dependence with Kernels and Copulas
-
Ratio-based losses track relative errors via y over f(x)
Ratio-based Loss Functions
-
Kernel gradient flows match minimax uniform rates
Optimal Confidence Band for Kernel Gradient Flow Estimator
-
Transformers execute RL policy updates from context alone
Transformers Provably Implement In-Context Reinforcement Learning with Policy Improvement
-
Fourier features scale nonlinear causal discovery to mixed data
Fourier Feature Methods for Nonlinear Causal Discovery: FFML Scoring, TRFF Scoring, and FFCI Testing in Mixed Data
-
Fourier features scale GP scoring and CI tests for nonlinear causal discovery
Fourier Feature Methods for Nonlinear Causal Discovery: FFML Scoring, TRFF Scoring, and FFCI Testing in Mixed Data
-
Convex hulls give O(d/N) error for positive kernel quadrature
Convex-Geometric Error Bounds for Positive-Weight Kernel Quadrature
-
KAN spline removal degrades time-series forecasts
Temporal Functional Circuits: From Spline Plots to Faithful Explanations in KAN Forecasting
-
Early spectra forecast token efficiency in LLM training
Spectral Lens: Activation and Gradient Spectra as Diagnostics of LLM Optimization
-
vMF spherical flows generate categorical data from posterior alone
Spherical Flows for Sampling Categorical Data
-
Spherical vMF flow reduces categorical sampling to scalar ODE
Spherical Flows for Sampling Categorical Data
-
Residual from spectral analysis flags grokking transitions early
Distributional Spectral Diagnostics for Localizing Grokking Transitions
-
This paper develops a new algorithm for setting prices dynamically when customer demand…
Optimal Contextual Pricing under Agnostic Non-Lipschitz Demand
-
Neural score from backward PDE defines posterior SDE for sparse smoothing
Variational Smoothing and Inference for SDEs from Sparse Data with Dynamic Neural Flows
-
Pretrained transformer solves PU classification in one forward pass
In-Context Positive-Unlabeled Learning
-
Relaxed Cholesky method scales causal discovery to 10k variables
Relaxed Sparsest-Permutation Formulation for Causal Discovery at Scale
-
Symmetry-aware nets learn non-stationary GP kernels scalably
Permutation-preserving Functions and Neural Vecchia Covariance Kernels
-
Diffusion priors sharpen rain maps from microwave links
Bayesian Rain Field Reconstruction using Commercial Microwave Links and Diffusion Model Priors
-
Pathwise gradients optimize non-myopic feature acquisition
Non-Myopic Active Feature Acquisition via Pathwise Policy Gradients
-
Linear attribution methods share one canonical form
GRALIS: A Unified Canonical Framework for Linear Attribution Methods via Riesz Representation
-
Linear attribution methods share one canonical form
GRALIS: A Unified Canonical Framework for Linear Attribution Methods via Riesz Representation
-
Benign regularizer exposes hidden local convexity in nonconvex matrix estimation
Convexity in Disguise: A Theoretical Framework for Nonconvex Low-Rank Matrix Estimation
-
Gradient matching recovers hidden penalties in neural net training
Estimating Implicit Regularization in Deep Learning
-
Direct estimator gives finite-sample bounds for Schr odinger bridge drifts
Direct Estimation of Schr\"odinger Bridge Time-Series Drifts: Finite-Sample, Asymptotic, and Adaptive Guarantees
-
Adaptive elastic nets fix feature starvation in sparse autoencoders
Feature Starvation as Geometric Instability in Sparse Autoencoders
-
Linear memory reaches n capacity with listwise but only n/log n with top-1
Sharp Capacity Thresholds in Linear Associative Memory: From Winner-Take-All to Listwise Retrieval
-
Cumulant approximations estimate wide MLP outputs with fewer FLOPs
Estimating the expected output of wide random MLPs more efficiently than sampling
-
Wide random MLPs yield expected outputs without sampling
Estimating the expected output of wide random MLPs more efficiently than sampling
-
Drifting models match Wasserstein flow fixed points on KL
On the Wasserstein Gradient Flow Interpretation of Drifting Models
-
GMD targets fixed points of Wasserstein gradient flows
On the Wasserstein Gradient Flow Interpretation of Drifting Models
-
Distributional regret bounds unify bandits and episodic RL
Unified Framework of Distributional Regret in Multi-Armed Bandits and Reinforcement Learning
-
Bayesian view selection cuts scans needed for task-specific 3D models
A Bayesian Approach for Task-Specific Next-Best-View Selection with Uncertain Geometry
-
Decomposing coefficients by graph nodes yields stable doubly sparse regression
Proximal Projection for Doubly Sparse Regularized Models
-
High-dimensional statistics connects to optimization and random matrices
High-Dimensional Statistics: Reflections on Progress and Open Problems
-
Diffusion on incidence matrices generates better hypergraphs
Hypergraph Generation via Structured Stochastic Diffusion
-
MDL method finds spatial regions and their time series drivers
Scalable inference of spatial regions and temporal signatures from time series
-
Adaptivity advantages shift under ReLU realizability
Adaptivity Under Realizability Constraints: Comparing In-Context and Agentic Learning
-
Batch normalization refines local affine partitions during training
Training-Time Batch Normalization Reshapes Local Partition Geometry in Piecewise-Affine Networks
-
BN recenters hyperplanes to refine local partitions in networks
Training-Time Batch Normalization Reshapes Local Partition Geometry in Piecewise-Affine Networks
-
Directional energy in drift subspace bounds frozen predictor risk
Jacobian-Velocity Bounds for Deployment Risk Under Covariate Drift
-
Causal GRN methods beat correlations only in clean data
When Does Gene Regulatory Network Inference Break? A Controlled Diagnostic Study of Causal and Correlational Methods on Single-Cell Data
-
Regime score flips Bayesian optimization winners across budgets
Regime-Conditioned Evaluation in Multi-Context Bayesian Optimization
-
Symmetric attention diagnostics miss flow direction
Self-Attention as Transport: Limits of Symmetric Spectral Diagnostics