archive
Every paper Pith has read. Search by title, abstract, or pith.
2684 papers in stat.ML · page 1
-
SHK flow perturbations give dimension-free DP bounds
On the Stability of Spherical Hellinger-Kantorovich Flows and Their Implications for Differential Privacy
-
Damped looping of transformer blocks lifts accuracy on frozen models
Training-Free Looped Transformers
-
Muon dynamics dissipate Hamiltonian energy monotonically
Move on Muon : A Hamiltonian probability gradient flow perspective of Muon optimizer
-
The paper derives entrywise error bounds for spectral ranking in the Bradley-Terry-Luce…
Entrywise Error Bounds for Spectral Ranking with Semi-Random Adversaries
-
Derivative bound yields linear sampling for regularized classification
Optimal Dimension-Free Sampling for Regularized Classification
-
Preference feedback yields sublinear regret in kernel MDPs
Learning Kernel-Based MDPs from Episodic Preferential Feedback
-
Dirichlet model inside MC Dropout improves uncertainty calibration
Dirichlet-Based Monte Carlo Dropout for Uncertainty Estimation in Neural Networks
-
Sparse activations split scaling laws into two exponents
Asymmetric Scaling Laws from Sparse Features
-
Joint noise and DAG estimation handles varying variances
Concomitant DAG Learning: On the Roles of Noise Adaptivity, Sparsity, and Non-negativity
-
Adaptive allocation matches oracle rate for multi-judge LLM scoring
Instance-Optimal Estimation with Multiple LLM Judges on a Budget
-
Next-token prediction works only if text prefixes suffice for latent context
When Is Next-Token Prediction Useful? Marginalization, Ergodicity, Mixture Identifiability, Local Sufficiency, RAG, Tools, and Programming
-
Joint training avoids error inheritance from weak privileged data
Coupled Training with Privileged Information and Unlabeled Data
-
Symmetric noise lifts AlpacaEval scores from 65% to 69% in fine-tuning
Understanding and Improving Noisy Embedding Techniques in Instruction Finetuning
-
Limit space makes any-size input models universal
Any-Dimensional Invariant Universality
-
Lifted operators turn hybrid models into convex kernel mixtures
Convex Hybrid Modeling: An Operator-Based Approach
-
Gradient descent recovers true similarity metric from triplets
Operationalizing Individual Fairness via Gradient Descent and Bradley-Terry Models
-
Gen-ROTDA adapts bike-sharing demand models across years by anchoring on few target labels
Robust OT-Guided Generative Residual Domain Adaptation for Bike-Sharing Demand Prediction under Temporal Domain Shift
-
LLM Sparsity Prior lets spike-and-slab models ignore bad LLM weights
LLM Sparsity Prior for Robust Feature Selection
-
Mass-orthogonality penalty yields consistent mode shapes from sparse data
Mode-Shape Expansion Using Physics-Constrained Gaussian Process Regression
-
KAN estimator converges independent of covariate dimension
KAPLAN: Kolmogorov-Arnold Prognostic Learnable Activation Networks for Survival Analysis
-
One config matches tuned AdamW across 1-8x horizons on LLMs
Anytime Training with Schedule-Free Spectral Optimization
-
Hawkes process lifts late alignment in news text simulations
HawkesLLM: Semantic Uncertainty Propagation in Agentic Text Simulation
-
Bayesian models match frequentist SHD classification with better uncertainty
Uncertainty-aware classification and triage of structural heart disease using electrocardiography and echocardiography metrics
-
Diffusion denoising score matching keeps bounds stable as modes separate
Diffusion-based Denoising Beats Vanilla Score Matching in Parameter Estimation: A Theoretical Explanation
-
Entropy regularization needs non-degenerate information forces to work
Human-Centered Learning Mechanics: A Dynamical Framework for Entropy-Regulated Representation Learning
-
-
Kernel density gradients yield conservative drifting at rate N^{-1/(d+4)}
Finite-Particle Convergence Rates for Conservative and Non-Conservative Drifting Models
-
Diffusion model generates continuous survival times from censored data
SDPM: Survival Diffusion Probabilistic Model for Continuous-Time Survival Analysis
-
Leave-one-out predictor fixes uniform diffusion mismatch
Uniform Diffusion Models Revisited: Leave-One-Out Denoiser and Absorbing State Reformulation
-
Plug-in losses approximate EDL objectives with decaying error
Plug-in Losses for Evidential Deep Learning: A Simplified Framework for Uncertainty Estimation that Includes the Softmax Classifier
-
Proxy method sets new accuracy standard for Shapley interactions
Proxy-Based Approximation of Shapley and Banzhaf Interactions
-
ProxySHAP lowers error in Shapley interaction estimates
Proxy-Based Approximation of Shapley and Banzhaf Interactions
-
Multi-task operator learning matches single-task rates
Multiple Neural Operators Achieve Near-Optimal Rates for Multi-Task Learning
-
Hyperfitting expands final LLM layer to promote rare tokens
Beyond Temperature: Hyperfitting as a Late-Stage Geometric Expansion
-
Martingale kernel tests replace permutations with normal quantiles
A Martingale Kernel Independence Test
-
Value functions create straight paths for generative transport
Generative Modeling by Value-Driven Transport
-
Algorithms achieve optimal bidding rates despite feedback shilling
Do Not Trust The Auctioneer: Learning to Bid in Feedback-Manipulated Auctions
-
Description length post-selection lifts GP regression accuracy
Guiding Multi-Objective Genetic Programming with Description Length Improves Symbolic Regression Solutions
-
Selective neuron fusion trades ensemble accuracy for lower cost
Partial Fusion of Neural Networks: Efficient Tradeoffs Between Ensembles and Weight Aggregation
-
Regular graphs make ASE and LSE subspaces identical
The ASE-LSE Disagreement Landscape: An End-to-End Characterisation of Extremes and Structural Drivers
-
GPU batches cut optimal sparse GLM search time by 10-100 times
From Sequential Nodes to GPU Batches: Parallel Branch and Bound for Optimal $k$-Sparse GLMs
-
Betting wealth bound yields empirical Bernstein LIL
From Betting to Empirical Bernstein LIL
-
Physics-informed model recovers aerodynamic loads from noisy bridge data
Aerodynamic force reconstruction using physics-informed Gaussian processes
-
Finite networks track mean-field limit uniformly in time
Uniform-in-Time Weak Propagation-of-Chaos in Shallow Neural Networks
-
Optimal mean estimators must have sensitivity Omega(eta + sqrt(eta d/n))
Robust Statistical Estimators with Bounded Empirical Sensitivity
-
Equal-variance structural VARs identified only up to orthogonal transforms and scale
Causal Discovery in Structural VAR Models Under Equal Noise Variance
-
Symbolic search recovers exact discrete distribution formulas
Symbolic Density Estimation for Discrete Distributions
-
Truncation makes neural likelihood work for long state sequences
Truncated Neural Likelihood Estimation for Simulation-Based Inference in State-Space Models
-
KL divergence to GPs splits into three costs for neural processes
Three Costs of Amortizing Gaussian Process Inference with Neural Processes
-
Estimator gives valid vaccine effectiveness from TND data with gaps
Targeted maximum likelihood estimation of vaccine effectiveness and immune correlates in test-negative design studies with missing data