archive
Every paper Pith has read. Search by title, abstract, or pith.
2684 papers in stat.ML · page 9
-
Fast sketching accelerates power method for low-rank approximations
Accelerating Power Method with Fast Sketching for Stronger Low-Rank Approximation
-
Hybrid booster adds linear terms to trees for macro forecasts
LGB+: A Macroeconomic Forecasting Road Test
-
Normalizing flows recover fast equilibrium from slow data alone
Learning stochastic multiscale models through normalizing flows
-
Risk-adjusted metrics favor professional forecasters
Quantifying the Risk-Return Tradeoff in Forecasting
-
Metropolis-Hastings steps fix discretization bias in diffusion correctors
Metropolis-Adjusted Diffusion Models
-
Matching bounds set exact mu threshold for submatrix detection
Minimax optimal submatrix detection: Sharp non-asymptotic rates
-
Exact signal thresholds derived for submatrix detection
Minimax optimal submatrix detection: Sharp non-asymptotic rates
-
Power law model splits Muon and SignSGD into three phases
Phases of Muon: When Muon Eclipses SignSGD
-
History-space neural operator halves rollout error for memory PDEs
HS-FNO: History-Space Fourier Neural Operator for Non-Markovian Partial Differential Equations
-
History-aware operator halves rollout error for memory PDEs
HS-FNO: History-Space Fourier Neural Operator for Non-Markovian Partial Differential Equations
-
Empirical Bayes shrinkage completes 1-bit matrices with balanced accuracy and calibration
Empirical Bayes 1-bit matrix completion
-
Dataset pairs 1700 vision-model embeddings with training metadata
SEMASIA: A Large-Scale Dataset of Semantically Structured Latent Representations
-
Bridge functions identify path-specific effects with hidden confounders
Proximal Path-Specific Inference
-
Mean-field SVGD converges in L2 at explicit polynomial rates
Quantitative Local Convergence of Mean-Field Stein Variational Gradient Flow
-
Single-index bandits admit optimal regret of order T to the two-thirds
Optimal Regret for Single Index Bandits
-
Inputs recovered to match any target output distribution
Inverse Design for Conditional Distribution Matching
-
Gravity decoder lifts link prediction in directed graphs
GravityGraphSAGE: Link Prediction in Directed Attributed Graphs
-
Feature selection tolerates noise and weak symmetry
Universal Feature Selection with Noisy Observations and Weak Symmetry Conditions
-
Shared parametric value function scales RL measurement to large tasks
Reinforcement Learning Measurement Model
-
Permutation routing across model copies improves generalization
Improving Generalization by Permutation Routing Across Model Copies
-
Optimal coefficient stabilizes MeanFlow training
On Variance Reduction in Learning Mean Flows
-
Geometry-aware VAE outperforms baselines on skeleton trajectories
An Elastic Shape Variational Autoencoder for Skeleton Pose Trajectories
-
Shape geometry VAE outperforms standard models on skeleton sequences
An Elastic Shape Variational Autoencoder for Skeleton Pose Trajectories
-
Elastic shape VAE beats standard models on skeletal trajectories
An Elastic Shape Variational Autoencoder for Skeleton Pose Trajectories
-
Forward KL regularization yields first fast rates for offline contextual bandits
Fast Rates for Offline Contextual Bandits with Forward-KL Regularization under Single-Policy Concentrability
-
Unsigned CATE estimates power randomization tests without splitting data
Fit CATE Once: Model-Assisted Randomization Tests Without Sample Splitting
-
Sub-network Laplace approximations underestimate predictive variance
Optimality of Sub-network Laplace Approximations: New Results and Methods
-
Muon fails to converge on convex Lipschitz functions
Muon Does Not Converge on Convex Lipschitz Functions
-
Nine-step guideline corrects bias in ML on health surveys
Survey-aware Machine Learning: A Guideline for Valid Population Health Inference based on Scoping Review
-
Popular Wikipedia pages show weaker periodicity than rare ones
TailedTS: Benchmark Dataset for Heavy-Tailed Time Series Prediction and Periodicity Quantification
-
Soft penalties make stochastic paths match observed marginals
Learning Generative Dynamics with Soft Law Constraints: A McKean-Vlasov FBSDE Approach
-
Co-distillation lifts small-model math accuracy by 6 points over GRPO
CoDistill-GRPO: A Co-Distillation Recipe for Efficient Group Relative Policy Optimization
-
Variance reduction shortens time complexity in parallel optimization
Rennala MVR: Improved Time Complexity for Parallel Stochastic Optimization via Momentum-Based Variance Reduction
-
Noiseless inverse optimization has tight O(d/T) generalization
Tight Generalization Bounds for Noiseless Inverse Optimization
-
Local LMO matches PGD rates without bounded sets or curvature
Local LMO: Constrained Gradient Optimization via a Local Linear Minimization Oracle
-
Two encoder blocks suffice for optimal Transformer approximation
Learning Theory of Transformers: Local-to-Global Approximation via Softmax Partition of Unity
-
GPU solver speeds up entropic optimal transport calculations
cuRegOT: A GPU-Accelerated Solver for Entropic-Regularized Optimal Transport
-
Canonical diffusion isolates mode barriers from samples and scores
Measuring and Decomposing Mode Separation via the Canonical Diffusion
-
Spectral analysis tracks shape and color defects in unregistered 4D point clouds
Simultaneous Monitoring of Shape and Surface Color via 4D Point Clouds: A Registration-free Approach
-
Estimators achieve minimax optimal rates for unbalanced transport-growth pairs
Minimax Optimal Estimation of Transport-Growth Pairs in Unbalanced Optimal Transport
-
Core-halo split removes bias in decentralized fixed-point solving
Core-Halo Decomposition: Decentralizing Large-Scale Fixed-Point Problems
-
Bayesian PINNs contract to PDE solutions at near-minimax rates
Posterior Concentration of Bayesian Physics-Informed Neural Networks for Elliptic PDEs
-
Normalizing flows sharpen conformal prediction regions in multiple dimensions
CONTRA: Conformal Prediction Region via Normalizing Flow Transformation
-
Multi-component ICA splits into decoupled and competition phases
Learnability and Competition in High-Dimensional Multi-Component ICA
-
WLM learns second-order population dynamics from snapshots
A Call to Lagrangian Action: Learning Population Mechanics from Temporal Snapshots
-
Learns second-order population mechanics from snapshots
A Call to Lagrangian Action: Learning Population Mechanics from Temporal Snapshots
-
WLM learns second-order population dynamics from snapshots
A Call to Lagrangian Action: Learning Population Mechanics from Temporal Snapshots
-
Sliced inner-product GW distance aligns high-dim data scalably
Sliced Inner Product Gromov-Wasserstein Distances
-
Sinkhorn divergence tests full distributional treatment effects
Sinkhorn Treatment Effects: A Causal Optimal Transport Measure
-
Sinks equal hard attention switches at lower cost than diagonals
Sink vs. diagonal patterns as mechanisms for attention switch and oversmoothing prevention