archive
Every paper Pith has read. Search by title, abstract, or pith.
2684 papers in stat.ML · page 8
-
Stable barcodes track how dependency clusters evolve in dynamic Bayesian networks
A Stable Distance Persistence Homology for Dynamic Bayesian Network Clustering
-
Thompson sampling learns unknown networks while optimizing treatments
Adaptive Policy Learning Under Unknown Network Interference
-
Random spectra match Muon on GPT-2 training
Muon is Not That Special: Random or Inverted Spectra Work Just as Well
-
Kernel makes rotated 3D anisotropy explicit in Gaussian processes
Interpretable Machine Learning for Spatial Science: A Lie-Algebraic Kernel for Rotationally Anisotropic Gaussian Processes
-
CutMix training induces local attention in early ViT layers
Inducing Spatial Locality in Vision Transformers through the Training Protocol
-
Predictive resampling yields exact Bayesian posteriors
Variational predictive resampling
-
VPR with mean-field predictives matches exact posteriors
Variational predictive resampling
-
Synthesize likelihoods to meet accuracy bounds with minimal prior deviation
Sensor Design for Accuracy-Bounded Estimation via Maximum-Entropy Likelihood Synthesis
-
Neural tilting of Lévy measures enables jump-preserving SDE inference
Variational Inference for L\'evy Process-Driven SDEs via Neural Tilting
-
k-step policy gradients escape myopic traps in restricted MDPs
Revisiting Policy Gradients for Restricted Policy Classes: Escaping Myopic Local Optima with $k$-step Policy Gradients
-
Transformer states converge uniformly to ODEs at rate O(1/L + 1/(L^{1/3} sqrt(H)))
Uniform Scaling Limits in AdamW-Trained Transformers
-
Reasoning helps LLM judges only on hard tasks
Reasoning Is Not Free: Robust Adaptive Cost-Efficient Routing for LLM-as-a-Judge
-
Linear networks store facts up to p log p = d²/2
Factual recall in linear associative memories: sharp asymptotics and mechanistic insights
-
Finite VC dimension enables finite-sample tests for distribution trade-offs
When Are Trade-Off Functions Testable from Finite Samples?
-
Tail extrapolation approximates best-of-N gradients from m much smaller than N
What should post-training optimize? A test-time scaling law perspective
-
LASSO matches homogeneous threshold for mixed-quality sparse data
Price of Quality: Sufficient Conditions for Sparse Recovery using Mixed-Quality Data
-
Natural policy gradient equals smoothed policy iteration
Natural Policy Gradient as Doubly Smoothed Policy Iteration: A Bellman-Operator Framework
-
LLM personas match human survey distributions on stable questions
When Can Digital Personas Reliably Approximate Human Survey Findings?
-
Divide-and-conquer causal discovery extends to latent variables
A Recursive Decomposition Framework for Causal Structure Learning in the Presence of Latent Variables
-
Amortized networks speed up causal sensitivity bounds by orders of magnitude
Amortizing Causal Sensitivity Analysis via Prior Data-Fitted Networks
-
Bayesian linear solvers are special cases of affine PIMs
Affine Tracing: A New Paradigm for Probabilistic Linear Solvers
-
Confidence weights fuse modalities for long-tailed recognition
Simultaneous Long-tailed Recognition and Multi-modal Fusion for Highly Imbalanced Multi-modal Data
-
Bound certifies any learned controller for unknown linear systems
A PAC-Bayes Approach for Controlling Unknown Linear Discrete-time Systems
-
PAC-Bayes bound guarantees controller performance on unknown systems
A PAC-Bayes Approach for Controlling Unknown Linear Discrete-time Systems
-
Semi-simulated tests pick different winners than real data for treatment effects
Real vs. Semi-Simulated: Rethinking Evaluation for Treatment Effect Estimation
-
Covariate-dependent level links low-fidelity quantiles to high-fidelity ones
Multi-Fidelity Quantile Regression
-
Sharp jumps in feature overlap set optimal neural scaling laws
Sharp feature-learning transitions and Bayes-optimal neural scaling laws in extensive-width networks
-
Mass lift certifies regret in guided diffusion optimization
Regret Analysis of Guided Diffusion for Black-Box Optimization over Structured Inputs
-
Low-fidelity data yields kernels for high-fidelity PDE solving
Multifidelity Gaussian process regression for solving nonlinear partial differential equations
-
Unified taxonomy clarifies ML uncertainty for physics
Uncertainty in Physics and AI: Taxonomy, Quantification, and Validation
-
Expert losses cut MoE training time for time series
Fast Training of Mixture-of-Experts for Time Series Forecasting via Expert Loss Integration
-
Test error in augmented random features depends only on data and augmentation moments
Characterizing the Generalization Error of Random Feature Regression with Arbitrary Data-Augmentation
-
Anchored TS safely reduces regret using shifted offline data
Sample-Mean Anchored Thompson Sampling for Offline-to-Online Learning with Distribution Shift
-
Median anchoring cuts regret in online bandits with shifted offline data
Sample-Mean Anchored Thompson Sampling for Offline-to-Online Learning with Distribution Shift
-
Neural feature maps scale exact GP inference
Scalable Gaussian process inference via neural feature maps
-
Focal sets plus fuzzy logic tame uncertainty in hierarchical image labels
A neurosymbolic Approach with Epistemic Deep Learning for Hierarchical Image Classification
-
Deeper Picard iterations cut truncation error without unbounded estimation error
Generalization Error Bounds for Picard-Type Operator Learning in Nonlinear Parabolic PDEs
-
GAN method estimates full causal distributions with minimax optimality
Extended Wasserstein-GAN Approach to Causal Distribution Learning: Density-Free Estimation and Minimax Optimality
-
Scaling rules transfer hyperparameters from small to large DenseAMs
Hyperparameter Transfer for Dense Associative Memories
-
Cyclic LiNG models coarsen to identifiable low-dimensional DAGs
Coarsening Linear Non-Gaussian Causal Models with Cycles
-
Subsampled CLT turns PFN predictions into valid Thompson samples
PFN-TS: Thompson Sampling for Contextual Bandits via Prior-Data Fitted Networks
-
Kaplan-Meier estimators give unbiased ARL and ADD for finite sequences
Accurate Evaluation of Quickest Changepoint Detectors via Non-parametric Survival Analysis
-
Generative models show an innovation window before memorizing data
The two clocks and the innovation window: When and how generative models learn rules
-
Wasserstein projection gives optimal private sampling
Differentially Private Sampling from Distributions via Wasserstein Projection
-
Federated LLMs keep explicit consistency and coverage under bandwidth budgets
Federated Language Models Under Bandwidth Budgets: Distillation Rates and Conformal Coverage
-
Order-gap measure gives stopping rule for adaptive learning
Consolidation-Expansion Operator Mechanics:A Unified Framework for Adaptive Learning
-
Order-gap tracks distance to settled state in learning systems
Consolidation-Expansion Operator Mechanics:A Unified Framework for Adaptive Learning
-
Multicalibration corrected without clean labels using contamination matrices
Unified Approach for Weakly Supervised Multicalibration
-
Rectified AI laws cut bias in Bayesian priors from limited data
Supercharging Bayesian Inference with Reliable AI-Informed Priors
-
Kernel regression error bounds cover non-Gaussian noise
On Uniform Error Bounds for Kernel Regression under Non-Gaussian Noise