archive
Every paper Pith has read. Search by title, abstract, or pith.
14903 papers in cs.LG · page 1
-
Shannon capacity produces U-shaped LLM scaling curves
LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws
-
Tune dense once, transfer to any MoE configuration
Complete-muE: Optimal Hyperparameter Transfer and Scaling for MoE Models
-
Token selection speeds geometry transformers over 85 percent
Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers
-
CHRONOS unifies index decay, pricing and privacy in data markets
CHRONOS: Temporally-Aware Multi-Agent Coordination for Evolving Data Marketplaces
-
SHK flow perturbations give dimension-free DP bounds
On the Stability of Spherical Hellinger-Kantorovich Flows and Their Implications for Differential Privacy
-
Damped looping of transformer blocks lifts accuracy on frozen models
Training-Free Looped Transformers
-
Muon dynamics dissipate Hamiltonian energy monotonically
Move on Muon : A Hamiltonian probability gradient flow perspective of Muon optimizer
-
Foundation models support zero-shot causal image reasoning
Leveraging Foundation Models for Causal Generative Modeling
-
Weak teachers boost larger LLMs via loss mixing
Strong Teacher Not Needed? On Distillation in LLM Pretraining
-
The paper derives entrywise error bounds for spectral ranking in the Bradley-Terry-Luce…
Entrywise Error Bounds for Spectral Ranking with Semi-Random Adversaries
-
Post-training, not pre-training data, creates LLM geopolitical bias
It's the humans, not the data: Geopolitical bias in LLMs originates in post-training, amplified by the language of the prompt
-
Word co-occurrence creates hierarchical geometry in embeddings
Hierarchical Concept Geometry in Language Models Emerges from Word Co-occurrence
-
Dual-Brain pairs LLM with ML engine to automate O-RAN AI apps
Advanced AI Service Provisioning in O-RAN through LLM Engine Integration
-
Debiased mining converts OOD detection to Monte-Carlo sampling
Debiased Negative Mining Improves Out-of-distribution Detection with Pre-trained Vision-Language Models
-
AI weather models move like particles down a learned free-energy slope
The physics of AI weather models
-
Inspector agent raises LLM constitutive models to 100% physical validity
LLM-driven design of physics-constrained constitutive models: two agents are better than one
-
Seed-and-expand retrieval raises recall on knowledge graphs with small candidate sets
SeedER: Seed-and-Expand Retrieval from Knowledge Graphs
-
Attention I/O cost falls to near-linear in n for most regimes
Approaching I/O-optimality for Approximate Attention
-
ContrastAD detects anomalies by contrasting drifting time series graphs
Contrast to Detect: Dynamic Graph Contrastive Regularization for Unsupervised Anomaly Detection in Multivariate Time Series
-
Derivative bound yields linear sampling for regularized classification
Optimal Dimension-Free Sampling for Regularized Classification
-
Language models reconstruct flow fields from under 10% data
Operator Learning for Reconstructing Flow Fields from Sparse Measurements: a Language Model Approach
-
Stability landscapes learned from network topology
Learning Dynamic Stability Landscapes in Synchronization Networks
-
Graph forecasts predict controller workload better than volume counts
Graph-based Complexity Forecasts in UK En Route Airspace Using Relevant Aircraft Interactions
-
Activation optimization improves randomized nets for operator approximation
Optimization of randomized neural networks for transfer operator approximation
-
Max-product search finds top relevant GNN walks in polynomial time
Relevant Walk Search for Explaining Graph Neural Networks
-
Smartwatches detect drunk driving at 0.88 AUROC
Detecting Drunk Driving Using Off-the-Shelf Smartwatches
-
Adaptive search fixes blind spots in high-res image perception for LLMs
CVSearch: Empowering Multimodal LLMs with Cognitive Visual Search for High-Resolution Image Perception
-
Preference feedback yields sublinear regret in kernel MDPs
Learning Kernel-Based MDPs from Episodic Preferential Feedback
-
Compatible output heads let students learn from noise
Learning Through Noise: Why Subliminal Learning Works and When It Fails
-
RL search finds more Tamarin proofs with shorter trees
Less Effort, Shorter Proofs: Reinforcement Learning for Security Protocol Analysis in Tamarin
-
Dirichlet model inside MC Dropout improves uncertainty calibration
Dirichlet-Based Monte Carlo Dropout for Uncertainty Estimation in Neural Networks
-
CopFITi makes copulas consistent for irregular time series
Valid and Expressive Copulas for Irregular Multivariate Time Series
-
Rigging benchmarks via training data is NP-hard
How Hard is it to Rig a Benchmark? A Social Choice Analysis of Leaderboard Robustness
-
Temporal gaps weaken Android malware model defenses
Adversarial Vulnerability Under Temporal Concept Drift: A Longitudinal Study of Android Malware Detection
-
Latent space lets diffusion language models sample faster with better quality
DiLaDiff: Distilled Latent-Augmented Diffusion for Language Modeling
-
Hysteretic attention reaches Turing completeness in constant depth
Preisach Attention: A Hysteretic Model of Sequential Memory
-
Two-phase curriculum reaches 99.02% accuracy on name matching
Structure-Guided Entity Resolution: Fine-Tuning LLMs for Robust Name Matching in Complex Linguistic Contexts
-
Meta-learning yields model performance scores on unlabeled data
Learning to Evaluate: Cost-Effective Model Evaluation on Unlabeled Data with Meta-Learning
-
Sparse activations split scaling laws into two exponents
Asymmetric Scaling Laws from Sparse Features
-
125 samples suffice for ANN inverse kinematics accuracy
How Many Training Samples Are Needed for the Inverse Kinematics Solutions by Artificial Neural Networks
-
Agents fail quantitative goals without progress tracking
Push Your Agent: Measuring and Enforcing Quantitative Goal Persistence in Long-Horizon LLM Agents
-
Three-phase recipe keeps 98% precision in 190M retrieval models
HARNESS-LM: A Three-Phase Training Recipe for Harnessing SLMs in Sponsored Search Retrieval
-
Latent policy gradients forecast RL goal generalization
Understanding Goal Generalisation in Sequential Reinforcement Learning
-
MARS scales ranks by performance gap sizes
MARS: Magnitude-Aware Rank Statistics
-
Low dimension suffices for near-max retrieval margins
Is Dimensionality a Barrier for Retrieval Models?
4 Piths -
One network pass trains an agent on every goal at once
Goal-Conditioned Agents that Learn Everything All at Once
-
Duplicating ambiguous points reveals hidden neighborhoods in projections
When One Point Is Not Enough: Addressing Ambiguous Instances in Dimensionality Reduction by Splitting
-
New sampler cuts RL training time for flow models by up to 53%
Precise: SDE-Consistent Stochastic Sampling for RL Post-Training of Flow-Matching Models
-
Energy conservation lets neural models recover hidden dynamics
Learning partially observed systems with neural Hamiltonian ordinary differential equations
-
Velocity consistency shapes embeddings for top time series anomaly detection
VACE: Learning Geometrically Structured Representations for Time Series Anomaly Detection