archive

Every paper Pith has read. Search by title, abstract, or pith.

14903 papers in cs.LG · page 1

cs.LG 2026-05-22 reviewed

Shannon capacity produces U-shaped LLM scaling curves
LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

Xu Ouyang +7
cs.LG 2026-05-22 reviewed

Tune dense once, transfer to any MoE configuration
Complete-muE: Optimal Hyperparameter Transfer and Scaling for MoE Models

Hongwu Peng +5
cs.CV 2026-05-22 reviewed

Token selection speeds geometry transformers over 85 percent
Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers

Shuhong Zheng +5
cs.DB 2026-05-22 reviewed

CHRONOS unifies index decay, pricing and privacy in data markets
CHRONOS: Temporally-Aware Multi-Agent Coordination for Evolving Data Marketplaces

Joydeep Chandra
stat.ML 2026-05-22 reviewed

SHK flow perturbations give dimension-free DP bounds
On the Stability of Spherical Hellinger-Kantorovich Flows and Their Implications for Differential Privacy

Aratrika Mustafi +1
cs.LG 2026-05-22 reviewed

Damped looping of transformer blocks lifts accuracy on frozen models
Training-Free Looped Transformers

Lizhang Chen +4
stat.ML 2026-05-22 reviewed

Muon dynamics dissipate Hamiltonian energy monotonically
Move on Muon : A Hamiltonian probability gradient flow perspective of Muon optimizer

Aratrika Mustafi +2
cs.LG 2026-05-22 reviewed

Foundation models support zero-shot causal image reasoning
Leveraging Foundation Models for Causal Generative Modeling

Aneesh Komanduri +1
cs.LG 2026-05-22 reviewed

Weak teachers boost larger LLMs via loss mixing
Strong Teacher Not Needed? On Distillation in LLM Pretraining

Taiming Lu +1
cs.LG 2026-05-22 reviewed

The paper derives entrywise error bounds for spectral ranking in the Bradley-Terry-Luce…
Entrywise Error Bounds for Spectral Ranking with Semi-Random Adversaries

Dongmin Lee +2
cs.LG 2026-05-22 reviewed

Post-training, not pre-training data, creates LLM geopolitical bias
It's the humans, not the data: Geopolitical bias in LLMs originates in post-training, amplified by the language of the prompt

Stuart Bladon +1
cs.CL 2026-05-22 reviewed

Word co-occurrence creates hierarchical geometry in embeddings
Hierarchical Concept Geometry in Language Models Emerges from Word Co-occurrence

Andres Nava +1
eess.SY 2026-05-22 reviewed

Dual-Brain pairs LLM with ML engine to automate O-RAN AI apps
Advanced AI Service Provisioning in O-RAN through LLM Engine Integration

Seyed Bagher Hashemi Natanzi +3
cs.LG 2026-05-22 reviewed

Debiased mining converts OOD detection to Monte-Carlo sampling
Debiased Negative Mining Improves Out-of-distribution Detection with Pre-trained Vision-Language Models

Bo Peng +3
physics.ao-ph 2026-05-22 reviewed

AI weather models move like particles down a learned free-energy slope
The physics of AI weather models

George Craig +3
cs.LG 2026-05-22 reviewed

Inspector agent raises LLM constitutive models to 100% physical validity
LLM-driven design of physics-constrained constitutive models: two agents are better than one

Marius Tacke +6
cs.LG 2026-05-22 reviewed

Seed-and-expand retrieval raises recall on knowledge graphs with small candidate sets
SeedER: Seed-and-Expand Retrieval from Knowledge Graphs

Hamed Shirzad +4
cs.LG 2026-05-22 reviewed

Attention I/O cost falls to near-linear in n for most regimes
Approaching I/O-optimality for Approximate Attention

P\'al Andr\'as Papp +2
cs.LG 2026-05-22 reviewed

ContrastAD detects anomalies by contrasting drifting time series graphs
Contrast to Detect: Dynamic Graph Contrastive Regularization for Unsupervised Anomaly Detection in Multivariate Time Series

Yunhua Pei +3
cs.LG 2026-05-22 reviewed

Derivative bound yields linear sampling for regularized classification
Optimal Dimension-Free Sampling for Regularized Classification

Meysam Alishahi +3
cs.CE 2026-05-22 reviewed

Language models reconstruct flow fields from under 10% data
Operator Learning for Reconstructing Flow Fields from Sparse Measurements: a Language Model Approach

Qian Zhang +1
cs.LG 2026-05-22 reviewed

Stability landscapes learned from network topology
Learning Dynamic Stability Landscapes in Synchronization Networks

Christian Nauck +3
cs.LG 2026-05-22 reviewed

Graph forecasts predict controller workload better than volume counts
Graph-based Complexity Forecasts in UK En Route Airspace Using Relevant Aircraft Interactions

Edward Henderson +2
cs.LG 2026-05-22 reviewed

Activation optimization improves randomized nets for operator approximation
Optimization of randomized neural networks for transfer operator approximation

Mohammad Tabish +1
cs.LG 2026-05-22 reviewed

Max-product search finds top relevant GNN walks in polynomial time
Relevant Walk Search for Explaining Graph Neural Networks

Ping Xiong +5
cs.HC 2026-05-22 reviewed

Smartwatches detect drunk driving at 0.88 AUROC
Detecting Drunk Driving Using Off-the-Shelf Smartwatches

Robin Deuber +11
cs.CV 2026-05-22 reviewed

Adaptive search fixes blind spots in high-res image perception for LLMs
CVSearch: Empowering Multimodal LLMs with Cognitive Visual Search for High-Resolution Image Perception

Liupeng Li +6
stat.ML 2026-05-22 reviewed

Preference feedback yields sublinear regret in kernel MDPs
Learning Kernel-Based MDPs from Episodic Preferential Feedback

Nikola Pavlovic +2
cs.LG 2026-05-22 reviewed

Compatible output heads let students learn from noise
Learning Through Noise: Why Subliminal Learning Works and When It Fails

Vincent C. Brockers +4
cs.CR 2026-05-22 reviewed

RL search finds more Tamarin proofs with shorter trees
Less Effort, Shorter Proofs: Reinforcement Learning for Security Protocol Analysis in Tamarin

Matthias Cosler +4
stat.ML 2026-05-22 reviewed

Dirichlet model inside MC Dropout improves uncertainty calibration
Dirichlet-Based Monte Carlo Dropout for Uncertainty Estimation in Neural Networks

Rouaa Hoblos (FEMTO-ST) +3
cs.LG 2026-05-22 reviewed

CopFITi makes copulas consistent for irregular time series
Valid and Expressive Copulas for Irregular Multivariate Time Series

Christian Kl\"otergens +3
cs.LG 2026-05-22 reviewed

Rigging benchmarks via training data is NP-hard
How Hard is it to Rig a Benchmark? A Social Choice Analysis of Leaderboard Robustness

Polina Gordienko +3
cs.CR 2026-05-22 reviewed

Temporal gaps weaken Android malware model defenses
Adversarial Vulnerability Under Temporal Concept Drift: A Longitudinal Study of Android Malware Detection

Ahmed Sabbah +4
cs.LG 2026-05-22 reviewed

Latent space lets diffusion language models sample faster with better quality
DiLaDiff: Distilled Latent-Augmented Diffusion for Language Modeling

Jean-Marie Lemercier +5
cs.LG 2026-05-22 reviewed

Hysteretic attention reaches Turing completeness in constant depth
Preisach Attention: A Hysteretic Model of Sequential Memory

Piotr Frydrych
cs.CL 2026-05-22 reviewed

Two-phase curriculum reaches 99.02% accuracy on name matching
Structure-Guided Entity Resolution: Fine-Tuning LLMs for Robust Name Matching in Complex Linguistic Contexts

Shivam Chourasia +2
cs.LG 2026-05-22 reviewed

Meta-learning yields model performance scores on unlabeled data
Learning to Evaluate: Cost-Effective Model Evaluation on Unlabeled Data with Meta-Learning

Trinh Pham +4
stat.ML 2026-05-22 reviewed

Sparse activations split scaling laws into two exponents
Asymmetric Scaling Laws from Sparse Features

John Sous +1
cs.RO 2026-05-22 reviewed

125 samples suffice for ANN inverse kinematics accuracy
How Many Training Samples Are Needed for the Inverse Kinematics Solutions by Artificial Neural Networks

Dong-Won Lim
cs.LG 2026-05-22 reviewed

Agents fail quantitative goals without progress tracking
Push Your Agent: Measuring and Enforcing Quantitative Goal Persistence in Long-Horizon LLM Agents

Yuandao Cai +4
cs.IR 2026-05-22 reviewed

Three-phase recipe keeps 98% precision in 190M retrieval models
HARNESS-LM: A Three-Phase Training Recipe for Harnessing SLMs in Sponsored Search Retrieval

Vipul Gupta +6
cs.LG 2026-05-22 reviewed

Latent policy gradients forecast RL goal generalization
Understanding Goal Generalisation in Sequential Reinforcement Learning

Jason Ross Brown +1
cs.LG 2026-05-22 reviewed

MARS scales ranks by performance gap sizes
MARS: Magnitude-Aware Rank Statistics

Muhammad Rajabinasab +2
cs.LG 2026-05-22 reviewed

Low dimension suffices for near-max retrieval margins
Is Dimensionality a Barrier for Retrieval Models?

Kiril Bangachev +3

4 Piths
cs.LG 2026-05-22 reviewed

One network pass trains an agent on every goal at once
Goal-Conditioned Agents that Learn Everything All at Once

Michael Matthews +7
cs.LG 2026-05-22 reviewed

Duplicating ambiguous points reveals hidden neighborhoods in projections
When One Point Is Not Enough: Addressing Ambiguous Instances in Dimensionality Reduction by Splitting

Diede P.M. van der Hoorn +2
cs.LG 2026-05-22 reviewed

New sampler cuts RL training time for flow models by up to 53%
Precise: SDE-Consistent Stochastic Sampling for RL Post-Training of Flow-Matching Models

Jade Zou +9
cs.LG 2026-05-22 reviewed

Energy conservation lets neural models recover hidden dynamics
Learning partially observed systems with neural Hamiltonian ordinary differential equations

Sunniva Meltzer +2
cs.LG 2026-05-22 reviewed

Velocity consistency shapes embeddings for top time series anomaly detection
VACE: Learning Geometrically Structured Representations for Time Series Anomaly Detection

Alberto D. Cencillo +3