pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

14903 papers in cs.LG · page 1

  1. cs.LG 2026-05-22 reviewed
    Shannon capacity produces U-shaped LLM scaling curves

    LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

    Xu Ouyang +7

  2. cs.LG 2026-05-22 reviewed
    Tune dense once, transfer to any MoE configuration

    Complete-muE: Optimal Hyperparameter Transfer and Scaling for MoE Models

    Hongwu Peng +5

  3. cs.CV 2026-05-22 reviewed
    Token selection speeds geometry transformers over 85 percent

    Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers

    Shuhong Zheng +5

  4. cs.DB 2026-05-22 reviewed
    CHRONOS unifies index decay, pricing and privacy in data markets

    CHRONOS: Temporally-Aware Multi-Agent Coordination for Evolving Data Marketplaces

    Joydeep Chandra

  5. stat.ML 2026-05-22 reviewed
    SHK flow perturbations give dimension-free DP bounds

    On the Stability of Spherical Hellinger-Kantorovich Flows and Their Implications for Differential Privacy

    Aratrika Mustafi +1

  6. cs.LG 2026-05-22 reviewed
    Damped looping of transformer blocks lifts accuracy on frozen models

    Training-Free Looped Transformers

    Lizhang Chen +4

  7. stat.ML 2026-05-22 reviewed
    Muon dynamics dissipate Hamiltonian energy monotonically

    Move on Muon : A Hamiltonian probability gradient flow perspective of Muon optimizer

    Aratrika Mustafi +2

  8. cs.LG 2026-05-22 reviewed
    Foundation models support zero-shot causal image reasoning

    Leveraging Foundation Models for Causal Generative Modeling

    Aneesh Komanduri +1

  9. cs.LG 2026-05-22 reviewed
    Weak teachers boost larger LLMs via loss mixing

    Strong Teacher Not Needed? On Distillation in LLM Pretraining

    Taiming Lu +1

  10. cs.LG 2026-05-22 reviewed
    The paper derives entrywise error bounds for spectral ranking in the Bradley-Terry-Luce…

    Entrywise Error Bounds for Spectral Ranking with Semi-Random Adversaries

    Dongmin Lee +2

  11. cs.LG 2026-05-22 reviewed
    Post-training, not pre-training data, creates LLM geopolitical bias

    It's the humans, not the data: Geopolitical bias in LLMs originates in post-training, amplified by the language of the prompt

    Stuart Bladon +1

  12. cs.CL 2026-05-22 reviewed
    Word co-occurrence creates hierarchical geometry in embeddings

    Hierarchical Concept Geometry in Language Models Emerges from Word Co-occurrence

    Andres Nava +1

  13. eess.SY 2026-05-22 reviewed
    Dual-Brain pairs LLM with ML engine to automate O-RAN AI apps

    Advanced AI Service Provisioning in O-RAN through LLM Engine Integration

    Seyed Bagher Hashemi Natanzi +3

  14. cs.LG 2026-05-22 reviewed
    Debiased mining converts OOD detection to Monte-Carlo sampling

    Debiased Negative Mining Improves Out-of-distribution Detection with Pre-trained Vision-Language Models

    Bo Peng +3

  15. physics.ao-ph 2026-05-22 reviewed
    AI weather models move like particles down a learned free-energy slope

    The physics of AI weather models

    George Craig +3

  16. cs.LG 2026-05-22 reviewed
    Inspector agent raises LLM constitutive models to 100% physical validity

    LLM-driven design of physics-constrained constitutive models: two agents are better than one

    Marius Tacke +6

  17. cs.LG 2026-05-22 reviewed
    Seed-and-expand retrieval raises recall on knowledge graphs with small candidate sets

    SeedER: Seed-and-Expand Retrieval from Knowledge Graphs

    Hamed Shirzad +4

  18. cs.LG 2026-05-22 reviewed
    Attention I/O cost falls to near-linear in n for most regimes

    Approaching I/O-optimality for Approximate Attention

    P\'al Andr\'as Papp +2

  19. cs.LG 2026-05-22 reviewed
    ContrastAD detects anomalies by contrasting drifting time series graphs

    Contrast to Detect: Dynamic Graph Contrastive Regularization for Unsupervised Anomaly Detection in Multivariate Time Series

    Yunhua Pei +3

  20. cs.LG 2026-05-22 reviewed
    Derivative bound yields linear sampling for regularized classification

    Optimal Dimension-Free Sampling for Regularized Classification

    Meysam Alishahi +3

  21. cs.CE 2026-05-22 reviewed
    Language models reconstruct flow fields from under 10% data

    Operator Learning for Reconstructing Flow Fields from Sparse Measurements: a Language Model Approach

    Qian Zhang +1

  22. cs.LG 2026-05-22 reviewed
    Stability landscapes learned from network topology

    Learning Dynamic Stability Landscapes in Synchronization Networks

    Christian Nauck +3

  23. cs.LG 2026-05-22 reviewed
    Graph forecasts predict controller workload better than volume counts

    Graph-based Complexity Forecasts in UK En Route Airspace Using Relevant Aircraft Interactions

    Edward Henderson +2

  24. cs.LG 2026-05-22 reviewed
    Activation optimization improves randomized nets for operator approximation

    Optimization of randomized neural networks for transfer operator approximation

    Mohammad Tabish +1

  25. cs.LG 2026-05-22 reviewed
    Max-product search finds top relevant GNN walks in polynomial time

    Relevant Walk Search for Explaining Graph Neural Networks

    Ping Xiong +5

  26. cs.HC 2026-05-22 reviewed
    Smartwatches detect drunk driving at 0.88 AUROC

    Detecting Drunk Driving Using Off-the-Shelf Smartwatches

    Robin Deuber +11

  27. cs.CV 2026-05-22 reviewed
    Adaptive search fixes blind spots in high-res image perception for LLMs

    CVSearch: Empowering Multimodal LLMs with Cognitive Visual Search for High-Resolution Image Perception

    Liupeng Li +6

  28. stat.ML 2026-05-22 reviewed
    Preference feedback yields sublinear regret in kernel MDPs

    Learning Kernel-Based MDPs from Episodic Preferential Feedback

    Nikola Pavlovic +2

  29. cs.LG 2026-05-22 reviewed
    Compatible output heads let students learn from noise

    Learning Through Noise: Why Subliminal Learning Works and When It Fails

    Vincent C. Brockers +4

  30. cs.CR 2026-05-22 reviewed
    RL search finds more Tamarin proofs with shorter trees

    Less Effort, Shorter Proofs: Reinforcement Learning for Security Protocol Analysis in Tamarin

    Matthias Cosler +4

  31. stat.ML 2026-05-22 reviewed
    Dirichlet model inside MC Dropout improves uncertainty calibration

    Dirichlet-Based Monte Carlo Dropout for Uncertainty Estimation in Neural Networks

    Rouaa Hoblos (FEMTO-ST) +3

  32. cs.LG 2026-05-22 reviewed
    CopFITi makes copulas consistent for irregular time series

    Valid and Expressive Copulas for Irregular Multivariate Time Series

    Christian Kl\"otergens +3

  33. cs.LG 2026-05-22 reviewed
    Rigging benchmarks via training data is NP-hard

    How Hard is it to Rig a Benchmark? A Social Choice Analysis of Leaderboard Robustness

    Polina Gordienko +3

  34. cs.CR 2026-05-22 reviewed
    Temporal gaps weaken Android malware model defenses

    Adversarial Vulnerability Under Temporal Concept Drift: A Longitudinal Study of Android Malware Detection

    Ahmed Sabbah +4

  35. cs.LG 2026-05-22 reviewed
    Latent space lets diffusion language models sample faster with better quality

    DiLaDiff: Distilled Latent-Augmented Diffusion for Language Modeling

    Jean-Marie Lemercier +5

  36. cs.LG 2026-05-22 reviewed
    Hysteretic attention reaches Turing completeness in constant depth

    Preisach Attention: A Hysteretic Model of Sequential Memory

    Piotr Frydrych

  37. cs.CL 2026-05-22 reviewed
    Two-phase curriculum reaches 99.02% accuracy on name matching

    Structure-Guided Entity Resolution: Fine-Tuning LLMs for Robust Name Matching in Complex Linguistic Contexts

    Shivam Chourasia +2

  38. cs.LG 2026-05-22 reviewed
    Meta-learning yields model performance scores on unlabeled data

    Learning to Evaluate: Cost-Effective Model Evaluation on Unlabeled Data with Meta-Learning

    Trinh Pham +4

  39. stat.ML 2026-05-22 reviewed
    Sparse activations split scaling laws into two exponents

    Asymmetric Scaling Laws from Sparse Features

    John Sous +1

  40. cs.RO 2026-05-22 reviewed
    125 samples suffice for ANN inverse kinematics accuracy

    How Many Training Samples Are Needed for the Inverse Kinematics Solutions by Artificial Neural Networks

    Dong-Won Lim

  41. cs.LG 2026-05-22 reviewed
    Agents fail quantitative goals without progress tracking

    Push Your Agent: Measuring and Enforcing Quantitative Goal Persistence in Long-Horizon LLM Agents

    Yuandao Cai +4

  42. cs.IR 2026-05-22 reviewed
    Three-phase recipe keeps 98% precision in 190M retrieval models

    HARNESS-LM: A Three-Phase Training Recipe for Harnessing SLMs in Sponsored Search Retrieval

    Vipul Gupta +6

  43. cs.LG 2026-05-22 reviewed
    Latent policy gradients forecast RL goal generalization

    Understanding Goal Generalisation in Sequential Reinforcement Learning

    Jason Ross Brown +1

  44. cs.LG 2026-05-22 reviewed
    MARS scales ranks by performance gap sizes

    MARS: Magnitude-Aware Rank Statistics

    Muhammad Rajabinasab +2

  45. cs.LG 2026-05-22 reviewed
    Low dimension suffices for near-max retrieval margins

    Is Dimensionality a Barrier for Retrieval Models?

    Kiril Bangachev +3

    4 Piths
  46. cs.LG 2026-05-22 reviewed
    One network pass trains an agent on every goal at once

    Goal-Conditioned Agents that Learn Everything All at Once

    Michael Matthews +7

  47. cs.LG 2026-05-22 reviewed
    Duplicating ambiguous points reveals hidden neighborhoods in projections

    When One Point Is Not Enough: Addressing Ambiguous Instances in Dimensionality Reduction by Splitting

    Diede P.M. van der Hoorn +2

  48. cs.LG 2026-05-22 reviewed
    New sampler cuts RL training time for flow models by up to 53%

    Precise: SDE-Consistent Stochastic Sampling for RL Post-Training of Flow-Matching Models

    Jade Zou +9

  49. cs.LG 2026-05-22 reviewed
    Energy conservation lets neural models recover hidden dynamics

    Learning partially observed systems with neural Hamiltonian ordinary differential equations

    Sunniva Meltzer +2

  50. cs.LG 2026-05-22 reviewed
    Velocity consistency shapes embeddings for top time series anomaly detection

    VACE: Learning Geometrically Structured Representations for Time Series Anomaly Detection

    Alberto D. Cencillo +3