archive
Every paper Pith has read. Search by title, abstract, or pith.
14903 papers in cs.LG · page 10
-
Stimulus symmetries produce distinct RSMs for equivalent codes
Stimulus symmetries can confound representational similarity analyses
-
Clients pick own models to cut federated comms 44x and raise accuracy
Optimized Federated Knowledge Distillation with Distributed Neural Architecture Search
-
Regularization curbs prompt overfitting for better LLM generalization
TextReg: Mitigating Prompt Distributional Overfitting via Regularized Text-Space Optimization
-
CRAFT projects updates to resolve conflicts in federated learning
CRAFT: Conflict-Resolved Aggregation for Federated Training
-
Bernoulli metrics distinguish memorized from generalizing networks
A New Framework to Analyse the Distributional Robustness of Deep Neural Networks
-
Simulator predicts LLM serving latency with 6% error
Frontier: Towards Comprehensive and Accurate LLM Inference Simulation
-
RL cuts pedestrian waits 79% via better crosswalks and signals
DeCoR: Design and Control Co-Optimization for Urban Streets Using Reinforcement Learning
-
Inductive logic turns neural circuit findings into transferable theories
From Circuit Evidence to Mechanistic Theory: An Inductive Logic Approach
-
Contrasting patients with controls isolates disease subgroups
Automatic Discovery of Disease Subgroups by Contrasting with Healthy Controls
-
Semantic route cuts mental health prediction error across datasets
TimeSRL: Generalizable Time-Series Behavioral Modeling via Semantic RL-Tuned LLMs -- A Case Study in Mental Health
-
Large learning rates alter transformer attractors to cycles and chaos
Large-Step Training Dynamics of a Two-Factor Linear Transformer Model
-
Tabular models use distinct similarity readouts despite matching accuracy
A Mechanistic Study of Tabular Foundation Models
-
One-step generative policies add multimodal actions to mirror descent RL
Stochastic MeanFlow Policies: One-Step Generative Control with Entropic Mirror Descent
-
One-step MeanFlow policies beat Gaussian baselines in RL
Stochastic MeanFlow Policies: One-Step Generative Control with Entropic Mirror Descent
-
PCA loss matches supervised accuracy in unsupervised feature selection
Objective-Induced Bias and Search Dynamics in Multiobjective Unsupervised Feature Selection
-
LLM agents design MCU neural nets in hours instead of days
AutoMCU: Feasibility-First MCU Neural Network Customization via LLM-based Multi-Agent Systems
-
Moderate warm-up lets offline DPO surpass online RL on math reasoning
How Much Online RL is Enough? Informative Rollouts for Offline Preference Optimization in RLVR
-
Dual-level experts reach 78% global accuracy in federated learning
FedCoE: Bridging Generalization and Personalization via Federated Coordinated Dual-level MoEs
-
Pricing learns demand from one revenue and resets for shifts
Nonparametric Learning and Earning with One-Point Feedback under Nonstationarity
-
Chain of thought splits into benefit and cost with stability bounds
On the Cost and Benefit of Chain of Thought: A Learning-Theoretic Perspective
-
Wasserstein bounds set tuning rules for annealed Langevin in SBI
Theoretical guidelines for annealed Langevin dynamics in compositional simulation-based inference
-
Fluid-inspired velocity field fixes oversmoothing in deep GNNs
Graph Navier Stokes Networks
-
Contrast sub-blocks in windows to learn time series features
Divide and Contrast: Learning Robust Temporal Features without Augmentation
-
Strategy-map DAG keeps self-evolving agents from repeating old routines
APEX: Autonomous Policy Exploration for Self-Evolving LLM Agents
-
10% heads on 10% data deliver 8.3 pp gain with 7x speedup in LLM alignment
From Parameters to Data: A Task-Parameter-Guided Fine-Tuning Pipeline for Efficient LLM Alignment
-
Octahedral triplet quantizer trims KV cache bits
OCTOPUS: Optimized KV Cache for Transformers via Octahedral Parametrization Under optimal Squared error quantization
-
Preference tuning cuts RL policy failures by over 60%
PREFINE: Preference-Based Implicit Reward and Cost Fine-Tuning for Safety Alignment
-
Decomposition recovers shared LoRA subspace across clients
Federated LoRA Fine-Tuning for LLMs via Collaborative Alignment
-
QED makes RL policies 100 times more consistent across runs
Behavior-Consistent Deep Reinforcement Learning
-
QED cuts cross-run divergence in RL by two orders of magnitude
Behavior-Consistent Deep Reinforcement Learning
-
Quantum RL matches classical on chemical flowsheet design
Enhanced Reinforcement Learning-based Process Synthesis via Quantum Computing
-
YANN-RL cuts training time for chemical process control
Reinforcement Learning-based Control via Y-wise Affine Neural Networks: Comparative Case Studies for Chemical Processes
-
RL fine-tuning lifts code generation pass@1 by 19% on MBPP
Domain-Adaptable Reinforcement Learning for Code Generation with Dense Rewards
-
Adaptive batch scaling unlocks large-batch RL
Scalable Reinforcement Learning via Adaptive Batch Scaling
-
ChunkFT fits full fine-tuning of 8B models in 14GB GPU memory
ChunkFT: Byte-Streamed Optimization for Memory-Efficient Full Fine-Tuning
-
Gradient similarities unify measures of model complexity
A Rigorous, Tractable Measure of Model Complexity
-
Multi-slot ad matching lifts revenue per user nearly 29 percent
Beyond Single Slot: Joint Optimization for Multi-Slot Guaranteed Display Advertising
-
Quantum circuit generator cuts mismatch in synthetic fraud data
Q-SYNTH: Hybrid Quantum-Classical Adversarial Augmentation for Imbalanced Fraud Detection
-
Backward data generation lets compact model beat Mathematica on first integrals
Learning First Integrals via Backward-Generated Data and Guided Reinforcement Learning
-
YOLOv11 detects military targets in synthetic thermal and night drone images
Comparative Analysis of Military Detection Using Drone Imagery Across Multiple Visual Spectrums
-
Fine-tuned LLM reaches 0.866 F1 on Spanish psychiatric ICD coding
Automated ICD Classification of Psychiatric Diagnoses: From Classical NLP to Large Language Models
-
SMoA outperforms LoRA in low-budget fine-tuning
SMoA: Spectrum Modulation Adapter for Parameter-Efficient Fine-Tuning
-
Model separates animal, natural, and human sounds in field recordings
CoarseSoundNet: Building a reliable model for ecological soundscape analysis
-
Deep learning model separates animal
CoarseSoundNet: Building a reliable model for ecological soundscape analysis
-
Cognitive-physical RL adds foresight to safer driving policies
Distill to Think, Foresee to Act: Cognitive-Physical Reinforcement Learning for Autonomous Driving
-
CoPhy RL framework reaches SOTA on NAVSIM with BEV foresight
Distill to Think, Foresee to Act: Cognitive-Physical Reinforcement Learning for Autonomous Driving
-
Fine-tuning erases reasoning traces while answers stay correct
Reasoning-Trace Collapse: Evaluating the Loss of Explicit Reasoning During Fine-Tuning
-
Virtual samples cut advantage collapse in GRPO by over half
Advantage Collapse in Group Relative Policy Optimization: Diagnosis and Mitigation
-
Linear utility improves DPO for diffusion and flow image models
Linear-DPO: Linear Direct Preference Optimization for Diffusion and Flow-Matching Generative Models
-
FLECA defends decentralized EV learning from attacks
Automated Byzantine-Resilient Clustered Decentralized Federated Learning for Battery Intelligence in Connected EVs