archive
Every paper Pith has read. Search by title, abstract, or pith.
14903 papers in cs.LG · page 3
-
Prefix prompts let frozen LLMs condition flows for multi-modal forecasts
PaP-NF: Probabilistic Long-Term Time Series Forecasting via Prefix-as-Prompt Reprogramming and Normalizing Flows
-
Kernel agents top out at 0.94x production baselines
FastKernels: Benchmarking GPU Kernel Generation in Production
-
Homography mapping yields linear bounds for camera motion verification
Lipschitz Optimization for Formal Verification of Homographies
-
Region quotas stop wipe-out of reasoning blocks in KV caches
Adaptive Mass-Segmented KV Compression for Long-Context Reasoning
-
Small labeled set plus pseudo-labels prunes datasets effectively
Label-Efficient Dataset Pruning via Semi-Supervised Pseudo-Labeling
-
Pretrained graph model improves low-data OPF accuracy
Scalable Heterogeneous Graph Foundation Models for Data-Driven Optimal Power Flow in Smart Grids
-
RankElastor stabilizes rank trajectories for scaled recommenders
Expand More, Shrink Less: Shaping Effective-Rank Dynamics for Dense Scaling in Recommendation
-
r-value scores shrink conformal sets by excluding unstable candidates
Empirical Bayes Conformal Prediction for Vision and Language Models
-
GPI finds good policies with cost independent of state space size
Pure Exploration for a Good Policy in Reinforcement Learning with Bandit Feedback
-
Optimizing prompt embeddings boosts in-context learning
Self-Improving In-Context Learning
-
Symmetric noise lifts AlpacaEval scores from 65% to 69% in fine-tuning
Understanding and Improving Noisy Embedding Techniques in Instruction Finetuning
-
LLMs drop up to 88 points when tasks move to context middle
Positional Failures in Long-Context LLMs: A Blind Spot in Reasoning Benchmarks
-
10 poisoned examples hijack targeted LLM tasks at 70%+ success
PoisonForge: Task-Level Targeted Poisoning Benchmark for Instruction-Tuned LLMs
-
ActInv recovers inputs from LLM split-inference activations
What Does the Server See? Understanding Privacy Leakage from Large Language Models in Split Inference
-
Limit space makes any-size input models universal
Any-Dimensional Invariant Universality
-
Infra-Bayesian RL records lower worst-case regret than classical agents
Infra-Bayesian Reinforcement Learning Agents Outperform Classical RL For Worst-Case Robustness
-
Gradient descent recovers true similarity metric from triplets
Operationalizing Individual Fairness via Gradient Descent and Bradley-Terry Models
-
Channel relevance steers contrastive samples for time series anomaly detection
CALAD: Channel-Aware contrastive Learning for multivariate time series Anomaly Detection
-
RL selects Clifford states that boost VQA energy accuracy 3x on average
Classical State Preparation for Variational Quantum Algorithms via Reinforcement Learning
-
Taylor-mode AD powering yields exact nested copula likelihoods
Archimedean Copula Inference via Taylor-Mode AD
-
Rayleigh quotient fixes rare switching under privacy noise
When Determinants Are Not Enough: Private Rare Switching
-
Verified prompts plus longitudinal context raise lesion tracking Dice by 4.5 points
Exploiting Longitudinal Context in Clinician-Verified Interactive Lesion Tracking
-
Gen-ROTDA adapts bike-sharing demand models across years by anchoring on few target labels
Robust OT-Guided Generative Residual Domain Adaptation for Bike-Sharing Demand Prediction under Temporal Domain Shift
-
LLM Sparsity Prior lets spike-and-slab models ignore bad LLM weights
LLM Sparsity Prior for Robust Feature Selection
-
Certified bounds eliminate overflows in encrypted neural nets
Encrypted Neural Networks without Overflows
-
Jacobian penalty on latent dynamics raises sample efficiency in DreamerV3
Dreaming Smoothly and Sample Efficiently with Gradient Penalized Latent Dynamics
-
Depth biases networks toward low-rank softmax codes
The Implicit Bias of Depth: From Neural Collapse to Softmax Codes
-
KAN estimator converges independent of covariate dimension
KAPLAN: Kolmogorov-Arnold Prognostic Learnable Activation Networks for Survival Analysis
-
5% FP16 blocks recover 89% of FP4-to-FP16 attention quality gap
ThriftAttention: Selective Mixed Precision for Long-Context FP4 Attention
-
Attribution contract resolves ambiguity in generative model explanations
The Attribution Contract: Feature Attribution for Generative Language Models
-
Global LP ranks every MoE expert to cut memory at low bits
GEMQ: Global Expert-Level Mixed-Precision Quantization for MoE LLMs
-
Orbax speeds JAX checkpoint saves up to 3.5x over PyTorch
Orbax: Distributed Checkpointing with JAX
-
Dithering defends vision models against adversarial attacks
Dithering Defense: Adversarial Robustness of Vision Foundation Models via Multi-Level Floyd-Steinberg Dithering
-
Vertex weights let mmWave data drive accurate SMPL body fits
Millimeter-wave Imaging for Anthropometric Body Measurement
-
One config matches tuned AdamW across 1-8x horizons on LLMs
Anytime Training with Schedule-Free Spectral Optimization
-
Controller routes LLM requests to best mode for 2x speedup
ModeSwitch-LLM: A Lightweight Phase-Aware Controller for Cross-Mode LLM Inference on a Single GPU
-
Recognition of evaluations depends on model-benchmark pairs
Decomposing and Measuring Evaluation Awareness
-
Compositionality rises then falls in LLM self-training
Model Collapse as Cultural Evolution
-
Motion data alone rivals video models trained on 10000x more examples
The TIME Machine: On The Power of Motion for Efficient Perception
-
Sparse query gradients steer LLM paths and feedback levels
Steered Generation via Gradient-Based Optimization on Sparse Query Features
-
LLMs learn what not to say via frequency competition
Do Language Models Know What Not to Say? Causal Evidence for Statistical Preemption in LLMs
-
Open datasets and software released for thermal-fluid AI
Open Multimodal Datasets and Open-Source Software for Data-Driven Modeling of Multiphase Transport and Thermal Systems
-
Intermediate layers hold more task info than final layers
Uncovering the Latent Potential of Deep Intermediate Representations
-
RADAR forecasts transfer by comparing representation trajectories
RADAR: Relative Angular Divergence Across Representations
-
Latent states let transformer adapt to time-series contexts without quadratic cost
World Machine: Towards Generative World Modeling for Time-Series
-
Transformers have fixed accuracy limits set by layers and width
The Deterministic Horizon: Impossibility Results as Design Specifications for Trustworthy AI Systems
-
Small models evolve their own agents via two-timescale updates
PACE: Two-Timescale Self-Evolution for Small Language Model Agents
-
Lipschitz intermediaries enable approximate calibration of discrete properties
Smoothed Elicitation Complexity for Approximate $\Gamma$-calibration of Discrete Classification Tasks
-
LLM evolutionary optimizer boosts Bitcoin trading in backtests
MadEvolve: Evolutionary Optimization of Trading Systems with Large Language Models
-
Active sensing serves task control
Active Sensing Subserves Task-Level Control