Low dimension suffices for near-max retrieval margins
Is Dimensionality a Barrier for Retrieval Models?
Dimension O(m^{-2} log n) nearly matches the infinite-dimension margin for any relevance matrix A.
full image
Machine Learning
Papers on all aspects of machine learning research (supervised, unsupervised, reinforcement learning, bandit problems, and so on) including also robustness, explanation, fairness, and methodology. cs.LG is also an appropriate primary category for applications of machine learning methods.
Is Dimensionality a Barrier for Retrieval Models?
Dimension O(m^{-2} log n) nearly matches the infinite-dimension margin for any relevance matrix A.
full image
Disentanglement Beyond Generative Models with Riemannian ICA
The construction drops ICA's global generative requirement while recovering sources consistently across manifold representations.
full image
Lost in Tokenization: Fundamental Trade-offs in Graph Tokenization for Transformers
Random-walk maps lose information permanently while spectral maps preserve it but hinder local tasks, creating provable depth gaps between 2
full image
When Stronger Triggers Backfire: A High-Dimensional Theory of Backdoor Attacks
Proportional-regime analysis shows attack success peaks then falls while clean performance improves with training trigger strength.
full image
EntmaxKV: Support-Aware Decoding for Entmax Attention
When selected pages capture the entmax support, sparse decoding matches the full version exactly and error vanishes with the dropped mass.
full image
Conformal Selective Acting: Anytime-Valid Risk Control for RLVR-Trained LLMs
CSA maintains per-round selective risk bounds under predictable updates without pooling across deployments.
full image
Catching a Moving Subspace: Low-Rank Bandits Beyond Stationarity
Three probe conditions suffice for identification and replace ambient d sqrt(T) regret with intrinsic r sqrt(T) plus detection cost.
full image
Counterfactual Explanations Under Concept Drift
A sampling-based repair keeps explanations actionable in evolving models without full regeneration.
full image
Function graph transformers universally approximate operators between function spaces
Lifting functions to measures supported on their graphs allows standard attention and MLP layers to learn nonlinear operators while staying
full image
BoLT: A Benchmark to Democratize Black-box Optimization Research for Expensive LLM Tasks
Benchmark supplies accessible models that capture multi-fidelity noisy LLM problems so researchers can compare methods without massive runs.
full image
The bound holds for any number of adaptive rounds and any reuse of client samples under total clientwise zCDP.
full image
Ringmaster LMO: Asynchronous Linear Minimization Oracle Momentum Method
Extending delay-thresholding to LMO momentum handles heterogeneous workers and matches best known bounds in smooth cases.
full image
Universal Approximation of Nonlinear Operators and Their Derivatives
Proves first UATs extending Hornik 1991 results to Banach spaces using Bastiani differentiability and weighted Sobolev norms.
Exemplar Partitioning for Mechanistic Interpretability
Voronoi cells around observed points yield interpretable regions that control refusal in Gemma-2 and match SAE performance at 1000x lower
full image
Exemplar Partitioning for Mechanistic Interpretability
By using observed activation exemplars to define Voronoi regions, the method achieves 0.881 AUROC on concept detection with far less build
full image
HodgeCover: Higher-Order Topological Coverage Drives Compression of Sparse Mixture-of-Experts
Greedy selection of critical edges and triangles from the simplicial Laplacian of merge barriers leads on hybrid compression while balancing
full image
A Unified Geometric Framework for Weighted Contrastive Learning
The weighting scheme sets target distances that decide whether embeddings form simplices, collapse, or become geometrically impossible.
full image
Scale-Sensitive Shattering: Learnability and Evaluability at Optimal Scale
Exact equivalence for bounded real-valued classes closes scale gaps in PAC learning and gives sharp IPM evaluability thresholds
full image
Support-Conditioned Flow Matching Is Kernel Smoothing
Under Gaussian transport the velocity field equals a Nadaraya-Watson smoother whose bandwidth shrinks from global to nearest-neighbor.
full image
Local Inverse Geometry Can Be Amortized
D-IPG amortizes local inverse geometry for nonlinear PDE problems and keeps high success while slashing solve time.
full image
One offline-trained conditional flow matching map reuses an analytic latent trajectory across unlimited agents and targets while bounding,
full image
Large Language Models Lack Temporal Awareness of Medical Knowledge
Accuracy on current advice fades gradually over years while recall of past versions lags far behind, and search tools do not fully correct.
full image
Certified Robustness under Heterogeneous Perturbations via Hybrid Randomized Smoothing
A single closed-form radius now covers simultaneous attacks on discrete tokens and continuous pixels in multimodal models.
full image
Reanalysis of 722 annotated features shows the average string reused 3.07 times and resolves only 70 percent of feature identity.
full image
WriteSAE: Sparse Autoencoders for Recurrent State
Atoms match rank-1 k v writes, substitute successfully in 92 percent of cases, and deliver closed-form logit predictions plus sustained 3x-l
full image
WriteSAE: Sparse Autoencoders for Recurrent State
WriteSAE matches the rank-1 write shape and installs target continuations at 3x lift in Mamba and RWKV models.
full image
WriteSAE: Sparse Autoencoders for Recurrent State
WriteSAE reshapes features to match rank-1 updates and achieves high substitution success while enabling targeted lifts in generation.
full image
WriteSAE: Sparse Autoencoders for Recurrent State
They yield closer token distributions than deletion and enable steering by injecting chosen directions into the cache.
full image
Inference-Time Machine Unlearning via Gated Activation Redirection
Input-dependent rotations in activations match gradient baselines while preserving utility and working on quantized models.
Inference-Time Machine Unlearning via Gated Activation Redirection
GUARD-IT matches gradient methods on unlearning benchmarks while preserving utility and working on quantized LLMs.
Grid Games: The Power of Multiple Grids for Quantizing Large Language Models
Choosing the best 4-bit grid from a small set for each weight group cuts quantization error compared with fixed grids in both training and 4
full image
The method outperforms benchmarks in Delhi and Mumbai while staying stable during spikes and seasons and adding uncertainty estimates.
full image