Softmax Transformers implement in-context RL through equivalence to weighted softmax TD updates, with error decay under contraction and parameters as global minimizers of pretraining loss.
Title resolution pending
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 8representative citing papers
Anchor-guided variance-aware reward modeling uses two response-level anchors to resolve non-identifiability in Gaussian models of pluralistic preferences, yielding provable identification, a joint training objective, and improved RLHF performance.
High-order generator regression from multi-step trajectories yields a second-order accurate estimator for finite-horizon continuous-time policy evaluation that outperforms the Bellman baseline in calibration studies and benchmarks.
SeqLoRA applies bilevel optimization to sequential LoRA adaptation for continual multi-concept text-to-image generation with theoretical bounds on forgetting and interference.
Proves sharp operator-norm concentration and expectation bounds for sample cross-covariances of sub-Gaussian and Gaussian vectors, governed by effective ranks of the marginal covariances.
Scaling pretrained representations improves label-free OOD detection on frozen backbones, causing performance gaps between global and local detectors to vanish across vision and language tasks.
A frozen average of the last two cycles matches or exceeds eight shape-learning alternatives on 97 GIFT-Eval configurations for periodic time series forecasting.
citing papers explorer
-
Beyond Linear Attention: Softmax Transformers Implement In-Context Reinforcement Learning
Softmax Transformers implement in-context RL through equivalence to weighted softmax TD updates, with error decay under contraction and parameters as global minimizers of pretraining loss.
-
Variance-aware Reward Modeling with Anchor Guidance
Anchor-guided variance-aware reward modeling uses two response-level anchors to resolve non-identifiability in Gaussian models of pluralistic preferences, yielding provable identification, a joint training objective, and improved RLHF performance.
-
Beyond Bellman: High-Order Generator Regression for Continuous-Time Policy Evaluation
High-order generator regression from multi-step trajectories yields a second-order accurate estimator for finite-horizon continuous-time policy evaluation that outperforms the Bellman baseline in calibration studies and benchmarks.
-
SeqLoRA: Bilevel Orthogonal Adaptation for Continual Multi-Concept Generation
SeqLoRA applies bilevel optimization to sequential LoRA adaptation for continual multi-concept text-to-image generation with theoretical bounds on forgetting and interference.
-
Concentration Inequalities for Sample Cross-Covariances
Proves sharp operator-norm concentration and expectation bounds for sample cross-covariances of sub-Gaussian and Gaussian vectors, governed by effective ranks of the marginal covariances.
-
Scaling Pretrained Representations Enables Label-Free Out-of-Distribution Detection Without Fine-Tuning
Scaling pretrained representations improves label-free OOD detection on frozen backbones, causing performance gaps between global and local detectors to vanish across vision and language tasks.
-
Don't Learn the Shape: Forecasting Periodic Time Series by Rank-1 Decomposition
A frozen average of the last two cycles matches or exceeds eight shape-learning alternatives on 97 GIFT-Eval configurations for periodic time series forecasting.
- Covariance Structure and Coordinate Heterogeneity Govern Binary Quantization of Contrastive Embeddings