Softmax Transformers implement in-context RL through equivalence to weighted softmax TD updates, with error decay under contraction and parameters as global minimizers of pretraining loss.
hub Canonical reference
arXiv preprint arXiv:2405.17247 , year=
Canonical reference. 100% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
roles
background 5polarities
background 5representative citing papers
With specific linear Transformer parameters, CoT generation equals iterative TD updates, yielding geometric error decay with CoT length until a context-length statistical floor, and those parameters globally minimize the pretraining loss.
ArchSIBench is a new benchmark dataset and evaluation suite that measures vision-language models on architectural spatial intelligence across 17 subtasks, showing most models lag human baselines especially in transformation and configuration.
RouteHijack is a routing-aware jailbreak that identifies safety-critical experts via activation contrast and optimizes suffixes to suppress them, reaching 69.3% average attack success rate on seven MoE LLMs with strong transfer to variants and VLMs.
TF-SMOT composes pretrained vision-language models into a training-free pipeline that reaches state-of-the-art tracking and improved summary quality on the BenSMOT benchmark.
OrganicHAR discovers 4-8 activity categories per user from sensor signals, achieves 79% accuracy on coarse activities with ambient sensors alone and cuts VLM queries by 90% by triggering video analysis only at detected pattern moments.
MegaScale-Omni delivers 1.27x-7.57x higher throughput for dynamic multimodal LLM training by decoupling encoder and LLM parallelism, using unified colocation, and applying adaptive workload balancing.
MoR lets clients train local reward models on private preferences and uses a learned Mixture-of-Rewards with GRPO on the server to align a shared base VLM without exchanging parameters, architectures, or raw data.
Proposes the Modality Translation Protocol with metrics ToS, CoS, FoS and SSC to quantify visual knowledge bottlenecks in VLMs, plus a Divergence Law hypothesis that scaling language models may increase the penalty.
MM-Telco creates multimodal benchmarks for telecom and demonstrates that fine-tuned LLMs and VLMs achieve significant performance gains on domain-specific tasks.
SemanticOpt fine-tunes LLMs on structured Bayesian optimization trajectories augmented with natural-language context to jointly use numerical and semantic evidence for black-box optimization.
The paper benchmarks sycophancy in medical VLMs using hierarchical VQA templates and proposes VIPER to filter non-evidence social cues, reducing sycophancy while preserving interpretability.
Introduces FoodNExTDB dataset and EWR metric to benchmark VLMs for food recognition, showing closed-source models achieve over 90% EWR on single-product images but struggle with fine-grained distinctions.
TFM-Tokenizer learns a vocabulary of time-frequency motifs from single-channel EEG via a dual-path masked architecture and encodes signals into discrete tokens, reporting up to 11% Cohen's Kappa gains on benchmarks and 14% on ear-EEG sleep staging.
A formalized Minimal Cognitive Grid ranks computational models of analogy and metaphor by alignment with cognitive theories using Functional/Structural Ratio, Generality, and Performance Match dimensions.
Detection-guided prompting raises small VLM hazard F1 from 34.5% to 50.6% and BERTScore from 0.61 to 0.82 on construction images with only 2.5 ms added latency.
The paper introduces a safety framework for datasets in autonomous driving that uses the AI Data Flywheel and lifecycle processes to identify hazards and ensure compliance with ISO/PAS 8800.
A literature survey that organizes diffusion model alignment methods along five axes (feedback source, reward form, optimization mechanism, distribution shift handling, and explicit safety constraints) and identifies open challenges for reliable deployment.
citing papers explorer
-
Beyond Linear Attention: Softmax Transformers Implement In-Context Reinforcement Learning
Softmax Transformers implement in-context RL through equivalence to weighted softmax TD updates, with error decay under contraction and parameters as global minimizers of pretraining loss.
-
Convergence and Emergence of In-Context Reinforcement Learning with Chain of Thought
With specific linear Transformer parameters, CoT generation equals iterative TD updates, yielding geometric error decay with CoT length until a context-length statistical floor, and those parameters globally minimize the pretraining loss.
-
ArchSIBench: Benchmarking the Architectural Spatial Intelligence of Vision-Language Models
ArchSIBench is a new benchmark dataset and evaluation suite that measures vision-language models on architectural spatial intelligence across 17 subtasks, showing most models lag human baselines especially in transformation and configuration.
-
RouteHijack: Routing-Aware Attack on Mixture-of-Experts LLMs
RouteHijack is a routing-aware jailbreak that identifies safety-critical experts via activation contrast and optimizes suffixes to suppress them, reaching 69.3% average attack success rate on seven MoE LLMs with strong transfer to variants and VLMs.
-
Training-Free Semantic Multi-Object Tracking with Vision-Language Models
TF-SMOT composes pretrained vision-language models into a training-free pipeline that reaches state-of-the-art tracking and improved summary quality on the BenSMOT benchmark.
-
OrganicHAR: Towards Activity Discovery in Organic Settings for Privacy Preserving Sensors Using Efficient Video Analysis
OrganicHAR discovers 4-8 activity categories per user from sensor signals, achieves 79% accuracy on coarse activities with ambient sensors alone and cuts VLM queries by 90% by triggering video analysis only at detected pattern moments.
-
MegaScale-Omni: A Hyper-Scale, Workload-Resilient System for MultiModal LLM Training in Production
MegaScale-Omni delivers 1.27x-7.57x higher throughput for dynamic multimodal LLM training by decoupling encoder and LLM parallelism, using unified colocation, and applying adaptive workload balancing.
-
Replacing Parameters with Preferences: Federated Alignment of Heterogeneous Vision-Language Models
MoR lets clients train local reward models on private preferences and uses a learned Mixture-of-Rewards with GRPO on the server to align a shared base VLM without exchanging parameters, architectures, or raw data.
-
The Expense of Seeing: Attaining Trustworthy Multimodal Reasoning Within the Monolithic Paradigm
Proposes the Modality Translation Protocol with metrics ToS, CoS, FoS and SSC to quantify visual knowledge bottlenecks in VLMs, plus a Divergence Law hypothesis that scaling language models may increase the penalty.
-
MM-Telco: Benchmarks and Multimodal Large Language Models for Telecom Applications
MM-Telco creates multimodal benchmarks for telecom and demonstrates that fine-tuned LLMs and VLMs achieve significant performance gains on domain-specific tasks.
-
SemanticOpt: Towards LLM-Based Semantic Black-Box Optimization
SemanticOpt fine-tunes LLMs on structured Bayesian optimization trajectories augmented with natural-language context to jointly use numerical and semantic evidence for black-box optimization.
-
Benchmarking and Mitigating Sycophancy in Medical Vision Language Models
The paper benchmarks sycophancy in medical VLMs using hierarchical VQA templates and proposes VIPER to filter non-evidence social cues, reducing sycophancy while preserving interpretability.
-
Are Vision-Language Models Ready for Dietary Assessment? Exploring the Next Frontier in AI-Powered Food Image Recognition
Introduces FoodNExTDB dataset and EWR metric to benchmark VLMs for food recognition, showing closed-source models achieve over 90% EWR on single-product images but struggle with fine-grained distinctions.
-
Tokenizing Single-Channel EEG with Time-Frequency Motif Learning
TFM-Tokenizer learns a vocabulary of time-frequency motifs from single-channel EEG via a dual-path masked architecture and encodes signals into discrete tokens, reporting up to 11% Cohen's Kappa gains on benchmarks and 14% on ear-EEG sleep staging.
-
Structural Ranking of the Cognitive Plausibility of Computational Models of Analogy and Metaphors with the Minimal Cognitive Grid
A formalized Minimal Cognitive Grid ranks computational models of analogy and metaphor by alignment with cognitive theories using Functional/Structural Ratio, Generality, and Performance Match dimensions.
-
Integration of Object Detection and Small VLMs for Construction Safety Hazard Identification
Detection-guided prompting raises small VLM hazard F1 from 34.5% to 50.6% and BERTScore from 0.61 to 0.82 on construction images with only 2.5 ms added latency.
-
Dataset Safety in Autonomous Driving: Requirements, Risks, and Assurance
The paper introduces a safety framework for datasets in autonomous driving that uses the AI Data Flywheel and lifecycle processes to identify hazards and ensure compliance with ISO/PAS 8800.
-
Alignment and Safety of Diffusion Models via Reinforcement Learning and Reward Modeling: A Survey
A literature survey that organizes diffusion model alignment methods along five axes (feedback source, reward form, optimization mechanism, distribution shift handling, and explicit safety constraints) and identifies open challenges for reliable deployment.
- Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers
- Chain of Evidence: Pixel-Level Visual Attribution for Iterative Retrieval-Augmented Generation