hub Canonical reference

arXiv preprint arXiv:2405.17247 , year=

Florian Bordes, Richard Yuanzhe Pang, Anurag Ajay, Alexander C · 2024 · arXiv 2405.17247

Canonical reference. 100% of citing Pith papers cite this work as background.

20 Pith papers citing it

Background 100% of classified citations

read on arXiv browse 20 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 5

citation-polarity summary

background 5

representative citing papers

Beyond Linear Attention: Softmax Transformers Implement In-Context Reinforcement Learning

cs.LG · 2026-05-08 · unverdicted · novelty 8.0 · 2 refs

Softmax Transformers implement in-context RL through equivalence to weighted softmax TD updates, with error decay under contraction and parameters as global minimizers of pretraining loss.

Convergence and Emergence of In-Context Reinforcement Learning with Chain of Thought

cs.LG · 2026-05-08 · unverdicted · novelty 8.0

With specific linear Transformer parameters, CoT generation equals iterative TD updates, yielding geometric error decay with CoT length until a context-length statistical floor, and those parameters globally minimize the pretraining loss.

ArchSIBench: Benchmarking the Architectural Spatial Intelligence of Vision-Language Models

cs.CV · 2026-05-20 · unverdicted · novelty 7.0

ArchSIBench is a new benchmark dataset and evaluation suite that measures vision-language models on architectural spatial intelligence across 17 subtasks, showing most models lag human baselines especially in transformation and configuration.

RouteHijack: Routing-Aware Attack on Mixture-of-Experts LLMs

cs.LG · 2026-05-01 · unverdicted · novelty 7.0

RouteHijack is a routing-aware jailbreak that identifies safety-critical experts via activation contrast and optimizes suffixes to suppress them, reaching 69.3% average attack success rate on seven MoE LLMs with strong transfer to variants and VLMs.

Training-Free Semantic Multi-Object Tracking with Vision-Language Models

cs.CV · 2026-04-15 · conditional · novelty 7.0

TF-SMOT composes pretrained vision-language models into a training-free pipeline that reaches state-of-the-art tracking and improved summary quality on the BenSMOT benchmark.

OrganicHAR: Towards Activity Discovery in Organic Settings for Privacy Preserving Sensors Using Efficient Video Analysis

cs.HC · 2026-05-18 · unverdicted · novelty 6.0

OrganicHAR discovers 4-8 activity categories per user from sensor signals, achieves 79% accuracy on coarse activities with ambient sensors alone and cuts VLM queries by 90% by triggering video analysis only at detected pattern moments.

MegaScale-Omni: A Hyper-Scale, Workload-Resilient System for MultiModal LLM Training in Production

cs.DC · 2026-05-09 · unverdicted · novelty 6.0

MegaScale-Omni delivers 1.27x-7.57x higher throughput for dynamic multimodal LLM training by decoupling encoder and LLM parallelism, using unified colocation, and applying adaptive workload balancing.

Replacing Parameters with Preferences: Federated Alignment of Heterogeneous Vision-Language Models

cs.AI · 2026-05-05 · unverdicted · novelty 6.0

MoR lets clients train local reward models on private preferences and uses a learned Mixture-of-Rewards with GRPO on the server to align a shared base VLM without exchanging parameters, architectures, or raw data.

The Expense of Seeing: Attaining Trustworthy Multimodal Reasoning Within the Monolithic Paradigm

cs.CV · 2026-04-22 · unverdicted · novelty 6.0 · 2 refs

Proposes the Modality Translation Protocol with metrics ToS, CoS, FoS and SSC to quantify visual knowledge bottlenecks in VLMs, plus a Divergence Law hypothesis that scaling language models may increase the penalty.

MM-Telco: Benchmarks and Multimodal Large Language Models for Telecom Applications

cs.AI · 2025-11-17 · unverdicted · novelty 6.0

MM-Telco creates multimodal benchmarks for telecom and demonstrates that fine-tuned LLMs and VLMs achieve significant performance gains on domain-specific tasks.

SemanticOpt: Towards LLM-Based Semantic Black-Box Optimization

cs.LG · 2025-10-29 · unverdicted · novelty 6.0

SemanticOpt fine-tunes LLMs on structured Bayesian optimization trajectories augmented with natural-language context to jointly use numerical and semantic evidence for black-box optimization.

Benchmarking and Mitigating Sycophancy in Medical Vision Language Models

cs.CV · 2025-09-26 · unverdicted · novelty 6.0 · 2 refs

The paper benchmarks sycophancy in medical VLMs using hierarchical VQA templates and proposes VIPER to filter non-evidence social cues, reducing sycophancy while preserving interpretability.

Are Vision-Language Models Ready for Dietary Assessment? Exploring the Next Frontier in AI-Powered Food Image Recognition

cs.CV · 2025-04-09 · unverdicted · novelty 6.0

Introduces FoodNExTDB dataset and EWR metric to benchmark VLMs for food recognition, showing closed-source models achieve over 90% EWR on single-product images but struggle with fine-grained distinctions.

Tokenizing Single-Channel EEG with Time-Frequency Motif Learning

cs.LG · 2025-02-22 · unverdicted · novelty 6.0

TFM-Tokenizer learns a vocabulary of time-frequency motifs from single-channel EEG via a dual-path masked architecture and encodes signals into discrete tokens, reporting up to 11% Cohen's Kappa gains on benchmarks and 14% on ear-EEG sleep staging.

Structural Ranking of the Cognitive Plausibility of Computational Models of Analogy and Metaphors with the Minimal Cognitive Grid

cs.AI · 2026-05-02 · unverdicted · novelty 5.0

A formalized Minimal Cognitive Grid ranks computational models of analogy and metaphor by alignment with cognitive theories using Functional/Structural Ratio, Generality, and Performance Match dimensions.

Integration of Object Detection and Small VLMs for Construction Safety Hazard Identification

cs.CV · 2026-04-06 · unverdicted · novelty 4.0

Detection-guided prompting raises small VLM hazard F1 from 34.5% to 50.6% and BERTScore from 0.61 to 0.82 on construction images with only 2.5 ms added latency.

Dataset Safety in Autonomous Driving: Requirements, Risks, and Assurance

cs.AI · 2025-11-11 · unverdicted · novelty 4.0

The paper introduces a safety framework for datasets in autonomous driving that uses the AI Data Flywheel and lifecycle processes to identify hazards and ensure compliance with ISO/PAS 8800.

Alignment and Safety of Diffusion Models via Reinforcement Learning and Reward Modeling: A Survey

cs.CV · 2025-05-23 · accept · novelty 4.0

A literature survey that organizes diffusion model alignment methods along five axes (feedback source, reward form, optimization mechanism, distribution shift handling, and explicit safety constraints) and identifies open challenges for reliable deployment.

Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers

math.OC · 2026-05-18

Chain of Evidence: Pixel-Level Visual Attribution for Iterative Retrieval-Augmented Generation

cs.CV · 2026-05-02

citing papers explorer

Showing 20 of 20 citing papers.

Beyond Linear Attention: Softmax Transformers Implement In-Context Reinforcement Learning cs.LG · 2026-05-08 · unverdicted · none · ref 220 · 2 links
Softmax Transformers implement in-context RL through equivalence to weighted softmax TD updates, with error decay under contraction and parameters as global minimizers of pretraining loss.
Convergence and Emergence of In-Context Reinforcement Learning with Chain of Thought cs.LG · 2026-05-08 · unverdicted · none · ref 191
With specific linear Transformer parameters, CoT generation equals iterative TD updates, yielding geometric error decay with CoT length until a context-length statistical floor, and those parameters globally minimize the pretraining loss.
ArchSIBench: Benchmarking the Architectural Spatial Intelligence of Vision-Language Models cs.CV · 2026-05-20 · unverdicted · none · ref 19
ArchSIBench is a new benchmark dataset and evaluation suite that measures vision-language models on architectural spatial intelligence across 17 subtasks, showing most models lag human baselines especially in transformation and configuration.
RouteHijack: Routing-Aware Attack on Mixture-of-Experts LLMs cs.LG · 2026-05-01 · unverdicted · none · ref 4
RouteHijack is a routing-aware jailbreak that identifies safety-critical experts via activation contrast and optimizes suffixes to suppress them, reaching 69.3% average attack success rate on seven MoE LLMs with strong transfer to variants and VLMs.
Training-Free Semantic Multi-Object Tracking with Vision-Language Models cs.CV · 2026-04-15 · conditional · none · ref 6
TF-SMOT composes pretrained vision-language models into a training-free pipeline that reaches state-of-the-art tracking and improved summary quality on the BenSMOT benchmark.
OrganicHAR: Towards Activity Discovery in Organic Settings for Privacy Preserving Sensors Using Efficient Video Analysis cs.HC · 2026-05-18 · unverdicted · none · ref 15
OrganicHAR discovers 4-8 activity categories per user from sensor signals, achieves 79% accuracy on coarse activities with ambient sensors alone and cuts VLM queries by 90% by triggering video analysis only at detected pattern moments.
MegaScale-Omni: A Hyper-Scale, Workload-Resilient System for MultiModal LLM Training in Production cs.DC · 2026-05-09 · unverdicted · none · ref 7
MegaScale-Omni delivers 1.27x-7.57x higher throughput for dynamic multimodal LLM training by decoupling encoder and LLM parallelism, using unified colocation, and applying adaptive workload balancing.
Replacing Parameters with Preferences: Federated Alignment of Heterogeneous Vision-Language Models cs.AI · 2026-05-05 · unverdicted · none · ref 44
MoR lets clients train local reward models on private preferences and uses a learned Mixture-of-Rewards with GRPO on the server to align a shared base VLM without exchanging parameters, architectures, or raw data.
The Expense of Seeing: Attaining Trustworthy Multimodal Reasoning Within the Monolithic Paradigm cs.CV · 2026-04-22 · unverdicted · none · ref 2 · 2 links
Proposes the Modality Translation Protocol with metrics ToS, CoS, FoS and SSC to quantify visual knowledge bottlenecks in VLMs, plus a Divergence Law hypothesis that scaling language models may increase the penalty.
MM-Telco: Benchmarks and Multimodal Large Language Models for Telecom Applications cs.AI · 2025-11-17 · unverdicted · none · ref 6
MM-Telco creates multimodal benchmarks for telecom and demonstrates that fine-tuned LLMs and VLMs achieve significant performance gains on domain-specific tasks.
SemanticOpt: Towards LLM-Based Semantic Black-Box Optimization cs.LG · 2025-10-29 · unverdicted · none · ref 1
SemanticOpt fine-tunes LLMs on structured Bayesian optimization trajectories augmented with natural-language context to jointly use numerical and semantic evidence for black-box optimization.
Benchmarking and Mitigating Sycophancy in Medical Vision Language Models cs.CV · 2025-09-26 · unverdicted · none · ref 4 · 2 links
The paper benchmarks sycophancy in medical VLMs using hierarchical VQA templates and proposes VIPER to filter non-evidence social cues, reducing sycophancy while preserving interpretability.
Are Vision-Language Models Ready for Dietary Assessment? Exploring the Next Frontier in AI-Powered Food Image Recognition cs.CV · 2025-04-09 · unverdicted · none · ref 5
Introduces FoodNExTDB dataset and EWR metric to benchmark VLMs for food recognition, showing closed-source models achieve over 90% EWR on single-product images but struggle with fine-grained distinctions.
Tokenizing Single-Channel EEG with Time-Frequency Motif Learning cs.LG · 2025-02-22 · unverdicted · none · ref 1
TFM-Tokenizer learns a vocabulary of time-frequency motifs from single-channel EEG via a dual-path masked architecture and encodes signals into discrete tokens, reporting up to 11% Cohen's Kappa gains on benchmarks and 14% on ear-EEG sleep staging.
Structural Ranking of the Cognitive Plausibility of Computational Models of Analogy and Metaphors with the Minimal Cognitive Grid cs.AI · 2026-05-02 · unverdicted · none · ref 49
A formalized Minimal Cognitive Grid ranks computational models of analogy and metaphor by alignment with cognitive theories using Functional/Structural Ratio, Generality, and Performance Match dimensions.
Integration of Object Detection and Small VLMs for Construction Safety Hazard Identification cs.CV · 2026-04-06 · unverdicted · none · ref 15
Detection-guided prompting raises small VLM hazard F1 from 34.5% to 50.6% and BERTScore from 0.61 to 0.82 on construction images with only 2.5 ms added latency.
Dataset Safety in Autonomous Driving: Requirements, Risks, and Assurance cs.AI · 2025-11-11 · unverdicted · none · ref 33
The paper introduces a safety framework for datasets in autonomous driving that uses the AI Data Flywheel and lifecycle processes to identify hazards and ensure compliance with ISO/PAS 8800.
Alignment and Safety of Diffusion Models via Reinforcement Learning and Reward Modeling: A Survey cs.CV · 2025-05-23 · accept · none · ref 5
A literature survey that organizes diffusion model alignment methods along five axes (feedback source, reward form, optimization mechanism, distribution shift handling, and explicit safety constraints) and identifies open challenges for reliable deployment.
Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers math.OC · 2026-05-18 · unreviewed · ref 18
Chain of Evidence: Pixel-Level Visual Attribution for Iterative Retrieval-Augmented Generation cs.CV · 2026-05-02 · unreviewed · ref 6

arXiv preprint arXiv:2405.17247 , year=

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer