Fasionad: Fast and slow fusion thinking systems for human- like autonomous driving with adaptive feedback

Multi-modal fusion transformer for end-toend autonomous driving · 2024 · arXiv 2411.18013

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

BLUE: Toward Better Language Use in Efficient Vision-Language-Action Models for Autonomous Driving

cs.CV · 2026-06-07 · unverdicted · novelty 6.0

BLUE trains a lightweight gate on frozen VLA hidden states to selectively activate language generation only when beneficial, achieving SOTA results with 2.54x inference speedup on driving benchmarks.

SafeLens: Deliberate and Efficient Video Guardrails with Fast-and-Slow Screening

cs.CV · 2026-05-17 · unverdicted · novelty 5.0

SafeLens presents a fast-and-slow video guardrail framework that filters the SafeWatch dataset to 2.4% and adds Chain-of-Thought traces to achieve state-of-the-art moderation performance at reduced inference cost.

XEmbodied: A Foundation Model with Enhanced Geometric and Physical Cues for Large-Scale Embodied Environments

cs.CV · 2026-04-20 · unverdicted · novelty 4.0

XEmbodied is a foundation model that integrates 3D geometric and physical signals into VLMs using a 3D Adapter and Efficient Image-Embodied Adapter, plus progressive curriculum and RL post-training, to improve spatial reasoning and embodied performance on 18 benchmarks.

citing papers explorer

Showing 3 of 3 citing papers.

BLUE: Toward Better Language Use in Efficient Vision-Language-Action Models for Autonomous Driving cs.CV · 2026-06-07 · unverdicted · none · ref 6
BLUE trains a lightweight gate on frozen VLA hidden states to selectively activate language generation only when beneficial, achieving SOTA results with 2.54x inference speedup on driving benchmarks.
SafeLens: Deliberate and Efficient Video Guardrails with Fast-and-Slow Screening cs.CV · 2026-05-17 · unverdicted · none · ref 43
SafeLens presents a fast-and-slow video guardrail framework that filters the SafeWatch dataset to 2.4% and adds Chain-of-Thought traces to achieve state-of-the-art moderation performance at reduced inference cost.
XEmbodied: A Foundation Model with Enhanced Geometric and Physical Cues for Large-Scale Embodied Environments cs.CV · 2026-04-20 · unverdicted · none · ref 80
XEmbodied is a foundation model that integrates 3D geometric and physical signals into VLMs using a 3D Adapter and Efficient Image-Embodied Adapter, plus progressive curriculum and RL post-training, to improve spatial reasoning and embodied performance on 18 benchmarks.

Fasionad: Fast and slow fusion thinking systems for human- like autonomous driving with adaptive feedback

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer