DualGaze-VLM uses text guidance and a new object-level dataset G-W3DA to predict driver attention, beating prior models by up to 17.8% in similarity metrics and passing human visual Turing tests at 88%.
Salm 2: An extremely lightweight saliency mamba model for real-time cognitive awareness of driver attention
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
DiffAttn formulates driver visual attention prediction as a conditional diffusion-denoising task with Swin Transformer encoding, multi-scale fusion, and LLM semantic reasoning, achieving SoTA results on four datasets.
citing papers explorer
-
From Scene to Object: Text-Guided Dual-Gaze Prediction
DualGaze-VLM uses text guidance and a new object-level dataset G-W3DA to predict driver attention, beating prior models by up to 17.8% in similarity metrics and passing human visual Turing tests at 88%.
-
DiffAttn: Diffusion-Based Drivers' Visual Attention Prediction with LLM-Enhanced Semantic Reasoning
DiffAttn formulates driver visual attention prediction as a conditional diffusion-denoising task with Swin Transformer encoding, multi-scale fusion, and LLM semantic reasoning, achieving SoTA results on four datasets.