LISA achieves state-of-the-art driver gaze estimation by integrating frequency-domain priors with language-guided spatial attention and disentangling gaze features from interference via CLIP and orthogonal regularization.
Deep residual learning for image recog- nition
4 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
DCT-based initialization and frequency truncation for self-attention improve accuracy and reduce overhead in Vision Transformers on standard benchmarks.
FedIDM filters abnormal updates in federated learning by creating condensed data through distribution matching and rejecting updates that deviate or cause high loss on that data.
citing papers explorer
-
LISA: Language-guided Interference-aware Spatial-Frequency Attention for Driver Gaze Estimation
LISA achieves state-of-the-art driver gaze estimation by integrating frequency-domain priors with language-guided spatial attention and disentangling gaze features from interference via CLIP and orthogonal regularization.
-
Discrete Cosine Transform Based Decorrelated Attention for Vision Transformers
DCT-based initialization and frequency truncation for self-attention improve accuracy and reduce overhead in Vision Transformers on standard benchmarks.
-
FedIDM: Achieving Fast and Stable Convergence in Byzantine Federated Learning through Iterative Distribution Matching
FedIDM filters abnormal updates in federated learning by creating condensed data through distribution matching and rejecting updates that deviate or cause high loss on that data.
- GRPO-TTA: Test-Time Visual Tuning for Vision-Language Models via GRPO-Driven Reinforcement Learning