A multi-stream ensemble using DINOv2 and CLIP backbones trained with extreme degradations achieves stable deepfake detection and fourth place in the NTIRE 2026 challenge.
What makes train- ing multi-modal classification networks hard? InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12695–12705, 2020
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Robust Deepfake Detection: Mitigating Spatial Attention Drift via Calibrated Complementary Ensembles
A multi-stream ensemble using DINOv2 and CLIP backbones trained with extreme degradations achieves stable deepfake detection and fourth place in the NTIRE 2026 challenge.