Modern vision foundation models plus a tunable attention pooling classifier head deliver state-of-the-art detection of AI-generated and inpainted images, outperforming CLIP by over 12 percent accuracy.
Megalith-10m: A dataset of public domain photographs.https://huggingface.co/ datasets / madebyollin / megalith - 10m, 2024
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
TAP into the Patch Tokens: Leveraging Vision Foundation Model Features for AI-Generated Image Detection
Modern vision foundation models plus a tunable attention pooling classifier head deliver state-of-the-art detection of AI-generated and inpainted images, outperforming CLIP by over 12 percent accuracy.