CanViT is the first task- and policy-agnostic AVFM pretrained via passive-to-active dense latent distillation on 13.2M scenes and 1B random glimpses, achieving 38.5% ADE20K mIoU in one glimpse and 84.5% ImageNet-1k top-1 after fine-tuning.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3roles
background 1polarities
background 1representative citing papers
Backpropagated gradients from vision models predict higher visual cortex signals but diverge from brain hierarchies in spatial and temporal organization.
A-ROM delivers competitive MedMNIST performance via pretrained ViT metric spaces, a concept dictionary, and kNN without backpropagation or fine-tuning, framed as interpretable few-shot learning under the Platonic Representation Hypothesis.
citing papers explorer
-
CanViT: Toward Active-Vision Foundation Models
CanViT is the first task- and policy-agnostic AVFM pretrained via passive-to-active dense latent distillation on 13.2M scenes and 1B random glimpses, achieving 38.5% ADE20K mIoU in one glimpse and 84.5% ImageNet-1k top-1 after fine-tuning.
-
Misalignment Between Backpropagation and the Hierarchy of Brain Responses to Images
Backpropagated gradients from vision models predict higher visual cortex signals but diverge from brain hierarchies in spatial and temporal organization.
-
Toward Aristotelian Medical Representations: Backpropagation-Free Layer-wise Analysis for Interpretable Generalized Metric Learning on MedMNIST
A-ROM delivers competitive MedMNIST performance via pretrained ViT metric spaces, a concept dictionary, and kNN without backpropagation or fine-tuning, framed as interpretable few-shot learning under the Platonic Representation Hypothesis.