Masked autoencoders are scalable vision learners

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Doll´ar, Ross Girshick · 2022

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

AuralSAM2: Enabling SAM2 Hear Through Pyramid Audio-Visual Feature Prompting

cs.CV · 2025-06-01 · conditional · novelty 6.0

AuralSAM2 fuses audio-visual features via a pyramid-based AuralFuser module and audio-guided contrastive loss to improve promptable segmentation accuracy in SAM2 with minimal efficiency impact.

Primus: Enforcing Attention Usage for 3D Medical Image Segmentation

cs.CV · 2025-03-03 · unverdicted · novelty 6.0

Primus and PrimusV2 are Transformer-centric models that match or exceed nnU-Net and top CNNs on nine 3D medical segmentation datasets by enforcing attention usage.

citing papers explorer

Showing 2 of 2 citing papers.

AuralSAM2: Enabling SAM2 Hear Through Pyramid Audio-Visual Feature Prompting cs.CV · 2025-06-01 · conditional · none · ref 15
AuralSAM2 fuses audio-visual features via a pyramid-based AuralFuser module and audio-guided contrastive loss to improve promptable segmentation accuracy in SAM2 with minimal efficiency impact.
Primus: Enforcing Attention Usage for 3D Medical Image Segmentation cs.CV · 2025-03-03 · unverdicted · none · ref 24
Primus and PrimusV2 are Transformer-centric models that match or exceed nnU-Net and top CNNs on nine 3D medical segmentation datasets by enforcing attention usage.

Masked autoencoders are scalable vision learners

fields

years

verdicts

representative citing papers

citing papers explorer