Emerging properties in self-supervised vision transformers

· 2021

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

browse 9 citing papers

citation-role summary

dataset 1 method 1

citation-polarity summary

use dataset 1 use method 1

representative citing papers

Learning from Compressed CT: Feature Attention Style Transfer and Structured Factorized Projections for Resource-Efficient Medical Image Analysis

cs.CV · 2026-05-01 · unverdicted · novelty 7.0

CT-Lite combines Feature Attention Style Transfer (FAST) and Structured Factorized Projections (SFP) with contrastive learning to reach AUROC within 5-7% of uncompressed baselines on compressed CT volumes across three datasets while using far fewer parameters.

VFM$^{4}$SDG: Unveiling the Power of VFMs for Single-Domain Generalized Object Detection

cs.CV · 2026-04-23 · unverdicted · novelty 7.0 · 2 refs

VFM4SDG is a dual-prior framework that distills cross-domain stable relations from VFMs into DETR encoders and injects semantic-contextual priors into decoder queries to reduce missed detections in single-domain generalized object detection.

ToLL: Topological Layout Learning with Asymmetric Cross-View Structural Distillation for 3D Scene Graph Generation Pretraining

cs.CV · 2026-03-30 · unverdicted · novelty 7.0

ToLL pretrains 3D scene graph generators via anchor-conditioned topological layout recovery and asymmetric structural distillation to learn predicate constraints rather than geometric interpolation shortcuts.

Contrastive-SDXL: Annotation-Preserving Night-Time Augmentation for Pedestrian Detection

cs.CV · 2026-05-13 · unverdicted · novelty 6.0

Contrastive-SDXL augments daytime images into realistic night-time versions using SDXL-Turbo with LoRA and multi-level DINOv2 contrastive losses, yielding 6-7% lower miss rate on pedestrian detection versus daytime-only training.

From Synthetic to Real: Toward Identity-Consistent Makeup Transfer with Synthetic and Real Data

cs.CV · 2026-05-08 · unverdicted · novelty 6.0

The work creates identity-consistent synthetic makeup data via ConsistentBeauty and adapts models to real images using reinforcement learning in RealBeauty, achieving better identity preservation and real-world performance than prior methods.

AttriBE: Quantifying Attribute Expressivity in Body Embeddings for Recognition and Identification

cs.CV · 2026-04-29 · unverdicted · novelty 6.0

Transformer-based ReID embeddings encode BMI most strongly in deeper layers, followed by pitch, gender, and yaw, with pose peaking in middle layers and BMI increasing with depth; cross-spectral settings shift reliance toward structural cues.

LAA-X: Unified Localized Artifact Attention for Quality-Agnostic and Generalizable Face Forgery Detection

cs.CV · 2026-04-05 · unverdicted · novelty 6.0

LAA-X uses multi-task learning with explicit localized artifact attention and blending synthesis to build a deepfake detector that generalizes to high-quality and unseen manipulations after training only on real and pseudo-fake samples.

SurgMotion: A Video-Native Foundation Model for Universal Understanding of Surgical Videos

cs.CV · 2026-02-05 · unverdicted · novelty 6.0

SurgMotion outperforms prior methods on 17 surgical video benchmarks by shifting pretraining to latent motion prediction with motion-guided masking, affinity distillation, and diversity regularization on a 15M-sample dataset.

Stylistic-STORM (ST-STORM) : Perceiving the Semantic Nature of Appearance

cs.CV · 2026-04-17 · unverdicted · novelty 5.0

ST-STORM introduces a dual-branch SSL framework that disentangles semantic content from stylistic appearance using gated latent streams, JEPA for content invariance, and adversarial constraints for style capture.

citing papers explorer

Showing 9 of 9 citing papers.

Learning from Compressed CT: Feature Attention Style Transfer and Structured Factorized Projections for Resource-Efficient Medical Image Analysis cs.CV · 2026-05-01 · unverdicted · none · ref 38
CT-Lite combines Feature Attention Style Transfer (FAST) and Structured Factorized Projections (SFP) with contrastive learning to reach AUROC within 5-7% of uncompressed baselines on compressed CT volumes across three datasets while using far fewer parameters.
VFM$^{4}$SDG: Unveiling the Power of VFMs for Single-Domain Generalized Object Detection cs.CV · 2026-04-23 · unverdicted · none · ref 37 · 2 links
VFM4SDG is a dual-prior framework that distills cross-domain stable relations from VFMs into DETR encoders and injects semantic-contextual priors into decoder queries to reduce missed detections in single-domain generalized object detection.
ToLL: Topological Layout Learning with Asymmetric Cross-View Structural Distillation for 3D Scene Graph Generation Pretraining cs.CV · 2026-03-30 · unverdicted · none · ref 22
ToLL pretrains 3D scene graph generators via anchor-conditioned topological layout recovery and asymmetric structural distillation to learn predicate constraints rather than geometric interpolation shortcuts.
Contrastive-SDXL: Annotation-Preserving Night-Time Augmentation for Pedestrian Detection cs.CV · 2026-05-13 · unverdicted · none · ref 21
Contrastive-SDXL augments daytime images into realistic night-time versions using SDXL-Turbo with LoRA and multi-level DINOv2 contrastive losses, yielding 6-7% lower miss rate on pedestrian detection versus daytime-only training.
From Synthetic to Real: Toward Identity-Consistent Makeup Transfer with Synthetic and Real Data cs.CV · 2026-05-08 · unverdicted · none · ref 48
The work creates identity-consistent synthetic makeup data via ConsistentBeauty and adapts models to real images using reinforcement learning in RealBeauty, achieving better identity preservation and real-world performance than prior methods.
AttriBE: Quantifying Attribute Expressivity in Body Embeddings for Recognition and Identification cs.CV · 2026-04-29 · unverdicted · none · ref 60
Transformer-based ReID embeddings encode BMI most strongly in deeper layers, followed by pitch, gender, and yaw, with pose peaking in middle layers and BMI increasing with depth; cross-spectral settings shift reliance toward structural cues.
LAA-X: Unified Localized Artifact Attention for Quality-Agnostic and Generalizable Face Forgery Detection cs.CV · 2026-04-05 · unverdicted · none · ref 97
LAA-X uses multi-task learning with explicit localized artifact attention and blending synthesis to build a deepfake detector that generalizes to high-quality and unseen manipulations after training only on real and pseudo-fake samples.
SurgMotion: A Video-Native Foundation Model for Universal Understanding of Surgical Videos cs.CV · 2026-02-05 · unverdicted · none · ref 18
SurgMotion outperforms prior methods on 17 surgical video benchmarks by shifting pretraining to latent motion prediction with motion-guided masking, affinity distillation, and diversity regularization on a 15M-sample dataset.
Stylistic-STORM (ST-STORM) : Perceiving the Semantic Nature of Appearance cs.CV · 2026-04-17 · unverdicted · none · ref 46
ST-STORM introduces a dual-branch SSL framework that disentangles semantic content from stylistic appearance using gated latent streams, JEPA for content invariance, and adversarial constraints for style capture.

Emerging properties in self-supervised vision transformers

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer