All are worth words: A vit backbone for diffusion models

Fan Bao, Shen Nie, Kaiwen Xue, Yue Cao, Chongxuan Li, Hang Su, Jun Zhu · 2023

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

FlashClear: Ultra-Fast Image Content Removal via Efficient Step Distillation and Feature Caching

cs.CV · 2026-05-09 · unverdicted · novelty 6.0 · 2 refs

FlashClear delivers up to 122x faster object removal than prior diffusion models via adversarial step distillation and asymmetric attention caching while preserving visual quality.

Show-o2: Improved Native Unified Multimodal Models

cs.CV · 2025-06-18 · unverdicted · novelty 4.0

Show-o2 unifies text, image, and video understanding and generation in a single autoregressive-plus-flow-matching model built on 3D causal VAE representations.

citing papers explorer

Showing 2 of 2 citing papers.

FlashClear: Ultra-Fast Image Content Removal via Efficient Step Distillation and Feature Caching cs.CV · 2026-05-09 · unverdicted · none · ref 2 · 2 links
FlashClear delivers up to 122x faster object removal than prior diffusion models via adversarial step distillation and asymmetric attention caching while preserving visual quality.
Show-o2: Improved Native Unified Multimodal Models cs.CV · 2025-06-18 · unverdicted · none · ref 9
Show-o2 unifies text, image, and video understanding and generation in a single autoregressive-plus-flow-matching model built on 3D causal VAE representations.

All are worth words: A vit backbone for diffusion models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer