DiTo shifts token reduction in DiTs to output token similarity, reusing prior-step matches across timesteps with PMR scheduling and frequency-aware penalties to raise PSNR at given speedups.
Title resolution pending
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 8years
2026 8verdicts
UNVERDICTED 8representative citing papers
UniGeo unifies geometric guidance across three levels in video models to reduce geometric drift and improve consistency in camera-controllable image editing.
OmniCamera disentangles video content and camera motion for multi-task generation with arbitrary camera control via the OmniCAM hybrid dataset and Dual-level Curriculum Co-Training.
HairOrbit leverages video generation priors and a neural orientation extractor to achieve state-of-the-art strand-level 3D hair reconstruction from single-view portraits in visible and invisible regions.
StreamGVE enables high-quality training-free video editing by converting the task to noise-to-data streaming generation with dual-branch fast sampling, self-attention bridges, cross-attention grounding, source-oriented guidance, and visual prompting.
SmartPhotoCrafter performs automatic photographic image editing by coupling an Image Critic module that identifies deficiencies with a Photographic Artist module that generates edits, trained via multi-stage pretraining, reasoning supervision, and reinforcement learning.
UniCSG adds staged semantic disentanglement and frequency-aware reconstruction to DiT diffusion models to improve content preservation and style fidelity in both text- and reference-guided generation.
A training-free method with time-dependent attention gating and trajectory pruning enhances object-background balance in diffusion-based image synthesis.
citing papers explorer
-
Rethinking Token Reduction for Diffusion Models via Output-Similarity-Awareness
DiTo shifts token reduction in DiTs to output token similarity, reusing prior-step matches across timesteps with PMR scheduling and frequency-aware penalties to raise PSNR at given speedups.
-
UniGeo: Unifying Geometric Guidance for Camera-Controllable Image Editing via Video Models
UniGeo unifies geometric guidance across three levels in video models to reduce geometric drift and improve consistency in camera-controllable image editing.
-
OmniCamera: A Unified Framework for Multi-task Video Generation with Arbitrary Camera Control
OmniCamera disentangles video content and camera motion for multi-task generation with arbitrary camera control via the OmniCAM hybrid dataset and Dual-level Curriculum Co-Training.
-
HairOrbit: Multi-view Aware 3D Hair Modeling from Single Portraits
HairOrbit leverages video generation priors and a neural orientation extractor to achieve state-of-the-art strand-level 3D hair reconstruction from single-view portraits in visible and invisible regions.
-
StreamGVE: Training-Free Video Editing via Few-Step Streaming Video Generation
StreamGVE enables high-quality training-free video editing by converting the task to noise-to-data streaming generation with dual-branch fast sampling, self-attention bridges, cross-attention grounding, source-oriented guidance, and visual prompting.
-
SmartPhotoCrafter: Unified Reasoning, Generation and Optimization for Automatic Photographic Image Editing
SmartPhotoCrafter performs automatic photographic image editing by coupling an Image Critic module that identifies deficiencies with a Photographic Artist module that generates edits, trained via multi-stage pretraining, reasoning supervision, and reinforcement learning.
-
UniCSG: Unified High-Fidelity Content-Constrained Style-Driven Generation via Staged Semantic and Frequency Disentanglement
UniCSG adds staged semantic disentanglement and frequency-aware reconstruction to DiT diffusion models to improve content preservation and style fidelity in both text- and reference-guided generation.
-
Training-Free Object-Background Compositional T2I via Dynamic Spatial Guidance and Multi-Path Pruning
A training-free method with time-dependent attention gating and trajectory pruning enhances object-background balance in diffusion-based image synthesis.