ResetEdit embeds a recoverable discrepancy signal during image generation in diffusion models to reconstruct an approximate original latent for high-fidelity text-guided editing.
hub
arXiv preprint arXiv:2310.01506 (2023)
18 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
fields
cs.CV 18representative citing papers
Z²-Sampling implicitly realizes zero-cost zigzag trajectories for curvature-aware semantic alignment in diffusion models by reducing multi-step paths via operator dualities and temporal caching while synthesizing a directional derivative penalty.
Masked Logit Nudging aligns visual autoregressive model logits with source token maps under target prompts inside cross-attention masks, delivering top image editing results on PIE benchmarks and strong reconstructions on COCO and OpenImages while running faster than diffusion approaches.
RewardFlow unifies differentiable rewards including a new VQA-based one and uses a prompt-aware adaptive policy with Langevin dynamics to achieve state-of-the-art image editing and compositional generation.
LPNSR derives optimal intermediate noise for diffusion SR via MLE and implements it with an LR-guided noise predictor, reaching SOTA perceptual quality in 4 steps without text priors.
LivingSwap is the first video reference-guided face swapping model that uses keyframe conditioning and temporal stitching to preserve source video realism with high fidelity across long sequences.
DRFS is a new inversion-free editing technique for rectified flow models that models source-target velocity discrepancies and applies a time-dependent shift to improve fidelity and unify prior methods like DDS and FlowEdit.
ICEdit achieves state-of-the-art instructional image editing in Diffusion Transformers via in-context generation, requiring only 0.1% of prior training data and 1% trainable parameters.
VAGS adapts the CFG scale at each ODE step using velocity alignment signals to raise structural fidelity in editing and sample quality in generation over fixed-scale baselines.
LimeCross enables text-guided editing of individual layers in composite images by conditioning on cross-layer context via bi-stream attention while preserving layer integrity and introducing the LayerEditBench benchmark.
Task-aware localization via attention cues and feature centroids from source/target streams in IIE models improves non-edit consistency while preserving instruction following.
FlashEdit delivers real-time localized text-guided image editing under 0.2 seconds via cycle-consistent one-step inversion, background shield, and sparsified spatial cross-attention, achieving over 150x speedup on PIE-Bench.
EditVerse unifies image and video editing and generation in one transformer model via unified token sequences and in-context learning, trained jointly on curated video editing data plus image/video corpora and evaluated on a new instruction-based benchmark.
CPAM proposes a context-preserving adaptive manipulation method for zero-shot real image editing in diffusion models via preservation adaptation and localized extraction modules, outperforming prior techniques on a new IMBA benchmark.
Grounded SAM integrates Grounding DINO and SAM to support text-prompted open-world detection and segmentation, achieving 48.7 mean AP on SegInW zero-shot with the base detector and huge segmenter.
Near-reversible Runge-Kutta diffusion ODE solvers with vector-field smoothing improve stability and edit fidelity for large changes in text-guided image editing compared to exactly reversible alternatives.
citing papers explorer
-
$Z^2$-Sampling: Zero-Cost Zigzag Trajectories for Semantic Alignment in Diffusion Models
Z²-Sampling implicitly realizes zero-cost zigzag trajectories for curvature-aware semantic alignment in diffusion models by reducing multi-step paths via operator dualities and temporal caching while synthesizing a directional derivative penalty.