{"total":18,"items":[{"citing_arxiv_id":"2605.21190","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Semantic Granularity Navigation in Image Editing","primary_cat":"cs.CV","submitted_at":"2026-05-20T13:53:13+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15661","ref_index":30,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"VAGS: Velocity Adaptive Guidance Scale for Image Editing and Generation","primary_cat":"cs.CV","submitted_at":"2026-05-15T06:32:49+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"VAGS adapts the CFG scale at each ODE step using velocity alignment signals to raise structural fidelity in editing and sample quality in generation over fixed-scale baselines.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16399","ref_index":21,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Stable and Near-Reversible Diffusion ODE Solvers for Image Editing","primary_cat":"cs.CV","submitted_at":"2026-05-12T18:34:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Near-reversible Runge-Kutta diffusion ODE solvers with vector-field smoothing improve stability and edit fidelity for large changes in text-guided image editing compared to exactly reversible alternatives.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.10319","ref_index":21,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LimeCross: Context-Conditioned Layered Image Editing with Structural Consistency","primary_cat":"cs.CV","submitted_at":"2026-05-11T10:18:28+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LimeCross enables text-guided editing of individual layers in composite images by conditioning on cross-layer context via bi-stream attention while preserving layer integrity and introducing the LayerEditBench benchmark.","context_count":1,"top_context_role":"dataset","top_context_polarity":"use_dataset","context_text":"Training-free editors can broadly be categorized into inversion-based and inversion-free approaches. Inversion-based methods first map the source image into the latent or noise space of a pretrained model and then reconstruct an edited result under a target prompt, improving locality and semantic control via attention[15,32,42,47,55]orguidancemanipulation[21,31,41,51,53].Incontrast, inversion-free methods directly modify the sampling trajectory without explicit inversion. These include consistency-based sampling [54] and velocity-based for- mulations such as FlowEdit [24], with subsequent refinements for stability and content preservation [23,38,57,61]. Motivated by practical layered workflows, recent works have begun to move"},{"citing_arxiv_id":"2605.02417","ref_index":25,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"DirectEdit: Step-Level Accurate Inversion for Flow-Based Image Editing","primary_cat":"cs.CV","submitted_at":"2026-05-04T10:09:18+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.25128","ref_index":10,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ResetEdit: Precise Text-guided Editing of Generated Image via Resettable Starting Latent","primary_cat":"cs.CV","submitted_at":"2026-04-28T02:05:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"ResetEdit embeds a recoverable discrepancy signal during image generation in diffusion models to reconstruct an approximate original latent for high-fidelity text-guided editing.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.23536","ref_index":14,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"$Z^2$-Sampling: Zero-Cost Zigzag Trajectories for Semantic Alignment in Diffusion Models","primary_cat":"cs.CV","submitted_at":"2026-04-26T05:16:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Z²-Sampling implicitly realizes zero-cost zigzag trajectories for curvature-aware semantic alignment in diffusion models by reducing multi-step paths via operator dualities and temporal caching while synthesizing a directional derivative penalty.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"This error accumulates over time, pushing latents away from the marginal distribution and causing numerical drift, structural artifacts, and color oversaturation [19, 35]. Can we leverage the semantic benefits of zigzag exploration without incurring computational overhead and off-manifold drift? Inspired by recent advances in direct ODE inversion [14]-which recycle latent noise to mitigate inversion errors during image editing-we address this question through the principle ofExact Noise Reuse. We proposeImplicit Z-Sampling, which, instead of evaluating the network at out-of-distribution intermediate states, deliberately recycles the un- conditional noise evaluated at the exact on-manifold anchor."},{"citing_arxiv_id":"2604.20258","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Rethinking Where to Edit: Task-Aware Localization for Instruction-Based Image Editing","primary_cat":"cs.CV","submitted_at":"2026-04-22T07:08:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Task-aware localization via attention cues and feature centroids from source/target streams in IIE models improves non-edit consistency while preserving instruction following.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.14591","ref_index":22,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Prompt-Guided Image Editing with Masked Logit Nudging in Visual Autoregressive Models","primary_cat":"cs.CV","submitted_at":"2026-04-16T03:47:35+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Masked Logit Nudging aligns visual autoregressive model logits with source token maps under target prompts inside cross-attention masks, delivering top image editing results on PIE benchmarks and strong reconstructions on COCO and OpenImages while running faster than diffusion approaches.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.08536","ref_index":19,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"RewardFlow: Generate Images by Optimizing What You Reward","primary_cat":"cs.CV","submitted_at":"2026-04-09T17:59:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"RewardFlow unifies differentiable rewards including a new VQA-based one and uses a prompt-aware adaptive policy with Langevin dynamics to achieve state-of-the-art image editing and compositional generation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2603.21045","ref_index":40,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LPNSR: Optimal Noise-Guided Diffusion Image Super-Resolution Via Learnable Noise Prediction","primary_cat":"cs.CV","submitted_at":"2026-03-22T03:52:38+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"LPNSR derives optimal intermediate noise for diffusion SR via MLE and implements it with an LR-guided noise predictor, reaching SOTA perceptual quality in 4 steps without text priors.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2512.07951","ref_index":20,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Preserving Source Video Realism: High-Fidelity Face Swapping for Cinematic Quality","primary_cat":"cs.CV","submitted_at":"2025-12-08T19:00:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"LivingSwap is the first video reference-guided face swapping model that uses keyframe conditioning and temporal stitching to preserve source video realism with high fidelity across long sequences.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.22244","ref_index":13,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"FlashEdit: Decoupling Speed, Structure, and Semantics for Precise Image Editing","primary_cat":"cs.CV","submitted_at":"2025-09-26T11:59:30+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"FlashEdit delivers real-time localized text-guided image editing under 0.2 seconds via cycle-consistent one-step inversion, background shield, and sparsified spatial cross-attention, achieving over 150x speedup on PIE-Bench.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.20360","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning","primary_cat":"cs.CV","submitted_at":"2025-09-24T17:59:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"EditVerse unifies image and video editing and generation in one transformer model via unified token sequences and in-context learning, trained jointly on curated video editing data plus image/video corpora and evaluated on a new instruction-based benchmark.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.05342","ref_index":13,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Delta Rectified Flow Sampling for Text-to-Image Editing","primary_cat":"cs.CV","submitted_at":"2025-09-01T21:51:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"DRFS is a new inversion-free editing technique for rectified flow models that models source-target velocity discrepancies and applies a time-dependent shift to improve fidelity and unify prior methods like DDS and FlowEdit.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2506.18438","ref_index":21,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"CPAM: Context-Preserving Adaptive Manipulation for Zero-Shot Real Image Editing","primary_cat":"cs.CV","submitted_at":"2025-06-23T09:19:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"CPAM proposes a context-preserving adaptive manipulation method for zero-shot real image editing in diffusion models via preservation adaptation and localized extraction modules, outperforming prior techniques on a new IMBA benchmark.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2504.20690","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer","primary_cat":"cs.CV","submitted_at":"2025-04-29T12:14:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"ICEdit achieves state-of-the-art instructional image editing in Diffusion Transformers via in-context generation, requiring only 0.1% of prior training data and 1% trainable parameters.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2401.14159","ref_index":21,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks","primary_cat":"cs.CV","submitted_at":"2024-01-25T13:12:09+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Grounded SAM integrates Grounding DINO and SAM to support text-prompted open-world detection and segmentation, achieving 48.7 mean AP on SegInW zero-shot with the base detector and huge segmenter.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Tag2Text: Guiding Vision-Language Model via Image Tag- ging, 2023. 2, 4 [19] Ding Jia, Yuhui Yuan, Haodi He, Xiaopei Wu, Haojun Yu, Weihong Lin, Lei Sun, Chao Zhang, and Han Hu. DETRs with Hybrid Matching. arXiv preprint arXiv:2207.13080 , 2022. 2 [20] Qing Jiang, Feng Li, Tianhe Ren, Shilong Liu, Zhaoyang Zeng, Kent Yu, and Lei Zhang. T-Rex: Counting by Visual Prompting, 2023. 2 [21] Xuan Ju, Ailing Zeng, Yuxuan Bian, Shaoteng Liu, and Qiang Xu. Direct inversion: Boosting diffusion-based editing with 3 lines of code. arXiv preprint arXiv:2310.01506, 2023. 2 [22] Xuan Ju, Ailing Zeng, Chenchen Zhao, Jianan Wang, Lei Zhang, and Qiang Xu. HumanSD: A native skeleton-guided diffusion model for human image generation. 2023. 2 [23] Minguk Kang, Jun-Yan Zhu, Richard Zhang, Jaesik Park, Eli"}],"limit":50,"offset":0}