neuralCAD-Edit benchmark shows even the best foundation model (GPT 5.2) scores 53% lower than human CAD experts in acceptance trials for multimodal-instructed 3D model edits.
hub Canonical reference
Diffedit: Diffusion-based semantic image editing with mask guidance
Canonical reference. 100% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
roles
background 6polarities
background 6representative citing papers
LD-Pruning applies latent discrepancy to prune tokens and adaptively skip unconditional branches in VAR models for up to 2.35x faster inference with preserved quality.
Object functionalization is cast as neural graph completion over a functional graph of parts, contacts, and motions, followed by geometry realization that also rectifies erroneous motions, demonstrated on furniture with a new paired dataset.
GeoEdit constructs local tangent frames from small perturbations to initial noise, enabling Jacobian-free on-manifold edits in diffusion models via alternating tangent steps and diffusion projections.
UniEditBench unifies image and video editing evaluation with a nine-plus-eight operation taxonomy and cost-effective 4B/8B distilled MLLM evaluators that align with human judgments.
DLEBench is the first benchmark for small-scale object editing in instruction-based image editing models, using 1889 samples, seven instruction types, and a dual-mode evaluation protocol to reveal performance gaps in 10 tested models.
RDSplat is the first 3D Gaussian Splatting watermarking method that maintains 0.701 bit accuracy against both 2D and 3D diffusion editing by embedding only in low-frequency primitives selected via FAPS.
Factored Classifier-Free Guidance enables per-attribute control in classifier-free guidance for diffusion models to produce more sound counterfactuals.
UniEdit-Flow presents tuning-free Uni-Inv and Uni-Edit methods for inversion and editing in flow models that achieve accurate reconstruction and robust region-preserving edits across generative models.
GeoEdit introduces a Lift-Manipulate-Render-Denoise pipeline with dual-branch denoising and variance-homogeneous injection for 3D-consistent object editing in single photos.
MirrorPPR extracts retouching operations from exemplar pairs via a dedicated extractor and transfers them to query images through a LoRA-adapted Diffusion Transformer, enabled by a new 47-million-pair dataset and self-augmentation for alignment.
SpatialFlow-GRPO adds region-level reward feedback and spatial alignment to Flow-GRPO-style RL for image editing, reporting gains on GEdit-Bench, ImgEdit-Bench, and a new MultiEditBench.
Introduces 9 synthetic annotation tasks and benchmarks for behavioral cloning, finding hierarchical skill learning, scaling benefits, effective multi-task pretraining, and shared internal representations of task phases and mistakes.
Empirical studies reveal instruction-level generalization as the main bottleneck in scribble-guided editing; three strategies (curriculum, multi-task mosaicking, edit-focused loss) achieve SOTA on VIBE benchmark.
ACE-LoRA introduces adaptive orthogonal decoupling and rank-invariant compression for continual image editing in diffusion models, plus the CIE-Bench benchmark.
PhysEdit introduces adaptive reasoning depth and spatial masking to make image editing faster and more instruction-aligned without retraining the base model.
Task-aware localization via attention cues and feature centroids from source/target streams in IIE models improves non-edit consistency while preserving instruction following.
Fine-tuning text-to-video models on sparse low-quality synthetic data for physical camera controls outperforms fine-tuning on photorealistic data.
FlashEdit delivers real-time localized text-guided image editing under 0.2 seconds via cycle-consistent one-step inversion, background shield, and sparsified spatial cross-attention, achieving over 150x speedup on PIE-Bench.
UniCodec uses LLM-driven semantic disentanglement at the encoder and diffusion-based compositional generation at the decoder to enable one codec for both human perception and machine vision tasks without task-specific retraining.
An ensemble of stage-specialized text-to-image diffusion models improves prompt alignment over single shared-parameter models while preserving visual quality and inference speed.
IdentiFace is a multi-modal iterative diffusion framework that generates identifiable suspect faces with improved identity retrieval for law enforcement applications.
DiffMagicFace uses concurrent fine-tuned text and image diffusion models plus a rendered multi-view dataset to achieve identity-consistent text-conditioned editing of real facial videos.
A score-based method is introduced to guide optimization in geometric view diffusion models toward correct viewpoints, improving convergence and sample efficiency over naive multistart strategies.
citing papers explorer
-
A Systematic Study of Behavioral Cloning for Scientific Data Annotation
Introduces 9 synthetic annotation tasks and benchmarks for behavioral cloning, finding hierarchical skill learning, scaling benefits, effective multi-task pretraining, and shared internal representations of task phases and mistakes.