Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions

Aleksander Holynski; Alexei A. Efros; Angjoo Kanazawa; Ayaan Haque; Matthew Tancik

arxiv: 2303.12789 · v2 · pith:XYWDAIQSnew · submitted 2023-03-22 · 💻 cs.CV · cs.GR

Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions

Ayaan Haque , Matthew Tancik , Alexei A. Efros , Aleksander Holynski , Angjoo Kanazawa This is my paper

classification 💻 cs.CV cs.GR

keywords editmethodscenescenesableeditingimagesnerf

0 comments

read the original abstract

We propose a method for editing NeRF scenes with text-instructions. Given a NeRF of a scene and the collection of images used to reconstruct it, our method uses an image-conditioned diffusion model (InstructPix2Pix) to iteratively edit the input images while optimizing the underlying scene, resulting in an optimized 3D scene that respects the edit instruction. We demonstrate that our proposed method is able to edit large-scale, real-world scenes, and is able to accomplish more realistic, targeted edits than prior work.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 7 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution
cs.CL 2023-09 unverdicted novelty 8.0

Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.
Information-Regularized Constrained Inversion for Stable Avatar Editing from Sparse Supervision
cs.CV 2026-04 unverdicted novelty 7.0

A conditioning-guided constrained inversion method restricts avatar edits to a low-dimensional part-specific subspace and uses an information matrix spectrum from pipeline linearization to predict and ensure stability...
StereoPolicy: Improving Robotic Manipulation Policies via Stereo Perception
cs.RO 2026-05 unverdicted novelty 6.0

StereoPolicy fuses stereo image pairs via a Stereo Transformer on pretrained 2D encoders to boost robotic manipulation policies, showing gains over monocular, RGB-D, point cloud, and multi-view methods in simulations ...
DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory
cs.CV 2023-08 unverdicted novelty 6.0

DragNUWA integrates text, image, and trajectory controls into a diffusion video model using a Trajectory Sampler, Multiscale Fusion, and Adaptive Training to enable fine-grained open-domain video generation.
Native3D: End-to-End 3D Scene Generation via Unified Mesh-Texture Modeling and Semantic Alignment
cs.CV 2026-06 unverdicted novelty 5.0

Native3D introduces a direct 3D scene generation method using unified mesh-texture representation and 3D REPA Loss for semantic alignment, claimed to outperform prior 2D-dependent approaches.
DreamEdit3D: Personalization of Multi-View Diffusion Models for 3D Editing
cs.CV 2026-05 unverdicted novelty 5.0

DreamEdit3D learns separate token embeddings for segmented object components via two-phase multi-view optimization to enable text-guided 3D editing with consistent image generation and mesh reconstruction.
StereoPolicy: Improving Robotic Manipulation Policies via Stereo Perception
cs.RO 2026-05 unverdicted novelty 4.0

StereoPolicy fuses left-right image features via cross-attention to deliver consistent gains over RGB, RGB-D, point cloud, and multi-view baselines in simulation and real-robot manipulation tasks.