Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions
read the original abstract
We propose a method for editing NeRF scenes with text-instructions. Given a NeRF of a scene and the collection of images used to reconstruct it, our method uses an image-conditioned diffusion model (InstructPix2Pix) to iteratively edit the input images while optimizing the underlying scene, resulting in an optimized 3D scene that respects the edit instruction. We demonstrate that our proposed method is able to edit large-scale, real-world scenes, and is able to accomplish more realistic, targeted edits than prior work.
This paper has not been read by Pith yet.
Forward citations
Cited by 7 Pith papers
-
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution
Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.
-
Information-Regularized Constrained Inversion for Stable Avatar Editing from Sparse Supervision
A conditioning-guided constrained inversion method restricts avatar edits to a low-dimensional part-specific subspace and uses an information matrix spectrum from pipeline linearization to predict and ensure stability...
-
StereoPolicy: Improving Robotic Manipulation Policies via Stereo Perception
StereoPolicy fuses stereo image pairs via a Stereo Transformer on pretrained 2D encoders to boost robotic manipulation policies, showing gains over monocular, RGB-D, point cloud, and multi-view methods in simulations ...
-
DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory
DragNUWA integrates text, image, and trajectory controls into a diffusion video model using a Trajectory Sampler, Multiscale Fusion, and Adaptive Training to enable fine-grained open-domain video generation.
-
Native3D: End-to-End 3D Scene Generation via Unified Mesh-Texture Modeling and Semantic Alignment
Native3D introduces a direct 3D scene generation method using unified mesh-texture representation and 3D REPA Loss for semantic alignment, claimed to outperform prior 2D-dependent approaches.
-
DreamEdit3D: Personalization of Multi-View Diffusion Models for 3D Editing
DreamEdit3D learns separate token embeddings for segmented object components via two-phase multi-view optimization to enable text-guided 3D editing with consistent image generation and mesh reconstruction.
-
StereoPolicy: Improving Robotic Manipulation Policies via Stereo Perception
StereoPolicy fuses left-right image features via cross-attention to deliver consistent gains over RGB, RGB-D, point cloud, and multi-view baselines in simulation and real-robot manipulation tasks.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.