Image-to-3D models successfully generate harmful geometries in most cases with under 0.3% caught by commercial filters; existing safeguards are weak but a stacked defense cuts harmful outputs to under 1% at 11% false-positive cost.
Canonical reference
Srinivasan, Matthew Tancik, Jonathan T
Canonical reference. 80% of citing Pith papers cite this work as background.
citation-role summary
citation-polarity summary
representative citing papers
Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.
GenRecon lifts object-level generative priors to scene-scale reconstruction by chunking scenes and using projection-based conditioning on multi-view features, claiming 16% better results than prior methods.
GS-Surrogate creates a canonical Gaussian field that is sequentially deformed by simulation parameters to enable real-time, controllable 3D exploration of ensemble data while separating simulation variations from visualization adjustments.
StereoPolicy fuses stereo image pairs via a Stereo Transformer on pretrained 2D encoders to boost robotic manipulation policies, showing gains over monocular, RGB-D, point cloud, and multi-view methods in simulations and real-robot tests.
YOGO reformulates stochastic 3D Gaussian Splatting into a deterministic budget-aware system and supplies an ultra-dense dataset to enforce physical fidelity over viewpoint interpolation.
EmbodiedHead introduces a Rectified-Flow Diffusion Transformer with differentiable renderer and single-stream listening-speaking conditioning to achieve real-time high-fidelity conversational avatars.
Habitat-GS integrates 3D Gaussian Splatting scene rendering and Gaussian avatars into Habitat-Sim, yielding agents with stronger cross-domain generalization and effective human-aware navigation.
AE-ViT combines a convolutional autoencoder with a latent-space transformer and multi-stage parameter plus coordinate injection to deliver stable long-horizon predictions for parametric PDEs, cutting relative rollout error by roughly five times versus prior DL-ROMs and ViTs on advection-diffusion-re
TrianguLang achieves state-of-the-art feed-forward text-guided 3D localization and segmentation by using predicted geometry to gate cross-view semantic correspondences without ground-truth poses.
Shap-E encodes 3D assets into implicit function parameters then uses a conditional diffusion model to generate new ones from text, enabling fast multi-representation 3D asset creation.
SpatialPrompt turns spatial sketches and voice prompts into executable constraints for controllable AI 3D generation in XR, enabling iterative collaborative creation with color-coded contributions.
Emulation of constrained GPUs reveals performance-energy trade-offs for real-time 3D Gaussian Splatting on edge devices.
citing papers explorer
-
On the Generation and Mitigation of Harmful Geometry in Image-to-3D Models
Image-to-3D models successfully generate harmful geometries in most cases with under 0.3% caught by commercial filters; existing safeguards are weak but a stacked defense cuts harmful outputs to under 1% at 11% false-positive cost.
-
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution
Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.
-
GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction
GenRecon lifts object-level generative priors to scene-scale reconstruction by chunking scenes and using projection-based conditioning on multi-view features, claiming 16% better results than prior methods.
-
GS-Surrogate: Deformable Gaussian Splatting for Parameter Space Exploration of Ensemble Simulations
GS-Surrogate creates a canonical Gaussian field that is sequentially deformed by simulation parameters to enable real-time, controllable 3D exploration of ensemble data while separating simulation variations from visualization adjustments.
-
StereoPolicy: Improving Robotic Manipulation Policies via Stereo Perception
StereoPolicy fuses stereo image pairs via a Stereo Transformer on pretrained 2D encoders to boost robotic manipulation policies, showing gains over monocular, RGB-D, point cloud, and multi-view methods in simulations and real-robot tests.
-
You Only Gaussian Once: Controllable 3D Gaussian Splatting for Ultra-Densely Sampled Scenes
YOGO reformulates stochastic 3D Gaussian Splatting into a deterministic budget-aware system and supplies an ultra-dense dataset to enforce physical fidelity over viewpoint interpolation.
-
EmbodiedHead: Real-Time Listening and Speaking Avatar for Conversational Agents
EmbodiedHead introduces a Rectified-Flow Diffusion Transformer with differentiable renderer and single-stream listening-speaking conditioning to achieve real-time high-fidelity conversational avatars.
-
Habitat-GS: A High-Fidelity Navigation Simulator with Dynamic Gaussian Splatting
Habitat-GS integrates 3D Gaussian Splatting scene rendering and Gaussian avatars into Habitat-Sim, yielding agents with stronger cross-domain generalization and effective human-aware navigation.
-
AE-ViT: Stable Long-Horizon Parametric Partial Differential Equations Modeling
AE-ViT combines a convolutional autoencoder with a latent-space transformer and multi-stage parameter plus coordinate injection to deliver stable long-horizon predictions for parametric PDEs, cutting relative rollout error by roughly five times versus prior DL-ROMs and ViTs on advection-diffusion-re
-
TrianguLang: Geometry-Aware Semantic Consensus for Pose-Free 3D Localization
TrianguLang achieves state-of-the-art feed-forward text-guided 3D localization and segmentation by using predicted geometry to gate cross-view semantic correspondences without ground-truth poses.
-
Shap-E: Generating Conditional 3D Implicit Functions
Shap-E encodes 3D assets into implicit function parameters then uses a conditional diffusion model to generate new ones from text, enabling fast multi-representation 3D asset creation.
-
SpatialPrompt: XR-Based Spatial Intent Expression as Executable Constraints for AI Generative 3D Design
SpatialPrompt turns spatial sketches and voice prompts into executable constraints for controllable AI 3D generation in XR, enabling iterative collaborative creation with color-coded contributions.
-
Splats under Pressure: Exploring Performance-Energy Trade-offs in Real-Time 3D Gaussian Splatting under Constrained GPU Budgets
Emulation of constrained GPUs reveals performance-energy trade-offs for real-time 3D Gaussian Splatting on edge devices.