Image-to-3D models successfully generate harmful geometries in most cases with under 0.3% caught by commercial filters; existing safeguards are weak but a stacked defense cuts harmful outputs to under 1% at 11% false-positive cost.
hub Canonical reference
Srinivasan, Matthew Tancik, Jonathan T
Canonical reference. 80% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.
RAM is a morphology-conditioned implicit neural representation trained on 3e10 forward-kinematics samples that serves as a fast, differentiable surrogate for pose reachability and generalizes to unseen morphologies while accounting for self-collisions.
GenRecon lifts object-level generative priors to scene-scale reconstruction by chunking scenes and using projection-based conditioning on multi-view features, claiming 16% better results than prior methods.
GS-Surrogate creates a canonical Gaussian field that is sequentially deformed by simulation parameters to enable real-time, controllable 3D exploration of ensemble data while separating simulation variations from visualization adjustments.
A neural marking scheme trained with contrastive learning tightens constraints on σ8 by 2.9× and Ωm by 1.8× over classical marks at k_max=0.2 h/Mpc while breaking their degeneracy at the Fisher level.
ADIGen generates counterfactuals under general interventions via Riesz regression, causal invariance, and orthogonal learning, with excess-risk bounds featuring product-bias remainder and invariant risk across environments.
YOGO reformulates stochastic 3D Gaussian Splatting into a deterministic budget-aware system and supplies an ultra-dense dataset to enforce physical fidelity over viewpoint interpolation.
EmbodiedHead introduces a Rectified-Flow Diffusion Transformer with differentiable renderer and single-stream listening-speaking conditioning to achieve real-time high-fidelity conversational avatars.
Habitat-GS integrates 3D Gaussian Splatting scene rendering and Gaussian avatars into Habitat-Sim, yielding agents with stronger cross-domain generalization and effective human-aware navigation.
AE-ViT combines a convolutional autoencoder with a latent-space transformer and multi-stage parameter plus coordinate injection to deliver stable long-horizon predictions for parametric PDEs, cutting relative rollout error by roughly five times versus prior DL-ROMs and ViTs on advection-diffusion-re
TrianguLang achieves state-of-the-art feed-forward text-guided 3D localization and segmentation by using predicted geometry to gate cross-view semantic correspondences without ground-truth poses.
Shap-E encodes 3D assets into implicit function parameters then uses a conditional diffusion model to generate new ones from text, enabling fast multi-representation 3D asset creation.
SparseStreet applies node-based learnable pruning followed by static background compression to 3D Gaussian Splatting, reporting up to 80% reduction in primitives with minimal quality loss on Waymo and nuScenes street scene data.
WebSpline uses learnable cubic Hermite splines guided by a Structural Proxy Graph to deliver state-of-the-art quality dynamic 3D Gaussian rendering from monocular videos at over 10x the speed of prior methods on iPhone and NVIDIA benchmarks.
SpatialPrompt turns spatial sketches and voice prompts into executable constraints for controllable AI 3D generation in XR, enabling iterative collaborative creation with color-coded contributions.
Dual-stream EEG decoder separates identity and orientation to support 3D reconstruction from neural signals via circular regression and conditioned diffusion.
A method to generate personalized hand avatars from two views in a fraction of the time of optimization-based approaches.
StereoPolicy fuses left-right image features via cross-attention to deliver consistent gains over RGB, RGB-D, point cloud, and multi-view baselines in simulation and real-robot manipulation tasks.
Emulation of constrained GPUs reveals performance-energy trade-offs for real-time 3D Gaussian Splatting on edge devices.
citing papers explorer
-
On the Generation and Mitigation of Harmful Geometry in Image-to-3D Models
Image-to-3D models successfully generate harmful geometries in most cases with under 0.3% caught by commercial filters; existing safeguards are weak but a stacked defense cuts harmful outputs to under 1% at 11% false-positive cost.
-
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution
Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.
-
RAM: Reachability Across Morphologies
RAM is a morphology-conditioned implicit neural representation trained on 3e10 forward-kinematics samples that serves as a fast, differentiable surrogate for pose reachability and generalizes to unseen morphologies while accounting for self-collisions.
-
GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction
GenRecon lifts object-level generative priors to scene-scale reconstruction by chunking scenes and using projection-based conditioning on multi-view features, claiming 16% better results than prior methods.
-
GS-Surrogate: Deformable Gaussian Splatting for Parameter Space Exploration of Ensemble Simulations
GS-Surrogate creates a canonical Gaussian field that is sequentially deformed by simulation parameters to enable real-time, controllable 3D exploration of ensemble data while separating simulation variations from visualization adjustments.
-
Interpretable Neural Marked Statistics for Cosmological Inference
A neural marking scheme trained with contrastive learning tightens constraints on σ8 by 2.9× and Ωm by 1.8× over classical marks at k_max=0.2 h/Mpc while breaking their degeneracy at the Fisher level.
-
Automatic, Debiased, and Invariant Counterfactual Generation under General Interventions
ADIGen generates counterfactuals under general interventions via Riesz regression, causal invariance, and orthogonal learning, with excess-risk bounds featuring product-bias remainder and invariant risk across environments.
-
You Only Gaussian Once: Controllable 3D Gaussian Splatting for Ultra-Densely Sampled Scenes
YOGO reformulates stochastic 3D Gaussian Splatting into a deterministic budget-aware system and supplies an ultra-dense dataset to enforce physical fidelity over viewpoint interpolation.
-
EmbodiedHead: Real-Time Listening and Speaking Avatar for Conversational Agents
EmbodiedHead introduces a Rectified-Flow Diffusion Transformer with differentiable renderer and single-stream listening-speaking conditioning to achieve real-time high-fidelity conversational avatars.
-
Habitat-GS: A High-Fidelity Navigation Simulator with Dynamic Gaussian Splatting
Habitat-GS integrates 3D Gaussian Splatting scene rendering and Gaussian avatars into Habitat-Sim, yielding agents with stronger cross-domain generalization and effective human-aware navigation.
-
AE-ViT: Stable Long-Horizon Parametric Partial Differential Equations Modeling
AE-ViT combines a convolutional autoencoder with a latent-space transformer and multi-stage parameter plus coordinate injection to deliver stable long-horizon predictions for parametric PDEs, cutting relative rollout error by roughly five times versus prior DL-ROMs and ViTs on advection-diffusion-re
-
TrianguLang: Geometry-Aware Semantic Consensus for Pose-Free 3D Localization
TrianguLang achieves state-of-the-art feed-forward text-guided 3D localization and segmentation by using predicted geometry to gate cross-view semantic correspondences without ground-truth poses.
-
Shap-E: Generating Conditional 3D Implicit Functions
Shap-E encodes 3D assets into implicit function parameters then uses a conditional diffusion model to generate new ones from text, enabling fast multi-representation 3D asset creation.
-
SparseStreet: Sparse Gaussian Splatting for Real-Time Street Scene Simulation
SparseStreet applies node-based learnable pruning followed by static background compression to 3D Gaussian Splatting, reporting up to 80% reduction in primitives with minimal quality loss on Waymo and nuScenes street scene data.
-
WebSpline: Structure-Informed Splines for Real-Time 3D Gaussians from Monocular Videos
WebSpline uses learnable cubic Hermite splines guided by a Structural Proxy Graph to deliver state-of-the-art quality dynamic 3D Gaussian rendering from monocular videos at over 10x the speed of prior methods on iPhone and NVIDIA benchmarks.
-
SpatialPrompt: XR-Based Spatial Intent Expression as Executable Constraints for AI Generative 3D Design
SpatialPrompt turns spatial sketches and voice prompts into executable constraints for controllable AI 3D generation in XR, enabling iterative collaborative creation with color-coded contributions.
-
Dual-Stream EEG Decoding for 3D Visual Perception
Dual-stream EEG decoder separates identity and orientation to support 3D reconstruction from neural signals via circular regression and conditioned diffusion.
-
PHAF-Personalized Hand Avatars in a Flash
A method to generate personalized hand avatars from two views in a fraction of the time of optimization-based approaches.
-
StereoPolicy: Improving Robotic Manipulation Policies via Stereo Perception
StereoPolicy fuses left-right image features via cross-attention to deliver consistent gains over RGB, RGB-D, point cloud, and multi-view baselines in simulation and real-robot manipulation tasks.
-
Splats under Pressure: Exploring Performance-Energy Trade-offs in Real-Time 3D Gaussian Splatting under Constrained GPU Budgets
Emulation of constrained GPUs reveals performance-energy trade-offs for real-time 3D Gaussian Splatting on edge devices.