arXiv preprint arXiv:2505.05469 , year =

Generating Physically Stable · 2025 · arXiv 2505.05469

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Thinking in Blender: Staged Executable Inverse Graphics with Vision-Language Models

cs.CV · 2026-06-01 · unverdicted · novelty 7.0

SEIG uses staged VLM prompting to output executable Blender programs that reconstruct editable 3D scenes from single images, showing improved fidelity over non-staged baselines.

Learning to Build Shapes by Extrusion

cs.GR · 2026-01-30 · unverdicted · novelty 7.0

Text Encoded Extrusions (TEE) lets LLMs generate and edit manifold 3D meshes by learning sequences of face extrusions from decomposed quadrilateral meshes.

Voxify3D: Pixel Art Meets Volumetric Rendering

cs.CV · 2025-12-08 · unverdicted · novelty 7.0

Voxify3D generates voxel art from 3D meshes via orthographic pixel supervision, patch-based CLIP alignment, and palette-constrained Gumbel-Softmax quantization, achieving 37.12 CLIP-IQA and 77.90% user preference.

CG-MLLM: Captioning and Generating 3D content via Multi-modal Large Language Models

cs.CV · 2026-01-29 · unverdicted · novelty 5.0

CG-MLLM is a multimodal LLM using a Mixture-of-Transformer architecture with separate TokenAR and BlockAR components integrated with a pre-trained vision-language backbone and 3D VAE to enable 3D captioning and high-fidelity generation.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Thinking in Blender: Staged Executable Inverse Graphics with Vision-Language Models cs.CV · 2026-06-01 · unverdicted · none · ref 18
SEIG uses staged VLM prompting to output executable Blender programs that reconstruct editable 3D scenes from single images, showing improved fidelity over non-staged baselines.
CG-MLLM: Captioning and Generating 3D content via Multi-modal Large Language Models cs.CV · 2026-01-29 · unverdicted · none · ref 18
CG-MLLM is a multimodal LLM using a Mixture-of-Transformer architecture with separate TokenAR and BlockAR components integrated with a pre-trained vision-language backbone and 3D VAE to enable 3D captioning and high-fidelity generation.

arXiv preprint arXiv:2505.05469 , year =

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer