Michelangelo: Conditional 3D Shape Generation Based on Shape-Image- Text Aligned Latent Representation

Zibo Zhao et al

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

CG-MLLM: Captioning and Generating 3D content via Multi-modal Large Language Models

cs.CV · 2026-01-29 · unverdicted · novelty 5.0

CG-MLLM is a multimodal LLM using a Mixture-of-Transformer architecture with separate TokenAR and BlockAR components integrated with a pre-trained vision-language backbone and 3D VAE to enable 3D captioning and high-fidelity generation.

citing papers explorer

Showing 1 of 1 citing paper.

CG-MLLM: Captioning and Generating 3D content via Multi-modal Large Language Models cs.CV · 2026-01-29 · unverdicted · none · ref 39
CG-MLLM is a multimodal LLM using a Mixture-of-Transformer architecture with separate TokenAR and BlockAR components integrated with a pre-trained vision-language backbone and 3D VAE to enable 3D captioning and high-fidelity generation.

Michelangelo: Conditional 3D Shape Generation Based on Shape-Image- Text Aligned Latent Representation

fields

years

verdicts

representative citing papers

citing papers explorer