hub

Step1X-3D: Towards high-fidelity and controllable generation of textured 3D assets

Weiyu Li, Xuanyang Zhang, Zheng Sun, Di Qi, Hao Li, Wei Cheng, Weiwei Cai, Shihao Wu, Jiarui Liu, Zihao Wang, et al · 2025 · arXiv 2505.07747

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it

read on arXiv browse 15 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

Towards Realistic 3D Emission Materials: Dataset, Baseline, and Evaluation for Emission Texture Generation

cs.CV · 2026-04-13 · unverdicted · novelty 8.0

The work creates the first dataset and baseline for generating emission textures on 3D objects to reproduce glowing materials from input images.

VDE: Training-Free Accelerating Rectified Flow Model via Velocity Decomposition and Estimation

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

VDE accelerates rectified flow models like Flux by 3.22x with LPIPS of 0.069 via velocity decomposition into parallel/orthogonal components plus periodic full-pass anchoring.

Who Generated This 3D Asset? Learning Source Attribution for Generative 3D Models

cs.CV · 2026-05-18 · unverdicted · novelty 7.0

Introduces the first passive source attribution benchmark for 22 generative 3D models and a Transformer achieving 97.22% accuracy under full supervision and 77.17% with 1% training data.

3D-Fixer: Coarse-to-Fine In-place Completion for 3D Scenes from a Single Image

cs.CV · 2026-04-06 · unverdicted · novelty 7.0

3D-Fixer performs in-place 3D asset completion from single-view partial point clouds via coarse-to-fine generation with ORFA conditioning, plus a new ARSG-110K dataset, to achieve higher geometric accuracy than MIDI and Gen3DSR while keeping diffusion efficiency.

Fast 4D Mesh Generation by Spatio-Temporal Attention Chains

cs.CV · 2026-05-19 · unverdicted · novelty 6.0

A training-free Spatio-Temporal Attention Chain framework accelerates 4D mesh generation 13x, improves quality, scales to 16x longer videos, and supports downstream tracking and camera estimation.

Sculpt4D: Generating 4D Shapes via Sparse-Attention Diffusion Transformers

cs.CV · 2026-04-23 · unverdicted · novelty 6.0

Sculpt4D generates temporally coherent 4D shapes by integrating a block sparse attention mechanism with time-decaying mask into a pretrained 3D diffusion transformer, achieving SOTA results with 56% less computation.

Beyond Voxel 3D Editing: Learning from 3D Masks and Self-Constructed Data

cs.CV · 2026-04-15 · unverdicted · novelty 6.0

BVE framework enables text-guided 3D editing beyond voxel limits by combining self-constructed data, lightweight semantic injection, and annotation-free masking to preserve local invariance.

LSRM: High-Fidelity Object-Centric Reconstruction via Scaled Context Windows

cs.CV · 2026-04-06 · conditional · novelty 6.0

LSRM scales transformer context windows with native sparse attention and geometric routing to deliver high-fidelity feed-forward 3D reconstruction and inverse rendering that approaches dense optimization quality.

Native and Compact Structured Latents for 3D Generation

cs.CV · 2025-12-16 · unverdicted · novelty 6.0

Introduces O-Voxel omni-voxel representation and Sparse Compression VAE for structured native 3D latents, enabling efficient training of large flow-matching models that produce higher-quality geometry and materials than prior methods.

EVA01: Unified Native 3D Understanding and Generation via Mixture-of-Transformers

cs.CV · 2026-05-16 · unverdicted · novelty 5.0

EVA01 introduces a Mixture-of-Transformers model that natively adds 3D mesh understanding, generation, and multi-turn editing to MLLMs by decoupling understanding and generation experts with shared global self-attention.

WorldAct: Activating Monolithic 3D Worlds into Interactive-Ready Object-Centric Scenes

cs.CV · 2026-05-15 · unverdicted · novelty 5.0

WorldAct activates monolithic 3D worlds into interactive scenes via multimodal agent-guided decomposition, geometrically aligned mesh reconstruction, and 3D inpainting.

DataEvolver: Let Your Data Build and Improve Itself via Goal-Driven Loop Agents

cs.AI · 2026-05-03 · unverdicted · novelty 5.0

DataEvolver introduces a reusable framework with generation-time self-correction and validation-time self-expansion loops that improves visual datasets, shown to outperform baselines on an object-rotation task.

Hitem3D 2.0: Multi-View Guided Native 3D Texture Generation

cs.CV · 2026-04-10 · unverdicted · novelty 5.0

Hitem3D 2.0 combines multi-view image synthesis with native 3D texture projection to improve completeness, cross-view consistency, and geometry alignment over prior methods.

CG-MLLM: Captioning and Generating 3D content via Multi-modal Large Language Models

cs.CV · 2026-01-29 · unverdicted · novelty 5.0

CG-MLLM is a multimodal LLM using a Mixture-of-Transformer architecture with separate TokenAR and BlockAR components integrated with a pre-trained vision-language backbone and 3D VAE to enable 3D captioning and high-fidelity generation.

Hunyuan3D 2.1: From Images to High-Fidelity 3D Assets with Production-Ready PBR Material

cs.CV · 2025-06-18 · unverdicted · novelty 3.0

Hunyuan3D 2.1 is a two-part system with DiT for shape generation and Paint for texture synthesis that produces high-fidelity 3D assets with PBR materials.

citing papers explorer

Showing 15 of 15 citing papers.

Towards Realistic 3D Emission Materials: Dataset, Baseline, and Evaluation for Emission Texture Generation cs.CV · 2026-04-13 · unverdicted · none · ref 15
The work creates the first dataset and baseline for generating emission textures on 3D objects to reproduce glowing materials from input images.
VDE: Training-Free Accelerating Rectified Flow Model via Velocity Decomposition and Estimation cs.CV · 2026-05-22 · unverdicted · none · ref 19
VDE accelerates rectified flow models like Flux by 3.22x with LPIPS of 0.069 via velocity decomposition into parallel/orthogonal components plus periodic full-pass anchoring.
Who Generated This 3D Asset? Learning Source Attribution for Generative 3D Models cs.CV · 2026-05-18 · unverdicted · none · ref 13
Introduces the first passive source attribution benchmark for 22 generative 3D models and a Transformer achieving 97.22% accuracy under full supervision and 77.17% with 1% training data.
3D-Fixer: Coarse-to-Fine In-place Completion for 3D Scenes from a Single Image cs.CV · 2026-04-06 · unverdicted · none · ref 28
3D-Fixer performs in-place 3D asset completion from single-view partial point clouds via coarse-to-fine generation with ORFA conditioning, plus a new ARSG-110K dataset, to achieve higher geometric accuracy than MIDI and Gen3DSR while keeping diffusion efficiency.
Fast 4D Mesh Generation by Spatio-Temporal Attention Chains cs.CV · 2026-05-19 · unverdicted · none · ref 39
A training-free Spatio-Temporal Attention Chain framework accelerates 4D mesh generation 13x, improves quality, scales to 16x longer videos, and supports downstream tracking and camera estimation.
Sculpt4D: Generating 4D Shapes via Sparse-Attention Diffusion Transformers cs.CV · 2026-04-23 · unverdicted · none · ref 18
Sculpt4D generates temporally coherent 4D shapes by integrating a block sparse attention mechanism with time-decaying mask into a pretrained 3D diffusion transformer, achieving SOTA results with 56% less computation.
Beyond Voxel 3D Editing: Learning from 3D Masks and Self-Constructed Data cs.CV · 2026-04-15 · unverdicted · none · ref 39
BVE framework enables text-guided 3D editing beyond voxel limits by combining self-constructed data, lightweight semantic injection, and annotation-free masking to preserve local invariance.
LSRM: High-Fidelity Object-Centric Reconstruction via Scaled Context Windows cs.CV · 2026-04-06 · conditional · none · ref 35
LSRM scales transformer context windows with native sparse attention and geometric routing to deliver high-fidelity feed-forward 3D reconstruction and inverse rendering that approaches dense optimization quality.
Native and Compact Structured Latents for 3D Generation cs.CV · 2025-12-16 · unverdicted · none · ref 32
Introduces O-Voxel omni-voxel representation and Sparse Compression VAE for structured native 3D latents, enabling efficient training of large flow-matching models that produce higher-quality geometry and materials than prior methods.
EVA01: Unified Native 3D Understanding and Generation via Mixture-of-Transformers cs.CV · 2026-05-16 · unverdicted · none · ref 30
EVA01 introduces a Mixture-of-Transformers model that natively adds 3D mesh understanding, generation, and multi-turn editing to MLLMs by decoupling understanding and generation experts with shared global self-attention.
WorldAct: Activating Monolithic 3D Worlds into Interactive-Ready Object-Centric Scenes cs.CV · 2026-05-15 · unverdicted · none · ref 26
WorldAct activates monolithic 3D worlds into interactive scenes via multimodal agent-guided decomposition, geometrically aligned mesh reconstruction, and 3D inpainting.
DataEvolver: Let Your Data Build and Improve Itself via Goal-Driven Loop Agents cs.AI · 2026-05-03 · unverdicted · none · ref 18
DataEvolver introduces a reusable framework with generation-time self-correction and validation-time self-expansion loops that improves visual datasets, shown to outperform baselines on an object-rotation task.
Hitem3D 2.0: Multi-View Guided Native 3D Texture Generation cs.CV · 2026-04-10 · unverdicted · none · ref 21
Hitem3D 2.0 combines multi-view image synthesis with native 3D texture projection to improve completeness, cross-view consistency, and geometry alignment over prior methods.
CG-MLLM: Captioning and Generating 3D content via Multi-modal Large Language Models cs.CV · 2026-01-29 · unverdicted · none · ref 43
CG-MLLM is a multimodal LLM using a Mixture-of-Transformer architecture with separate TokenAR and BlockAR components integrated with a pre-trained vision-language backbone and 3D VAE to enable 3D captioning and high-fidelity generation.
Hunyuan3D 2.1: From Images to High-Fidelity 3D Assets with Production-Ready PBR Material cs.CV · 2025-06-18 · unverdicted · none · ref 26
Hunyuan3D 2.1 is a two-part system with DiT for shape generation and Paint for texture synthesis that produces high-fidelity 3D assets with PBR materials.

Step1X-3D: Towards high-fidelity and controllable generation of textured 3D assets

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer