hub Mixed citations

Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation

Zibo Zhao, Zeqiang Lai, Qingxiang Lin, Yunfei Zhao, Haolin Liu, Shuhui Yang, Yifei Feng, Mingxin Yang, Sheng Zhang, Xianghui Yang, et al · 2025 · cs.CV · arXiv 2501.12202

Mixed citation behavior. Most common role is background (67%).

56 Pith papers citing it

Background 67% of classified citations

open full Pith review browse 56 citing papers arXiv PDF

abstract

We present Hunyuan3D 2.0, an advanced large-scale 3D synthesis system for generating high-resolution textured 3D assets. This system includes two foundation components: a large-scale shape generation model -- Hunyuan3D-DiT, and a large-scale texture synthesis model -- Hunyuan3D-Paint. The shape generative model, built on a scalable flow-based diffusion transformer, aims to create geometry that properly aligns with a given condition image, laying a solid foundation for downstream applications. The texture synthesis model, benefiting from strong geometric and diffusion priors, produces high-resolution and vibrant texture maps for either generated or hand-crafted meshes. Furthermore, we build Hunyuan3D-Studio -- a versatile, user-friendly production platform that simplifies the re-creation process of 3D assets. It allows both professional and amateur users to manipulate or even animate their meshes efficiently. We systematically evaluate our models, showing that Hunyuan3D 2.0 outperforms previous state-of-the-art models, including the open-source models and closed-source models in geometry details, condition alignment, texture quality, and etc. Hunyuan3D 2.0 is publicly released in order to fill the gaps in the open-source 3D community for large-scale foundation generative models. The code and pre-trained weights of our models are available at: https://github.com/Tencent/Hunyuan3D-2

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 9 baseline 1 method 1 other 1

citation-polarity summary

background 8 unclear 2 baseline 1 use method 1

representative citing papers

WarpHammer: Densifying Scene Warps with 3D Object Priors for Extreme View Synthesis

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

WarpHammer densifies scene warps with 3D object priors from generative models and fuses pose-unknown auxiliary views via multi-view geometry to enable stable extreme novel view synthesis.

UnfoldArt: Zero-Shot Recovery of Full Articulated 3D Objects from Text or Image

cs.CV · 2026-06-29 · unverdicted · novelty 7.0 · 2 refs

UnfoldArt uses a two-round structured debate between high-level semantic agents and low-level parameter agents, grounded in generated video, to infer articulation and reconstruct full articulated 3D objects including occluded geometry from text or image inputs.

GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

GenRecon lifts object-level generative priors to scene-scale reconstruction by chunking scenes and using projection-based conditioning on multi-view features, claiming 16% better results than prior methods.

CelloCut: Constructive Watertight Remeshing via Tetrahedral Cell Cuts

cs.GR · 2026-05-18 · unverdicted · novelty 7.0

CelloCut formulates watertight remeshing as binary labeling on a Delaunay tetrahedral partition solved by graph-cut minimization with one-sided constraints to guarantee volumetrically consistent solids.

QuadLink: Autoregressive Quad-Dominant Mesh Generation via Point-Relation Learning

cs.GR · 2026-05-16 · unverdicted · novelty 7.0

QuadLink generates anisotropic quad-dominant meshes from point clouds via anchor prediction, centroid-conditioned linking, and quad-first assembly, supporting hybrid n-gon topology.

InverseDraping: Recovering Sewing Patterns from 3D Garment Surfaces via BoxMesh Bridging

cs.CV · 2026-04-03 · unverdicted · novelty 7.0

A two-stage autoregressive framework centered on BoxMesh recovers parametric sewing patterns from 3D garment surfaces, claiming state-of-the-art results on benchmarks and generalization to real scans and single-view images.

PerpetualWonder: Long-Horizon Action-Conditioned 4D Scene Generation

cs.CV · 2026-02-04 · unverdicted · novelty 7.0

PerpetualWonder introduces a closed-loop generative simulator with a unified physical-visual representation for long-horizon action-conditioned 4D scene generation from one image.

ATATA: One Algorithm to Align Them All

cs.CV · 2026-01-16 · unverdicted · novelty 7.0

ATATA enables fast joint inference of structurally aligned pairs using Rectified Flow models via segment transport, improving state-of-the-art for image and video generation while matching 3D quality at much higher speed.

LangDriveCTRL: Natural Language Controllable Driving Scene Editing with Multi-modal Agents

cs.CV · 2025-12-19 · unverdicted · novelty 7.0

LangDriveCTRL decomposes driving videos into 3D scene graphs and uses an agentic pipeline with specialized multi-modal agents to perform language-controlled object and behavior edits, achieving nearly 2x higher instruction alignment than prior state-of-the-art methods.

PacTure: Efficient PBR Texture Generation on Packed Views with Visual Autoregressive Models

cs.CV · 2025-05-28 · unverdicted · novelty 7.0

PacTure uses view packing and next-scale autoregressive prediction to generate consistent multi-view PBR textures faster than prior sequential or cross-attention methods.

OmniFit: Multi-modal 3D Body Fitting via Scale-agnostic Dense Landmark Prediction

cs.CV · 2026-04-23 · unverdicted · novelty 7.0

OmniFit uses a conditional transformer decoder to predict dense body landmarks from multi-modal inputs for scale-agnostic SMPL-X fitting, outperforming prior methods by 57-81% and reaching millimeter accuracy on CAPE and 4D-DRESS benchmarks.

Geometrically Consistent Multi-View Scene Generation from Freehand Sketches

cs.CV · 2026-04-15 · unverdicted · novelty 7.0

A framework generates consistent multi-view scenes from one freehand sketch via a ~9k-sample dataset, Parallel Camera-Aware Attention Adapters, and Sparse Correspondence Supervision Loss, outperforming baselines in realism and consistency.

Towards Realistic and Consistent Orbital Video Generation via 3D Foundation Priors

cs.CV · 2026-04-14 · unverdicted · novelty 7.0

A video generation approach conditions a base model with multi-scale 3D latent features and a cross-attention adapter to produce geometrically realistic and consistent orbital videos from one image.

Any 3D Scene is Worth 1K Tokens: 3D-Grounded Representation for Scene Generation at Scale

cs.CV · 2026-04-13 · unverdicted · novelty 7.0

A 3D-grounded autoencoder and diffusion transformer allow direct generation of 3D scenes in an implicit latent space using a fixed 1K-token representation for arbitrary views and resolutions.

GenLCA: 3D Diffusion for Full-Body Avatars from In-the-Wild Videos

cs.CV · 2026-04-08 · unverdicted · novelty 7.0

GenLCA enables scalable training of a 3D diffusion model for photorealistic, animatable full-body avatars by tokenizing large-scale real-world videos with a pretrained reconstructor and applying visibility-aware diffusion training to handle partial observations.

PointSplat: Compact Gaussian Splatting via Human-Centric Prediction

cs.CV · 2026-06-30 · unverdicted · novelty 6.0

PointSplat infers compact Gaussian splats directly in 3D space from input point sets via ray casting and Point-Image Transformer to reduce inter-view redundancy and improve novel-view quality for humans.

Mesh BDF: Barycentric Dominance Field for 3D Native Mesh Generation

cs.CV · 2026-06-30 · unverdicted · novelty 6.0

Barycentric Dominance Field converts discrete mesh connectivity into a continuous surface signal that diffusion models can use directly for higher-quality native 3D mesh generation.

DualBrep: A Dual-Field Continuous Representation for B-rep Modelling

cs.GR · 2026-06-30 · unverdicted · novelty 6.0

DualBrep encodes B-rep models as dual scalar fields (SDF geometry + UDF topology) compressed into a shared latent space for flow-matching generation and neural B-rep extraction.

HandMade: Spatial Prompting for Generative 3D Creation with Part-Labeled VR Sketches

cs.HC · 2026-06-26 · unverdicted · novelty 6.0

HandMade converts segmented VR strokes into multi-view part guidance and structured prompts so generative 3D models better preserve user-specified spatial scaffolds than text-only or sketch baselines.

HiFiVe: High-Fidelity Vehicle Generation Leveraging Auto-Regressive 2D Generative Priors

cs.CV · 2026-06-24 · unverdicted · novelty 6.0

HiFiVe is a training-free framework using an auto-regressive texture refinement pipeline with depth-based warping, multi-view fusion, and symmetry to enhance both texture and geometry fidelity in vehicle generation from 2D priors.

FoundObj: Self-supervised Foundation Models as Rewards for Label-free 3D Object Segmentation

cs.CV · 2026-05-26 · unverdicted · novelty 6.0

FoundObj uses foundation-model priors as RL rewards to discover multi-class 3D objects from point clouds without scene-level labels.

Helix4D: Complex 4D Mesh Generation

cs.CV · 2026-05-25 · unverdicted · novelty 6.0

Helix4D generates high-quality dynamic 4D meshes from videos by extending Trellis2 with sliding-window cross-frame attention anchored on the first frame and a repurposed 4D temporal encoding.

Fishbone: From One 3D Asset to a Million Controllable Edits

cs.CV · 2026-05-24 · unverdicted · novelty 6.0

Fishbone introduces a unified rib-spine representation computed via adaptive heat method, iso-contour ribs, and geometry-aware spine that enables real-time parametric deformation, reduced-space simulation, and animation on general meshes.

Stream3D: Sequential Multi-View 3D Generation via Evidential Memory

cs.CV · 2026-05-20 · unverdicted · novelty 6.0 · 2 refs

Stream3D is a training-free method that maintains a fixed-size evidential memory of past frames to convert frozen view-conditioned 3D generators into consistent streaming generators.

citing papers explorer

Showing 5 of 5 citing papers after filters.

LangDriveCTRL: Natural Language Controllable Driving Scene Editing with Multi-modal Agents cs.CV · 2025-12-19 · unverdicted · none · ref 61 · internal anchor
LangDriveCTRL decomposes driving videos into 3D scene graphs and uses an agentic pipeline with specialized multi-modal agents to perform language-controlled object and behavior edits, achieving nearly 2x higher instruction alignment than prior state-of-the-art methods.
PacTure: Efficient PBR Texture Generation on Packed Views with Visual Autoregressive Models cs.CV · 2025-05-28 · unverdicted · none · ref 6 · internal anchor
PacTure uses view packing and next-scale autoregressive prediction to generate consistent multi-view PBR textures faster than prior sequential or cross-attention methods.
DreamLifting: A Plug-in Module Lifting MV Diffusion Models for 3D Asset Generation cs.CV · 2025-09-09 · unverdicted · none · ref 3 · internal anchor
LGAA is a modular adapter framework that lifts multi-view diffusion models to produce 2D Gaussian Splats with PBR channels for high-quality relightable 3D mesh extraction using data-efficient finetuning on 69k instances.
Hunyuan3D 2.5: Towards High-Fidelity 3D Assets Generation with Ultimate Details cs.CV · 2025-06-19 · unverdicted · none · ref 23 · internal anchor
Hunyuan3D 2.5's LATTICE model with 10B parameters generates detailed 3D shapes from images and uses multi-view PBR for textures, outperforming prior methods in fidelity and mesh quality.
Hunyuan3D 2.1: From Images to High-Fidelity 3D Assets with Production-Ready PBR Material cs.CV · 2025-06-18 · unverdicted · none · ref 36 · internal anchor
Hunyuan3D 2.1 is a two-part system with DiT for shape generation and Paint for texture synthesis that produces high-fidelity 3D assets with PBR materials.

Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer