Vist3a: Text-to-3d by stitching a multi-view reconstruction network to a video generator

Hyojun Go, Dominik Narnhofer, Goutam Bhat, Prune Truong, Federico Tombari, Konrad Schindler · 2025 · arXiv 2510.13454

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Brain3D: EEG-to-3D Decoding of Visual Representations via Multimodal Reasoning

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

A multimodal pipeline decodes EEG into 3D meshes via EEG-to-image, MLLM reasoning, diffusion, and single-image-to-3D conversion, reporting 85.4% 10-way accuracy and 0.648 CLIPScore.

UniRecGen: Unifying Multi-View 3D Reconstruction and Generation

cs.CV · 2026-04-01 · unverdicted · novelty 6.0

UniRecGen unifies reconstruction and generation via shared canonical space and disentangled cooperative learning to produce complete, consistent 3D models from sparse views.

Splatent: Splatting Diffusion Latents for Novel View Synthesis

cs.CV · 2025-12-10 · conditional · novelty 6.0

Splatent recovers fine details for latent-space 3D Gaussian Splatting by applying multi-view attention in 2D rather than reconstructing in 3D space.

citing papers explorer

Showing 3 of 3 citing papers.

Brain3D: EEG-to-3D Decoding of Visual Representations via Multimodal Reasoning cs.CV · 2026-04-09 · unverdicted · none · ref 9
A multimodal pipeline decodes EEG into 3D meshes via EEG-to-image, MLLM reasoning, diffusion, and single-image-to-3D conversion, reporting 85.4% 10-way accuracy and 0.648 CLIPScore.
UniRecGen: Unifying Multi-View 3D Reconstruction and Generation cs.CV · 2026-04-01 · unverdicted · none · ref 18
UniRecGen unifies reconstruction and generation via shared canonical space and disentangled cooperative learning to produce complete, consistent 3D models from sparse views.
Splatent: Splatting Diffusion Latents for Novel View Synthesis cs.CV · 2025-12-10 · conditional · none · ref 12
Splatent recovers fine details for latent-space 3D Gaussian Splatting by applying multi-view attention in 2D rather than reconstructing in 3D space.

Vist3a: Text-to-3d by stitching a multi-view reconstruction network to a video generator

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer