pith. sign in

arxiv: 2502.06819 · v2 · pith:WLVMOEIRnew · submitted 2025-02-05 · 💻 cs.LG · cs.GR

AccioScene: Compositional 3D Scene Generation via Graph Diffusion and Interaction-driven Critics

classification 💻 cs.LG cs.GR
keywords scenegraphobjectscenestextbettercoherentdiffusion
0
0 comments X
read the original abstract

This paper presents a framework for generating 3D indoor scenes from text prompts. Existing methods often formulate scene synthesis as an object layout prediction problem conditioned on a single input modality, such as a text description, room shape, or scene graph. This design can lead to object collisions and limited functional plausibility, reducing its practical applicability. To address these limitations, we introduce a multi-stage pipeline that better reflects practical scene creation scenarios. Given a text prompt describing partial scene content, our method first uses graph diffusion to produce a contextually coherent scene graph and then predicts a realistic object layout. In addition, we incorporate lightweight human-object interaction priors to encourage human-centric and functional arrangements, with explicit spatial constraints to reduce interpenetration. Our approach generates coherent 3D scenes with viable layouts that better support human interaction. Experiments on the 3D-FRONT dataset demonstrate that our method achieves competitive or state-of-the-art performance compared with existing approaches, while improving the physical plausibility of generated scenes.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.