pith. sign in

WorldAct: Activating Monolithic 3D Worlds into Interactive-Ready Object-Centric Scenes

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it
abstract

Recent 3D world modeling systems based on generative scene synthesis, such as Marble, can create coherent and explorable 3D environments, yet their outputs are typically static monolithic assets with limited editability and physical interaction. This restricts their use in immersive content creation and embodied simulation, where generated worlds must be actively modified and manipulated. To tackle this challenge, we present WorldAct, a framework that converts static generated 3D worlds into editable and interaction-ready scenes. WorldAct uses a multimodal agent to guide scene decomposition, identify actionable objects, reconstruct geometrically aligned object-level meshes for interaction, and restore the residual background via 3D inpainting. The resulting scenes support object-level editing, collision-aware manipulation, and embodied task execution while preserving global scene coherence. Experiments show that WorldAct enables richer interaction scenarios than the original generated scenes, suggesting a practical path toward editable and interactive 3D world models.

fields

cs.CV 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

SAM3D-Phys: Towards Multi-Object Interactive Simulation in Real World

cs.CV · 2026-05-28 · unverdicted · novelty 4.0

SAM3D-Phys recovers complete simulatable object geometries from incomplete real-world scene reconstructions by combining SAM3D generative priors with physics-constrained spatial optimization and mask-guided appearance distillation.

citing papers explorer

Showing 1 of 1 citing paper.

  • SAM3D-Phys: Towards Multi-Object Interactive Simulation in Real World cs.CV · 2026-05-28 · unverdicted · none · ref 11 · internal anchor

    SAM3D-Phys recovers complete simulatable object geometries from incomplete real-world scene reconstructions by combining SAM3D generative priors with physics-constrained spatial optimization and mask-guided appearance distillation.