TabletopGen: Tabletop Scene Generation and Interactive Simulation for Robotic Manipulation

Hongxuan Ma; Hu Su; Licheng Yang; Liu Liu; Wei Sui; Wei Zou; Yonghao He; Yuxin Guo; Ziqian Wang

arxiv: 2512.01204 · v4 · pith:54V2UTPVnew · submitted 2025-12-01 · 💻 cs.CV

TabletopGen: Tabletop Scene Generation and Interactive Simulation for Robotic Manipulation

Ziqian Wang , Yonghao He , Licheng Yang , Wei Zou , Hongxuan Ma , Liu Liu , Wei Sui , Yuxin Guo

show 1 more author

Hu Su

This is my paper

classification 💻 cs.CV

keywords manipulationdatascenerobotictabletopgengenerationsimulationtabletop

0 comments

read the original abstract

Simulation provides a low-cost, scalable pathway to large-scale robotic manipulation data collection. However, existing 3D scene generation methods can rarely be applied directly to manipulation data synthesis, as their generated scenes often lack instance-level interactivity and physical plausibility. Focusing on tabletop manipulation, we propose TabletopGen, a training-free and automated tabletop scene generation and interactive simulation engine. Starting from text or a single image, we first obtain independent 3D object models via generative instance extraction. Second, we introduce a novel pose and scale alignment approach that recovers a collision-free scene layout using a Differentiable Rotation Optimizer and a Top-View Spatial Alignment mechanism. Finally, we assemble the generated scene in a physics simulator with collision geometry, yielding a stable, interactable environment for synthesizing multimodal manipulation data. Extensive experiments and user studies demonstrate that TabletopGen achieves state-of-the-art performance in visual fidelity, layout accuracy, and physical plausibility. Furthermore, we validate the executability of the collected trajectories on a real robotic arm via zero-shot real-to-sim-to-real policy transfer, indicating that TabletopGen can serve as a reliable data engine for robotic manipulation data synthesis.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 9 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

One Video, One World: Turning Monocular Video into Physical 4D Scenes
cs.CV 2026-06 unverdicted novelty 8.0

OVOW reconstructs instance-level, simulation-ready 4D mesh scenes from monocular video via a four-stage training-free pipeline and introduces a new benchmark for structured Video-to-4D evaluation.
REST3D: Reconstructing Physically Stable 3D Scenes from a Single Image
cs.CV 2026-05 unverdicted novelty 7.0

REST3D reconstructs physically stable 3D scenes from single images via agentic scene-tree understanding and physics-constrained optimization.
3D Generation for Embodied AI and Robotic Simulation: A Survey
cs.RO 2026-04 accept novelty 7.0

3D generation for embodied AI is shifting from visual realism toward interaction readiness, organized into data generation, simulation environments, and sim-to-real bridging roles.
Perceive-then-Plan: Layout-as-Policy for Monocular 3D Scene Layout Estimation
cs.CV 2026-05 unverdicted novelty 6.0

Introduces Layout-as-Policy (LaP) to turn 3D layout estimation into an iterative policy-learning refinement process for better physical coherence.
STABLE: Simulation-Ready Tabletop Layout Generation via a Semantics-Physics Dual System
cs.CV 2026-05 unverdicted novelty 6.0

STABLE generates simulation-ready tabletop scenes by alternating a semantic LLM reasoner for task-aligned coarse layouts with a physics corrector for physical plausibility using progressive scene expansion.
V-CAGE: Vision-Closed-Loop Agentic Generation Engine for Robotic Manipulation
cs.RO 2026-04 unverdicted novelty 6.0

V-CAGE automates the creation of scalable, high-quality robotic manipulation datasets through context-aware scene construction, closed-loop visual verification, and perceptually-driven compression.
WorldAct: Activating Monolithic 3D Worlds into Interactive-Ready Object-Centric Scenes
cs.CV 2026-05 unverdicted novelty 5.0

WorldAct activates monolithic 3D worlds into interactive scenes via multimodal agent-guided decomposition, geometrically aligned mesh reconstruction, and 3D inpainting.
3D Generation for Embodied AI and Robotic Simulation: A Survey
cs.RO 2026-04 unverdicted novelty 3.0

The survey organizes 3D generation for embodied AI into data generators for assets, simulation environments for interaction, and sim-to-real bridges, noting a shift toward interaction readiness and listing bottlenecks...
3D Generation for Embodied AI and Robotic Simulation: A Survey
cs.RO 2026-04 unverdicted novelty 2.0

The paper surveys 3D generation techniques for embodied AI and robotics, categorizing them into data generation, simulation environments, and sim-to-real bridging while identifying bottlenecks in physical validity and...