{"total":11,"items":[{"citing_arxiv_id":"2605.23888","ref_index":65,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction","primary_cat":"cs.CV","submitted_at":"2026-05-22T17:49:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"GenRecon lifts object-level generative priors to scene-scale reconstruction by chunking scenes and using projection-based conditioning on multi-view features, claiming 16% better results than prior methods.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18451","ref_index":26,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Code-as-Room: Generating 3D Rooms from Top-Down View Images via Agentic Code Synthesis","primary_cat":"cs.CV","submitted_at":"2026-05-18T14:18:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Code-as-Room is an MLLM-based agentic pipeline that parses top-down images into multi-stage Blender code synthesis with cross-stage memory to generate functional 3D rooms.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17102","ref_index":34,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"VoxScene: Anchor-Conditioned Voxel Diffusion for Indoor Scene Arrangement","primary_cat":"cs.GR","submitted_at":"2026-05-16T18:10:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"VoxScene is a new anchor-conditioned voxel diffusion model that synthesizes collision-free 3D indoor scene arrangements via discrete volumetric occupancies and uses the grids for asset retrieval.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16797","ref_index":19,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"EgoKit: Towards Unified Low-Cost Egocentric Data Collection with Heterogeneous Devices","primary_cat":"cs.CV","submitted_at":"2026-05-16T03:59:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"EgoKit is a new toolkit and accessory set that unifies egocentric video collection with wrist views across heterogeneous consumer devices using a consistent interface and log format.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.09423","ref_index":90,"ref_count":4,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning","primary_cat":"cs.AI","submitted_at":"2026-05-10T08:51:50+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":8.0,"formal_verification":"none","one_line_summary":"SimWorld Studio deploys an evolving coding agent to create adaptive 3D environments that co-evolve with embodied learners, delivering 18-point success-rate gains over fixed environments in navigation benchmarks.","context_count":2,"top_context_role":"method","top_context_polarity":"use_method","context_text":"we directly query the agent pose and target location and check success based on distance to the target. Gymnasium Compilation.A generated environment then exports as a standard Gymnasium environment, with env.reset() and env.step(action) returning RGB-D observations, agent pose, and reward (top of Figure 1(Left)). Because the contract is the standard one, any off-the-shelf RL algorithm (e.g., PPO [ 64]) or training-free LLM policy (e.g., ReAct [ 90]) plugs in without modification, making each generated scene a first-class training substrate for embodied agent learning. 2.2 Co-Evolution: An Adaptive Curriculum Mechanism So far the generator runs open-loop: it produces environments without knowing how the embodied agent fares in them. Co-evolution closes this loop and turns environment generation from a one-shot"},{"citing_arxiv_id":"2604.26509","ref_index":145,"ref_count":3,"confidence":0.9,"is_internal_anchor":false,"paper_title":"3D Generation for Embodied AI and Robotic Simulation: A Survey","primary_cat":"cs.RO","submitted_at":"2026-04-29T10:17:55+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":2.0,"formal_verification":"none","one_line_summary":"The paper surveys 3D generation techniques for embodied AI and robotics, categorizing them into data generation, simulation environments, and sim-to-real bridging while identifying bottlenecks in physical validity and transfer.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"27 MarketGen [140] arXiv '25 Ta Proc+Agent Isaac Unreal ❸ /external-link-alt 28 MesaTask [141] NeurIPS '25 Ta LLM Isaac ❸ /external-link-alt 29 TabletopGen [142] arXiv '25 Ta ⊷ /f⌢ntVLM+Diff Isaac ❸ /external-link-alt 30 PAT3D [143] ICLR '26 /f⌢nt Diff+Phys LibulPC ❸ /external-link-alt 31 PhyScensis [144] ICLR '26 /f⌢nt/balance-scaleLLM+Phys Genesis ❸ /external-link-alt 32 SAGE [145] arXiv '26 /f⌢nt Agent Isaac ❸ /external-link-alt 33 SceneSmith [146] arXiv '26 /f⌢nt Agent Drake MJ Isaac ❸ /external-link-alt 34 Scenethesis [147] ICLR '26 /f⌢nt Agent Blender ❸ /external-link-alt 35 3D-Generalist [148] 3DV '26 /f⌢nt ⊷ VLM Blender ❸ /external-link-alt embodied readiness requires combining these signals, as no single modality simultaneously addresses grounding qual-"},{"citing_arxiv_id":"2604.19907","ref_index":35,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SceneOrchestra: Efficient Agentic 3D Scene Synthesis via Full Tool-Call Trajectory Generation","primary_cat":"cs.CV","submitted_at":"2026-04-21T18:33:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SceneOrchestra trains an orchestrator to generate full tool-call trajectories for 3D scene synthesis and uses a discriminator during training to select high-quality plans, yielding state-of-the-art results with lower runtime.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.13800","ref_index":43,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"EmbodiedClaw: Conversational Workflow Execution for Embodied AI Development","primary_cat":"cs.RO","submitted_at":"2026-04-15T12:36:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"EmbodiedClaw automates embodied AI development workflows through conversation, reducing manual effort and improving consistency and reproducibility.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.13035","ref_index":37,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SceneCritic: A Symbolic Evaluator for 3D Indoor Scene Synthesis","primary_cat":"cs.CV","submitted_at":"2026-04-14T17:59:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SceneCritic is a symbolic, ontology-grounded evaluator for floor-plan layouts that identifies specific semantic, orientation, and geometric violations and aligns better with human judgments than VLM-based scorers.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"(b)text-only LLMs can outperform VLMs on semantic layout quality, and(c)image-based VLM refinement is the most effective critic modality for semantic and orientation correction. 1. Introduction The generation of 3D indoor environments has become central to a range of applications, from training embodiedagentsthatmustnavigateandmanipulateobjectsinrealisticspaces[ 37,20,12,30],tovirtual reality and robotics simulation. Recent work has demonstrated that Large Language Models (LLMs) and Vision-Language Models (VLMs) can serve as powerful priors for this task, leveraging their world knowledge to produce object layouts that are both diverse and semantically meaningful [14, 44, 4, 32]. These approaches generate scenes through explicit structured representations (e."},{"citing_arxiv_id":"2604.09036","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"V-CAGE: Vision-Closed-Loop Agentic Generation Engine for Robotic Manipulation","primary_cat":"cs.RO","submitted_at":"2026-04-10T06:56:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"V-CAGE automates the creation of scalable, high-quality robotic manipulation datasets through context-aware scene construction, closed-loop visual verification, and perceptually-driven compression.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.07105","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Genie Sim PanoRecon: Fast Immersive Scene Generation from Single-View Panorama","primary_cat":"cs.RO","submitted_at":"2026-04-08T13:57:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A feed-forward Gaussian-splatting system reconstructs photo-realistic 3D scenes from single-view panoramas in seconds via cube-map decomposition and depth-aware fusion for robotic simulation use.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}