arXiv preprint arXiv:2505.02836 (2025)

· 2025 · arXiv 2505.02836

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 2 other 1

citation-polarity summary

background 2 unclear 1

representative citing papers

ArchSIBench: Benchmarking the Architectural Spatial Intelligence of Vision-Language Models

cs.CV · 2026-05-20 · unverdicted · novelty 7.0

ArchSIBench is a new benchmark dataset and evaluation suite that measures vision-language models on architectural spatial intelligence across 17 subtasks, showing most models lag human baselines especially in transformation and configuration.

SceneOrchestra: Efficient Agentic 3D Scene Synthesis via Full Tool-Call Trajectory Generation

cs.CV · 2026-04-21 · unverdicted · novelty 7.0

SceneOrchestra trains an orchestrator to generate full tool-call trajectories for 3D scene synthesis and uses a discriminator during training to select high-quality plans, yielding state-of-the-art results with lower runtime.

Co-generation of Layout and Shape from Text via Autoregressive 3D Diffusion

cs.CV · 2026-04-17 · unverdicted · novelty 7.0

3D-ARD+ unifies autoregressive token prediction with diffusion-based 3D latent generation to co-produce indoor scene layouts and object geometries that follow complex text-specified spatial and semantic constraints.

Vision-as-Inverse-Graphics Agent via Interleaved Multimodal Reasoning

cs.CV · 2026-01-16 · conditional · novelty 7.0

VIGA introduces a training-free interleaved multimodal reasoning loop that improves vision-as-inverse-graphics accuracy over one-shot baselines on BlenderGym, SlideBench, and new BlenderBench.

Repurposing 3D Generative Model for Autoregressive Layout Generation

cs.CV · 2026-04-17 · unverdicted · novelty 6.0

LaviGen turns 3D generative models into an autoregressive layout generator that models geometric and physical constraints, delivering 19% higher physical plausibility and 65% faster inference on the LayoutVLM benchmark.

SceneCritic: A Symbolic Evaluator for 3D Indoor Scene Synthesis

cs.CV · 2026-04-14 · unverdicted · novelty 6.0

SceneCritic is a symbolic, ontology-grounded evaluator for floor-plan layouts that identifies specific semantic, orientation, and geometric violations and aligns better with human judgments than VLM-based scorers.

Rein3D: Reinforced 3D Indoor Scene Generation with Panoramic Video Diffusion Models

cs.CV · 2026-04-12 · unverdicted · novelty 6.0

Rein3D generates photorealistic, globally consistent 3D indoor scenes by using a restore-and-refine process where radial panoramic videos are restored via diffusion models and then used to update a 3D Gaussian field.

Real2Sim via Active Perception with Behavior Trees Automatically Generated by VLMs

cs.RO · 2026-01-13 · unverdicted · novelty 6.0

An intent-driven Real2Sim framework uses VLMs for semantic task decomposition to identify missing physical parameters and generates reactive behavior trees to acquire them via contact-rich robotic interactions on a Franka Panda arm.

EmbodiedClaw: Conversational Workflow Execution for Embodied AI Development

cs.RO · 2026-04-15 · unverdicted · novelty 5.0

EmbodiedClaw automates embodied AI development workflows through conversation, reducing manual effort and improving consistency and reproducibility.

citing papers explorer

Showing 9 of 9 citing papers.

ArchSIBench: Benchmarking the Architectural Spatial Intelligence of Vision-Language Models cs.CV · 2026-05-20 · unverdicted · none · ref 15
ArchSIBench is a new benchmark dataset and evaluation suite that measures vision-language models on architectural spatial intelligence across 17 subtasks, showing most models lag human baselines especially in transformation and configuration.
SceneOrchestra: Efficient Agentic 3D Scene Synthesis via Full Tool-Call Trajectory Generation cs.CV · 2026-04-21 · unverdicted · none · ref 21
SceneOrchestra trains an orchestrator to generate full tool-call trajectories for 3D scene synthesis and uses a discriminator during training to select high-quality plans, yielding state-of-the-art results with lower runtime.
Co-generation of Layout and Shape from Text via Autoregressive 3D Diffusion cs.CV · 2026-04-17 · unverdicted · none · ref 12
3D-ARD+ unifies autoregressive token prediction with diffusion-based 3D latent generation to co-produce indoor scene layouts and object geometries that follow complex text-specified spatial and semantic constraints.
Vision-as-Inverse-Graphics Agent via Interleaved Multimodal Reasoning cs.CV · 2026-01-16 · conditional · none · ref 30
VIGA introduces a training-free interleaved multimodal reasoning loop that improves vision-as-inverse-graphics accuracy over one-shot baselines on BlenderGym, SlideBench, and new BlenderBench.
Repurposing 3D Generative Model for Autoregressive Layout Generation cs.CV · 2026-04-17 · unverdicted · none · ref 52
LaviGen turns 3D generative models into an autoregressive layout generator that models geometric and physical constraints, delivering 19% higher physical plausibility and 65% faster inference on the LayoutVLM benchmark.
SceneCritic: A Symbolic Evaluator for 3D Indoor Scene Synthesis cs.CV · 2026-04-14 · unverdicted · none · ref 24
SceneCritic is a symbolic, ontology-grounded evaluator for floor-plan layouts that identifies specific semantic, orientation, and geometric violations and aligns better with human judgments than VLM-based scorers.
Rein3D: Reinforced 3D Indoor Scene Generation with Panoramic Video Diffusion Models cs.CV · 2026-04-12 · unverdicted · none · ref 23
Rein3D generates photorealistic, globally consistent 3D indoor scenes by using a restore-and-refine process where radial panoramic videos are restored via diffusion models and then used to update a 3D Gaussian field.
Real2Sim via Active Perception with Behavior Trees Automatically Generated by VLMs cs.RO · 2026-01-13 · unverdicted · none · ref 13
An intent-driven Real2Sim framework uses VLMs for semantic task decomposition to identify missing physical parameters and generates reactive behavior trees to acquire them via contact-rich robotic interactions on a Franka Panda arm.
EmbodiedClaw: Conversational Workflow Execution for Embodied AI Development cs.RO · 2026-04-15 · unverdicted · none · ref 47
EmbodiedClaw automates embodied AI development workflows through conversation, reducing manual effort and improving consistency and reproducibility.

arXiv preprint arXiv:2505.02836 (2025)

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer