HOG-Layout enables text-driven hierarchical 3D scene generation, optimization, and real-time editing using LLMs, VLMs, RAG for semantic consistency, and an optimization module for physical plausibility.
Tree of thoughts: Deliberate problem solving with large language models
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2verdicts
UNVERDICTED 2representative citing papers
R1-Onevision turns images into structured text for multimodal reasoning, trains on a custom dataset with RL, and claims SOTA results on an educational benchmark.
citing papers explorer
-
HOG-Layout: Hierarchical 3D Scene Generation, Optimization and Editing via Vision-Language Models
HOG-Layout enables text-driven hierarchical 3D scene generation, optimization, and real-time editing using LLMs, VLMs, RAG for semantic consistency, and an optimization module for physical plausibility.
-
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
R1-Onevision turns images into structured text for multimodal reasoning, trains on a custom dataset with RL, and claims SOTA results on an educational benchmark.