An LLM agentic system builds executable GEST specifications via a hierarchical Director-Scene Builder architecture with Relation Subagents, then runs them in a 3D engine, outperforming neural models on physical validity and semantic alignment in human and jury evaluations.
Action genome: Actions as compositions of spatio- temporal scene graphs
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
GraphThinker reduces temporal hallucinations in video reasoning by constructing event-based scene graphs and applying visual attention rewards in reinforcement finetuning.
citing papers explorer
-
Agentic Video Generation: From Text to Executable Event Graphs via Tool-Constrained LLM Planning
An LLM agentic system builds executable GEST specifications via a hierarchical Director-Scene Builder architecture with Relation Subagents, then runs them in a 3D engine, outperforming neural models on physical validity and semantic alignment in human and jury evaluations.
-
GraphThinker: Reinforcing Temporally Grounded Video Reasoning with Event Graph Thinking
GraphThinker reduces temporal hallucinations in video reasoning by constructing event-based scene graphs and applying visual attention rewards in reinforcement finetuning.