pith. machine review for the scientific record. sign in

arxiv: 2604.27895 · v1 · submitted 2026-04-30 · 💻 cs.AI

Recognition: unknown

Graph World Models: Concepts, Taxonomy, and Future Directions

Authors on Pith no claims yet

Pith reviewed 2026-05-07 05:00 UTC · model grok-4.3

classification 💻 cs.AI
keywords graph world modelsworld modelsrelational inductive biasesgraph neural networksAI agentsstructured representationstaxonomyenvironment modeling
0
0 comments X

The pith

Graph world models decompose environments into entity nodes and interaction edges to overcome noise sensitivity and weak reasoning in classical tensor-based approaches.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines and unifies a class of world models that represent environments as graphs rather than flat tensors, allowing agents to learn structured predictions and plans. It argues that this decomposition into nodes for objects and edges for relations injects useful priors that classical models lack. The authors organize existing work into a taxonomy of three relational inductive biases: spatial for topology, physical for dynamics, and logical for causality. They review representative models in each category, compare their design choices, and outline open problems such as adapting graphs over time and creating dedicated benchmarks. This framing treats graph world models as an emerging paradigm rather than scattered techniques.

Core claim

Graph world models formalize the use of graphs to decompose environments into entity nodes and interactive edges, thereby modeling virtual environments in a structured space. The paper unifies these works under one concept and proposes a taxonomy driven by the specific relational inductive biases each model injects: spatial biases for topological abstraction, physical biases for dynamic simulation, and logical biases for causal and semantic reasoning. For each category the authors summarize key design principles and representative models while identifying shared limitations of flat tensor world models such as noise sensitivity, error accumulation, and weak reasoning.

What carries the argument

The taxonomy of relational inductive biases (RIB) that divides graph world models into spatial, physical, and logical categories, each supplying a distinct structural prior to the agent.

If this is right

  • Spatial RIB models can abstract topological structure to support efficient navigation and prediction.
  • Physical RIB models enable more accurate simulation of object dynamics and interactions.
  • Logical RIB models improve causal inference and long-horizon planning in agents.
  • Future GWMs must incorporate dynamic graph adaptation and probabilistic relational dynamics.
  • Dedicated benchmarks and metrics are required to evaluate structured world models separately from flat ones.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Combining graph world models with language-based planners could produce agents that both simulate physics and follow high-level instructions.
  • Real-world robotics systems might adopt physical-RIB models to reduce compounding errors during long manipulation sequences.
  • The taxonomy suggests a path toward multi-granularity models that switch between fine and coarse graph representations depending on task demands.
  • Standardized evaluation suites focused on graph adaptation and relational uncertainty would accelerate progress beyond current ad-hoc testing.

Load-bearing premise

Decomposing environments into entity nodes and interactive edges via graphs will systematically reduce noise sensitivity, error accumulation, and weak reasoning compared with flat tensor world models.

What would settle it

A controlled benchmark in which graph-based world models show no measurable reduction in error accumulation or improvement in reasoning accuracy relative to flat tensor baselines when environments contain realistic noise or partial observability.

Figures

Figures reproduced from arXiv: 2604.27895 by Bei Yu, Jiawei Liu, Mingjun Wang, Senqiao Yang, Yu Wang.

Figure 1
Figure 1. Figure 1: Comparison of the classical world models (WMs) and the view at source ↗
Figure 2
Figure 2. Figure 2: Taxonomy and representative papers on GWMs. Each work is placed according to its dominant relational inductive bias, while some view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of graph as connector. (Left) view at source ↗
Figure 5
Figure 5. Figure 5: Illustration of graph as reasoner. (Left) view at source ↗
read the original abstract

As one of the mainstream models of artificial intelligence, world models allow agents to learn the representation of the environment for efficient prediction and planning. However, classical world models based on flat tensors face several key problems, including noise sensitivity, error accumulation and weak reasoning. To address these limitations, many recent studies use graph structure to decompose the environment into entity nodes and interactive edges, and model virtual environments in a structured space. This paper systematically formalizes and unifies these emerging graph-based works under the concept of graph world models (GWMs). To the best of our knowledge, GWMs have not yet been explicitly defined and surveyed as a unified research paradigm. Furthermore, we propose a taxonomy based on relational inductive biases (RIB), categorizing GWMs by the specific structural priors they inject: (1) spatial RIB for topological abstraction; (2) physical RIB for dynamic simulation; and (3) logical RIB for causal and semantic reasoning. For each model category, we outline the key design principles, summarize representative models, and conduct comparative analyses. We further discuss open challenges and future directions, including dynamic graph adaptation, probabilistic relational dynamics, multi-granularity inductive biases, and the need for dedicated benchmarks and evaluation metrics for GWMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The paper introduces Graph World Models (GWMs) as a unified paradigm for world models that decompose environments into entity nodes and interactive edges using graph structures, in contrast to classical flat tensor-based world models that suffer from noise sensitivity, error accumulation, and weak reasoning. It claims to provide the first explicit definition and survey of this paradigm, proposes a taxonomy based on relational inductive biases (spatial for topological abstraction, physical for dynamic simulation, and logical for causal/semantic reasoning), reviews representative models and conducts comparative analyses for each category, and outlines open challenges and future directions such as dynamic graph adaptation, probabilistic relational dynamics, multi-granularity biases, and dedicated benchmarks.

Significance. If the taxonomy proves comprehensive and the classifications accurate, this work could establish a common framework for an emerging area at the intersection of world models, graph neural networks, and structured reasoning in AI, helping researchers compare approaches and identify gaps. The explicit unification and discussion of future directions (including the need for new evaluation metrics) represent a useful organizational contribution, though the paper's impact will depend on how well it captures the breadth of existing literature without significant omissions.

major comments (3)
  1. [Abstract and Introduction] Abstract and Introduction: The central claim that 'GWMs have not yet been explicitly defined and surveyed as a unified research paradigm' is load-bearing for the paper's novelty but is supported only by the phrase 'to the best of our knowledge' without a dedicated related-work subsection that systematically distinguishes this survey from prior reviews on graph-based RL, structured world models, or relational inductive biases in AI.
  2. [Taxonomy section] Taxonomy section (presumably Section 3): The three RIB categories are presented as distinct, but the manuscript does not specify a primary classification rule for models that exhibit multiple biases simultaneously (e.g., a model combining spatial topology with physical dynamics); this ambiguity risks inconsistent application of the taxonomy and weakens its utility for unifying the literature.
  3. [Comparative analyses] Comparative analyses (presumably Section 4): The comparative discussion of representative models across categories is qualitative and lacks any summary table, standardized metrics, or explicit criteria for comparison, making it difficult to evaluate whether the taxonomy reveals systematic differences in how each RIB addresses the motivating limitations (noise sensitivity, error accumulation, weak reasoning).
minor comments (3)
  1. [Introduction or Background] The manuscript would benefit from an explicit preliminary section defining graph notation (nodes, edges, adjacency) and how it maps to environment components, to improve accessibility for readers unfamiliar with GNNs.
  2. [Future directions] Future directions subsection on 'dedicated benchmarks and evaluation metrics' lists the need but does not propose even high-level examples of what such metrics might measure (e.g., relational prediction accuracy or graph edit distance under noise), leaving the discussion somewhat open-ended.
  3. [Figures] If figures are included to illustrate the taxonomy or model architectures, ensure they are accompanied by captions that explicitly link visual elements to the three RIB categories for clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. Each major comment has been carefully considered, and we outline our responses and planned revisions below. We believe these changes will strengthen the clarity, rigor, and utility of the paper.

read point-by-point responses
  1. Referee: [Abstract and Introduction] Abstract and Introduction: The central claim that 'GWMs have not yet been explicitly defined and surveyed as a unified research paradigm' is load-bearing for the paper's novelty but is supported only by the phrase 'to the best of our knowledge' without a dedicated related-work subsection that systematically distinguishes this survey from prior reviews on graph-based RL, structured world models, or relational inductive biases in AI.

    Authors: We agree that the novelty claim would be more robust with explicit differentiation from prior surveys. Although the manuscript contains a Related Work section, it lacks a dedicated subsection for systematic comparison. In the revised manuscript, we will add a new subsection within Related Work that explicitly contrasts our survey with existing reviews on graph-based RL, structured world models, and relational inductive biases. This will provide a clearer justification for presenting GWMs as a unified paradigm. revision: yes

  2. Referee: [Taxonomy section] Taxonomy section (presumably Section 3): The three RIB categories are presented as distinct, but the manuscript does not specify a primary classification rule for models that exhibit multiple biases simultaneously (e.g., a model combining spatial topology with physical dynamics); this ambiguity risks inconsistent application of the taxonomy and weakens its utility for unifying the literature.

    Authors: We acknowledge the need for clearer guidance on hybrid models. To address this, we will introduce an explicit primary classification rule based on the dominant relational inductive bias emphasized in each model's core design and primary contribution. We will also add a dedicated discussion of multi-bias models, including examples and classification guidance, to ensure consistent and unambiguous application of the taxonomy. revision: yes

  3. Referee: [Comparative analyses] Comparative analyses (presumably Section 4): The comparative discussion of representative models across categories is qualitative and lacks any summary table, standardized metrics, or explicit criteria for comparison, making it difficult to evaluate whether the taxonomy reveals systematic differences in how each RIB addresses the motivating limitations (noise sensitivity, error accumulation, weak reasoning).

    Authors: We recognize that the current comparisons are qualitative and would benefit from greater structure. In the revision, we will add a summary table comparing representative models across the three RIB categories. The table will use explicit criteria tied to the motivating limitations (noise sensitivity, error accumulation, and weak reasoning), along with key design principles. While fully standardized quantitative metrics are difficult to apply uniformly across heterogeneous models and experimental setups, the table and accompanying criteria will make systematic differences more transparent and evaluable. revision: partial

Circularity Check

0 steps flagged

No significant circularity; survey paper with no derivations or self-referential reductions

full rationale

This is a literature survey that introduces the term 'graph world models' and organizes prior work under a taxonomy of relational inductive biases. The central claim is that GWMs have not yet been explicitly defined and surveyed as a unified paradigm, which is an external observation about the literature rather than a derivation from internal equations, fitted parameters, or self-citations. No load-bearing steps reduce by construction to the paper's own inputs; all motivations and references point to external cited works. The paper contains no equations, theorems, or experimental results that could exhibit self-definition, fitted-input predictions, or uniqueness imported from the authors' prior work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that graph decompositions inherently supply useful relational inductive biases that flat tensors lack; no free parameters or new invented entities are introduced.

axioms (1)
  • domain assumption Graph structures decompose environments into entity nodes and interactive edges that improve modeling over flat tensors.
    Directly stated in the abstract as the motivation for graph world models.

pith-pipeline@v0.9.0 · 5525 in / 1256 out tokens · 63581 ms · 2026-05-07T05:00:44.640525+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

62 extracted references · 11 canonical work pages · 3 internal anchors

  1. [1]

    World Models

    D. Ha and J. Schmidhuber, “World models,”arXiv preprint arXiv:1803.10122, 2018

  2. [2]

    Recurrent world models facilitate policy evolution,

    ——, “Recurrent world models facilitate policy evolution,”Proc. of NeurIPS, vol. 31, 2018

  3. [3]

    Mastering diverse control tasks through world models,

    D. Hafner, J. Pasukonis, J. Ba, and T. Lillicrap, “Mastering diverse control tasks through world models,”Nature, pp. 1–7, 2025

  4. [4]

    World models for autonomous driving: An initial survey,

    Y . Guan, H. Liao, Z. Liet al., “World models for autonomous driving: An initial survey,”IEEE Trans Intell V eh, 2024

  5. [5]

    Sora: Creating video from text,

    OpenAI, “Sora: Creating video from text,” https://openai.com/ sora, 2024, retrieved May 06, 2025

  6. [6]

    Semi-parametric topological memory for navigation,

    N. Savinov, A. Dosovitskiy, and V . Koltun, “Semi-parametric topological memory for navigation,” inProc. of ICLR, 2018

  7. [7]

    World model as a graph: Learning latent landmarks for planning,

    L. Zhang, G. Yang, and B. C. Stadie, “World model as a graph: Learning latent landmarks for planning,” inProc. of ICML. PMLR, 2021, pp. 12 611–12 620

  8. [8]

    Pose invariant topological memory for visual navigation,

    A. Taniguchi, F. Sasaki, and R. Yamashina, “Pose invariant topological memory for visual navigation,” inProc. of ICCV, 2021, pp. 15 384–15 393

  9. [9]

    Contrastive learning of structured world models,

    T. Kipf, E. van der Pol, and M. Welling, “Contrastive learning of structured world models,” inProc. of ICLR, 2020

  10. [10]

    Visual grounding of learned physical models,

    Y . Li, T. Lin, K. Yiet al., “Visual grounding of learned physical models,” inProc. of ICML. PMLR, 2020, pp. 5927–5936

  11. [11]

    Facts: A factored state-space framework for world modelling,

    L. Nanbo, F. Laakom, W. Wang, J. Schmidhuberet al., “Facts: A factored state-space framework for world modelling,” inProc. of ICLR, 2025

  12. [12]

    Coke: A cognitive knowledge graph for machine theory of mind,

    J. Wu, Z. Chen, J. Deng, S. Sabour, H. Meng, and M. Huang, “Coke: A cognitive knowledge graph for machine theory of mind,” inProc. of ACL, 2024, pp. 15 984–16 007

  13. [13]

    Graph world model,

    T. Feng, Y . Wu, G. Lin, and J. You, “Graph world model,” in Proc. of ICML, 2025

  14. [14]

    Relational inductive biases, deep learning, and graph networks

    P. W. Battaglia, J. B. Hamrick, V . Bapstet al., “Relational inductive biases, deep learning, and graph networks,”arXiv preprint arXiv:1806.01261, 2018

  15. [15]

    Robotic world mod- els—conceptualization, review, and engineering best practices,

    R. Sakagami, F. S. Lay, A. D ¨omelet al., “Robotic world mod- els—conceptualization, review, and engineering best practices,” Front. robot. AI, vol. 10, p. 1253049, 2023

  16. [16]

    Sora as an agi world model? a complete survey on text-to-video generation,

    J. Cho, F. D. Puspitasari, S. Zhenget al., “Sora as an agi world model? a complete survey on text-to-video generation,”arXiv preprint arXiv:2403.05131, 2024

  17. [17]

    Understanding world or predicting fu- ture? a comprehensive survey of world models,

    J. Ding, Y . Zhanget al., “Understanding world or predicting fu- ture? a comprehensive survey of world models,”CSUR, vol. 58, no. 3, pp. 1–38, 2025

  18. [18]

    arXiv preprint arXiv:2506.22355 (2025) 5

    P. Fung, Y . Bachrach, A. Celikyilmazet al., “Embodied ai agents: Modeling the world,”arXiv preprint arXiv:2506.22355, 2025

  19. [19]

    Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond

    M. Chu, X. B. Zhang, K. Q. Lin, L. Kong, J. Zhang, T. Tu, W. Ma, Z. Huang, S. Yang, W. Huanget al., “Agentic world modeling: Foundations, capabilities, laws, and beyond,”arXiv preprint arXiv:2604.22748, 2026

  20. [20]

    Factored adaptation for non-stationary reinforcement learning,

    F. Feng, B. Huang, K. Zhang, and S. Magliacane, “Factored adaptation for non-stationary reinforcement learning,” inProc. of NeurIPS, 2022, pp. 31 957–31 971

  21. [21]

    Variational causal dy- namics: Discovering modular world models from interventions,

    A. Lei, B. Sch ¨olkopf, and I. Posner, “Variational causal dy- namics: Discovering modular world models from interventions,” TMLR, 2023

  22. [22]

    Curious causality- seeking agents learn meta causal world,

    Z. Zhao, H. Li, H. Zhanget al., “Curious causality- seeking agents learn meta causal world,”arXiv preprint arXiv:2506.23068, 2025

  23. [23]

    Learning knowledge graph- based world models of textual environments,

    P. Ammanabrolu and M. O. Riedl, “Learning knowledge graph- based world models of textual environments,” inProc. of NeurIPS, 2021, pp. 3720–3731

  24. [24]

    S3: Social-network simulation system with large language model-empowered agents

    C. Gao, X. Lan, Z. Lu, J. Maoet al., “S3: Social-network sim- ulation system with large language model-empowered agents,” arXiv preprint arXiv:2307.14984, 2023

  25. [25]

    Unlocking smarter device control: Foresighted planning with a world model-driven code execution approach,

    X. Yin, X. Luo, H. Wu, L. Gao, and J. Song, “Unlocking smarter device control: Foresighted planning with a world model-driven code execution approach,”arXiv preprint arXiv:2505.16422, 2025

  26. [26]

    Davis: Planning agent with knowledge graph-powered inner mono- logue,

    M. P. Dinh, M. G. Yankoski, M. Syed, and T. W. Ford, “Davis: Planning agent with knowledge graph-powered inner mono- logue,” inFindings of EMNLP, 2025, pp. 16 490–16 505

  27. [27]

    Arigraph: Learning knowledge graph world models with episodic memory for llm agents,

    P. Anokhin, N. Semenov, A. Sorokinet al., “Arigraph: Learning knowledge graph world models with episodic memory for llm agents,” inProc. of IJCAI, 2025

  28. [28]

    Open-vocabulary func- tional 3d scene graphs for real-world indoor spaces,

    C. Zhang, A. Delitzas, F. Wanget al., “Open-vocabulary func- tional 3d scene graphs for real-world indoor spaces,” inProc. of CVPR, 2025, pp. 19 401–19 413

  29. [29]

    Neurosymbolic world models for sequential decision making,

    L. H. Cano, M. Perroni-Scharf, N. Dhiret al., “Neurosymbolic world models for sequential decision making,” inProc. of KDD, 2025

  30. [30]

    Neurosymbolic graph en- richment for grounded world models,

    S. De Giorgis, A. Gangemiet al., “Neurosymbolic graph en- richment for grounded world models,”IPM, vol. 62, no. 4, p. 104127, 2025

  31. [31]

    Yulan-onesim: Towards the next generation of social simulator with large language models,

    L. Wang, H. Gao, X. Boet al., “Yulan-onesim: Towards the next generation of social simulator with large language models,” in NeurIPS 2025 Workshop SEA, 2025

  32. [32]

    Leveraging world model disentangle- ment in value-based multi-agent reinforcement learning,

    Z. Wang and D. Meger, “Leveraging world model disentangle- ment in value-based multi-agent reinforcement learning,”arXiv preprint arXiv:2309.04615, 2023

  33. [33]

    Robopack: Learning tactile- informed dynamics models for dense packing,

    B. Ai, S. Tian, H. Shiet al., “Robopack: Learning tactile- informed dynamics models for dense packing,” inICRA 2024 Workshop on 3D Visual Representations for Robot Manipulation, 2024

  34. [34]

    Modeling the real world with high-density visual particle dynamics,

    W. F. Whitney, J. Varleyet al., “Modeling the real world with high-density visual particle dynamics,” inProc. of CoRL, 2024, pp. 1427–1442

  35. [35]

    Improving generative imagination in object-centric world models,

    Z. Lin, Y .-F. Wuet al., “Improving generative imagination in object-centric world models,” inProc. of ICML. PMLR, 2020, pp. 6140–6149

  36. [36]

    Causal world models by unsupervised deconfounding of physical dynamics,

    M. Li, M. Yang, F. Liu, X. Chenet al., “Causal world models by unsupervised deconfounding of physical dynamics,”arXiv preprint arXiv:2012.14228, 2020

  37. [37]

    Gatsbi: Generative agent-centric spatio-temporal object interaction,

    C.-H. Min, J. Baeet al., “Gatsbi: Generative agent-centric spatio-temporal object interaction,” inProc. of CVPR, 2021, pp. 3074–3083

  38. [38]

    3d-oes: Viewpoint-invariant object- factorized environment simulators,

    H.-Y . Tung, Z. Xianet al., “3d-oes: Viewpoint-invariant object- factorized environment simulators,” inProc. of CoRL. PMLR, 2021, pp. 1669–1683

  39. [39]

    Curious exploration via structured world models yields zero-shot object manipulation,

    C. Sancaktar, S. Blaes, and G. Martius, “Curious exploration via structured world models yields zero-shot object manipulation,” 8 inProc. of NeurIPS, 2022, pp. 24 170–24 183

  40. [40]

    Relational object- centric actor-critic,

    L. A. Ugadiarov, V . V orobyov, and A. Panov, “Relational object- centric actor-critic,” inProc. of CLeaR, 2025

  41. [41]

    Learning interactive world model for object-centric reinforcement learning,

    F. Feng, P. Lippe, and S. Magliacane, “Learning interactive world model for object-centric reinforcement learning,” inProc. of NeurIPS, 2025

  42. [42]

    Dyn-o: Building structured world models with object-centric representations,

    Z. Wang, K. Wang, L. Zhaoet al., “Dyn-o: Building structured world models with object-centric representations,” inProc. of NeurIPS, 2025

  43. [43]

    Learning world graphs to accelerate hierarchical reinforcement learning,

    W. Shang, A. Trott, S. Zhenget al., “Learning world graphs to accelerate hierarchical reinforcement learning,”arXiv preprint arXiv:1907.00664, 2019

  44. [44]

    Chunking space and time with information geometry,

    T. Verbelen, D. De Tinguyet al., “Chunking space and time with information geometry,” inNeurIPS 2022 Workshop InfoCog, 2022

  45. [45]

    Value memory graph: A graph-structured world model for offline reinforcement learning,

    D. Zhu, L. E. Liet al., “Value memory graph: A graph-structured world model for offline reinforcement learning,” inProc. of ICLR, 2023

  46. [46]

    Sample efficient reinforce- ment learning using graph-based memory reconstruction,

    Y . Kang, E. Zhao, Y . Zanget al., “Sample efficient reinforce- ment learning using graph-based memory reconstruction,”IEEE Trans. Artif. Intell, vol. 5, no. 2, pp. 751–762, 2023

  47. [47]

    Invariant representations learning with future dynamics,

    W. Hu, M. He, X. Chen, and N. Wang, “Invariant representations learning with future dynamics,”Eng. Appl. Artif. Intell, vol. 128, p. 107496, 2024

  48. [48]

    Incorporating spatial information into goal-conditioned hierarchical reinforce- ment learning via graph representations,

    S. Zhang, Z. Wang, X.-W. Chang, and D. Precup, “Incorporating spatial information into goal-conditioned hierarchical reinforce- ment learning via graph representations,”TMLR, 2025

  49. [49]

    Search on the replay buffer: bridging planning and reinforcement learning,

    B. Eysenbach, R. Salakhutdinov, and S. Levine, “Search on the replay buffer: bridging planning and reinforcement learning,” in Proc. of NeurIPS, 2019, pp. 15 246–15 257

  50. [50]

    Planning from pixels in environments with combinatorially hard search spaces,

    M. Bagatella, M. Ol ˇs´ak, M. Rolinek, and G. Martius, “Planning from pixels in environments with combinatorially hard search spaces,” inProc. of NeurIPS, 2022, pp. 24 707–24 718

  51. [51]

    Learning state reachability as a graph in translation invariant goal-based rein- forcement learning tasks,

    H. Bonnavaud, A. Albore, and E. Rachelson, “Learning state reachability as a graph in translation invariant goal-based rein- forcement learning tasks,” inEWRL, 2023

  52. [52]

    Dreamwalker: Mental planning for continuous vision-language navigation,

    H. Wang, W. Liang, L. Van Gool, and W. Wang, “Dreamwalker: Mental planning for continuous vision-language navigation,” in Proc. of ICCV, 2023, pp. 10 873–10 883

  53. [53]

    Over-nav: Elevating iterative vision- and-language navigation with open-vocabulary detection and structured representation,

    G. Zhao, G. Liet al., “Over-nav: Elevating iterative vision- and-language navigation with open-vocabulary detection and structured representation,” inProc. of CVPR, 2024, pp. 16 296– 16 306

  54. [54]

    Anyhome: Open-vocabulary generation of structured and textured 3d homes,

    R. Fu, Z. Wen, Z. Liuet al., “Anyhome: Open-vocabulary generation of structured and textured 3d homes,” inProc. of ECCV, 2024, pp. 52–70

  55. [55]

    Conceptgraphs: Open-vocabulary 3d scene graphs for perception and planning,

    Q. Gu, A. Kuwajerwalaet al., “Conceptgraphs: Open-vocabulary 3d scene graphs for perception and planning,” inProc. of ICRA, 2024, pp. 5021–5028

  56. [56]

    Open3dsg: Open-vocabulary 3d scene graphs from point clouds with queryable objects and open- set relationships,

    S. Koch, N. Vaskeviciuset al., “Open3dsg: Open-vocabulary 3d scene graphs from point clouds with queryable objects and open- set relationships,” inProc. of CVPR, 2024, pp. 14 183–14 193

  57. [57]

    Citynavagent: Aerial vision- and-language navigation with hierarchical semantic planning and global memory,

    W. Zhang, C. Gao, S. Yuet al., “Citynavagent: Aerial vision- and-language navigation with hierarchical semantic planning and global memory,” inProc. of ACL, 2025

  58. [58]

    Open-world 3d scene graph generation for retrieval-augmented reasoning,

    F. Yu, Q. Deng, S. Tanget al., “Open-world 3d scene graph generation for retrieval-augmented reasoning,” inProc. of AAAI, 2026

  59. [59]

    Citygpt: Empowering urban spatial cognition of large language models,

    J. Feng, T. Liu, Y . Duet al., “Citygpt: Empowering urban spatial cognition of large language models,” inProc. of KDD, 2025, pp. 591–602

  60. [60]

    Agentmove: A large language model based agentic framework for zero-shot next location prediction,

    J. Feng, Y . Du, J. Zhao, and Y . Li, “Agentmove: A large language model based agentic framework for zero-shot next location prediction,” inProc. of NAACL, 2025, pp. 1322–1338

  61. [61]

    Worldsimbench: Towards video generation models as world simulators,

    Y . Qin, Z. Shi, J. Yuet al., “Worldsimbench: Towards video generation models as world simulators,” inProc. of ICML, 2025

  62. [62]

    Vbench++: Comprehensive and versatile benchmark suite for video generative models,

    Z. Huang, F. Zhang, X. Xuet al., “Vbench++: Comprehensive and versatile benchmark suite for video generative models,” IEEE TPAMI, 2025