Predictive Spatio-Temporal Scene Graphs for Semi-Static Scenes
Pith reviewed 2026-05-09 20:18 UTC · model grok-4.3
The pith
Scene graphs equipped with Bayesian filters on their edges can predict future states of semi-static environments.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors claim that their PredictiveGraphs structure, which places Perpetua* Bayesian filters on the edges of a 3D scene graph, enables tempo-spatio-semantic reasoning. This allows the system to predict the future configuration of the scene by modeling how object relationships evolve in a structured manner over repeated observations.
What carries the argument
Perpetua* filters, which are Bayesian reasoners placed on scene graph edges to encode and predict changes in spatio-semantic relationships between objects.
If this is right
- Robots gain the ability to anticipate object positions in cyclic daily routines, such as a mug's movement through a kitchen.
- Prediction accuracy holds up in real-world settings with bi-hourly changes observed over three weeks.
- The method shows robustness to distributional shifts in the environment's state.
- Overall performance exceeds that of baseline approaches in both simulated and physical dynamic navigation tasks.
Where Pith is reading between the lines
- This approach might reduce the need for robots to frequently rebuild their environmental maps in stable but changing locations.
- Applying similar filters to other sensor data could extend predictions to more variables like lighting or temperature patterns.
- Future tests could measure if these predictions improve the success rate of long-horizon robot tasks in homes.
Load-bearing premise
The environment's changes are structured and semi-static in ways that can be modeled effectively by Bayesian filters on the relationships in a scene graph.
What would settle it
Running the system in an environment where object movements are random and lack any detectable pattern, then checking if it still outperforms static scene graph baselines in state prediction.
Figures
read the original abstract
We have seen tremendous recent progress in our ability to build "spatio-semantic" representations that enable robots to perform complex reasoning across geometry and semantics. However, the vast majority of these methods lack any ability to perform reasoning across time. This is a desirable property in situations where a robot repeatedly observes an environment where instances may change in between observations, but in a structured way. Consider as an example a home environment where the location of a mug typically moves from the cupboard to a countertop to the sink and then back to the cupboard on a daily basis. We should be able to learn this cyclic behavior and use it to predict the state of the mug in the future. In this work, we propose a method that is able to perform this type of tempo-spatio-semantic reasoning. Underpinning the method is a filter, Perpetua$^*$, that performs Bayesian reasoning on the states of the environment that are observed over time. This filter is integrated within a 3D scene graph structure that we call PredictiveGraphs, where nodes represent objects and edges function as Perpetua$^*$ filters encoding spatio-semantic relationships. We validate the method in both simulation and real-world dynamic navigation tasks, where our real world experiments consist of an environment that is undergoing semi-static changes at a bi-hourly frequency over a period of three weeks. In both settings, we demonstrate that our method outperforms baselines in predicting future environment states, even in the presence of distributional shifts.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes PredictiveGraphs, a 3D scene-graph representation for semi-static scenes in which nodes represent objects and edges host Perpetua* Bayesian filters that track and predict the temporal evolution of spatio-semantic relations. The central claim is that this architecture enables accurate future-state prediction in environments with structured, recurring changes (e.g., cyclic mug movements) and that the method outperforms baselines in both simulation and a three-week real-world deployment with bi-hourly observations, even under distributional shift.
Significance. If the empirical superiority is reproducible and the filter derivations are sound, the work would fill a recognized gap between static spatio-semantic scene graphs and long-horizon robotic reasoning, offering a practical way to embed structured temporal models directly into graph edges.
major comments (2)
- Abstract: the claim that the method 'outperforms baselines in predicting future environment states' is presented without any quantitative metrics, error bars, or description of the evaluation protocol for the Perpetua* filter; this absence prevents assessment of whether the reported gains are load-bearing for the central contribution.
- The weakest assumption—that semi-static changes are sufficiently structured to be captured by independent Bayesian filters on scene-graph edges—is stated but not tested against alternative temporal models (e.g., recurrent or attention-based predictors) that might violate the independence implicit in per-edge filtering.
minor comments (2)
- The asterisk in 'Perpetua*' is never explained; a footnote or sentence clarifying whether it denotes a variant, a trademark, or a placeholder would improve readability.
- No reference is supplied for the baseline methods used in the real-world experiments; adding citations in the evaluation section would allow readers to judge the strength of the comparison.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help clarify the presentation of our empirical results and the scope of our modeling assumptions. We address each major comment below and indicate the revisions we will make.
read point-by-point responses
-
Referee: Abstract: the claim that the method 'outperforms baselines in predicting future environment states' is presented without any quantitative metrics, error bars, or description of the evaluation protocol for the Perpetua* filter; this absence prevents assessment of whether the reported gains are load-bearing for the central contribution.
Authors: We agree that the abstract would be strengthened by including concrete quantitative support. The full manuscript reports mean prediction errors with standard deviations (across 10 simulation runs and the three-week real-world deployment), the bi-hourly observation protocol, and the exact baselines used; these appear in Sections 4.2, 4.3, and 5. We will revise the abstract to state the key quantitative gains (e.g., average error reduction) and reference the evaluation protocol while remaining within length limits. revision: yes
-
Referee: The weakest assumption—that semi-static changes are sufficiently structured to be captured by independent Bayesian filters on scene-graph edges—is stated but not tested against alternative temporal models (e.g., recurrent or attention-based predictors) that might violate the independence implicit in per-edge filtering.
Authors: We acknowledge that the paper does not empirically compare the independent per-edge filters against models that relax independence (e.g., recurrent predictors or attention over multiple edges). The design choice prioritizes modularity, per-relation interpretability, and efficient online updates, which are central to the contribution. To address the concern we will add a dedicated paragraph in the discussion section analyzing the independence assumption and include a limited ablation that replaces the set of independent filters with a single LSTM predictor operating on aggregated edge features, evaluated on the same simulation and real-world datasets. revision: yes
Circularity Check
No significant circularity detected in derivation or claims
full rationale
The provided abstract and context describe an architectural proposal (Perpetua* Bayesian filters on PredictiveGraphs scene-graph edges) for modeling semi-static temporal changes, followed by empirical validation in simulation and a three-week real-world experiment with distributional-shift testing. No equations, parameter-fitting steps, or derivations appear that could reduce by construction to inputs. No load-bearing self-citations, uniqueness theorems, or ansatzes are referenced in the given text. The central claim (outperformance on future-state prediction) rests on external experimental benchmarks rather than tautological redefinitions or fitted-input renamings, making the work self-contained against the stated premises.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Object locations in the target environments change in a structured, semi-static manner that can be modeled by Bayesian updates over repeated observations.
invented entities (2)
-
Perpetua* filter
no independent evidence
-
PredictiveGraphs
no independent evidence
Reference graph
Works this paper leans on
-
[1]
A new look at the statistical model identification
Hirotugu Akaike. A new look at the statistical model identification. InSelected Papers of Hirotugu Akaike, pages 215–222. Springer, 1974
work page 1974
-
[2]
Probper-lilo: Probabilistic persistency modeling for life-long mapping.RA-L, 11(3):2530–2537, 2026
Waqas Ali, Yixi Cai, Patric Jensfelt, and Thien-Minh Nguyen. Probper-lilo: Probabilistic persistency modeling for life-long mapping.RA-L, 11(3):2530–2537, 2026
work page 2026
-
[3]
Remembr: Building and reasoning over long-horizon spatio-temporal memory for robot naviga- tion
Abrar Anwar, John Welsh, Joydeep Biswas, Soha Pouya, and Yan Chang. Remembr: Building and reasoning over long-horizon spatio-temporal memory for robot naviga- tion. InICRA, pages 2838–2845. IEEE, 2025
work page 2025
-
[4]
Tjark Behrens, Ren ´e Zurbr ¨ugg, Marc Pollefeys, Zuria Bauer, and Hermann Blum. Lost & found: Tracking changes from egocentric observations in 3d dynamic scene graphs.RA-L, 10(4):3739–3746, 2025
work page 2025
-
[5]
Dynamic maps for long- term operation of mobile service robots
Peter Biber and Tom Duckett. Dynamic maps for long- term operation of mobile service robots. InRSS, 2005
work page 2005
-
[6]
Benjamin Bogenberger, Oliver Harrison, Orrin Da- hanaggamaarachchi, Lukas Brunke, Jingxing Qian, Siqi Zhou, and Angela P Schoellig. Where did i leave my glasses? open-vocabulary semantic exploration in real- world semi-static environments.PrePrint, 2025
work page 2025
-
[7]
Cesar Cadena, Luca Carlone, Henry Carrillo, Yasir Latif, Davide Scaramuzza, Jos ´e Neira, Ian Reid, and John J Leonard. Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age.T-RO, 32(6):1309–1332, 2016
work page 2016
-
[8]
Sam 3: Segment anything with concepts
Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoub- hik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, et al. Sam 3: Segment anything with concepts. PrePrint, 2025
work page 2025
-
[9]
From Localization and Mapping to Spatial Intelligence
Luca Carlone, Ayoung Kim, Timothy Barfoot, Daniel Cremers, and Frank Dellaert, editors.SLAM Handbook. From Localization and Mapping to Spatial Intelligence. Cambridge University Press, 2025
work page 2025
-
[10]
Harrison Chase. Langchain, 2022. URL https://github. com/langchain-ai/langchain. Software
work page 2022
-
[11]
Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al. Gem- ini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities.PrePrint, 2025
work page 2025
-
[12]
Procthor: Large-scale embodied ai using procedural gen- eration.NeurIPS, 35:5982–5994, 2022
Matt Deitke, Eli VanderBilt, Alvaro Herrasti, Luca Weihs, Kiana Ehsani, Jordi Salvador, Winson Han, Eric Kolve, Aniruddha Kembhavi, and Roozbeh Mottaghi. Procthor: Large-scale embodied ai using procedural gen- eration.NeurIPS, 35:5982–5994, 2022
work page 2022
-
[13]
Tianchen Deng, Hongle Xie, Jingchuan Wang, and Wei- dong Chen. Long-term visual simultaneous localization and mapping: Using a bayesian persistence filter-based global map prediction.RAM, 30(1):36–49, 2023
work page 2023
-
[14]
Enter the mind palace: Reasoning and planning for long-term active embodied question answering
Muhammad Fadhil Ginting, Dong-Ki Kim, Xiangyun Meng, Andrzej Marek Reinke, Bandi Jai Krishna, Navid Kayhani, Oriana Peltzer, David Fan, Amirreza Shaban, Sung-Kyun Kim, Mykel Kochenderfer, Ali akbar Agha- mohammadi, and Shayegan Omidshafiei. Enter the mind palace: Reasoning and planning for long-term active embodied question answering. InCoRL, 2025
work page 2025
-
[15]
Long- term human trajectory prediction using 3d dynamic scene graphs.RA-L, 2024
Nicolas Gorlo, Lukas Schmid, and Luca Carlone. Long- term human trajectory prediction using 3d dynamic scene graphs.RA-L, 2024
work page 2024
-
[16]
De- scribe anything anywhere at any moment.Computer Vision and Pattern Recognition (CVPR), 2026
Nicolas Gorlo, Lukas Schmid, and Luca Carlone. De- scribe anything anywhere at any moment.Computer Vision and Pattern Recognition (CVPR), 2026
work page 2026
-
[17]
Conceptgraphs: Open-vocabulary 3d scene graphs for perception and planning
Qiao Gu, Ali Kuwajerwala, Sacha Morin, Krishna Murthy Jatavallabhula, Bipasha Sen, Aditya Agarwal, Corban Rivera, William Paul, Kirsty Ellis, Rama Chellappa, et al. Conceptgraphs: Open-vocabulary 3d scene graphs for perception and planning. InICRA, pages 5021–5028. IEEE, 2024
work page 2024
-
[18]
Dynamic hilbert maps: Real-time occupancy predictions in changing environments
Vitor Guizilini, Ransalu Senanayake, and Fabio Ramos. Dynamic hilbert maps: Real-time occupancy predictions in changing environments. InICRA, pages 4091–4097. IEEE, 2019
work page 2019
-
[19]
Predictive and adaptive maps for long-term visual navigation in changing environments
Lucie Halodov ´a, Eliˇska Dvoˇrr´akov´a, Filip Majer, Tom ´aˇs Vintr, Oscar Martinez Mozos, Feras Dayoub, and Tom ´aˇs Krajn´ık. Predictive and adaptive maps for long-term visual navigation in changing environments. InIROS, pages 7033–7039, 2019
work page 2019
-
[20]
Jiajun Jiang, Yiming Zhu, Zirui Wu, and Jie Song. Dualmap: Online open-vocabulary semantic mapping for natural language navigation in dynamic changing scenes. RA-L, 10(12):12612–12619, 2025
work page 2025
-
[21]
Ai2-thor: An interactive 3d environment for visual ai.arXiv, 2017
Eric Kolve, Roozbeh Mottaghi, Winson Han, Eli Van- derBilt, Luca Weihs, Alvaro Herrasti, Daniel Gordon, Yuke Zhu, Abhinav Gupta, and Ali Farhadi. Ai2-thor: An interactive 3d environment for visual ai.arXiv, 2017
work page 2017
-
[22]
Tom ´aˇs Krajn´ık, Jaime P. Fentanes, Jo ˜ao M. Santos, and Tom Duckett. Fremen: Frequency map enhancement for long-term mobile robot autonomy in changing environ- ments.T-RO, 33(4):964–977, 2017
work page 2017
-
[23]
Modeling dynamic environments with scene graph memory
Andrey Kurenkov, Michael Lingelbach, Tanmay Agar- wal, Emily Jin, Chengshu Li, Ruohan Zhang, Li Fei- Fei, Jiajun Wu, Silvio Savarese, and Roberto Martın- Martın. Modeling dynamic environments with scene graph memory. InInternational Conference on Machine Learning, pages 17976–17993. PMLR, 2023
work page 2023
-
[24]
Mathieu Labb ´e and Franc ¸ois Michaud. Rtab-map as an open-source lidar and visual simultaneous localization and mapping library for large-scale and long-term online operation.Journal of field robotics, 36(2):416–446, 2019
work page 2019
-
[25]
Retrieval-augmented generation for knowledge- intensive nlp tasks.NeurIPS, 33:9459–9474, 2020
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K¨uttler, Mike Lewis, Wen-tau Yih, Tim Rockt ¨aschel, et al. Retrieval-augmented generation for knowledge- intensive nlp tasks.NeurIPS, 33:9459–9474, 2020
work page 2020
-
[26]
Dynamem: Online dynamic spatio-semantic memory for open world mobile manipulation
Peiqi Liu, Zhanqiu Guo, Mohit Warke, Soumith Chintala, Chris Paxton, Nur Muhammad Mahi Shafiullah, and Lerrel Pinto. Dynamem: Online dynamic spatio-semantic memory for open world mobile manipulation. InICRA, pages 13346–13355. IEEE, 2025
work page 2025
-
[27]
3d vsg: Long- term semantic scene change prediction through 3d vari- able scene graphs
Samuel Looper, Javier Rodriguez-Puigvert, Roland Sieg- wart, Cesar Cadena, and Lukas Schmid. 3d vsg: Long- term semantic scene change prediction through 3d vari- able scene graphs. InICRA, pages 8179–8186, 2023
work page 2023
-
[28]
3dgs-cd: 3d gaussian splatting-based change detection for physical object rearrangement.RA-L, 2025
Ziqi Lu, Jianbo Ye, and John Leonard. 3dgs-cd: 3d gaussian splatting-based change detection for physical object rearrangement.RA-L, 2025
work page 2025
-
[29]
Clio: Real-time task- driven open-set 3d scene graphs.RA-L, 2024
Dominic Maggio, Yun Chang, Nathan Hughes, Matthew Trang, Dan Griffith, Carlyn Dougherty, Eric Cristofalo, Lukas Schmid, and Luca Carlone. Clio: Real-time task- driven open-set 3d scene graphs.RA-L, 2024
work page 2024
-
[30]
Murphy.Probabilistic Machine Learning: An introduction
Kevin P. Murphy.Probabilistic Machine Learning: An introduction. MIT Press, 2022
work page 2022
-
[31]
Fernando Nobre, Christoffer Heckman, Paul Ozog, Ryan W. Wolcott, and Jeffrey M. Walls. Online proba- bilistic change detection in feature-based maps. InICRA, pages 3661–3668, 2018
work page 2018
-
[32]
Proactive robot assistance via spatio-temporal object modeling.PrePrint, 2022
Maithili Patel and Sonia Chernova. Proactive robot assistance via spatio-temporal object modeling.PrePrint, 2022
work page 2022
-
[33]
Waslander, and Angela Schoellig
Jingxing Qian, Veronica Chatrath, James Servos, Aaron Mavrinac, Wolfram Burgard, Steven L. Waslander, and Angela Schoellig. Pov-slam: Probabilistic object-level variational slam. InRSS, 2023
work page 2023
-
[34]
Learning transferable visual models from natural lan- guage supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, et al. Learning transferable visual models from natural lan- guage supervision. InInternational conference on ma- chine learning, pages 8748–8763, 2021
work page 2021
-
[35]
Structured interfaces for automated reasoning with 3d scene graphs.PrePrint, 2025
Aaron Ray, Jacob Arkin, Harel Biggie, Chuchu Fan, Luca Carlone, and Nicholas Roy. Structured interfaces for automated reasoning with 3d scene graphs.PrePrint, 2025
work page 2025
- [36]
-
[37]
Lista: Geometric object-based change detection in clut- tered environments
Joseph Rowell, Lintong Zhang, and Maurice Fallon. Lista: Geometric object-based change detection in clut- tered environments. InICRA, pages 3632–3638, 2024
work page 2024
-
[38]
Perpetua: Multi-hypothesis persistence modeling for semi-static environments.IROS, 2025
Miguel Saavedra-Ruiz, Samer Nashed, Charlie Gauthier, and Liam Paull. Perpetua: Multi-hypothesis persistence modeling for semi-static environments.IROS, 2025
work page 2025
-
[39]
Habitat: A platform for embodied ai research
Manolis Savva, Abhishek Kadian, et al. Habitat: A platform for embodied ai research. InICCV, pages 9339– 9347, 2019
work page 2019
-
[40]
Lukas Schmid, Jeffrey Delmerico, Johannes L Sch¨onberger, Juan Nieto, Marc Pollefeys, Roland Siegwart, and Cesar Cadena. Panoptic multi-tsdfs: a flexible representation for online multi-resolution volumetric mapping and long-term dynamic scene consistency. InICRA, pages 8018–8024. IEEE, 2022
work page 2022
-
[41]
Khronos: A unified approach for spatio-temporal metric-semantic slam in dynamic environments
Lukas Schmid, Marcus Abate, Yun Chang, and Luca Car- lone. Khronos: A unified approach for spatio-temporal metric-semantic slam in dynamic environments. InRSS, 2024
work page 2024
-
[42]
Mobile robot mapping and localization in non-static environments
Cyrill Stachniss and Wolfram Burgard. Mobile robot mapping and localization in non-static environments. In AAAI, pages 1324–1329, 2005
work page 2005
-
[43]
Yujie Tang, Meiling Wang, Yinan Deng, Zibo Zheng, Jingchuan Deng, Sibo Zuo, and Yufeng Yue. Openin: Open-vocabulary instance-oriented navigation in dy- namic domestic environments.RA-L, 10(9):9256–9263, 2025
work page 2025
-
[44]
Rio: 3d object instance re-localization in changing indoor environments
Johanna Wald, Armen Avetisyan, Nassir Navab, Federico Tombari, and Matthias Nießner. Rio: 3d object instance re-localization in changing indoor environments. In ICCV, pages 7658–7667, 2019
work page 2019
-
[45]
Dynamic scene generation for embodied navigation benchmark
Chenxu Wang, dong wang, Xinghang Li, Dunzheng Wang, and Huaping Liu. Dynamic scene generation for embodied navigation benchmark. InRSS Workshop: Data Generation for Robotics, 2024
work page 2024
-
[46]
Yanbo Wang, Yaxian Fan, Jingchuan Wang, and Weidong Chen. Long-term navigation for autonomous robots based on spatio-temporal map prediction.Robotics and Autonomous Systems, 179:104724, 2024. ISSN 0921- 8890
work page 2024
-
[47]
Hierarchical open-vocabulary 3d scene graphs for language-grounded robot navigation
Abdelrhman Werby, Chenguang Huang, Martin B ¨uchner, Abhinav Valada, and Wolfram Burgard. Hierarchical open-vocabulary 3d scene graphs for language-grounded robot navigation. InVLMNM Workshop at ICRA, 2024
work page 2024
-
[48]
Embodied-rag: General non-parametric embodied mem- ory for retrieval and generation.PrePrint, 2024
Quanting Xie, So Yeon Min, Pengliang Ji, et al. Embodied-rag: General non-parametric embodied mem- ory for retrieval and generation.PrePrint, 2024
work page 2024
-
[49]
Mingfeng Yuan, Hao Zhang, Mahan Mohammadi, Run- hao Li, Jinjun Shan, and Steven L. Waslander. Star: Scal- able task-conditioned retrieval for long-horizon multi- modal robot memory.IEEE Robotics and Automation Letters, 11(5):5994–6001, 2026. doi: 10.1109/LRA.2026. 3677723
-
[50]
Gaussian mapping for evolving scenes.PrePrint, 2025
Vladimir Yugay, Thies Kersten, Luca Carlone, Theo Gevers, Martin R Oswald, and Lukas Schmid. Gaussian mapping for evolving scenes.PrePrint, 2025
work page 2025
-
[51]
Liyuan Zhu, Shengyu Huang, Konrad Schindler, and Iro Armeni. Living scenes: Multi-object relocalization and reconstruction in changing 3d environments. InCVPR, pages 28014–28024, 2024. APPENDIXA SUPPLEMENTARYDETAILS: PERPETUA ∗ A.1 Perpetua ∗ Algorithm In Algorithm 1, we present the update and prediction routines for a single receptacle in Perpetua ∗, omi...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.