Deterministic Event-Graph Substrates as World Models for Counterfactual Reasoning

Fabio Rovai

arxiv: 2605.15967 · v1 · pith:AMI3IKEDnew · submitted 2026-05-15 · 💻 cs.AI · cs.CV· cs.LO

Deterministic Event-Graph Substrates as World Models for Counterfactual Reasoning

Fabio Rovai This is my paper

Pith reviewed 2026-05-20 18:36 UTC · model grok-4.3

classification 💻 cs.AI cs.CVcs.LO

keywords event-graph substratescounterfactual reasoningworld modelscausal dualityCLEVRER benchmarkRDF triplesevent logs

0 comments

The pith

Event-graph substrates model agent states as append-only logs of typed RDF triples to support exact counterfactual reasoning by forking the log.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper formalizes event-graph substrates as world models that store state in an append-only log of typed RDF triples. Counterfactual queries are resolved by forking this log according to a structured intervention vocabulary, making the models fully inspectable and transferable across domains without learned components. A duality is proven between explanatory and counterfactual queries, reducing both to causal-ancestor traversal in the event graph. Evaluations using a CLEVRER interpreter and a new Smallville benchmark show the substrate outperforming symbolic oracles and large language models on accuracy metrics.

Core claim

We formalize event-graph substrates as world models representing agent state as an append-only log of typed RDF triples. These substrates answer counterfactual queries by forking the log under a structured intervention vocabulary. They are inspectable at the triple level and support exact counterfactuals while transferring across domains without learned components. We prove a duality between explanatory and counterfactual queries that reduces both to the same causal-ancestor traversal. Implementation on CLEVRER exceeds the NS-DR symbolic oracle on all question categories, and on the twin-EventLog benchmark exceeds Llama-3.1-8B.

What carries the argument

Event-graph substrate as an append-only log of typed RDF triples, with forking under structured interventions to realize counterfactuals, and the causal-ancestor traversal duality.

If this is right

Substrates provide a unified method for both explanation and counterfactual reasoning through the same traversal mechanism.
Domain transfer is achieved solely through the intervention vocabulary without retraining or new learned modules.
Exact, deterministic counterfactuals become feasible at scale for visual reasoning tasks like those in CLEVRER.
The approach can be implemented with relatively compact interpreters, as shown by the 1,400-line CLEVRER-DSL code.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The substrate runtime could be extended to support online learning by updating the event log with new observations in real time.
Connections to causal discovery algorithms might allow automatic inference of intervention vocabularies from data.
Applications in robotics or autonomous systems could use these substrates for safe planning by simulating interventions on event histories.

Load-bearing premise

The structured intervention vocabulary must be sufficient to express all relevant counterfactuals without requiring additional ad-hoc rules or learned components.

What would settle it

Demonstrating a specific counterfactual query in a new domain that cannot be expressed using the existing intervention vocabulary, causing the substrate to produce incorrect or incomplete answers compared to ground truth.

Figures

Figures reproduced from arXiv: 2605.15967 by Fabio Rovai.

**Figure 2.** Figure 2: Cross-domain transfer. The same substrate [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Twin-EventLog evaluation. A shared event log [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

read the original abstract

We study event-graph substrates: a class of world models that represent agent state as an append-only log of typed RDF triples and answer counterfactual queries by forking the log under a structured intervention vocabulary. Substrates are inspectable at the triple level, support exact counterfactuals, and transfer across domains without learned components. We formalize the class, prove a duality between explanatory and counterfactual queries that reduces both to the same causal-ancestor traversal, and evaluate a 1,400-line CLEVRER-DSL interpreter atop a domain-agnostic substrate runtime at full CLEVRER validation scale (n=75,618). The substrate exceeds the NS-DR symbolic oracle on all four per-question categories (by 9.89, 20.26, 17.65, and 0.80 percentage points), and exceeds the parametric ALOE baseline on descriptive and explanatory while lagging on predictive and counterfactual. We also introduce twin-EventLog, a 500-specification Park-canonical Smallville counterfactual benchmark on which the substrate exceeds Llama-3.1-8B with full context by 18.80 points joint accuracy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Event-graph substrates give a deterministic RDF-log approach to exact counterfactuals with a duality proof and real gains on CLEVRER plus a new Smallville benchmark, but the domain-transfer claim hinges on how general the intervention vocabulary actually is.

read the letter

The one or two things to know: this paper formalizes event-graph substrates as a way to build world models from append-only logs of typed RDF triples, with log-forking under a structured intervention vocabulary to get exact counterfactuals. It proves a duality that collapses explanatory and counterfactual queries into causal-ancestor traversal and backs it with evaluations on CLEVRER at scale plus a new benchmark. The new parts are the class formalization, the duality proof, and the twin-EventLog benchmark with 500 specifications for Park-canonical Smallville counterfactuals. What it does well is show consistent improvements over the NS-DR symbolic oracle on all four CLEVRER question categories at n=75,618, with the substrate running on a domain-agnostic runtime. The gains are real percentage points, and it also beats Llama-3.1-8B on the new benchmark by nearly 19 points joint accuracy. The inspectability at the triple level and lack of learned components in the substrate itself are clear advantages for applications needing determinism. The soft spots are around the intervention vocabulary and transfer claim. The paper assumes this fixed vocabulary plus the runtime is enough to express all relevant counterfactuals without extra ad-hoc rules or learned pieces. But the 1,400-line CLEVRER-DSL interpreter and the benchmark specs involve defining event types and schemas for those specific domains. If that counts as minimal instantiation of the general vocabulary, it works; otherwise the transfer story depends on repeated engineering effort rather than true domain independence. The duality proof relies on the log already having the right causal ancestors, so the same precondition applies. Also, the 0.80 point edge on one category is minor, and it lags the parametric ALOE baseline on predictive and counterfactual questions. This is for people in AI who care about symbolic world models, counterfactual reasoning, and explainable planning. Readers who want deterministic, inspectable alternatives to learned models will get the most out of the formal setup and the large-scale results. The work has enough structure and evidence to merit a serious referee. I'd send it to peer review.

Referee Report

2 major / 2 minor

Summary. The paper introduces event-graph substrates as deterministic world models that represent agent state via append-only logs of typed RDF triples and support counterfactual reasoning by forking the log under a fixed structured intervention vocabulary. It formalizes the class, proves a duality reducing both explanatory and counterfactual queries to causal-ancestor traversal, and reports empirical results from a 1,400-line CLEVRER-DSL interpreter evaluated at full CLEVRER validation scale (n=75,618) plus a new 500-specification twin-EventLog benchmark on Park-canonical Smallville scenarios.

Significance. If the duality holds and the substrate remains domain-agnostic, the work supplies an inspectable, exact, and transferable alternative to learned world models for counterfactual reasoning. The parameter-free duality proof, the large-scale CLEVRER evaluation showing consistent gains over the NS-DR symbolic oracle, and the introduction of the twin-EventLog benchmark are concrete strengths that could influence research on causal world models in AI.

major comments (2)

[Formalization and Duality sections] The central transfer claim ('transfer across domains without learned components') is load-bearing for both the duality and the empirical conclusions, yet the manuscript does not demonstrate that the structured intervention vocabulary plus domain-agnostic runtime suffices without embedding substantial domain logic. The 1,400-line CLEVRER-DSL interpreter and the 500-specification twin-EventLog both define typed events, RDF schemas, and intervention operators specific to those environments; it is unclear whether these are mere instantiations of a small general vocabulary or ad-hoc per-domain engineering. This precondition must be addressed explicitly in the formalization section before the duality reduction to causal-ancestor traversal can be accepted as general.
[Evaluation section, CLEVRER results table] Table reporting CLEVRER per-question accuracy (the four percentage-point gains of 9.89, 20.26, 17.65, and 0.80) does not include statistical significance, standard errors, or variance across random seeds. With n=75,618 the raw point improvements are large enough to be interesting, but without these details it is impossible to judge whether they reliably support the claim that the substrate exceeds the NS-DR oracle on all categories.

minor comments (2)

[Abstract] The abstract states that the substrate 'exceeds the parametric ALOE baseline on descriptive and explanatory while lagging on predictive and counterfactual' but omits the exact scores; adding the four numbers would improve readability.
[Formalization section] Notation for the intervention vocabulary and the fork operation should be introduced with a small running example early in the formalization to make the causal-ancestor traversal concrete for readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The comments on clarifying the domain-agnostic character of the substrate and on providing statistical details for the empirical results are well taken. We respond to each major comment below and describe the corresponding revisions.

read point-by-point responses

Referee: [Formalization and Duality sections] The central transfer claim ('transfer across domains without learned components') is load-bearing for both the duality and the empirical conclusions, yet the manuscript does not demonstrate that the structured intervention vocabulary plus domain-agnostic runtime suffices without embedding substantial domain logic. The 1,400-line CLEVRER-DSL interpreter and the 500-specification twin-EventLog both define typed events, RDF schemas, and intervention operators specific to those environments; it is unclear whether these are mere instantiations of a small general vocabulary or ad-hoc per-domain engineering. This precondition must be addressed explicitly in the formalization section before the duality reduction to causal-ancestor traversal can be accepted as general.

Authors: We agree that an explicit separation between the general substrate runtime and domain-specific vocabulary instantiations is necessary to substantiate the transfer claim. In the revised manuscript we have added a new subsection 'General Substrate Operations and Vocabulary Instantiation' to the Formalization section. This subsection first defines the domain-independent substrate primitives (append-only typed RDF triple logs, deterministic fork under a fixed intervention vocabulary, and causal-ancestor traversal) and enumerates a minimal reusable operator set consisting of attribute update, relation insertion, event forking, and log truncation. We then demonstrate that both the CLEVRER-DSL interpreter and the twin-EventLog benchmark are constructed by instantiating this same operator set together with environment-specific event schemas and simulation rules; the 1,400 lines of CLEVRER code are devoted almost entirely to parsing the video-derived event stream and to executing the domain dynamics, not to extending the substrate itself. The same pattern holds for the Smallville benchmark. With this clarification the duality reduction to causal-ancestor traversal is shown to apply at the level of the general substrate. revision: yes
Referee: [Evaluation section, CLEVRER results table] Table reporting CLEVRER per-question accuracy (the four percentage-point gains of 9.89, 20.26, 17.65, and 0.80) does not include statistical significance, standard errors, or variance across random seeds. With n=75,618 the raw point improvements are large enough to be interesting, but without these details it is impossible to judge whether they reliably support the claim that the substrate exceeds the NS-DR oracle on all categories.

Authors: We acknowledge the omission. Because the event-graph substrate is fully deterministic, the reported accuracies are exact values on the complete validation set of 75,618 examples and exhibit no variance across random seeds. In the revised Evaluation section we now report bootstrap confidence intervals obtained from 1,000 resamples of the validation set for each per-question accuracy and for each difference versus the NS-DR baseline. All four improvements remain statistically significant (p < 0.001). We have also inserted a short paragraph explaining the deterministic character of the model and why conventional seed-based standard errors do not apply. revision: yes

Circularity Check

0 steps flagged

Formal proof and domain-agnostic runtime evaluation show no reduction to inputs by construction

full rationale

The paper formalizes event-graph substrates as append-only RDF triple logs and proves a duality reducing explanatory and counterfactual queries to causal-ancestor traversal. This is presented as a mathematical result rather than a fitted parameter or self-referential definition. Evaluation uses a CLEVRER-DSL interpreter and twin-EventLog benchmark against external baselines (NS-DR oracle, ALOE, Llama-3.1-8B), with the runtime claimed domain-agnostic. No self-citation chains, ansatz smuggling, or renaming of known results appear as load-bearing steps. The intervention vocabulary sufficiency is an assumption but does not create a circular derivation where outputs equal inputs by construction. The work is self-contained against the stated benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the existence of a complete, structured intervention vocabulary that covers all counterfactuals of interest and on the assumption that causal-ancestor traversal is sufficient for both explanatory and counterfactual queries. No free parameters or invented physical entities are mentioned in the abstract.

axioms (2)

domain assumption A complete structured intervention vocabulary exists that can express all relevant counterfactuals without additional rules.
Invoked when the paper claims substrates transfer across domains without learned components.
domain assumption Causal-ancestor traversal on the event log is sufficient to answer both explanatory and counterfactual queries.
Stated in the duality proof claim.

invented entities (1)

Event-graph substrate no independent evidence
purpose: Deterministic world model using append-only RDF triple logs and log forking for exact counterfactuals.
The paper introduces this class as the core contribution.

pith-pipeline@v0.9.0 · 5728 in / 1618 out tokens · 52947 ms · 2026-05-20T18:36:01.907818+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/ArithmeticFromLogic.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We prove a duality between explanatory queries ... and counterfactual queries ... both are answered by the same causal-ancestor traversal.
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The substrate ... transfer across domains without learned components.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages · 1 internal anchor

[1]

ICLR , year=

CLEVRER: Collision Events for Video Representation and Reasoning , author=. ICLR , year=

work page
[2]

ICLR , year=

ComPhy: Compositional Physical Reasoning of Objects and Events from Videos , author=. ICLR , year=

work page
[3]

arXiv:2408.02687 , year=

Compositional Physical Reasoning of Objects and Events from Videos , author=. arXiv:2408.02687 , year=

work page arXiv
[4]

CVPR , year=

GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering , author=. CVPR , year=

work page
[5]

Generative Agents: Interactive Simulacra of Human Behavior , author=. Proc. 36th Annual ACM Symposium on User Interface Software and Technology (UIST) , year=

work page
[6]

2023 , eprint=

Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia , author=. 2023 , eprint=

work page 2023
[7]

Mastering Diverse Domains through World Models

Mastering Diverse Domains through World Models , author=. arXiv:2301.04104 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[8]

Assran, Mahmoud and Bardes, Adrien and others , journal=. V-

work page
[9]

NeurIPS , year=

Attention over Learned Object Embeddings Enables Complex Visual Reasoning , author=. NeurIPS , year=

work page
[10]

AAAI , year=

Probabilistic Evaluation of Counterfactual Queries , author=. AAAI , year=

work page
[11]

NeurIPS , year=

Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding , author=. NeurIPS , year=

work page
[12]

ICLR , year=

The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences from Natural Supervision , author=. ICLR , year=

work page
[13]

IJCAI , year=

Anytime Bottom-Up Rule Learning for Knowledge Graph Completion , author=. IJCAI , year=

work page
[14]

Yao, Shunyu and Zhao, Jeffrey and Yu, Dian and Du, Nan and Shafran, Izhak and Narasimhan, Karthik and Cao, Yuan , booktitle=

work page
[15]

2023 , eprint=

Voyager: An Open-Ended Embodied Agent with Large Language Models , author=. 2023 , eprint=

work page 2023
[16]

ICLR , year=

Contrastive Learning of Structured World Models , author=. ICLR , year=

work page
[17]

CVPR , year=

CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , author=. CVPR , year=

work page
[18]

NeurIPS , year=

End-To-End Memory Networks , author=. NeurIPS , year=

work page
[19]

2020 , howpublished=

Oxigraph: A graph database implementing the SPARQL standard and the RDF data model , author=. 2020 , howpublished=

work page 2020
[20]

Causality: Models, Reasoning, and Inference , author=

work page

[1] [1]

ICLR , year=

CLEVRER: Collision Events for Video Representation and Reasoning , author=. ICLR , year=

work page

[2] [2]

ICLR , year=

ComPhy: Compositional Physical Reasoning of Objects and Events from Videos , author=. ICLR , year=

work page

[3] [3]

arXiv:2408.02687 , year=

Compositional Physical Reasoning of Objects and Events from Videos , author=. arXiv:2408.02687 , year=

work page arXiv

[4] [4]

CVPR , year=

GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering , author=. CVPR , year=

work page

[5] [5]

Generative Agents: Interactive Simulacra of Human Behavior , author=. Proc. 36th Annual ACM Symposium on User Interface Software and Technology (UIST) , year=

work page

[6] [6]

2023 , eprint=

Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia , author=. 2023 , eprint=

work page 2023

[7] [7]

Mastering Diverse Domains through World Models

Mastering Diverse Domains through World Models , author=. arXiv:2301.04104 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[8] [8]

Assran, Mahmoud and Bardes, Adrien and others , journal=. V-

work page

[9] [9]

NeurIPS , year=

Attention over Learned Object Embeddings Enables Complex Visual Reasoning , author=. NeurIPS , year=

work page

[10] [10]

AAAI , year=

Probabilistic Evaluation of Counterfactual Queries , author=. AAAI , year=

work page

[11] [11]

NeurIPS , year=

Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding , author=. NeurIPS , year=

work page

[12] [12]

ICLR , year=

The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences from Natural Supervision , author=. ICLR , year=

work page

[13] [13]

IJCAI , year=

Anytime Bottom-Up Rule Learning for Knowledge Graph Completion , author=. IJCAI , year=

work page

[14] [14]

Yao, Shunyu and Zhao, Jeffrey and Yu, Dian and Du, Nan and Shafran, Izhak and Narasimhan, Karthik and Cao, Yuan , booktitle=

work page

[15] [15]

2023 , eprint=

Voyager: An Open-Ended Embodied Agent with Large Language Models , author=. 2023 , eprint=

work page 2023

[16] [16]

ICLR , year=

Contrastive Learning of Structured World Models , author=. ICLR , year=

work page

[17] [17]

CVPR , year=

CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , author=. CVPR , year=

work page

[18] [18]

NeurIPS , year=

End-To-End Memory Networks , author=. NeurIPS , year=

work page

[19] [19]

2020 , howpublished=

Oxigraph: A graph database implementing the SPARQL standard and the RDF data model , author=. 2020 , howpublished=

work page 2020

[20] [20]

Causality: Models, Reasoning, and Inference , author=

work page