arxiv: 2605.02475 · v2 · submitted 2026-05-04 · 💻 cs.AI · cs.CL

Recognition: 4 theorem links

· Lean Theorem

Shadow-Loom: Causal Reasoning over Graphical World Models of Narratives

David Wilmot

Authors on Pith no claims yet

Pith reviewed 2026-05-08 18:37 UTC · model grok-4.3

classification 💻 cs.AI cs.CL

keywords narrative modelinggraphical world modelscausal reasoningcounterfactual calculuscomputational suspensestory comprehensionPearl ladder of causation

0 comments

The pith

Narratives can be represented as versioned graphical world models that support both causal interventions and structural scoring of mystery, dramatic irony, suspense, and surprise.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Shadow-Loom as a framework that converts a narrative into a versioned graphical world model. One engine applies causal physics based on Pearl's ladder of causation and counterfactual reasoning over Ancestral Multi-World Networks to identify interventions and consequences within the graph. A second engine scores the same graph structure for its effects on four reader states: mystery, dramatic irony, suspense, and surprise. Large language models are confined to boundary tasks such as extraction and rendering, while all identification, intervention, and reasoning occur through typed code over the graph. This setup is offered as an open-source research artefact to enable precise, auditable analysis of story causes and reader engagement.

Core claim

Shadow-Loom turns a narrative into a versioned graphical world model and lets causal physics grounded in Pearl's ladder of causation and a counterfactual calculus over Ancestral Multi-World Networks act on it, alongside narrative physics scoring the graph against mystery, dramatic irony, suspense, and surprise. Large language models are used only at the boundary for extraction, rendering, and audit, while identification, intervention, and counterfactual reasoning are carried out in typed code over the graph.

What carries the argument

The versioned graphical world model, which encodes narrative elements and relations as a structure that directly supports causal interventions and scoring of four reader states.

If this is right

Causal interventions and counterfactual queries can be executed directly on story elements using typed code over the graph.
The same graph structure can be scored for its contribution to mystery, dramatic irony, suspense, and surprise based on explicit structural features.
Large language models are restricted to boundary operations, keeping core reasoning outside of opaque model outputs.
Versioning of the model allows tracking of changes and multiple possible worlds within a single narrative representation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could support automated checking of narrative consistency during story generation tasks.
Similar graph models might extend to interactive formats such as games, where player actions serve as interventions.
The dual engines could be combined with existing computational narratology tools to predict reader attention shifts more precisely.

Load-bearing premise

Narratives can be faithfully and losslessly represented as versioned graphical world models whose structure directly supports both causal interventions and the four reader-state scores without requiring additional unstated mappings or human judgment.

What would settle it

A narrative whose extracted versioned graph either omits essential causal links or yields suspense and surprise scores that diverge from independent human reader judgments on the same story.

Figures

Figures reproduced from arXiv: 2605.02475 by David Wilmot.

**Figure 1.** Figure 1: The Shadow-Loom loop. The LLM is invoked view at source ↗

read the original abstract

Stories hold a reader's attention because they have causes, secrets, and consequences. Shadow-Loom is an experimental open-source framework that turns a narrative into a versioned graphical world model and lets two engines act on it: a causal physics grounded in Pearl's ladder of causation and a recently proposed counterfactual calculus over Ancestral Multi-World Networks; and a narrative physics that scores the same graph against four structural reader-states -- mystery, dramatic irony, suspense, and surprise -- in the tradition of Sternberg's curiosity/suspense/surprise triad, with suspense formalised in the structural-affect line of work on story comprehension and computational suspense. Large language models are used only at the boundary: extraction, rendering, and audit; identification, intervention, and counterfactual reasoning are carried out in typed code over the graph. The system is offered as a research artefact rather than as a benchmarked NLP model; code, fixtures, and pipeline are released open source.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Shadow-Loom gives a clean open-source setup for running Pearl-style causal steps and structural reader-state scores on versioned narrative graphs after LLM extraction, but the description supplies no examples or derivations showing the scores follow mechanically from the graph.

read the letter

The paper's main contribution is an open-source system that extracts a narrative into a versioned graphical world model and then runs two separate engines on it in typed code: one for causal reasoning based on Pearl's ladder and counterfactuals over Ancestral Multi-World Networks, and another that scores the graph for mystery, dramatic irony, suspense, and surprise. It does a good job keeping the reasoning steps out of the LLM and releasing the code and pipeline for others to use. This separation of concerns is practical and makes the approach more inspectable than typical LLM-heavy story systems. The connection to existing work on structural affect and computational suspense is clear. The main limitation is the lack of any concrete examples, derivations, or test results in the description. The claim that the four reader-state scores come directly from the graph structure assumes the extraction captures all relevant causal and knowledge-partition information without extra rules, but nothing shows how that works in practice. The stress-test concern about potential unstated interpretive choices in building the graph holds up based on what's presented. This kind of work is for people doing research on formal models of stories and causal reasoning in AI. It could be useful as a starting point for experiments in narrative understanding. I would send it to peer review because the architecture is well thought out and the open-source aspect makes it worth examining, even if more validation is needed.

Referee Report

3 major / 2 minor

Summary. The paper introduces Shadow-Loom, an experimental open-source framework that extracts a narrative into a versioned graphical world model and applies two engines: causal physics based on Pearl's ladder of causation together with a counterfactual calculus over Ancestral Multi-World Networks, and narrative physics that computes four structural reader-state scores (mystery, dramatic irony, suspense, surprise) from the same graph. LLMs are restricted to boundary tasks (extraction, rendering, audit); all identification, intervention, and scoring occur in typed code. The work is positioned as a research artefact with released code and fixtures rather than a benchmarked model.

Significance. If the lossless mapping and mechanical scoring claims hold, the framework would offer a concrete bridge between causal graphical models and computational narratology, enabling reproducible analysis of reader affect via graph structure alone. The open-source release of code, fixtures, and pipeline is a clear strength that supports reproducibility and further experimentation.

major comments (3)

[Abstract] Abstract and overall description: the central claim that the four reader-state scores (mystery, dramatic irony, suspense, surprise) are computed directly and mechanically from the versioned graph structure via typed code is asserted but not demonstrated; no derivation, pseudocode, or worked example shows how suspense or surprise follows from nodes, versioned states, and edges without auxiliary interpretive rules for granularity or knowledge partitions.
[Abstract] The manuscript supplies no empirical results, validation experiments, or quantitative checks that the graph extraction plus scoring rules produce the claimed causal outputs or reader-state values on any concrete narrative; this leaves the soundness of the separation between LLM boundary and typed core untested.
[Abstract] The counterfactual calculus over Ancestral Multi-World Networks is described as recently proposed and central to the causal physics engine, yet the paper provides neither a self-contained definition nor a reference that would allow verification of how interventions and counterfactuals are implemented over the versioned graphs.

minor comments (2)

[Abstract] The abstract references Sternberg's curiosity/suspense/surprise triad and structural-affect work on suspense but does not cite specific prior computational implementations; adding those references would clarify the claimed lineage.
Notation for versioned states and Ancestral Multi-World Networks should be introduced with a small illustrative diagram or table early in the text to aid readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We agree that the manuscript would benefit from greater explicitness in demonstrating the core claims. Below we respond point-by-point to the major comments and indicate the revisions we will make.

read point-by-point responses

Referee: [Abstract] Abstract and overall description: the central claim that the four reader-state scores (mystery, dramatic irony, suspense, surprise) are computed directly and mechanically from the versioned graph structure via typed code is asserted but not demonstrated; no derivation, pseudocode, or worked example shows how suspense or surprise follows from nodes, versioned states, and edges without auxiliary interpretive rules for granularity or knowledge partitions.

Authors: We agree that the abstract asserts the mechanical nature of the scoring without supplying derivations or examples. The scoring functions are implemented in the released typed code, but the manuscript itself does not contain a self-contained worked example or pseudocode. In the revision we will add a dedicated subsection with pseudocode for the suspense and surprise calculations (derived strictly from path lengths, versioned node states, and explicit knowledge partitions) together with a short narrative worked example that traces the computation step by step from the graph. revision: yes
Referee: [Abstract] The manuscript supplies no empirical results, validation experiments, or quantitative checks that the graph extraction plus scoring rules produce the claimed causal outputs or reader-state values on any concrete narrative; this leaves the soundness of the separation between LLM boundary and typed core untested.

Authors: The manuscript is explicitly positioned as a research artefact with open-source code and fixtures rather than a benchmarked model, which is why no large-scale empirical evaluation was included. We nevertheless accept that a minimal demonstration would strengthen the claim of a clean separation between LLM boundary tasks and the typed core. We will therefore add a short validation section that applies the full pipeline to two concrete short narratives, reporting the extracted graphs, the causal interventions performed in code, and the resulting reader-state scores. revision: yes
Referee: [Abstract] The counterfactual calculus over Ancestral Multi-World Networks is described as recently proposed and central to the causal physics engine, yet the paper provides neither a self-contained definition nor a reference that would allow verification of how interventions and counterfactuals are implemented over the versioned graphs.

Authors: The Ancestral Multi-World Networks framework and its counterfactual calculus originate in our prior work. The current manuscript refers to this work but does not reproduce the definitions or the precise mapping onto versioned graphs. We will revise the causal-physics section to include a concise self-contained summary of the relevant definitions, the intervention and counterfactual operators as implemented over versioned nodes and edges, and the full reference. revision: yes

Circularity Check

0 steps flagged

No circularity: framework applies external causal tools to externally supplied graphs without self-referential reductions

full rationale

The paper presents Shadow-Loom as an open-source code framework that extracts versioned graphical world models from narratives (via boundary LLMs) and then applies Pearl's ladder of causation plus a counterfactual calculus over Ancestral Multi-World Networks, along with asserted structural scores for mystery, dramatic irony, suspense, and surprise. No equations, fitted parameters, or derivations are exhibited that would make any output equivalent to its inputs by construction. The four reader-state scores are described as following from the graph structure in the tradition of prior story-comprehension work, but without any shown reduction to self-definition or fitted inputs. The system is explicitly positioned as typed code acting on supplied graphs rather than a closed derivation, rendering the chain self-contained against external benchmarks and prior independent results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The framework rests on standard causal inference background plus one recently proposed network construct; no free parameters or new entities with independent evidence are described in the abstract.

axioms (1)

standard math Pearl's ladder of causation (association, intervention, counterfactuals)
Invoked to ground the causal physics engine.

invented entities (1)

Ancestral Multi-World Networks no independent evidence
purpose: To support the counterfactual calculus engine
Described as recently proposed and used for counterfactual reasoning over the graph.

pith-pipeline@v0.9.0 · 5451 in / 1439 out tokens · 81489 ms · 2026-05-08T18:37:41.808964+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Cost.FunctionalEquation / Foundation.LogicAsFunctionalEquation washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

a causal physics grounded in Pearl's ladder of causation and the recently proposed counterfactual calculus over Ancestral Multi-World Networks; and a narrative physics that scores the same graph against four structural reader-states - mystery, dramatic irony, suspense, and surprise
Cost.FunctionalEquation Jcost reciprocal symmetry J(x)=J(1/x) unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Propagation Eq. 2: I_bar = sum I_i / max(1, sum w_i); Delta_u = I_bar - sgn(I_bar) iota_u if |I_bar| > iota_u + epsilon, else 0
Foundation.BlackBodyRadiationDeep Jcost_pos_of_ne_one unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Suspense via Beta posterior mu = (1+A)/(2+A+B), expected variance sigma^2_fk; surprise as KL with geometric prior update over corpus marginal
Foundation/Constants ladder (c, hbar, G as phi-powers) reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

scorer constants externalised through DirectiveAssemblySettings (tau_cur=8, tau_iro=6, beta=0.6, mfb, rho_min=0.4, etc.)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

13 extracted references · 4 canonical work pages · 1 internal anchor

[1]

Raphaël Baroni

Does the autistic child have a ’theory of mind’? Cognition, 21(1):37–46. Raphaël Baroni. 2007.La tension narrative: suspense, curiosité, surprise. Éditions du Seuil, Paris. Annaliese Bissell, Emma Paulin, and Andrew Piper

2007
[2]

InProceed- ings of the 7th Workshop on Narrative Understanding (WNU)

A theoretical framework for evaluating narra- tive surprise in large language models. InProceed- ings of the 7th Workshop on Narrative Understanding (WNU). Wayne C. Booth. 1974.A Rhetoric of Irony. University of Chicago Press. David Bordwell. 1985.Narration in the Fiction Film. University of Wisconsin Press. Claude Bremond. 1973.Logique du récit. Seuil. W...

1974
[3]

InAdvances in Neural Information Processing Systems (NeurIPS)

Nested counterfactual identification from arbitrary surrogate experiments. InAdvances in Neural Information Processing Systems (NeurIPS). ArXiv:2107.03190. Logan Cross, Violet Xiang, Agam Bhatia, Daniel L. K. Yamins, and Nick Haber. 2024. Hypothetical minds: Scaffolding theory of mind for multi-agent tasks with large language models. ArXiv:2407.07086. Lub...

work page arXiv 2024
[4]

Ronald Fagin, Joseph Y

Suspense and surprise.Journal of Political Economy, 123(1):215–260. Ronald Fagin, Joseph Y . Halpern, Yoram Moses, and Moshe Y . Vardi. 1995.Reasoning About Knowledge. MIT Press. Angela Fan, Mike Lewis, and Yann Dauphin. 2018. Hierarchical neural story generation. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (A...

1995
[5]

A Survey on LLM-as-a-Judge

Constructing inferences during narrative text comprehension.Psychological Review, 101(3):371– 395. Algirdas Julien Greimas. 1983.Structural Semantics: An Attempt at a Method. University of Nebraska Press. Original workSémantique structuralepub- lished 1966. Jiawei Gu, Xuhui Jiang, Zhichao Shi, Hexiang Tan, Xuehao Zhai, Chengjin Xu, Wei Li, Yinghan Shen, S...

work page Pith review arXiv 1983
[6]

InProceedings of the 2023 Conference on Empirical Methods in Nat- ural Language Processing (EMNLP)

FANToM: A benchmark for stress-testing ma- chine theory of mind in interactions. InProceedings of the 2023 Conference on Empirical Methods in Nat- ural Language Processing (EMNLP). Association for Computational Linguistics. ArXiv:2310.15421. Walter Kintsch and Teun A. van Dijk. 1978. Toward a model of text comprehension and production.Psy- chological Revi...

work page arXiv 2023
[7]

Voyager: An Open-Ended Embodied Agent with Large Language Models

Reinforcement driven information acquisition in non-deterministic environments. InProceedings of the International Conference on Artificial Neural Networks (ICANN’95), volume 2, pages 159–164, Paris. EC2 & CIE. Adam Summerville, Sam Snodgrass, Matthew Guz- dial, Christoffer Holmgård, Amy K. Hoover, Aaron Isaksen, Andy Nealen, and Julian Togelius. 2018. Pr...

work page internal anchor Pith review arXiv 2018
[8]

Five extraction agents over P build a GlobalRegister of WORLD_, ENT_, LOC_, OBJ_ nodes plus a fuzzy alias table
[9]

For each chunk, a Socratic Who/What/Where/When/Why/How scaffold dispatches three concur- rent specialist agents:physics(events, causal/spatial edges),social(relationship edges, channels), consequences(authoritative entity-state deltas)
[10]

Output is normalised, fabula times are remapped to a uniform∆t= 100 spacing, and a programmatic validator + LLM correction loop enforces ID closure and type constraints. A.3 Ego-graph slicing and AMWN sandbox Given a query with focal entitiesFand time anchort, the ego-graph extractor returns GF,t = NEIGHBOURS k(F)∩RECONSTRUCT(·, t) , the k-hop spatial / c...

2025
[11]

Node shadowing.A counterfactual variable Vt is identified by the pair (V,proj(T,An(V) G ¯T )), where the intervention context T is projected onto V ’s ancestors in the mutilated diagram. Two copies of V coming from different worlds arethe same nodewhen their projected contexts agree — this is the AMWN’s mechanism for avoiding the exponentialk-plet blow-up...

2007
[12]

We promote each(src,tgt,metric) triple into a synthetic node REL::src::tgt::metricwith both endpoint entities and the triggering event as parents

Synthetic relationship-metric nodes.Relationship metrics (affinity, fear, power_dynamic) live on edges in social_topology, not as first-class nodes, so mutation_social causal edges would otherwise be invisible to d-separation. We promote each(src,tgt,metric) triple into a synthetic node REL::src::tgt::metricwith both endpoint entities and the triggering e...
[13]

next reveal lands threat-side

Channel and utterance routing.Communication channels are first-class diagram nodes. Each utterance event is wiredspeaker → utterance → channel → addressees, with bidirectional standing capability edgeschannel ↔ participant. A Rule-3 surgery on a channel therefore cuts both the standing capability and every per-utterance content path — exactly the closure ...

2009