pith. machine review for the scientific record. sign in

arxiv: 2605.02475 · v2 · submitted 2026-05-04 · 💻 cs.AI · cs.CL

Recognition: 4 theorem links

· Lean Theorem

Shadow-Loom: Causal Reasoning over Graphical World Models of Narratives

Authors on Pith no claims yet

Pith reviewed 2026-05-08 18:37 UTC · model grok-4.3

classification 💻 cs.AI cs.CL
keywords narrative modelinggraphical world modelscausal reasoningcounterfactual calculuscomputational suspensestory comprehensionPearl ladder of causation
0
0 comments X

The pith

Narratives can be represented as versioned graphical world models that support both causal interventions and structural scoring of mystery, dramatic irony, suspense, and surprise.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Shadow-Loom as a framework that converts a narrative into a versioned graphical world model. One engine applies causal physics based on Pearl's ladder of causation and counterfactual reasoning over Ancestral Multi-World Networks to identify interventions and consequences within the graph. A second engine scores the same graph structure for its effects on four reader states: mystery, dramatic irony, suspense, and surprise. Large language models are confined to boundary tasks such as extraction and rendering, while all identification, intervention, and reasoning occur through typed code over the graph. This setup is offered as an open-source research artefact to enable precise, auditable analysis of story causes and reader engagement.

Core claim

Shadow-Loom turns a narrative into a versioned graphical world model and lets causal physics grounded in Pearl's ladder of causation and a counterfactual calculus over Ancestral Multi-World Networks act on it, alongside narrative physics scoring the graph against mystery, dramatic irony, suspense, and surprise. Large language models are used only at the boundary for extraction, rendering, and audit, while identification, intervention, and counterfactual reasoning are carried out in typed code over the graph.

What carries the argument

The versioned graphical world model, which encodes narrative elements and relations as a structure that directly supports causal interventions and scoring of four reader states.

If this is right

  • Causal interventions and counterfactual queries can be executed directly on story elements using typed code over the graph.
  • The same graph structure can be scored for its contribution to mystery, dramatic irony, suspense, and surprise based on explicit structural features.
  • Large language models are restricted to boundary operations, keeping core reasoning outside of opaque model outputs.
  • Versioning of the model allows tracking of changes and multiple possible worlds within a single narrative representation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could support automated checking of narrative consistency during story generation tasks.
  • Similar graph models might extend to interactive formats such as games, where player actions serve as interventions.
  • The dual engines could be combined with existing computational narratology tools to predict reader attention shifts more precisely.

Load-bearing premise

Narratives can be faithfully and losslessly represented as versioned graphical world models whose structure directly supports both causal interventions and the four reader-state scores without requiring additional unstated mappings or human judgment.

What would settle it

A narrative whose extracted versioned graph either omits essential causal links or yields suspense and surprise scores that diverge from independent human reader judgments on the same story.

Figures

Figures reproduced from arXiv: 2605.02475 by David Wilmot.

Figure 1
Figure 1. Figure 1: The Shadow-Loom loop. The LLM is invoked view at source ↗
read the original abstract

Stories hold a reader's attention because they have causes, secrets, and consequences. Shadow-Loom is an experimental open-source framework that turns a narrative into a versioned graphical world model and lets two engines act on it: a causal physics grounded in Pearl's ladder of causation and a recently proposed counterfactual calculus over Ancestral Multi-World Networks; and a narrative physics that scores the same graph against four structural reader-states -- mystery, dramatic irony, suspense, and surprise -- in the tradition of Sternberg's curiosity/suspense/surprise triad, with suspense formalised in the structural-affect line of work on story comprehension and computational suspense. Large language models are used only at the boundary: extraction, rendering, and audit; identification, intervention, and counterfactual reasoning are carried out in typed code over the graph. The system is offered as a research artefact rather than as a benchmarked NLP model; code, fixtures, and pipeline are released open source.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces Shadow-Loom, an experimental open-source framework that extracts a narrative into a versioned graphical world model and applies two engines: causal physics based on Pearl's ladder of causation together with a counterfactual calculus over Ancestral Multi-World Networks, and narrative physics that computes four structural reader-state scores (mystery, dramatic irony, suspense, surprise) from the same graph. LLMs are restricted to boundary tasks (extraction, rendering, audit); all identification, intervention, and scoring occur in typed code. The work is positioned as a research artefact with released code and fixtures rather than a benchmarked model.

Significance. If the lossless mapping and mechanical scoring claims hold, the framework would offer a concrete bridge between causal graphical models and computational narratology, enabling reproducible analysis of reader affect via graph structure alone. The open-source release of code, fixtures, and pipeline is a clear strength that supports reproducibility and further experimentation.

major comments (3)
  1. [Abstract] Abstract and overall description: the central claim that the four reader-state scores (mystery, dramatic irony, suspense, surprise) are computed directly and mechanically from the versioned graph structure via typed code is asserted but not demonstrated; no derivation, pseudocode, or worked example shows how suspense or surprise follows from nodes, versioned states, and edges without auxiliary interpretive rules for granularity or knowledge partitions.
  2. [Abstract] The manuscript supplies no empirical results, validation experiments, or quantitative checks that the graph extraction plus scoring rules produce the claimed causal outputs or reader-state values on any concrete narrative; this leaves the soundness of the separation between LLM boundary and typed core untested.
  3. [Abstract] The counterfactual calculus over Ancestral Multi-World Networks is described as recently proposed and central to the causal physics engine, yet the paper provides neither a self-contained definition nor a reference that would allow verification of how interventions and counterfactuals are implemented over the versioned graphs.
minor comments (2)
  1. [Abstract] The abstract references Sternberg's curiosity/suspense/surprise triad and structural-affect work on suspense but does not cite specific prior computational implementations; adding those references would clarify the claimed lineage.
  2. Notation for versioned states and Ancestral Multi-World Networks should be introduced with a small illustrative diagram or table early in the text to aid readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We agree that the manuscript would benefit from greater explicitness in demonstrating the core claims. Below we respond point-by-point to the major comments and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [Abstract] Abstract and overall description: the central claim that the four reader-state scores (mystery, dramatic irony, suspense, surprise) are computed directly and mechanically from the versioned graph structure via typed code is asserted but not demonstrated; no derivation, pseudocode, or worked example shows how suspense or surprise follows from nodes, versioned states, and edges without auxiliary interpretive rules for granularity or knowledge partitions.

    Authors: We agree that the abstract asserts the mechanical nature of the scoring without supplying derivations or examples. The scoring functions are implemented in the released typed code, but the manuscript itself does not contain a self-contained worked example or pseudocode. In the revision we will add a dedicated subsection with pseudocode for the suspense and surprise calculations (derived strictly from path lengths, versioned node states, and explicit knowledge partitions) together with a short narrative worked example that traces the computation step by step from the graph. revision: yes

  2. Referee: [Abstract] The manuscript supplies no empirical results, validation experiments, or quantitative checks that the graph extraction plus scoring rules produce the claimed causal outputs or reader-state values on any concrete narrative; this leaves the soundness of the separation between LLM boundary and typed core untested.

    Authors: The manuscript is explicitly positioned as a research artefact with open-source code and fixtures rather than a benchmarked model, which is why no large-scale empirical evaluation was included. We nevertheless accept that a minimal demonstration would strengthen the claim of a clean separation between LLM boundary tasks and the typed core. We will therefore add a short validation section that applies the full pipeline to two concrete short narratives, reporting the extracted graphs, the causal interventions performed in code, and the resulting reader-state scores. revision: yes

  3. Referee: [Abstract] The counterfactual calculus over Ancestral Multi-World Networks is described as recently proposed and central to the causal physics engine, yet the paper provides neither a self-contained definition nor a reference that would allow verification of how interventions and counterfactuals are implemented over the versioned graphs.

    Authors: The Ancestral Multi-World Networks framework and its counterfactual calculus originate in our prior work. The current manuscript refers to this work but does not reproduce the definitions or the precise mapping onto versioned graphs. We will revise the causal-physics section to include a concise self-contained summary of the relevant definitions, the intervention and counterfactual operators as implemented over versioned nodes and edges, and the full reference. revision: yes

Circularity Check

0 steps flagged

No circularity: framework applies external causal tools to externally supplied graphs without self-referential reductions

full rationale

The paper presents Shadow-Loom as an open-source code framework that extracts versioned graphical world models from narratives (via boundary LLMs) and then applies Pearl's ladder of causation plus a counterfactual calculus over Ancestral Multi-World Networks, along with asserted structural scores for mystery, dramatic irony, suspense, and surprise. No equations, fitted parameters, or derivations are exhibited that would make any output equivalent to its inputs by construction. The four reader-state scores are described as following from the graph structure in the tradition of prior story-comprehension work, but without any shown reduction to self-definition or fitted inputs. The system is explicitly positioned as typed code acting on supplied graphs rather than a closed derivation, rendering the chain self-contained against external benchmarks and prior independent results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The framework rests on standard causal inference background plus one recently proposed network construct; no free parameters or new entities with independent evidence are described in the abstract.

axioms (1)
  • standard math Pearl's ladder of causation (association, intervention, counterfactuals)
    Invoked to ground the causal physics engine.
invented entities (1)
  • Ancestral Multi-World Networks no independent evidence
    purpose: To support the counterfactual calculus engine
    Described as recently proposed and used for counterfactual reasoning over the graph.

pith-pipeline@v0.9.0 · 5451 in / 1439 out tokens · 81489 ms · 2026-05-08T18:37:41.808964+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

13 extracted references · 4 canonical work pages · 1 internal anchor

  1. [1]

    Raphaël Baroni

    Does the autistic child have a ’theory of mind’? Cognition, 21(1):37–46. Raphaël Baroni. 2007.La tension narrative: suspense, curiosité, surprise. Éditions du Seuil, Paris. Annaliese Bissell, Emma Paulin, and Andrew Piper

  2. [2]

    InProceed- ings of the 7th Workshop on Narrative Understanding (WNU)

    A theoretical framework for evaluating narra- tive surprise in large language models. InProceed- ings of the 7th Workshop on Narrative Understanding (WNU). Wayne C. Booth. 1974.A Rhetoric of Irony. University of Chicago Press. David Bordwell. 1985.Narration in the Fiction Film. University of Wisconsin Press. Claude Bremond. 1973.Logique du récit. Seuil. W...

  3. [3]

    InAdvances in Neural Information Processing Systems (NeurIPS)

    Nested counterfactual identification from arbitrary surrogate experiments. InAdvances in Neural Information Processing Systems (NeurIPS). ArXiv:2107.03190. Logan Cross, Violet Xiang, Agam Bhatia, Daniel L. K. Yamins, and Nick Haber. 2024. Hypothetical minds: Scaffolding theory of mind for multi-agent tasks with large language models. ArXiv:2407.07086. Lub...

  4. [4]

    Ronald Fagin, Joseph Y

    Suspense and surprise.Journal of Political Economy, 123(1):215–260. Ronald Fagin, Joseph Y . Halpern, Yoram Moses, and Moshe Y . Vardi. 1995.Reasoning About Knowledge. MIT Press. Angela Fan, Mike Lewis, and Yann Dauphin. 2018. Hierarchical neural story generation. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (A...

  5. [5]

    A Survey on LLM-as-a-Judge

    Constructing inferences during narrative text comprehension.Psychological Review, 101(3):371– 395. Algirdas Julien Greimas. 1983.Structural Semantics: An Attempt at a Method. University of Nebraska Press. Original workSémantique structuralepub- lished 1966. Jiawei Gu, Xuhui Jiang, Zhichao Shi, Hexiang Tan, Xuehao Zhai, Chengjin Xu, Wei Li, Yinghan Shen, S...

  6. [6]

    InProceedings of the 2023 Conference on Empirical Methods in Nat- ural Language Processing (EMNLP)

    FANToM: A benchmark for stress-testing ma- chine theory of mind in interactions. InProceedings of the 2023 Conference on Empirical Methods in Nat- ural Language Processing (EMNLP). Association for Computational Linguistics. ArXiv:2310.15421. Walter Kintsch and Teun A. van Dijk. 1978. Toward a model of text comprehension and production.Psy- chological Revi...

  7. [7]

    Voyager: An Open-Ended Embodied Agent with Large Language Models

    Reinforcement driven information acquisition in non-deterministic environments. InProceedings of the International Conference on Artificial Neural Networks (ICANN’95), volume 2, pages 159–164, Paris. EC2 & CIE. Adam Summerville, Sam Snodgrass, Matthew Guz- dial, Christoffer Holmgård, Amy K. Hoover, Aaron Isaksen, Andy Nealen, and Julian Togelius. 2018. Pr...

  8. [8]

    Five extraction agents over P build a GlobalRegister of WORLD_, ENT_, LOC_, OBJ_ nodes plus a fuzzy alias table

  9. [9]

    For each chunk, a Socratic Who/What/Where/When/Why/How scaffold dispatches three concur- rent specialist agents:physics(events, causal/spatial edges),social(relationship edges, channels), consequences(authoritative entity-state deltas)

  10. [10]

    Output is normalised, fabula times are remapped to a uniform∆t= 100 spacing, and a programmatic validator + LLM correction loop enforces ID closure and type constraints. A.3 Ego-graph slicing and AMWN sandbox Given a query with focal entitiesFand time anchort, the ego-graph extractor returns GF,t = NEIGHBOURS k(F)∩RECONSTRUCT(·, t) , the k-hop spatial / c...

  11. [11]

    Node shadowing.A counterfactual variable Vt is identified by the pair (V,proj(T,An(V) G ¯T )), where the intervention context T is projected onto V ’s ancestors in the mutilated diagram. Two copies of V coming from different worlds arethe same nodewhen their projected contexts agree — this is the AMWN’s mechanism for avoiding the exponentialk-plet blow-up...

  12. [12]

    We promote each(src,tgt,metric) triple into a synthetic node REL::src::tgt::metricwith both endpoint entities and the triggering event as parents

    Synthetic relationship-metric nodes.Relationship metrics (affinity, fear, power_dynamic) live on edges in social_topology, not as first-class nodes, so mutation_social causal edges would otherwise be invisible to d-separation. We promote each(src,tgt,metric) triple into a synthetic node REL::src::tgt::metricwith both endpoint entities and the triggering e...

  13. [13]

    next reveal lands threat-side

    Channel and utterance routing.Communication channels are first-class diagram nodes. Each utterance event is wiredspeaker → utterance → channel → addressees, with bidirectional standing capability edgeschannel ↔ participant. A Rule-3 surgery on a channel therefore cuts both the standing capability and every per-utterance content path — exactly the closure ...