arxiv: 2604.21253 · v1 · submitted 2026-04-23 · 💻 cs.CL · cs.AI

Planning Beyond Text: Graph-based Reasoning for Complex Narrative Generation

Hanwen Gu , Chao Guo , Junle Wang , Wenda Xie , Yisheng Lv This is my paper

Pith reviewed 2026-05-09 22:08 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords narrative generationgraph-based planningevent graphscharacter graphsLLM reasoningnarrative coherencestory planningcausality enforcement

0 comments

The pith

Planning narratives on event and character graphs rather than raw text helps LLMs sustain global coherence and causality over long stories.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PLOTTER, a framework that shifts narrative planning from sequential text to structural graphs of events and characters. It runs an Evaluate-Plan-Revise cycle that diagnoses and repairs problems in graph topology under logical constraints before any full story text is generated. This targets common failures in LLM narratives such as structural fractures, inconsistent causality, and flat character arcs. A sympathetic reader would care because direct text generation often loses track of overall plot logic across extended outputs, while upfront graph planning aims to lock in the skeleton first.

Core claim

PLOTTER executes the Evaluate-Plan-Revise cycle on the event graph and character graph. By diagnosing and repairing issues of the graph topology under rigorous logical constraints, the model optimizes the causality and narrative skeleton before complete context generation. Experiments demonstrate that PLOTTER significantly outperforms representative baselines across diverse narrative scenarios.

What carries the argument

PLOTTER framework that performs an Evaluate-Plan-Revise cycle on an event graph and a character graph to enforce narrative causality and coherence prior to text generation.

If this is right

LLMs achieve stronger long-context reasoning by first optimizing narrative structure at the graph level.
Structural fractures and logical breaks can be identified and fixed before any full story text is produced.
Character development and causal chains remain consistent because revisions operate on explicit graph relations rather than implicit text patterns.
The approach generalizes across different narrative scenarios without requiring changes to the underlying language model.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar graph-first planning could extend to other long-sequence generation tasks such as multi-step reasoning chains or procedural instructions.
Interactive tools could let users edit the event or character graph directly and then regenerate text from the revised structure.
The method might lower the cost of post-editing by catching coherence failures early rather than after full text is written.
Scaling the graphs to very long narratives could test whether explicit topology maintenance remains feasible as story length grows.

Load-bearing premise

Event and character graphs can be constructed and revised accurately enough to capture and enforce global narrative causality and coherence under the model's logical constraints.

What would settle it

Running the same complex narrative tasks with direct text planning versus graph-based planning and finding no measurable gain in coherence or causality metrics, or observing that the model's graph revisions frequently introduce new inconsistencies not present in the original text plan.

Figures

Figures reproduced from arXiv: 2604.21253 by Chao Guo, Hanwen Gu, Junle Wang, Wenda Xie, Yisheng Lv.

**Figure 1.** Figure 1: Overview of the PLOTTER framework. (1) Graph-Based Script Planning initializes the narrative backbone comprising Event (Ge) and Character (Gc) graphs. (2) Iterative Graph Refinement employs a MultiAgent Critic (C) to diagnose structural issues, which are resolved by a Constrained Graph Editor (R) to produce an optimized graph (G∗ ). (3) Graph-Grounded Script Synthesis serializes the graph via Event Serial… view at source ↗

**Figure 2.** Figure 2: Module Necessity Analysis. Each cell shows the win rate (%) of the full model against the corresponding ablated variant. Every reasoning agent within the narrative graph is critical for the holistic quality. The “1+ 1 > 2” Synergy Effect. To verify intermodule coordination, we compared the Full Module against single-agent variants (Character, Plot, or Theme) in Stage 2 by measuring their respective win … view at source ↗

**Figure 3.** Figure 3: Synergy Effect Analysis. The incremental gains from individual modules sum to significantly less than the total performance improvement, indicating the synergy effect of modules during Stage 2 narrative graph refinement. Sensitivity Analysis of Iteration Count K. Table 3 reports how the maximum number of refinement iterations affects quality and generation scale (GPT-4.1 as backbone). Diversity (Distinct… view at source ↗

**Figure 4.** Figure 4: Event Graph Evolution (simplified view). (a) Initial graph G (0) e with Critic diagnoses: Discontinuity (E2→E3 lacks transition), Arc-Abrupt (Elena shifts abruptly), No-Turning-Point (no reversal before E4). (b) Refined graph G∗ e with Editor operations: Add-Plot-Bridge (E2a inserted), Insert-Twist (E3a airstrike), Add-Suspense (E1→E3), Add-Foreshadow (E2→E5). Dashed red borders indicate newly inserted nod… view at source ↗

**Figure 5.** Figure 5: Context-Aware Diagnosis and Retrieval. (A) Macro-Level Topology: The MULTI AGENT CRITIC identifies a logical break between the defeat in E7 and the confidence in E8, flagging a Motivation Gap (see Issue Types in [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: From Single Node to Causal Chain. (A) Structural Diagnosis: A simple single-node insertion (Ins-A) fails because it only addresses internal feelings. (B) The "Trinity" of Action: The Full Module generates a multi-hop causal chain by iteratively applying Add-Plot-Bridge operations: Motivation (Ins-A) → Trust (Ins-B) → Planning (Ins-C). This ensures the victory in E8 is logically earned [PITH_FULL_IMAGE:fig… view at source ↗

read the original abstract

While LLMs demonstrate remarkable fluency in narrative generation, existing methods struggle to maintain global narrative coherence, contextual logical consistency, and smooth character development, often producing monotonous scripts with structural fractures. To this end, we introduce PLOTTER, a framework that performs narrative planning on structural graph representations instead of the direct sequential text representations used in existing work. Specifically, PLOTTER executes the Evaluate-Plan-Revise cycle on the event graph and character graph. By diagnosing and repairing issues of the graph topology under rigorous logical constraints, the model optimizes the causality and narrative skeleton before complete context generation. Experiments demonstrate that PLOTTER significantly outperforms representative baselines across diverse narrative scenarios. These findings verify that planning narratives on structural graph representations-rather than directly on text-is crucial to enhance the long context reasoning of LLMs in complex narrative generation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PLOTTER moves narrative planning to event and character graphs with a revise cycle, but the abstract gives no metrics or details to show the gains.

read the letter

PLOTTER is a framework that plans narratives using event and character graphs with an Evaluate-Plan-Revise cycle to fix topology and causality before generating text. The main claim is that this structural approach improves long-context reasoning over direct text methods. The new element is this particular cycle applied to dual graphs for stories. It combines existing techniques in a way that targets the coherence problems in LLM outputs. The paper does well at pointing out the weaknesses in current narrative generation, such as monotonous scripts and structural fractures, and suggesting a graph-based fix. The soft spots center on the results. It says the method significantly outperforms baselines but provides no metrics or details on how that was measured. This leaves the actual gains unclear until the experiments are reviewed. The assumption about graphs accurately enforcing global constraints is the key one, and it holds up conceptually but requires proof that the revisions work as intended without side effects. This paper is for people in AI narrative generation and planning systems. Readers dealing with creative applications would find value in the approach if the data supports it. It deserves a serious referee to check the full methods and results. I would recommend putting it through peer review.

Referee Report

2 major / 2 minor

Summary. The paper introduces PLOTTER, a framework for complex narrative generation with LLMs that performs planning on structural graph representations (an event graph and a character graph) rather than direct sequential text. It executes an Evaluate-Plan-Revise cycle to diagnose and repair graph topology issues under logical constraints, optimizing causality and the narrative skeleton before full text generation. The authors claim that this approach significantly outperforms representative baselines across diverse narrative scenarios and verify that graph-based planning is crucial for enhancing long-context reasoning in LLMs.

Significance. If the results hold, the work offers a plausible engineering extension of structured planning methods to address known limitations in LLM narrative coherence and causality. Shifting to explicit graph representations for global constraints could improve robustness in long-form generation, and the Evaluate-Plan-Revise loop on topology provides a concrete mechanism that might be reusable. No parameter-free derivations or machine-checked proofs are present, but the framework's emphasis on pre-text revision is a clear contribution if empirically supported.

major comments (2)

[Abstract and Experiments] Abstract and Experiments section: the claim of significant outperformance across scenarios is asserted without any reported metrics, baseline descriptions, statistical tests, or error analysis. This directly undermines evaluation of the central claim that graph-based planning enhances long-context reasoning, as no quantitative evidence is visible to support the superiority.
[Framework description (likely §3)] Framework description (likely §3): the assumption that LLM-driven revision of event and character graphs can reliably enforce global narrative causality without introducing new inconsistencies is load-bearing but lacks concrete details on graph initialization, the exact logical constraints applied, or validation that revisions preserve coherence. This makes the weakest assumption difficult to assess from the provided description.

minor comments (2)

[Introduction and Methods] The paper introduces several new terms (PLOTTER, event graph, character graph) without a dedicated notation or definition table; adding one would improve clarity for readers.
[Figures] Figure captions and graph examples, if present, should explicitly show before/after revision states to illustrate the Evaluate-Plan-Revise cycle.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and insightful comments. We address each major comment point by point below, agreeing that clarifications and additions are warranted to better support our claims. We will revise the manuscript to incorporate these improvements.

read point-by-point responses

Referee: [Abstract and Experiments] Abstract and Experiments section: the claim of significant outperformance across scenarios is asserted without any reported metrics, baseline descriptions, statistical tests, or error analysis. This directly undermines evaluation of the central claim that graph-based planning enhances long-context reasoning, as no quantitative evidence is visible to support the superiority.

Authors: The referee correctly notes that the abstract lacks specific metrics. Although the Experiments section presents comparative results against baselines and demonstrates outperformance in narrative coherence and causality, we recognize the need for more explicit reporting. In the revision, we will update the abstract to include key quantitative findings (such as average improvements in human-rated coherence and automatic metrics like entity consistency scores) and enhance the Experiments section with detailed baseline descriptions, statistical significance tests, and error analysis to provide robust evidence for the advantages of graph-based planning. revision: yes
Referee: [Framework description (likely §3)] Framework description (likely §3): the assumption that LLM-driven revision of event and character graphs can reliably enforce global narrative causality without introducing new inconsistencies is load-bearing but lacks concrete details on graph initialization, the exact logical constraints applied, or validation that revisions preserve coherence. This makes the weakest assumption difficult to assess from the provided description.

Authors: We agree that the framework description would benefit from greater specificity to allow readers to evaluate the reliability of the graph revision process. We will expand the relevant section to detail the graph initialization procedure (starting from LLM-extracted events and characters), the precise logical constraints enforced (including causality via directed acyclic graphs, character attribute consistency, and event ordering), and the validation mechanisms (such as iterative consistency checks within the Evaluate-Plan-Revise loop and empirical verification through ablation studies showing no introduced inconsistencies). This will strengthen the assessment of our core assumption. revision: yes

Circularity Check

0 steps flagged

No significant circularity; framework is an independent engineering contribution

full rationale

The paper introduces PLOTTER as a new framework performing Evaluate-Plan-Revise cycles on event and character graphs for narrative planning, with claims supported by experimental outperformance over baselines. No equations, fitted parameters, or derivations are present that reduce results to inputs by construction. No self-citation chains or uniqueness theorems are invoked as load-bearing premises. The approach is presented as a self-contained methodological extension verified externally via benchmarks, satisfying the criteria for a non-circular finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 3 invented entities

The central claim depends on the unproven premise that graph topology can be diagnosed and repaired to capture narrative logic, plus the assumption that LLMs can execute this cycle effectively.

axioms (1)

domain assumption Structural graphs can faithfully represent narrative causality and character development
Invoked when the framework diagnoses and repairs graph issues before text generation.

invented entities (3)

PLOTTER framework no independent evidence
purpose: Narrative planning via graph representations
Newly introduced system for the Evaluate-Plan-Revise cycle.
event graph no independent evidence
purpose: Represent events and causal relations
Structural representation introduced to enable topology diagnosis.
character graph no independent evidence
purpose: Represent character relations and development
Structural representation introduced to enable topology diagnosis.

pith-pipeline@v0.9.0 · 5441 in / 1174 out tokens · 34878 ms · 2026-05-09T22:08:37.718627+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 2 canonical work pages · 1 internal anchor

[1]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Deepseek-r1: Incentivizing reasoning capa- bility in llms via reinforcement learning.Preprint, arXiv:2501.12948. Lajos Egri. 2007. The art of dramatic writing. Angela Fan, Mike Lewis, and Yann Dauphin. 2018. Hierarchical neural story generation. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Paper...

work page internal anchor Pith review Pith/arXiv arXiv 2007
[2]

the art of fiction

R2: A llm based novel-to-screenplay gen- eration framework with causal plot graphs.ArXiv, abs/2503.15655. Yan Ma, Yu Qiao, and Pengfei Liu. 2024. MoPS: Modu- lar story premise synthesis for open-ended automatic story generation. InProceedings of the 62nd Annual Meeting of the Association for Computational Lin- guistics (Volume 1: Long Papers), pages 2135–...

work page arXiv 2024
[3]

Shadow in the Fire

Summary: A war correspondent exposes a military cover-up and is hunted as a traitor by the government. Generated Title: "Shadow in the Fire"
[4]

The Vanishing Hour

Summary: A scientist discovers time travel but realizes using it will erase his daughter from history. Generated Title: "The Vanishing Hour"
[5]

The Last Dream

Summary: In a dystopian city where dreaming is forbidden, a woman risks everything to protect the last dreamer. Generated Title: "The Last Dream" Script Summary: {storyline} Return strictly in JSON format: {"title": "Generated Title"} Table 12: Prompt for generating a concise and distinctive title for a script. Plot and Character Graph Creation Prompt: sc...
[6]

First Rising Actions (2–3)
[7]

First Falling Action (5)
[8]

Second Rising Actions (6–7)
[9]

Second Falling Action (9)
[10]

plot_graph

Ending (10) - The ’Beginning’ must be the first event node, and the ’Ending’ must be the last. - Each event must include the following narrative attributes: - {narrative_stage}: One of the following: - Beginning: Introduces time, setting, and main characters; shows initial status quo and a triggering problem. Only the first node. - Rising Action: Characte...
[11]

Unclear Theme or Storyline

Theme Missing Analysis: You are a script theme structure analyst. Please evaluate the overall event graph to identify whether there is an issue of"Unclear Theme or Storyline", and provide a suggestion to strengthen the thematic coherence by modifying the content of existing event nodes. Problem Definition: Unclear Theme or Storyline: The script lacks a cl...
[12]

Overly Explicit Thematic Expression

Theme Explicitness Analysis: You are an expert in optimizing thematic expression in screenwriting. Please analyze the following event graph to determine whether there is an issue of"Overly Explicit Thematic Expression", and provide a structural optimization by inserting a narrative buildup node and adjusting causal relations to express the theme more impl...
[13]

Lack of Internal Motivation and Setup in Character Development

Character Drive Analysis: You are an expert in analyzing character psychological motivation. Please examine the following event graph to determine whether there exists an issue of"Lack of Internal Motivation and Setup in Character Development", and suggest a structural optimization by inserting a motivation-building event node and reconstructing the causa...
[14]

One-Dimensional Characterization

Character Flatness Analysis: You are a character complexity design expert. Please analyze the following event graph to identify whether there is an issue of"One-Dimensional Characterization", and suggest a structural optimization by inserting a fluctuation node and adjusting the causal event structure. Problem Definition: One-Dimensional Characterization:...
[15]

Abrupt Character Arc Shift

Character Arc Analysis: You are a narrative pacing expert specialized in character arc development. Please analyze the following event graph to identify whether there is an issue of"Abrupt Character Arc Shift", and provide a structural optimization suggestion by inserting a mediating node into an existing causal chain to better support the psychological t...
[16]

Incoherent Plot Progression

Plot Incoherence Analysis: You are a narrative progression structure optimization expert. Please analyze the following event graph to identify whether there is an issue of"Incoherent Plot Progression", and propose a structural optimization by inserting a progression node to improve narrative continuity. Problem Definition: Incoherent Plot Progression: Adj...
[17]

Lack of Suspense

Missing Suspense Analysis: You are a suspense design and narrative pacing expert. Please analyze the following event graph to determine whether it contains an issue of"Lack of Suspense", and propose a structural optimization by inserting a suspense node to establish a cross-phase tension chain. Problem Definition: Lack of Suspense: The plot reveals too mu...
[18]

Lack of Foreshadowing

Lack of Foreshadowing Analysis: You are a narrative structure optimization expert. Please analyze the following event graph to determine whether there is an issue of"Lack of Foreshadowing", and provide a structural enhancement suggestion by embedding symbolic behaviors or implicit references in earlier events. Problem Definition: Lack of Foreshadowing: Th...
[19]

Lack of Plot Reversal

Plot Turning Point Analysis: You are a narrative rhythm and dramatic structure expert. Please analyze the following event graph to identify whether there is an issue of"Lack of Plot Reversal", and propose a structural optimization by inserting a reversal node to enhance dramatic variation
[20]

Contradictions in Character or Plot Relationships

Relation Conflict Analysis: You are an expert in analyzing logical consistency in script relationships. Please assess the overall event graph to identify whether there is an issue of"Contradictions in Character or Plot Relationships", and provide a suggestion for improving logical coherence. Table 20: Prompts used by the Plot Agent for analyzing plot-rela...
[22]

Ignore superficial factors: do not let length, formatting style, or surface polish bias your judgment
[24]

A" if Script A demonstrates clearly superior performance on this dimension. •Choose

Comparative assessment: compare the two scripts directly on the given dimension. Decision Criteria: •Choose "A" if Script A demonstrates clearly superior performance on this dimension. •Choose "B" if Script B demonstrates clearly superior performance on this dimension. •Choose "Same" if both scripts are approximately equal in quality, or if neither shows ...
[26]

explanation

Output your verdict in strict JSON format as specified below. Required JSON Format: { "explanation": "your explanation of which script is better and why", "verdict": "A" or "B" or "Same" } Table 22: Prompt template for pairwise comparison of storylines (beats only) between two scripts. The dimension placeholder is replaced with one of the five evaluation ...
[27]

Avoid position biases: the order of presentation should not influence your decision
[28]

Ignore superficial factors: do not let length, formatting, or surface polish affect your judgment
[29]

Focus on content quality: base your reasoning strictly on the narrative quality under the specified dimension
[30]

A" if Script A demonstrates clearly superior performance on this dimension. •Choose

Comparative assessment: compare the two scripts directly on the given dimension. Decision Criteria: •Choose "A" if Script A demonstrates clearly superior performance on this dimension. •Choose "B" if Script B demonstrates clearly superior performance on this dimension. •Choose "Same" if both scripts are approximately equal in quality, or if neither shows ...
[31]

Provide a concise, one-sentence explanation justifying your judgment
[32]

explanation

Output your verdict in strict JSON format as specified below. Required JSON Format: { "explanation": "your explanation of which script is better and why", "verdict": "A" or "B" or "Same" } Table 23: Prompt template for pairwise comparison of full scripts (including scenes and dialogue) between two scripts. The dimension placeholder is replaced with one of...
[33]

Narrative Assess the narrative quality based on the following criteria: •Plot Continuity: Smooth transitions between events with clear causal linkages •Logical Consistency: Coherent contextual setups, world-building, storyline progression, and character behaviors •Dramatic Structure: Presence of a complete narrative arc (Exposition, Rising Action, Climax,...
[34]

Thematic Expression Assess the thematic development based on the following criteria: •Theme Clarity: Clear and consistent central theme throughout the script •Theme Depth: Sophisticated exploration of the theme with nuanced treatment •Artistic Reinforcement: Effective use of metaphor, symbolism, and narrative devices to enrich thematic content
[35]

Characterization Assess character portrayal based on the following criteria: •Motivation Credibility: Clear and believable character motivations that drive actions •Character Depth: Emotional and psychological complexity creating well-rounded, multi-dimensional characters •Character Development: Evident growth or meaningful transformation with natural, we...
[36]

Dramatic Engagement Assess dramatic tension and audience engagement based on the following criteria: •Event Design: Well-crafted, compelling events that sustain audience interest •Suspense Construction: Effective use of foreshadowing, hints, and delayed revelations •Narrative Pacing: Significant turning points that shift stakes or character trajectories •...
[37]

A" if Script A clearly demonstrates superior performance on this dimension. Choose

Premise Fidelity Assess adherence to the original premise based on the following criteria: •Conceptual Fidelity: Faithful adherence to the core idea and thematic direction of the given premise •Element Retention: Core premise elements—primary settings, characters, and central conflicts—are faithfully retained Table 24: The five evaluation dimensions and t...