Planning Beyond Text: Graph-based Reasoning for Complex Narrative Generation
Pith reviewed 2026-05-09 22:08 UTC · model grok-4.3
The pith
Planning narratives on event and character graphs rather than raw text helps LLMs sustain global coherence and causality over long stories.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PLOTTER executes the Evaluate-Plan-Revise cycle on the event graph and character graph. By diagnosing and repairing issues of the graph topology under rigorous logical constraints, the model optimizes the causality and narrative skeleton before complete context generation. Experiments demonstrate that PLOTTER significantly outperforms representative baselines across diverse narrative scenarios.
What carries the argument
PLOTTER framework that performs an Evaluate-Plan-Revise cycle on an event graph and a character graph to enforce narrative causality and coherence prior to text generation.
If this is right
- LLMs achieve stronger long-context reasoning by first optimizing narrative structure at the graph level.
- Structural fractures and logical breaks can be identified and fixed before any full story text is produced.
- Character development and causal chains remain consistent because revisions operate on explicit graph relations rather than implicit text patterns.
- The approach generalizes across different narrative scenarios without requiring changes to the underlying language model.
Where Pith is reading between the lines
- Similar graph-first planning could extend to other long-sequence generation tasks such as multi-step reasoning chains or procedural instructions.
- Interactive tools could let users edit the event or character graph directly and then regenerate text from the revised structure.
- The method might lower the cost of post-editing by catching coherence failures early rather than after full text is written.
- Scaling the graphs to very long narratives could test whether explicit topology maintenance remains feasible as story length grows.
Load-bearing premise
Event and character graphs can be constructed and revised accurately enough to capture and enforce global narrative causality and coherence under the model's logical constraints.
What would settle it
Running the same complex narrative tasks with direct text planning versus graph-based planning and finding no measurable gain in coherence or causality metrics, or observing that the model's graph revisions frequently introduce new inconsistencies not present in the original text plan.
Figures
read the original abstract
While LLMs demonstrate remarkable fluency in narrative generation, existing methods struggle to maintain global narrative coherence, contextual logical consistency, and smooth character development, often producing monotonous scripts with structural fractures. To this end, we introduce PLOTTER, a framework that performs narrative planning on structural graph representations instead of the direct sequential text representations used in existing work. Specifically, PLOTTER executes the Evaluate-Plan-Revise cycle on the event graph and character graph. By diagnosing and repairing issues of the graph topology under rigorous logical constraints, the model optimizes the causality and narrative skeleton before complete context generation. Experiments demonstrate that PLOTTER significantly outperforms representative baselines across diverse narrative scenarios. These findings verify that planning narratives on structural graph representations-rather than directly on text-is crucial to enhance the long context reasoning of LLMs in complex narrative generation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces PLOTTER, a framework for complex narrative generation with LLMs that performs planning on structural graph representations (an event graph and a character graph) rather than direct sequential text. It executes an Evaluate-Plan-Revise cycle to diagnose and repair graph topology issues under logical constraints, optimizing causality and the narrative skeleton before full text generation. The authors claim that this approach significantly outperforms representative baselines across diverse narrative scenarios and verify that graph-based planning is crucial for enhancing long-context reasoning in LLMs.
Significance. If the results hold, the work offers a plausible engineering extension of structured planning methods to address known limitations in LLM narrative coherence and causality. Shifting to explicit graph representations for global constraints could improve robustness in long-form generation, and the Evaluate-Plan-Revise loop on topology provides a concrete mechanism that might be reusable. No parameter-free derivations or machine-checked proofs are present, but the framework's emphasis on pre-text revision is a clear contribution if empirically supported.
major comments (2)
- [Abstract and Experiments] Abstract and Experiments section: the claim of significant outperformance across scenarios is asserted without any reported metrics, baseline descriptions, statistical tests, or error analysis. This directly undermines evaluation of the central claim that graph-based planning enhances long-context reasoning, as no quantitative evidence is visible to support the superiority.
- [Framework description (likely §3)] Framework description (likely §3): the assumption that LLM-driven revision of event and character graphs can reliably enforce global narrative causality without introducing new inconsistencies is load-bearing but lacks concrete details on graph initialization, the exact logical constraints applied, or validation that revisions preserve coherence. This makes the weakest assumption difficult to assess from the provided description.
minor comments (2)
- [Introduction and Methods] The paper introduces several new terms (PLOTTER, event graph, character graph) without a dedicated notation or definition table; adding one would improve clarity for readers.
- [Figures] Figure captions and graph examples, if present, should explicitly show before/after revision states to illustrate the Evaluate-Plan-Revise cycle.
Simulated Author's Rebuttal
We thank the referee for their thorough review and insightful comments. We address each major comment point by point below, agreeing that clarifications and additions are warranted to better support our claims. We will revise the manuscript to incorporate these improvements.
read point-by-point responses
-
Referee: [Abstract and Experiments] Abstract and Experiments section: the claim of significant outperformance across scenarios is asserted without any reported metrics, baseline descriptions, statistical tests, or error analysis. This directly undermines evaluation of the central claim that graph-based planning enhances long-context reasoning, as no quantitative evidence is visible to support the superiority.
Authors: The referee correctly notes that the abstract lacks specific metrics. Although the Experiments section presents comparative results against baselines and demonstrates outperformance in narrative coherence and causality, we recognize the need for more explicit reporting. In the revision, we will update the abstract to include key quantitative findings (such as average improvements in human-rated coherence and automatic metrics like entity consistency scores) and enhance the Experiments section with detailed baseline descriptions, statistical significance tests, and error analysis to provide robust evidence for the advantages of graph-based planning. revision: yes
-
Referee: [Framework description (likely §3)] Framework description (likely §3): the assumption that LLM-driven revision of event and character graphs can reliably enforce global narrative causality without introducing new inconsistencies is load-bearing but lacks concrete details on graph initialization, the exact logical constraints applied, or validation that revisions preserve coherence. This makes the weakest assumption difficult to assess from the provided description.
Authors: We agree that the framework description would benefit from greater specificity to allow readers to evaluate the reliability of the graph revision process. We will expand the relevant section to detail the graph initialization procedure (starting from LLM-extracted events and characters), the precise logical constraints enforced (including causality via directed acyclic graphs, character attribute consistency, and event ordering), and the validation mechanisms (such as iterative consistency checks within the Evaluate-Plan-Revise loop and empirical verification through ablation studies showing no introduced inconsistencies). This will strengthen the assessment of our core assumption. revision: yes
Circularity Check
No significant circularity; framework is an independent engineering contribution
full rationale
The paper introduces PLOTTER as a new framework performing Evaluate-Plan-Revise cycles on event and character graphs for narrative planning, with claims supported by experimental outperformance over baselines. No equations, fitted parameters, or derivations are present that reduce results to inputs by construction. No self-citation chains or uniqueness theorems are invoked as load-bearing premises. The approach is presented as a self-contained methodological extension verified externally via benchmarks, satisfying the criteria for a non-circular finding.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Structural graphs can faithfully represent narrative causality and character development
invented entities (3)
-
PLOTTER framework
no independent evidence
-
event graph
no independent evidence
-
character graph
no independent evidence
Reference graph
Works this paper leans on
-
[1]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Deepseek-r1: Incentivizing reasoning capa- bility in llms via reinforcement learning.Preprint, arXiv:2501.12948. Lajos Egri. 2007. The art of dramatic writing. Angela Fan, Mike Lewis, and Yann Dauphin. 2018. Hierarchical neural story generation. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Paper...
work page internal anchor Pith review Pith/arXiv arXiv 2007
-
[2]
R2: A llm based novel-to-screenplay gen- eration framework with causal plot graphs.ArXiv, abs/2503.15655. Yan Ma, Yu Qiao, and Pengfei Liu. 2024. MoPS: Modu- lar story premise synthesis for open-ended automatic story generation. InProceedings of the 62nd Annual Meeting of the Association for Computational Lin- guistics (Volume 1: Long Papers), pages 2135–...
-
[3]
Shadow in the Fire
Summary: A war correspondent exposes a military cover-up and is hunted as a traitor by the government. Generated Title: "Shadow in the Fire"
-
[4]
The Vanishing Hour
Summary: A scientist discovers time travel but realizes using it will erase his daughter from history. Generated Title: "The Vanishing Hour"
-
[5]
The Last Dream
Summary: In a dystopian city where dreaming is forbidden, a woman risks everything to protect the last dreamer. Generated Title: "The Last Dream" Script Summary: {storyline} Return strictly in JSON format: {"title": "Generated Title"} Table 12: Prompt for generating a concise and distinctive title for a script. Plot and Character Graph Creation Prompt: sc...
-
[6]
First Rising Actions (2–3)
-
[7]
First Falling Action (5)
-
[8]
Second Rising Actions (6–7)
-
[9]
Second Falling Action (9)
-
[10]
plot_graph
Ending (10) - The ’Beginning’ must be the first event node, and the ’Ending’ must be the last. - Each event must include the following narrative attributes: - {narrative_stage}: One of the following: - Beginning: Introduces time, setting, and main characters; shows initial status quo and a triggering problem. Only the first node. - Rising Action: Characte...
-
[11]
Unclear Theme or Storyline
Theme Missing Analysis: You are a script theme structure analyst. Please evaluate the overall event graph to identify whether there is an issue of"Unclear Theme or Storyline", and provide a suggestion to strengthen the thematic coherence by modifying the content of existing event nodes. Problem Definition: Unclear Theme or Storyline: The script lacks a cl...
-
[12]
Overly Explicit Thematic Expression
Theme Explicitness Analysis: You are an expert in optimizing thematic expression in screenwriting. Please analyze the following event graph to determine whether there is an issue of"Overly Explicit Thematic Expression", and provide a structural optimization by inserting a narrative buildup node and adjusting causal relations to express the theme more impl...
-
[13]
Lack of Internal Motivation and Setup in Character Development
Character Drive Analysis: You are an expert in analyzing character psychological motivation. Please examine the following event graph to determine whether there exists an issue of"Lack of Internal Motivation and Setup in Character Development", and suggest a structural optimization by inserting a motivation-building event node and reconstructing the causa...
-
[14]
One-Dimensional Characterization
Character Flatness Analysis: You are a character complexity design expert. Please analyze the following event graph to identify whether there is an issue of"One-Dimensional Characterization", and suggest a structural optimization by inserting a fluctuation node and adjusting the causal event structure. Problem Definition: One-Dimensional Characterization:...
-
[15]
Abrupt Character Arc Shift
Character Arc Analysis: You are a narrative pacing expert specialized in character arc development. Please analyze the following event graph to identify whether there is an issue of"Abrupt Character Arc Shift", and provide a structural optimization suggestion by inserting a mediating node into an existing causal chain to better support the psychological t...
-
[16]
Incoherent Plot Progression
Plot Incoherence Analysis: You are a narrative progression structure optimization expert. Please analyze the following event graph to identify whether there is an issue of"Incoherent Plot Progression", and propose a structural optimization by inserting a progression node to improve narrative continuity. Problem Definition: Incoherent Plot Progression: Adj...
-
[17]
Lack of Suspense
Missing Suspense Analysis: You are a suspense design and narrative pacing expert. Please analyze the following event graph to determine whether it contains an issue of"Lack of Suspense", and propose a structural optimization by inserting a suspense node to establish a cross-phase tension chain. Problem Definition: Lack of Suspense: The plot reveals too mu...
-
[18]
Lack of Foreshadowing
Lack of Foreshadowing Analysis: You are a narrative structure optimization expert. Please analyze the following event graph to determine whether there is an issue of"Lack of Foreshadowing", and provide a structural enhancement suggestion by embedding symbolic behaviors or implicit references in earlier events. Problem Definition: Lack of Foreshadowing: Th...
-
[19]
Lack of Plot Reversal
Plot Turning Point Analysis: You are a narrative rhythm and dramatic structure expert. Please analyze the following event graph to identify whether there is an issue of"Lack of Plot Reversal", and propose a structural optimization by inserting a reversal node to enhance dramatic variation
-
[20]
Contradictions in Character or Plot Relationships
Relation Conflict Analysis: You are an expert in analyzing logical consistency in script relationships. Please assess the overall event graph to identify whether there is an issue of"Contradictions in Character or Plot Relationships", and provide a suggestion for improving logical coherence. Table 20: Prompts used by the Plot Agent for analyzing plot-rela...
-
[22]
Ignore superficial factors: do not let length, formatting style, or surface polish bias your judgment
-
[24]
A" if Script A demonstrates clearly superior performance on this dimension. •Choose
Comparative assessment: compare the two scripts directly on the given dimension. Decision Criteria: •Choose "A" if Script A demonstrates clearly superior performance on this dimension. •Choose "B" if Script B demonstrates clearly superior performance on this dimension. •Choose "Same" if both scripts are approximately equal in quality, or if neither shows ...
-
[26]
explanation
Output your verdict in strict JSON format as specified below. Required JSON Format: { "explanation": "your explanation of which script is better and why", "verdict": "A" or "B" or "Same" } Table 22: Prompt template for pairwise comparison of storylines (beats only) between two scripts. The dimension placeholder is replaced with one of the five evaluation ...
-
[27]
Avoid position biases: the order of presentation should not influence your decision
-
[28]
Ignore superficial factors: do not let length, formatting, or surface polish affect your judgment
-
[29]
Focus on content quality: base your reasoning strictly on the narrative quality under the specified dimension
-
[30]
A" if Script A demonstrates clearly superior performance on this dimension. •Choose
Comparative assessment: compare the two scripts directly on the given dimension. Decision Criteria: •Choose "A" if Script A demonstrates clearly superior performance on this dimension. •Choose "B" if Script B demonstrates clearly superior performance on this dimension. •Choose "Same" if both scripts are approximately equal in quality, or if neither shows ...
-
[31]
Provide a concise, one-sentence explanation justifying your judgment
-
[32]
explanation
Output your verdict in strict JSON format as specified below. Required JSON Format: { "explanation": "your explanation of which script is better and why", "verdict": "A" or "B" or "Same" } Table 23: Prompt template for pairwise comparison of full scripts (including scenes and dialogue) between two scripts. The dimension placeholder is replaced with one of...
-
[33]
Narrative Assess the narrative quality based on the following criteria: •Plot Continuity: Smooth transitions between events with clear causal linkages •Logical Consistency: Coherent contextual setups, world-building, storyline progression, and character behaviors •Dramatic Structure: Presence of a complete narrative arc (Exposition, Rising Action, Climax,...
-
[34]
Thematic Expression Assess the thematic development based on the following criteria: •Theme Clarity: Clear and consistent central theme throughout the script •Theme Depth: Sophisticated exploration of the theme with nuanced treatment •Artistic Reinforcement: Effective use of metaphor, symbolism, and narrative devices to enrich thematic content
-
[35]
Characterization Assess character portrayal based on the following criteria: •Motivation Credibility: Clear and believable character motivations that drive actions •Character Depth: Emotional and psychological complexity creating well-rounded, multi-dimensional characters •Character Development: Evident growth or meaningful transformation with natural, we...
-
[36]
Dramatic Engagement Assess dramatic tension and audience engagement based on the following criteria: •Event Design: Well-crafted, compelling events that sustain audience interest •Suspense Construction: Effective use of foreshadowing, hints, and delayed revelations •Narrative Pacing: Significant turning points that shift stakes or character trajectories •...
-
[37]
A" if Script A clearly demonstrates superior performance on this dimension. Choose
Premise Fidelity Assess adherence to the original premise based on the following criteria: •Conceptual Fidelity: Faithful adherence to the core idea and thematic direction of the given premise •Element Retention: Core premise elements—primary settings, characters, and central conflicts—are faithfully retained Table 24: The five evaluation dimensions and t...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.