pith. sign in

arxiv: 2605.19681 · v1 · pith:S3DERBCKnew · submitted 2026-05-19 · 💻 cs.HC

TombWriter: Scaffolding Story Archeology through Beat-Level Interaction in Human-AI Co-Writing

Pith reviewed 2026-05-20 04:18 UTC · model grok-4.3

classification 💻 cs.HC
keywords human-AI co-writingstory archeologybeat-level interactioncreative writing toolsLLM-assisted storytellingauthor agencynarrative structurestory beats
0
0 comments X

The pith

Writers can maintain ownership of their stories by refining persistent story beats with AI rather than discarding one-off prompts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces story archeology as a method where AI prompts become lasting instruments that writers refine to uncover the latent narrative they intend to tell. Instead of generating prose directly, users create and edit beats that represent character actions within scenes, allowing the system to simulate possibilities or let writers nudge outcomes. These beats feed into prose generation tied to chosen style and genre, keeping structure separate from expression. A qualitative study with five writers using the TombWriter tool showed they treated the AI primarily as an engine for generating options, asserted ownership of the results, yet noted some loss of personal voice while appreciating the help with overall story architecture. This setup aims to restore agency by giving writers ongoing control at a structural level.

Core claim

The central claim is that LLM-based story archeology, grounded in the idea that stories exist as latent structures writers excavate, increases author agency and ownership when interaction happens at the editable beat level instead of through disposable prompts. Writers generate or nudge character actions in scenes to explore emergent possibilities, then iteratively edit the resulting beats before prose is produced from them according to style and genre. The TombWriter interface supports this through a five-stage pipeline that renders stories as navigable cards for characters, scenes, and beats.

What carries the argument

The five-stage narrative pipeline that renders stories as navigable cards for characters, scenes, and beats, with beats serving as the editable unit that separates structural discovery from style-specific prose generation.

If this is right

  • Writers can discover emergent story possibilities by generating and then editing beats that represent character actions within scenes.
  • Prose output can be regenerated from the same beats using different styles or genres without altering the underlying structure.
  • The AI functions as a simulation engine for testing narrative options rather than as a direct writing partner.
  • Writers retain ownership by exercising repeated control over beat edits and refinements across the story hierarchy.
  • Value is placed on structural discovery and organization more than on the production of finished sentences.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The separation of beats from prose could be tested in domains such as game narrative or film scripting where structural planning precedes dialogue.
  • Tools built on this model might add branching simulations that let writers compare multiple beat sequences side by side.
  • The reported voice loss points to a possible need for style-transfer mechanisms that apply writer-specific language patterns directly to beat-derived prose.
  • Longer-term use studies could show whether repeated beat refinement leads to writers internalizing new structural habits even without the tool.

Load-bearing premise

That shifting co-writing work to the level of editable story beats rather than disposable prose prompts will inherently give writers greater agency and ownership.

What would settle it

A follow-up study in which writers using beat-level editing report agency and ownership levels no higher than those using standard prompt-based LLM tools.

Figures

Figures reproduced from arXiv: 2605.19681 by Hugo Andersson, Niklas Elmqvist.

Figure 1
Figure 1. Figure 1: The TombWriter five-stage pipeline (schematic representation). (a) Premise: the writer articulates a high-level concept capturing the story’s core conflict, theme, or dramatic question. (b) Characters: the writer creates character cards specifying traits, goals, and personality. (c) Scenes: the writer defines an initial situation and selects participating characters. (d) Beats: dramatic actions are generat… view at source ↗
Figure 2
Figure 2. Figure 2: Three mechanisms for beat generation. From a given sit￾uation, writers can: (a) Simulation: the LLM role-plays characters based on their traits and goals; (b) Nudging: the writer specifies a desired outcome, which the LLM adapts to; or (c) Manual: the writer authors beat content directly. 5.1 Study Design Initial pilot testing with novice writers using survey-based mea￾sures suggested that beat-level autho… view at source ↗
read the original abstract

The dominant paradigm for LLM interaction in AI co-writing uses disposable prompts that vanish after use. This may lead to imprecise results, cumbersome workflows, and diminished author agency and ownership. We propose LLM-based story archeology, where prompts serve as a hierarchical story instrument refined over time to extract the writer's intended story. Drawing on the fossil theory of story- telling, where stories exist as latent structures that writers excavate through their craft, this approach supports agency and ownership through high involvement and control. Writers work at the level of story beats rather than prose. They generate character actions in scenes to discover emergent possibilities, simulated by the LLM or directly nudged, then edit resulting beats to refine scenes iteratively. Prose is generated from beats based on style and genre, separating structure from style. We developed TombWriter, a web-based tool that visualizes stories as navigable cards -- characters, scenes, and beats -- through a five-stage narrative pipeline. We conducted a qual- itative study with five experienced writers who used the system over three days. Through semi-structured interviews, we found that writers framed AI as a generation engine rather than collabo- rator, claimed ownership while reporting voice loss, and valued the system for structural discovery rather than prose production. We contribute the story archeology approach, the TombWriter system, and qualitative findings on beat-level human-AI co-writing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that disposable prompt-based LLM interaction in co-writing diminishes author agency and ownership, and proposes 'LLM-based story archeology' as an alternative where writers refine hierarchical story beats (rather than prose) to excavate latent narrative structures, drawing on fossil theory. The TombWriter web tool implements a five-stage pipeline visualizing characters, scenes, and beats; a qualitative study with five experienced writers over three days, using semi-structured interviews, found that writers framed AI as a generation engine, claimed ownership while reporting voice loss, and valued the system for structural discovery rather than prose production.

Significance. If the central claims hold, the work could meaningfully shift HCI practices in human-AI co-writing toward controllable, structure-first interfaces that separate narrative scaffolding from style generation. The concrete TombWriter implementation and direct interview evidence on writer perceptions (AI as engine, preference for beats) are strengths; however, the small sample and absence of controls limit the strength of claims about increased agency and ownership.

major comments (2)
  1. [User Study] User Study section: The qualitative study (n=5, three days, semi-structured interviews) reports self-perceived ownership and preference for structural discovery but includes no control condition or baseline using conventional LLM prompting interfaces. Without this comparison, reported benefits in agency and ownership cannot be attributed specifically to the beat-level scaffolding paradigm versus novelty effects, tool features, or participant expectations. This directly undermines the central argument that beat-level interaction inherently supports greater agency and ownership.
  2. [Abstract and Findings] Abstract and Findings: Participants claimed ownership while also reporting voice loss; the manuscript should explicitly reconcile these perceptions with the claim that high involvement and control at the beat level increases agency, as the current presentation leaves open whether voice loss indicates reduced ownership in practice.
minor comments (2)
  1. [Abstract] Abstract: The hyphenated 'story- telling' and 'qual- itative' appear to be formatting artifacts; ensure consistent typography in the final version.
  2. [Discussion] The connection between the fossil theory framing and the empirical findings is mentioned but could be clarified in the discussion to avoid any appearance of post-hoc interpretation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback on our manuscript. We address each major comment below, offering clarifications grounded in the exploratory nature of our qualitative study and indicating where we will revise the manuscript for greater precision.

read point-by-point responses
  1. Referee: [User Study] User Study section: The qualitative study (n=5, three days, semi-structured interviews) reports self-perceived ownership and preference for structural discovery but includes no control condition or baseline using conventional LLM prompting interfaces. Without this comparison, reported benefits in agency and ownership cannot be attributed specifically to the beat-level scaffolding paradigm versus novelty effects, tool features, or participant expectations. This directly undermines the central argument that beat-level interaction inherently supports greater agency and ownership.

    Authors: We designed the study as an in-depth qualitative exploration of writers' lived experiences with TombWriter's beat-level pipeline rather than a controlled experiment testing causal superiority. The reported findings focus on participants' self-perceptions (e.g., framing AI as a generation engine and valuing structural discovery), which are valid for an initial investigation of a novel interaction paradigm. We agree that the lack of a baseline condition prevents strong attribution of agency gains specifically to beat-level scaffolding versus other factors. In the revised manuscript we will expand the limitations and future work sections to explicitly acknowledge this and recommend comparative studies with conventional prompting interfaces. revision: yes

  2. Referee: [Abstract and Findings] Abstract and Findings: Participants claimed ownership while also reporting voice loss; the manuscript should explicitly reconcile these perceptions with the claim that high involvement and control at the beat level increases agency, as the current presentation leaves open whether voice loss indicates reduced ownership in practice.

    Authors: We will revise both the abstract and the findings/discussion sections to directly address this tension. Ownership is tied to writers' sustained control over the hierarchical beats and emergent narrative structure (consistent with the fossil-theory framing of excavating latent story elements). Voice loss, by contrast, refers to stylistic qualities in the final prose output, which our five-stage pipeline deliberately separates from beat-level structure work. This separation is intended to preserve agency at the structural level while allowing the LLM to handle style. We will make this distinction explicit so the agency claim is not left ambiguous. revision: yes

Circularity Check

0 steps flagged

Minor conceptual framing from fossil theory; user-study findings remain independent

full rationale

This is a qualitative HCI paper describing the TombWriter system and reporting results from a study with five writers. The central claims about perceived agency, ownership, and preference for structural discovery derive directly from semi-structured interview data rather than any derivation, fitted parameter, or self-referential equation. The fossil theory is invoked only for high-level motivation and does not serve as a load-bearing premise that the empirical results reduce to by construction. No mathematical steps, predictions, or uniqueness theorems appear in the manuscript, so the work is self-contained against external benchmarks with only minor reliance on prior conceptual framing.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The paper rests on the fossil theory of storytelling as a domain assumption and introduces story archeology as a new framing without independent falsifiable evidence beyond the reported user study.

axioms (1)
  • domain assumption Stories exist as latent structures that writers excavate through their craft (fossil theory of storytelling).
    Invoked early to motivate the shift from disposable prompts to refinable hierarchical instruments.
invented entities (1)
  • Story archeology no independent evidence
    purpose: A proposed paradigm in which prompts become persistent, refinable story instruments for extracting intended narrative structure.
    New concept introduced to organize the beat-level workflow; no external validation or falsifiable prediction supplied outside the study.

pith-pipeline@v0.9.0 · 5780 in / 1401 out tokens · 44551 ms · 2026-05-20T04:18:11.183341+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · 1 internal anchor

  1. [1]

    2011.Plot & Structure: Techniques and Exercises for Crafting a Plot That Grips Readers from Start to Finish(2nd ed.)

    James Scott Bell. 2011.Plot & Structure: Techniques and Exercises for Crafting a Plot That Grips Readers from Start to Finish(2nd ed.). Writer’s Digest Books, Cincinnati, OH, USA

  2. [2]

    Biermann, Ning F

    Oloff C. Biermann, Ning F. Ma, and Dongwook Yoon. 2022. From Tool to Compan- ion: Storywriters Want AI Writers to Respect Their Personal Values and Writing Strategies. InProceedings of the ACM Conference on Designing Interactive Systems. ACM, New York, NY, USA, 1209–1227. doi:10.1145/3532106.3533506

  3. [3]

    Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative research in psychology3, 2 (2006), 77–101

  4. [4]

    Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin...

  5. [5]

    Zenan Chen and Jason Chan. 2024. Large language model in creative work: The role of collaboration modality and user expertise.Management Science70, 12 (2024), 9101–9117. doi:10.1287/mnsc.2023.03014 AVI ’26, June 08–12, 2026, Venice, Italy Andersson & Elmqvist

  6. [6]

    John Joon Young Chung, Wooseok Kim, Kang Min Yoo, Hwaran Lee, Eytan Adar, and Minsuk Chang. 2022. TaleBrush: Sketching Stories with Generative Pretrained Language Models. InProceedings of the ACM Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 209:1–209:19. doi:10. 1145/3491102.3501819

  7. [7]

    Peter Dalsgaard. 2025. Creative Ambiguity and Cognitive Tension in Generative AI Tools. InProceedings of the Annual Conference of the European Association of Cognitive Ergonomics. ACM, New York, NY, USA, 4:1–4:10. doi:10.1145/3746175. 3746183

  8. [8]

    Fiona Draxler, Anna Werner, Florian Lehmann, Matthias Hoppe, Albrecht Schmidt, Daniel Buschek, and Robin Welsch. 2024. The AI Ghostwriter Effect: When Users do not Perceive Ownership of AI-Generated Text but Self-Declare as Authors.ACM Transactions on Computer-Human Interaction31, 2 (April 2024), 1–40. doi:10.1145/3637875

  9. [9]

    Paolo Grigis and Antonella De Angeli. 2024. Playwriting with large language models: Perceived features, interaction strategies and outcomes. InProceedings of the ACM Conference on Advanced Visual Interfaces. ACM, New York, NY, USA, 1–9. doi:10.1145/3656650.3656688

  10. [10]

    Naimul Hoque, Bhavya Ghai, and Niklas Elmqvist

    Md. Naimul Hoque, Bhavya Ghai, and Niklas Elmqvist. 2022. DramatVis Per- sonae: Visual Text Analytics for Identifying Social Biases in Creative Writing. In Proceedings of the ACM Conference on Designing Interactive Systems. ACM, New York, NY, USA, 1260–1276. doi:10.1145/3532106.3533526

  11. [11]

    Naimul Hoque, Bhavya Ghai, Kari Kraus, and Niklas Elmqvist

    Md. Naimul Hoque, Bhavya Ghai, Kari Kraus, and Niklas Elmqvist. 2023. Por- trayal: Leveraging NLP and Visualization for Analyzing Fictional Characters. In Proceedings of the ACM Conference on Designing Interactive Systems. ACM, New York, NY, USA, 74–94. doi:10.1145/3563657.3596000

  12. [12]

    Md Naimul Hoque, Tasfia Mashiat, Bhavya Ghai, Cecilia Shelton, Fanny Chevalier, Kari Kraus, and Niklas Elmqvist. 2024. The HaLLMark Effect: Supporting Prove- nance and Transparent Use of Large Language Models in Writing through Inter- active Visualization. InProceedings of the ACM Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 1...

  13. [13]

    Dickerson

    Runsheng Huang, Lara J. Martin, and Chris Callison-Burch. 2024. WHAT-IF: Exploring Branching Narratives by Meta-Prompting Large Language Models. CoRRabs/2412.10582 (2024), 21 pages. arXiv:2412.10582 doi:10.48550/ARXIV. 2412.10582

  14. [14]

    2000.On Writing: A Memoir of the Craft

    Stephen King. 2000.On Writing: A Memoir of the Craft. Scribner, New York, NY, USA

  15. [15]

    Max Kreminski. 2024. The Dearth of the Author in AI-Supported Writing.CoRR abs/2404.10289 (2024), 3 pages. arXiv:2404.10289 doi:10.48550/ARXIV.2404.10289

  16. [16]

    Alghamdi, Tal August, Avinash Bhat, Madiha Zahrah Choksi, Senjuti Dutta, Jin L.C

    Mina Lee, Katy Ilonka Gero, John Joon Young Chung, Simon Buckingham Shum, Vipul Raheja, Hua Shen, Subhashini Venugopalan, Thiemo Wambsganss, David Zhou, Emad A. Alghamdi, Tal August, Avinash Bhat, Madiha Zahrah Choksi, Senjuti Dutta, Jin L. C. Guo, Md. Naimul Hoque, Yewon Kim, Simon Knight, Seyed Parsa Neshaei, Antonette Shibani, Disha Shrivastava, Lila S...

  17. [17]

    Zhuoran Lu, Qian Zhou, and Yi Wang. 2025. WhatELSE: Shaping narrative spaces at configurable level of abstraction for AI-bridged interactive storytelling. In Proceedings of the ACM Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 333:1–333:18. doi:10.1145/3706598.3713363

  18. [18]

    Teresa Luther, Joachim Kimmerle, and Ulrike Cress. 2024. Teaming Up with an AI: Exploring Human–AI Collaboration in a Writing Scenario with ChatGPT.AI 5, 3 (2024), 1357–1376. doi:10.3390/ai5030065

  19. [19]

    Damien Masson, Zixin Zhao, and Fanny Chevalier. 2025. Visual Story-Writing: Writing by Manipulating Visual Representations of Stories. InProceedings of the ACM Symposium on User Interface Software and Technology. ACM, New York, NY, USA, 70:1–70:15. doi:10.1145/3746059.3747758

  20. [20]

    Mathewson, Jaylen Pittman, and Richard Evans

    Piotr Mirowski, Kory W. Mathewson, Jaylen Pittman, and Richard Evans. 2023. Co-writing screenplays and theatre scripts with language models: Evaluation by industry professionals. InProceedings of the ACM Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 335:1–335:34. doi:10.1145/ 3544548.3581225

  21. [21]

    Naimul Hoque, Tasfia Mashiat, Bhavya Ghai, Cecilia D

    Mohi Reza, Nathan M. Laundry, Ilya Musabirov, Peter Dushniku, Zhi Yuan "Michael" Yu, Kashish Mittal, Tovi Grossman, Michael Liut, Anastasia Kuzminykh, and Joseph Jay Williams. 2024. ABScribe: Rapid Exploration & Organization of Multiple Writing Variations in Human-AI Co-Writing Tasks using Large Language Models. InProceedings of the ACM Conference on Hu- ...

  22. [22]

    Mohi Reza, Jeb Thomas-Mitchell, Peter Dushniku, Nathan Laundry, Joseph Jay Williams, and Anastasia Kuzminykh. 2025. Co-Writing with AI, on Human Terms: Aligning Research with User Demands Across the Writing Process.CoRR abs/2504.12488 (2025), 37 pages. arXiv:2504.12488 doi:10.48550/ARXIV.2504.12488

  23. [23]

    Lucy A. Suchman. 1987.Plans and Situated Actions: The Problem of Human- Machine Communication. Cambridge University Press, Cambridge, UK

  24. [24]

    Kevin Yang, Dan Klein, Nanyun Peng, and Yuandong Tian. 2023. DOC: Improving Long Story Coherence With Detailed Outline Control. InProceedings of the Annual Meeting of the Association for Computational Linguistics. ACL, Stroudsburg, PA, USA, 3378–3465. doi:10.18653/V1/2023.ACL-LONG.190

  25. [25]

    Kevin Yang, Yuandong Tian, Nanyun Peng, and Dan Klein. 2022. Re3: Gener- ating Longer Stories With Recursive Reprompting and Revision. InProceedings of the Conference on Empirical Methods in Natural Language Processing. ACL, Stroudsburg, PA, USA, 4393–4479. doi:10.18653/V1/2022.EMNLP-MAIN.296

  26. [26]

    Ann Yuan, Andy Coenen, Emily Reif, and Daphne Ippolito. 2022. Wordcraft: Story Writing With Large Language Models. InProceedings of the ACM Conference on Intelligent User Interface. ACM, New York, NY, USA, 841–852. doi:10.1145/ 3490099.3511105

  27. [27]

    Martin, Andrew Head, and Chris Callison-Burch

    Andrew Zhu, Lara J. Martin, Andrew Head, and Chris Callison-Burch. 2023. CALYPSO: LLMs as Dungeon Master’s Assistants. InProceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment. AAAI Press, Washington, D.C., USA, 380–390. doi:10.1609/AIIDE.V19I1.27534