TombWriter: Scaffolding Story Archeology through Beat-Level Interaction in Human-AI Co-Writing
Pith reviewed 2026-05-20 04:18 UTC · model grok-4.3
The pith
Writers can maintain ownership of their stories by refining persistent story beats with AI rather than discarding one-off prompts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that LLM-based story archeology, grounded in the idea that stories exist as latent structures writers excavate, increases author agency and ownership when interaction happens at the editable beat level instead of through disposable prompts. Writers generate or nudge character actions in scenes to explore emergent possibilities, then iteratively edit the resulting beats before prose is produced from them according to style and genre. The TombWriter interface supports this through a five-stage pipeline that renders stories as navigable cards for characters, scenes, and beats.
What carries the argument
The five-stage narrative pipeline that renders stories as navigable cards for characters, scenes, and beats, with beats serving as the editable unit that separates structural discovery from style-specific prose generation.
If this is right
- Writers can discover emergent story possibilities by generating and then editing beats that represent character actions within scenes.
- Prose output can be regenerated from the same beats using different styles or genres without altering the underlying structure.
- The AI functions as a simulation engine for testing narrative options rather than as a direct writing partner.
- Writers retain ownership by exercising repeated control over beat edits and refinements across the story hierarchy.
- Value is placed on structural discovery and organization more than on the production of finished sentences.
Where Pith is reading between the lines
- The separation of beats from prose could be tested in domains such as game narrative or film scripting where structural planning precedes dialogue.
- Tools built on this model might add branching simulations that let writers compare multiple beat sequences side by side.
- The reported voice loss points to a possible need for style-transfer mechanisms that apply writer-specific language patterns directly to beat-derived prose.
- Longer-term use studies could show whether repeated beat refinement leads to writers internalizing new structural habits even without the tool.
Load-bearing premise
That shifting co-writing work to the level of editable story beats rather than disposable prose prompts will inherently give writers greater agency and ownership.
What would settle it
A follow-up study in which writers using beat-level editing report agency and ownership levels no higher than those using standard prompt-based LLM tools.
Figures
read the original abstract
The dominant paradigm for LLM interaction in AI co-writing uses disposable prompts that vanish after use. This may lead to imprecise results, cumbersome workflows, and diminished author agency and ownership. We propose LLM-based story archeology, where prompts serve as a hierarchical story instrument refined over time to extract the writer's intended story. Drawing on the fossil theory of story- telling, where stories exist as latent structures that writers excavate through their craft, this approach supports agency and ownership through high involvement and control. Writers work at the level of story beats rather than prose. They generate character actions in scenes to discover emergent possibilities, simulated by the LLM or directly nudged, then edit resulting beats to refine scenes iteratively. Prose is generated from beats based on style and genre, separating structure from style. We developed TombWriter, a web-based tool that visualizes stories as navigable cards -- characters, scenes, and beats -- through a five-stage narrative pipeline. We conducted a qual- itative study with five experienced writers who used the system over three days. Through semi-structured interviews, we found that writers framed AI as a generation engine rather than collabo- rator, claimed ownership while reporting voice loss, and valued the system for structural discovery rather than prose production. We contribute the story archeology approach, the TombWriter system, and qualitative findings on beat-level human-AI co-writing.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that disposable prompt-based LLM interaction in co-writing diminishes author agency and ownership, and proposes 'LLM-based story archeology' as an alternative where writers refine hierarchical story beats (rather than prose) to excavate latent narrative structures, drawing on fossil theory. The TombWriter web tool implements a five-stage pipeline visualizing characters, scenes, and beats; a qualitative study with five experienced writers over three days, using semi-structured interviews, found that writers framed AI as a generation engine, claimed ownership while reporting voice loss, and valued the system for structural discovery rather than prose production.
Significance. If the central claims hold, the work could meaningfully shift HCI practices in human-AI co-writing toward controllable, structure-first interfaces that separate narrative scaffolding from style generation. The concrete TombWriter implementation and direct interview evidence on writer perceptions (AI as engine, preference for beats) are strengths; however, the small sample and absence of controls limit the strength of claims about increased agency and ownership.
major comments (2)
- [User Study] User Study section: The qualitative study (n=5, three days, semi-structured interviews) reports self-perceived ownership and preference for structural discovery but includes no control condition or baseline using conventional LLM prompting interfaces. Without this comparison, reported benefits in agency and ownership cannot be attributed specifically to the beat-level scaffolding paradigm versus novelty effects, tool features, or participant expectations. This directly undermines the central argument that beat-level interaction inherently supports greater agency and ownership.
- [Abstract and Findings] Abstract and Findings: Participants claimed ownership while also reporting voice loss; the manuscript should explicitly reconcile these perceptions with the claim that high involvement and control at the beat level increases agency, as the current presentation leaves open whether voice loss indicates reduced ownership in practice.
minor comments (2)
- [Abstract] Abstract: The hyphenated 'story- telling' and 'qual- itative' appear to be formatting artifacts; ensure consistent typography in the final version.
- [Discussion] The connection between the fossil theory framing and the empirical findings is mentioned but could be clarified in the discussion to avoid any appearance of post-hoc interpretation.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback on our manuscript. We address each major comment below, offering clarifications grounded in the exploratory nature of our qualitative study and indicating where we will revise the manuscript for greater precision.
read point-by-point responses
-
Referee: [User Study] User Study section: The qualitative study (n=5, three days, semi-structured interviews) reports self-perceived ownership and preference for structural discovery but includes no control condition or baseline using conventional LLM prompting interfaces. Without this comparison, reported benefits in agency and ownership cannot be attributed specifically to the beat-level scaffolding paradigm versus novelty effects, tool features, or participant expectations. This directly undermines the central argument that beat-level interaction inherently supports greater agency and ownership.
Authors: We designed the study as an in-depth qualitative exploration of writers' lived experiences with TombWriter's beat-level pipeline rather than a controlled experiment testing causal superiority. The reported findings focus on participants' self-perceptions (e.g., framing AI as a generation engine and valuing structural discovery), which are valid for an initial investigation of a novel interaction paradigm. We agree that the lack of a baseline condition prevents strong attribution of agency gains specifically to beat-level scaffolding versus other factors. In the revised manuscript we will expand the limitations and future work sections to explicitly acknowledge this and recommend comparative studies with conventional prompting interfaces. revision: yes
-
Referee: [Abstract and Findings] Abstract and Findings: Participants claimed ownership while also reporting voice loss; the manuscript should explicitly reconcile these perceptions with the claim that high involvement and control at the beat level increases agency, as the current presentation leaves open whether voice loss indicates reduced ownership in practice.
Authors: We will revise both the abstract and the findings/discussion sections to directly address this tension. Ownership is tied to writers' sustained control over the hierarchical beats and emergent narrative structure (consistent with the fossil-theory framing of excavating latent story elements). Voice loss, by contrast, refers to stylistic qualities in the final prose output, which our five-stage pipeline deliberately separates from beat-level structure work. This separation is intended to preserve agency at the structural level while allowing the LLM to handle style. We will make this distinction explicit so the agency claim is not left ambiguous. revision: yes
Circularity Check
Minor conceptual framing from fossil theory; user-study findings remain independent
full rationale
This is a qualitative HCI paper describing the TombWriter system and reporting results from a study with five writers. The central claims about perceived agency, ownership, and preference for structural discovery derive directly from semi-structured interview data rather than any derivation, fitted parameter, or self-referential equation. The fossil theory is invoked only for high-level motivation and does not serve as a load-bearing premise that the empirical results reduce to by construction. No mathematical steps, predictions, or uniqueness theorems appear in the manuscript, so the work is self-contained against external benchmarks with only minor reliance on prior conceptual framing.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Stories exist as latent structures that writers excavate through their craft (fossil theory of storytelling).
invented entities (1)
-
Story archeology
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat recovery unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
five-stage narrative pipeline... simulation of character actions
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
James Scott Bell. 2011.Plot & Structure: Techniques and Exercises for Crafting a Plot That Grips Readers from Start to Finish(2nd ed.). Writer’s Digest Books, Cincinnati, OH, USA
work page 2011
-
[2]
Oloff C. Biermann, Ning F. Ma, and Dongwook Yoon. 2022. From Tool to Compan- ion: Storywriters Want AI Writers to Respect Their Personal Values and Writing Strategies. InProceedings of the ACM Conference on Designing Interactive Systems. ACM, New York, NY, USA, 1209–1227. doi:10.1145/3532106.3533506
-
[3]
Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative research in psychology3, 2 (2006), 77–101
work page 2006
-
[4]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin...
work page 2020
-
[5]
Zenan Chen and Jason Chan. 2024. Large language model in creative work: The role of collaboration modality and user expertise.Management Science70, 12 (2024), 9101–9117. doi:10.1287/mnsc.2023.03014 AVI ’26, June 08–12, 2026, Venice, Italy Andersson & Elmqvist
-
[6]
John Joon Young Chung, Wooseok Kim, Kang Min Yoo, Hwaran Lee, Eytan Adar, and Minsuk Chang. 2022. TaleBrush: Sketching Stories with Generative Pretrained Language Models. InProceedings of the ACM Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 209:1–209:19. doi:10. 1145/3491102.3501819
-
[7]
Peter Dalsgaard. 2025. Creative Ambiguity and Cognitive Tension in Generative AI Tools. InProceedings of the Annual Conference of the European Association of Cognitive Ergonomics. ACM, New York, NY, USA, 4:1–4:10. doi:10.1145/3746175. 3746183
-
[8]
Fiona Draxler, Anna Werner, Florian Lehmann, Matthias Hoppe, Albrecht Schmidt, Daniel Buschek, and Robin Welsch. 2024. The AI Ghostwriter Effect: When Users do not Perceive Ownership of AI-Generated Text but Self-Declare as Authors.ACM Transactions on Computer-Human Interaction31, 2 (April 2024), 1–40. doi:10.1145/3637875
-
[9]
Paolo Grigis and Antonella De Angeli. 2024. Playwriting with large language models: Perceived features, interaction strategies and outcomes. InProceedings of the ACM Conference on Advanced Visual Interfaces. ACM, New York, NY, USA, 1–9. doi:10.1145/3656650.3656688
-
[10]
Naimul Hoque, Bhavya Ghai, and Niklas Elmqvist
Md. Naimul Hoque, Bhavya Ghai, and Niklas Elmqvist. 2022. DramatVis Per- sonae: Visual Text Analytics for Identifying Social Biases in Creative Writing. In Proceedings of the ACM Conference on Designing Interactive Systems. ACM, New York, NY, USA, 1260–1276. doi:10.1145/3532106.3533526
-
[11]
Naimul Hoque, Bhavya Ghai, Kari Kraus, and Niklas Elmqvist
Md. Naimul Hoque, Bhavya Ghai, Kari Kraus, and Niklas Elmqvist. 2023. Por- trayal: Leveraging NLP and Visualization for Analyzing Fictional Characters. In Proceedings of the ACM Conference on Designing Interactive Systems. ACM, New York, NY, USA, 74–94. doi:10.1145/3563657.3596000
-
[12]
Md Naimul Hoque, Tasfia Mashiat, Bhavya Ghai, Cecilia Shelton, Fanny Chevalier, Kari Kraus, and Niklas Elmqvist. 2024. The HaLLMark Effect: Supporting Prove- nance and Transparent Use of Large Language Models in Writing through Inter- active Visualization. InProceedings of the ACM Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 1...
-
[13]
Runsheng Huang, Lara J. Martin, and Chris Callison-Burch. 2024. WHAT-IF: Exploring Branching Narratives by Meta-Prompting Large Language Models. CoRRabs/2412.10582 (2024), 21 pages. arXiv:2412.10582 doi:10.48550/ARXIV. 2412.10582
work page internal anchor Pith review doi:10.48550/arxiv 2024
-
[14]
2000.On Writing: A Memoir of the Craft
Stephen King. 2000.On Writing: A Memoir of the Craft. Scribner, New York, NY, USA
work page 2000
-
[15]
Max Kreminski. 2024. The Dearth of the Author in AI-Supported Writing.CoRR abs/2404.10289 (2024), 3 pages. arXiv:2404.10289 doi:10.48550/ARXIV.2404.10289
-
[16]
Alghamdi, Tal August, Avinash Bhat, Madiha Zahrah Choksi, Senjuti Dutta, Jin L.C
Mina Lee, Katy Ilonka Gero, John Joon Young Chung, Simon Buckingham Shum, Vipul Raheja, Hua Shen, Subhashini Venugopalan, Thiemo Wambsganss, David Zhou, Emad A. Alghamdi, Tal August, Avinash Bhat, Madiha Zahrah Choksi, Senjuti Dutta, Jin L. C. Guo, Md. Naimul Hoque, Yewon Kim, Simon Knight, Seyed Parsa Neshaei, Antonette Shibani, Disha Shrivastava, Lila S...
-
[17]
Zhuoran Lu, Qian Zhou, and Yi Wang. 2025. WhatELSE: Shaping narrative spaces at configurable level of abstraction for AI-bridged interactive storytelling. In Proceedings of the ACM Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 333:1–333:18. doi:10.1145/3706598.3713363
-
[18]
Teresa Luther, Joachim Kimmerle, and Ulrike Cress. 2024. Teaming Up with an AI: Exploring Human–AI Collaboration in a Writing Scenario with ChatGPT.AI 5, 3 (2024), 1357–1376. doi:10.3390/ai5030065
-
[19]
Damien Masson, Zixin Zhao, and Fanny Chevalier. 2025. Visual Story-Writing: Writing by Manipulating Visual Representations of Stories. InProceedings of the ACM Symposium on User Interface Software and Technology. ACM, New York, NY, USA, 70:1–70:15. doi:10.1145/3746059.3747758
-
[20]
Mathewson, Jaylen Pittman, and Richard Evans
Piotr Mirowski, Kory W. Mathewson, Jaylen Pittman, and Richard Evans. 2023. Co-writing screenplays and theatre scripts with language models: Evaluation by industry professionals. InProceedings of the ACM Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 335:1–335:34. doi:10.1145/ 3544548.3581225
-
[21]
Naimul Hoque, Tasfia Mashiat, Bhavya Ghai, Cecilia D
Mohi Reza, Nathan M. Laundry, Ilya Musabirov, Peter Dushniku, Zhi Yuan "Michael" Yu, Kashish Mittal, Tovi Grossman, Michael Liut, Anastasia Kuzminykh, and Joseph Jay Williams. 2024. ABScribe: Rapid Exploration & Organization of Multiple Writing Variations in Human-AI Co-Writing Tasks using Large Language Models. InProceedings of the ACM Conference on Hu- ...
-
[22]
Mohi Reza, Jeb Thomas-Mitchell, Peter Dushniku, Nathan Laundry, Joseph Jay Williams, and Anastasia Kuzminykh. 2025. Co-Writing with AI, on Human Terms: Aligning Research with User Demands Across the Writing Process.CoRR abs/2504.12488 (2025), 37 pages. arXiv:2504.12488 doi:10.48550/ARXIV.2504.12488
-
[23]
Lucy A. Suchman. 1987.Plans and Situated Actions: The Problem of Human- Machine Communication. Cambridge University Press, Cambridge, UK
work page 1987
-
[24]
Kevin Yang, Dan Klein, Nanyun Peng, and Yuandong Tian. 2023. DOC: Improving Long Story Coherence With Detailed Outline Control. InProceedings of the Annual Meeting of the Association for Computational Linguistics. ACL, Stroudsburg, PA, USA, 3378–3465. doi:10.18653/V1/2023.ACL-LONG.190
-
[25]
Kevin Yang, Yuandong Tian, Nanyun Peng, and Dan Klein. 2022. Re3: Gener- ating Longer Stories With Recursive Reprompting and Revision. InProceedings of the Conference on Empirical Methods in Natural Language Processing. ACL, Stroudsburg, PA, USA, 4393–4479. doi:10.18653/V1/2022.EMNLP-MAIN.296
- [26]
-
[27]
Martin, Andrew Head, and Chris Callison-Burch
Andrew Zhu, Lara J. Martin, Andrew Head, and Chris Callison-Burch. 2023. CALYPSO: LLMs as Dungeon Master’s Assistants. InProceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment. AAAI Press, Washington, D.C., USA, 380–390. doi:10.1609/AIIDE.V19I1.27534
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.