Frankentext: Stitching random text fragments into long-form narratives

Chau Minh Pham; Dzung Pham; Jenna Russell; Mohit Iyyer

arxiv: 2505.18128 · v4 · submitted 2025-05-23 · 💻 cs.CL

Frankentext: Stitching random text fragments into long-form narratives

Chau Minh Pham , Jenna Russell , Dzung Pham , Mohit Iyyer This is my paper

Pith reviewed 2026-05-19 12:52 UTC · model grok-4.3

classification 💻 cs.CL

keywords long-form narrative generationtext stitchingAI text detectionverbatim copyingLLM compositionoriginality evaluationcombinatorial selection

0 comments

The pith

LLMs generate higher-quality long stories by mostly copying and stitching random human text fragments rather than writing freely.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Frankentexts, a generation method that forces an LLM to create long narratives by selecting from thousands of random human-written snippets and copying the great majority of tokens directly from them. The model must implicitly search the huge space of possible selections and orderings before doing minimal edits to connect the pieces into a coherent story that fits the prompt. Evaluations show these outputs score better than standard LLM writing on quality, diversity, and originality while staying relevant and readable. The same constraint makes the results much harder for current detectors to flag as machine-generated. This approach also surfaces open questions about who owns stories built primarily from existing human text.

Core claim

Frankentexts are produced by giving an LLM a writing prompt plus thousands of randomly sampled human paragraphs and instructing it to form a narrative while copying most tokens verbatim from the supplied fragments. The LLM explores the combinatorial possibilities of choosing and sequencing these snippets, then stitches them with light edits to maintain coherence and prompt relevance. Automatic and human evaluations find that the resulting texts exceed vanilla LLM outputs in writing quality, diversity, and originality, remain coherent, and are frequently misclassified as human-written by state-of-the-art detectors such as Pangram.

What carries the argument

The Frankentext procedure that treats the LLM as a composer selecting, ordering, and minimally stitching random human snippets under a high verbatim-copy constraint.

If this is right

Long-form generation can achieve higher originality and diversity without increasing the amount of newly generated text.
Current AI-text detectors become significantly less reliable when outputs are built by stitching existing human fragments.
Questions of authorship and copyright intensify when the raw material is human-written and the LLM only orchestrates it.
The method offers a way to control stylistic and tonal variety by varying the source snippet pool.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This stitching approach may generalize to other creative tasks such as script or poetry generation where source material is abundant.
Training data policies for LLMs could need revision if outputs are shown to be largely derivative of specific human sources.
New legal or technical mechanisms for attributing contributions from the original snippet authors may become necessary.

Load-bearing premise

An LLM can implicitly explore the combinatorial space of selecting and ordering thousands of random snippets to form a coherent story while copying most tokens verbatim with only minimal new text.

What would settle it

A head-to-head comparison in which the same set of random snippets is given to skilled human writers who must also copy most tokens and produce stories; if humans produce clearly superior results or if improved detectors correctly flag most Frankentexts, the performance claims would not hold.

Figures

Figures reproduced from arXiv: 2505.18128 by Chau Minh Pham, Dzung Pham, Jenna Russell, Mohit Iyyer.

**Figure 1.** Figure 1: The Frankentexts pipeline. First, random paragraphs are sampled from a large corpus of human-written books. Then, an LLM is prompted with the paragraphs, a writing prompt, and instructions to include a certain amount of human text verbatim, to generate the first draft of a Frankentext, which is further edited into a coherent and faithful final version (see Algorithm 1). The highlighted texts are the human-… view at source ↗

**Figure 3.** Figure 3: Effects of varying the percentage of required [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Example of the consent form provided to participants. [PITH_FULL_IMAGE:figures/full_fig_p023_4.png] view at source ↗

**Figure 5.** Figure 5: Label Studio Single Story Annotation Interface [PITH_FULL_IMAGE:figures/full_fig_p024_5.png] view at source ↗

**Figure 6.** Figure 6: Label Studio Pairwise Story Annotation Interface. [PITH_FULL_IMAGE:figures/full_fig_p025_6.png] view at source ↗

**Figure 7.** Figure 7: An example of fictional Frankentexts 26 [PITH_FULL_IMAGE:figures/full_fig_p026_7.png] view at source ↗

**Figure 8.** Figure 8: An example of nonfiction Frankentexts 28 [PITH_FULL_IMAGE:figures/full_fig_p028_8.png] view at source ↗

**Figure 9.** Figure 9: Detection rates among annotators and detec [PITH_FULL_IMAGE:figures/full_fig_p033_9.png] view at source ↗

**Figure 12.** Figure 12: Pangram detection rate and copy rate throughout the texts, aggregated across models. 33 [PITH_FULL_IMAGE:figures/full_fig_p033_12.png] view at source ↗

**Figure 11.** Figure 11: Copy rate and Pangram detection rate on longer Frankentexts section 1 section 2 section 3 section 4 30 40 50 Percentage (%) Attribution (%) Pangram (%) [PITH_FULL_IMAGE:figures/full_fig_p033_11.png] view at source ↗

**Figure 13.** Figure 13: System prompt for MCP calls Copy Rate Average AI Fraction 0 10 20 30 40 50 60 70 80 90 Percentage (%) 74.8% 15.5% 68.4% 23.1% r/WritingPrompts Tell me a story [PITH_FULL_IMAGE:figures/full_fig_p034_13.png] view at source ↗

**Figure 14.** Figure 14: Copy rate and Pangram AI fraction across [PITH_FULL_IMAGE:figures/full_fig_p034_14.png] view at source ↗

**Figure 15.** Figure 15: Total occurrences of AI-related keywords [PITH_FULL_IMAGE:figures/full_fig_p034_15.png] view at source ↗

**Figure 18.** Figure 18: Prompt for judging text coherence Prompt for judging text relevance You are given a story and its premise. Your task is to determine whether the story is faithful to the premise or not. To be considered unfaithful, the story must contain elements that make it completely unrelated to the premise. Here are some popular types of unfaithfulness: 1. Ignoring or misinterpretating the premise: Key plot events, c… view at source ↗

**Figure 19.** Figure 19: Prompt for judging text relevance 35 [PITH_FULL_IMAGE:figures/full_fig_p035_19.png] view at source ↗

**Figure 20.** Figure 20: Prompt for Claude-as-a-judge, adapted from ( [PITH_FULL_IMAGE:figures/full_fig_p036_20.png] view at source ↗

**Figure 21.** Figure 21: Prompt for generation Prompt for generation revise This story contains way too much of your own writing! It’s not even close to {verbatim_perc}% snippet use. Can you edit your story as needed to get much closer to the {verbatim_perc}% threshold? Output only the edited story [PITH_FULL_IMAGE:figures/full_fig_p037_21.png] view at source ↗

**Figure 22.** Figure 22: Prompt for generation revision 37 [PITH_FULL_IMAGE:figures/full_fig_p037_22.png] view at source ↗

**Figure 23.** Figure 23: Prompt for editing the first draft of Frankentexts 38 [PITH_FULL_IMAGE:figures/full_fig_p038_23.png] view at source ↗

**Figure 24.** Figure 24: Prompt for nonfiction generation Prompt for nonfiction generation revise This news article contains way too much of your own writing! It’s not even close to { verbatim_perc}% snippet use. Can you edit your news article as needed to get much closer to the {verbatim_perc}% threshold? Output only the edited news article [PITH_FULL_IMAGE:figures/full_fig_p039_24.png] view at source ↗

**Figure 25.** Figure 25: Prompt for nonfiction generation revise 39 [PITH_FULL_IMAGE:figures/full_fig_p039_25.png] view at source ↗

**Figure 26.** Figure 26: Prompt for nonfiction edit Prompt for generating vanilla stories Your task is to write a coherent and engaging story based on the provided writing prompt. Your story should contain approximately {num_words} words. First, read the writing prompt carefully: {writing_prompt} Next, write the corresponding story. You should only return the story text and not any other irrelevant details (e.g. chapter indicator… view at source ↗

**Figure 27.** Figure 27: Prompt for generating vanilla stories 40 [PITH_FULL_IMAGE:figures/full_fig_p040_27.png] view at source ↗

read the original abstract

We introduce Frankentexts, a long-form narrative generation paradigm that treats an LLM as a composer of existing texts rather than as an author. Given a writing prompt and thousands of randomly sampled human-written snippets, the model is asked to produce a narrative under the extreme constraint that most tokens (e.g., 90%) must be copied verbatim from the provided paragraphs. This task is effectively intractable for humans: selecting and ordering snippets yields a combinatorial search space that an LLM implicitly explores, before minimally editing and stitching together selected fragments into a coherent long-form story. Despite the extreme challenge of the task, we observe through extensive automatic and human evaluation that Frankentexts significantly improve over vanilla LLM generations in terms of writing quality, diversity, and originality while remaining coherent and relevant to the prompt. Furthermore, Frankentexts pose a fundamental challenge to detectors of AI-generated text: 72% of Frankentexts produced by our best Gemini 2.5 Pro configuration are misclassified as human-written by Pangram, a state-of-the-art detector. Human annotators praise Frankentexts for their inventive premises, vivid descriptions, and dry humor; on the other hand, they identify issues with abrupt tonal shifts and uneven grammar across segments, particularly in longer pieces. The emergence of high-quality Frankentexts raises serious questions about authorship and copyright: when humans provide the raw materials and LLMs orchestrate them into new narratives, who truly owns the result?

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper introduces Frankentexts, a long-form narrative generation method in which an LLM is prompted to select, order, and minimally stitch thousands of randomly sampled human-written text snippets into a coherent story, subject to an explicit constraint that the great majority of output tokens (target ~90%) must be copied verbatim from the provided fragments. The central empirical claims are that the resulting texts outperform standard LLM generations on writing quality, diversity, and originality while preserving coherence and prompt relevance, and that they substantially evade state-of-the-art AI-text detectors (72% misclassification rate by Pangram for the best Gemini 2.5 Pro configuration).

Significance. If the results are robust, the work demonstrates that LLMs can implicitly solve a large-scale combinatorial selection-and-ordering task over noisy human fragments while satisfying a strict verbatim-copy constraint. This has direct implications for constrained generation, hybrid human-AI authorship, copyright questions, and the reliability of current detectors. The combination of automatic metrics, human judgments, and a concrete detector-evasion result supplies a falsifiable empirical contribution.

major comments (1)

[Method / Experimental Setup] Method / Experimental Setup (exact section not numbered in abstract but referenced via the 90% verbatim instruction): the manuscript states that the LLM is instructed to copy most tokens verbatim yet reports no post-generation verification of the realized copy rate. No LCS, n-gram overlap, or normalized edit-distance statistics against the input snippet pool are provided. Because the quality, diversity, and 72% detector-evasion claims rest on the premise that the model is performing true stitching rather than ordinary generation with incidental fragment insertion, the absence of this measurement is load-bearing for the central interpretation.

minor comments (2)

[Evaluation] Human evaluation protocol: specify the exact number of annotators, inter-annotator agreement (e.g., Krippendorff’s alpha or Fleiss’ kappa), and the precise rating scales used for coherence, originality, and tonal-shift judgments.
[Results] Detector result: state the total number of Frankentexts evaluated for the 72% Pangram figure, the exact prompt template and snippet count for the “best Gemini 2.5 Pro configuration,” and whether the same detector was run on the vanilla-LLM baseline for direct comparison.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review and for highlighting the importance of verifying the core mechanism underlying Frankentext generation. We address the single major comment below and will revise the manuscript accordingly to strengthen the empirical foundation of our claims.

read point-by-point responses

Referee: the manuscript states that the LLM is instructed to copy most tokens verbatim yet reports no post-generation verification of the realized copy rate. No LCS, n-gram overlap, or normalized edit-distance statistics against the input snippet pool are provided. Because the quality, diversity, and 72% detector-evasion claims rest on the premise that the model is performing true stitching rather than ordinary generation with incidental fragment insertion, the absence of this measurement is load-bearing for the central interpretation.

Authors: We agree that post-generation verification of the verbatim copy rate is necessary to confirm that the outputs result from the intended large-scale stitching process under the explicit constraint rather than from standard unconstrained generation. The submitted manuscript relied on the prompt's explicit instruction to copy ~90% of tokens verbatim and on qualitative inspection of samples, but did not report quantitative overlap statistics. In the revised version we will add a dedicated analysis (in the Experiments section or Appendix) that computes longest common subsequence (LCS) lengths, normalized edit distance, and n-gram overlap (e.g., 5-grams) between each generated Frankentext and the full pool of input snippets. This will quantify the realized copy rate, demonstrate adherence to the constraint, and directly support the interpretation of the quality, diversity, and detector-evasion results. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical task and evaluation

full rationale

The paper introduces Frankentexts as an empirical generation paradigm and supports its claims of improved quality, diversity, originality, coherence, and detector evasion through automatic metrics and human evaluation. No mathematical derivations, equations, fitted parameters, or predictions are presented that reduce to inputs by construction. The task definition (copying ~90% tokens from random snippets) and results are externally verifiable via the described experiments rather than self-referential. This is a standard non-circular empirical systems paper with independent content in its evaluations.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the LLM's ability to perform implicit combinatorial search over snippet selection and ordering under a strict copy constraint; no complex mathematical models or additional fitted constants beyond the task definition are introduced.

free parameters (1)

verbatim copy rate = 90%
The 90% threshold is chosen by the authors to create an extreme stitching constraint.

axioms (1)

domain assumption LLMs can implicitly explore the combinatorial space of selecting and ordering thousands of random snippets to form coherent narratives
This capability is presupposed for the model to succeed at the stated task.

invented entities (1)

Frankentext no independent evidence
purpose: A long-form narrative generated under the verbatim-stitching constraint
New term and generation paradigm introduced by the authors.

pith-pipeline@v0.9.0 · 5800 in / 1454 out tokens · 64598 ms · 2026-05-19T12:52:29.992036+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Given a writing prompt and thousands of randomly sampled human-written snippets, the model is asked to produce a narrative under the extreme constraint that most tokens (e.g., 90%) must be copied verbatim
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Frankentexts improve over vanilla LLM generations in key writing quality metrics such as diversity and novelty

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 2 internal anchors

[1]

All that‘s ‘human’ is not gold: Evaluating human evaluation of generated text. InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 7282–7296, Online. Association for Computational Linguistics. Isaac David and Arth...

work page arXiv 2025
[2]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Deepseek-r1: Incentivizing reasoning capa- bility in llms via reinforcement learning.Preprint, arXiv:2501.12948. Liam Dugan, Alyssa Hwang, Filip Trhlík, Andrew Zhu, Josh Magnus Ludan, Hainiu Xu, Daphne Ip- polito, and Chris Callison-Burch. 2024. RAID: A shared benchmark for robust evaluation of machine- generated text detectors. InProceedings of the 62nd ...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[3]

InProceedings of the 41st International Conference on Machine Learning, ICML’24

Spotting llms with binoculars: zero-shot detection of machine-generated text. InProceedings of the 41st International Conference on Machine Learning, ICML’24. JMLR.org. Jessica He, Stephanie Houde, and Justin D. Weisz. 2025. Which contributions deserve credit? perceptions of attribution in human-ai co-creation. InProceedings of the 2025 CHI Conference on ...

work page arXiv 2025
[4]

Outfox: Llm-generated essay detection through in-context learning with adversarially gen- erated examples. InProceedings of the Thirty- Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applica- tions of Artificial Intelligence and Fourteenth Sym- posium on Educational Advances in Artificial Intelli- gence, AAAI’2...

work page arXiv 2021
[5]

AI use in American newspapers is widespread, uneven, and rarely disclosed

Towards coherent and consistent use of entities in narrative generation. InInternational Conference on Machine Learning, pages 17278–17294. PMLR. Chau Minh Pham, Simeng Sun, and Mohit Iyyer. 2024. Suri: Multi-constraint instruction following in long- form text generation. InFindings of the Association for Computational Linguistics: EMNLP 2024, pages 1722–...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[6]

stitched

Skywork: A more open bilingual foundation model.Preprint, arXiv:2310.19341. 14 Yuhao Wu, Ming Shan Hee, Zhiqiang Hu, and Roy Ka-Wei Lee. 2025. Longgenbench: Benchmark- ing long-form generation in long context LLMs. In The Thirteenth International Conference on Learning Representations. Zhuohan Xie, Trevor Cohn, and Jey Han Lau. 2023. The next chapter: A s...

work page arXiv 2025
[7]

search – query the FAISS semantic index

work page
[8]

For each prompt iteration, you must make at least 20 calls to the MCP server to get enough materials to write a story

fetch – retrieve the full excerpt/passage for a selected result. For each prompt iteration, you must make at least 20 calls to the MCP server to get enough materials to write a story. Figure 13: System prompt for MCP calls Copy Rate Average AI Fraction 0 10 20 30 40 50 60 70 80 90 Percentage (%) 74.8% 15.5% 68.4% 23.1% r/WritingPrompts T ell me a story Fi...

work page
[9]

Plot/Event Incoherence: Events that happen without believable causes or effects, or an outcome contradicts earlier set-ups

work page
[10]

Character Incoherence: A character’s characteristics (personality, knowledge, or abilities) and actions suddenly change without explanations

work page
[11]

Spatial Incoherence: The physical layout of settings (rooms, cities, or worlds) changes suddenly

work page
[12]

Thematic Incoherence: Central messages clash or disappear; symbolism introduced early never pays off, themes collide, The mood, register, or genre conventions shift without motivation

work page
[13]

First, read the story: {story} Answer TRUE if the story is coherent

Surface-Level Incoherence: Pronouns, tense, narrative voice, or names flip mid-sentence; repeated or missing words; malformed sentences. First, read the story: {story} Answer TRUE if the story is coherent. Answer FALSE if the story is incoherent, i.e. contains issues that, if left unresolved, significantly affect the reader’s ability to understand the mai...

work page
[14]

Ignoring or misinterpretating the premise: Key plot events, characters, or settings required by the premise are not included or falsely represented in the story

work page
[15]

Hallucinating details that contradict the premise: The story introduces details that make the premise impossible

work page
[16]

First, read the premise: {writing_prompt} Next, read the story: {story} Answer TRUE if the story is faithful to the premise

Failure to maintain the specified tones, genres, or other constraints: The story do not use the surface-level constraints (correct tones, genres, point of views, length, etc.), as required by the premise. First, read the premise: {writing_prompt} Next, read the story: {story} Answer TRUE if the story is faithful to the premise. Answer FALSE if the story c...

work page
[17]

Penalize neat, overly structured, or cinematic arcs that feel artificial or generic

Plot: Favor stories with surprising turns and creative structures. Penalize neat, overly structured, or cinematic arcs that feel artificial or generic

work page
[18]

Penalize reliance on cliches, tropes, or smooth but unremarkable devices

Creativity: Reward originality of perspective, voice, and risk-taking. Penalize reliance on cliches, tropes, or smooth but unremarkable devices

work page
[19]

Do not reward over-explained or archetypal development

Development: Characters and settings should feel psychologically complex. Do not reward over-explained or archetypal development

work page
[20]

Penalize polished, ornamental, or overly literary prose that feels mechanical or detached

Language Use: Prefer authentic, striking, and emotionally charged expression, even if rough , fragmented, or unusual. Penalize polished, ornamental, or overly literary prose that feels mechanical or detached. Provide a detailed assessment of the story in terms of these four dimensions. Conclude your assessment with scores using the template below. Do not ...

work page 2025
[22]

Many snippets are from serious dramas, historical fiction, or thrillers, so careful selection and modification will be needed

Select snippets that can be woven together to create a coherent narrative fitting the prompt. Many snippets are from serious dramas, historical fiction, or thrillers, so careful selection and modification will be needed. Consider all provided snippets before moving onto the next step

work page
[24]

Combine the snippets into a narrative, adding or changing words (bolded) if necessary for coherence

work page
[25]

Ensure that you do not have story beats that are primarily written by yourself (i.e., every story beat should consist mainly of text taken from snippets)

work page
[27]

Do not output story title or any irrelevant details

work page
[28]

NO EDITS

Review the final story for adherence to the ~{verbatim_perc}% rule and coherence, and edit it if you have produced too many tokens of your own or if the story is too incoherent. # Writing prompt: {writing_prompt} # Snippets: {snippets} Figure 21: Prompt for generation Prompt for generation revise This story contains way too much of your own writing! It’s ...

work page
[29]

Read the generated story and writing prompt to understand the established context, plot, characters, and tone

work page
[32]

Keep additions minimal, but feel free to delete larger spans ( phrases, sentences, paragraphs, etc.) whenever material is irrelevant or incoherent

Implement the changes. Keep additions minimal, but feel free to delete larger spans ( phrases, sentences, paragraphs, etc.) whenever material is irrelevant or incoherent

work page
[33]

Review the final story for coherence adherence to the ~{verbatim_perc}% rule and coherence, and edit it if you have produced too many tokens of your own or if the story is too incoherent

work page
[34]

NO EDITS

Output the edited writing and no other details. If there is no edit to be made, output "NO EDITS" Figure 23: Prompt for editing the first draft ofFrankentexts 38 Prompt for nonfiction generation You’re writing a news article by repurposing a provided collection of snippets from other stories. Your news article will only be accepted for publication if it i...

work page
[35]

Read through the prompt and snippets carefully to understand the tone and available material

work page
[36]

Many snippets are from serious dramas, historical fiction, or thrillers, so careful selection and modification will be needed

Select snippets that can be woven together to create a coherent and factual narrative fitting the prompt. Many snippets are from serious dramas, historical fiction, or thrillers, so careful selection and modification will be needed. Consider all provided snippets before moving onto the next step

work page
[37]

Ensure character names, descriptions (like height), and actions align with the prompt

Modify the chosen snippets, bolding all changes. Ensure character names, descriptions (like height), and actions align with the prompt

work page
[38]

Combine the snippets into a narrative, adding or changing words (bolded) if necessary for coherence and factuality

work page
[39]

Ensure that you do not have news article beats that are primarily written by yourself (i.e ., every news article beat should consist mainly of text taken from snippets)

work page
[40]

Track the word count, aiming for around {num_words} words

work page
[41]

Do not output news article title or any irrelevant details

work page
[42]

NO EDITS

Review the final news article for adherence to the ~{verbatim_perc}% rule, factuality and coherence, and edit it if you have produced too many tokens of your own or if the news article is too incoherent or non-factual. # Writing prompt: {writing_prompt} # Snippets: {snippets} Figure 24: Prompt for nonfiction generation Prompt for nonfiction generation rev...

work page
[43]

Read the generated news article and writing prompt to understand the established context, plot, characters, and tone

work page
[44]

For each sentence in the text, identify the specific spans of inconsistency within the generated text

work page
[45]

- Contradictions: Information that conflicts with other details within the text (e.g., character traits, setting descriptions, established facts)

Identify minimal edits needed to correct these inconsistencies while respecting the { verbatim_perc}% rule. - Contradictions: Information that conflicts with other details within the text (e.g., character traits, setting descriptions, established facts). - Continuity errors: Actions or details that conflict with the established timeline or sequence of eve...

work page
[46]

Keep additions minimal, but feel free to delete larger spans ( phrases, sentences, paragraphs, etc.) whenever material is irrelevant, incoherent, or non- factual

Implement the changes. Keep additions minimal, but feel free to delete larger spans ( phrases, sentences, paragraphs, etc.) whenever material is irrelevant, incoherent, or non- factual

work page
[47]

Review the final news article for coherence adherence to the ~{verbatim_perc}% rule and coherence, and edit it if you have produced too many tokens of your own or if the news article is too incoherent or non-factual

work page
[48]

NO EDITS

Output the edited writing and no other details. If there is no edit to be made, output "NO EDITS". Figure 26: Prompt for nonfiction edit Prompt for generating vanilla stories Your task is to write a coherent and engaging story based on the provided writing prompt. Your story should contain approximately {num_words} words. First, read the writing prompt ca...

work page

[1] [1]

All that‘s ‘human’ is not gold: Evaluating human evaluation of generated text. InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 7282–7296, Online. Association for Computational Linguistics. Isaac David and Arth...

work page arXiv 2025

[2] [2]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Deepseek-r1: Incentivizing reasoning capa- bility in llms via reinforcement learning.Preprint, arXiv:2501.12948. Liam Dugan, Alyssa Hwang, Filip Trhlík, Andrew Zhu, Josh Magnus Ludan, Hainiu Xu, Daphne Ip- polito, and Chris Callison-Burch. 2024. RAID: A shared benchmark for robust evaluation of machine- generated text detectors. InProceedings of the 62nd ...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[3] [3]

InProceedings of the 41st International Conference on Machine Learning, ICML’24

Spotting llms with binoculars: zero-shot detection of machine-generated text. InProceedings of the 41st International Conference on Machine Learning, ICML’24. JMLR.org. Jessica He, Stephanie Houde, and Justin D. Weisz. 2025. Which contributions deserve credit? perceptions of attribution in human-ai co-creation. InProceedings of the 2025 CHI Conference on ...

work page arXiv 2025

[4] [4]

Outfox: Llm-generated essay detection through in-context learning with adversarially gen- erated examples. InProceedings of the Thirty- Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applica- tions of Artificial Intelligence and Fourteenth Sym- posium on Educational Advances in Artificial Intelli- gence, AAAI’2...

work page arXiv 2021

[5] [5]

AI use in American newspapers is widespread, uneven, and rarely disclosed

Towards coherent and consistent use of entities in narrative generation. InInternational Conference on Machine Learning, pages 17278–17294. PMLR. Chau Minh Pham, Simeng Sun, and Mohit Iyyer. 2024. Suri: Multi-constraint instruction following in long- form text generation. InFindings of the Association for Computational Linguistics: EMNLP 2024, pages 1722–...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[6] [6]

stitched

Skywork: A more open bilingual foundation model.Preprint, arXiv:2310.19341. 14 Yuhao Wu, Ming Shan Hee, Zhiqiang Hu, and Roy Ka-Wei Lee. 2025. Longgenbench: Benchmark- ing long-form generation in long context LLMs. In The Thirteenth International Conference on Learning Representations. Zhuohan Xie, Trevor Cohn, and Jey Han Lau. 2023. The next chapter: A s...

work page arXiv 2025

[7] [7]

search – query the FAISS semantic index

work page

[8] [8]

For each prompt iteration, you must make at least 20 calls to the MCP server to get enough materials to write a story

fetch – retrieve the full excerpt/passage for a selected result. For each prompt iteration, you must make at least 20 calls to the MCP server to get enough materials to write a story. Figure 13: System prompt for MCP calls Copy Rate Average AI Fraction 0 10 20 30 40 50 60 70 80 90 Percentage (%) 74.8% 15.5% 68.4% 23.1% r/WritingPrompts T ell me a story Fi...

work page

[9] [9]

Plot/Event Incoherence: Events that happen without believable causes or effects, or an outcome contradicts earlier set-ups

work page

[10] [10]

Character Incoherence: A character’s characteristics (personality, knowledge, or abilities) and actions suddenly change without explanations

work page

[11] [11]

Spatial Incoherence: The physical layout of settings (rooms, cities, or worlds) changes suddenly

work page

[12] [12]

Thematic Incoherence: Central messages clash or disappear; symbolism introduced early never pays off, themes collide, The mood, register, or genre conventions shift without motivation

work page

[13] [13]

First, read the story: {story} Answer TRUE if the story is coherent

Surface-Level Incoherence: Pronouns, tense, narrative voice, or names flip mid-sentence; repeated or missing words; malformed sentences. First, read the story: {story} Answer TRUE if the story is coherent. Answer FALSE if the story is incoherent, i.e. contains issues that, if left unresolved, significantly affect the reader’s ability to understand the mai...

work page

[14] [14]

Ignoring or misinterpretating the premise: Key plot events, characters, or settings required by the premise are not included or falsely represented in the story

work page

[15] [15]

Hallucinating details that contradict the premise: The story introduces details that make the premise impossible

work page

[16] [16]

First, read the premise: {writing_prompt} Next, read the story: {story} Answer TRUE if the story is faithful to the premise

Failure to maintain the specified tones, genres, or other constraints: The story do not use the surface-level constraints (correct tones, genres, point of views, length, etc.), as required by the premise. First, read the premise: {writing_prompt} Next, read the story: {story} Answer TRUE if the story is faithful to the premise. Answer FALSE if the story c...

work page

[17] [17]

Penalize neat, overly structured, or cinematic arcs that feel artificial or generic

Plot: Favor stories with surprising turns and creative structures. Penalize neat, overly structured, or cinematic arcs that feel artificial or generic

work page

[18] [18]

Penalize reliance on cliches, tropes, or smooth but unremarkable devices

Creativity: Reward originality of perspective, voice, and risk-taking. Penalize reliance on cliches, tropes, or smooth but unremarkable devices

work page

[19] [19]

Do not reward over-explained or archetypal development

Development: Characters and settings should feel psychologically complex. Do not reward over-explained or archetypal development

work page

[20] [20]

Penalize polished, ornamental, or overly literary prose that feels mechanical or detached

Language Use: Prefer authentic, striking, and emotionally charged expression, even if rough , fragmented, or unusual. Penalize polished, ornamental, or overly literary prose that feels mechanical or detached. Provide a detailed assessment of the story in terms of these four dimensions. Conclude your assessment with scores using the template below. Do not ...

work page 2025

[21] [22]

Many snippets are from serious dramas, historical fiction, or thrillers, so careful selection and modification will be needed

Select snippets that can be woven together to create a coherent narrative fitting the prompt. Many snippets are from serious dramas, historical fiction, or thrillers, so careful selection and modification will be needed. Consider all provided snippets before moving onto the next step

work page

[22] [24]

Combine the snippets into a narrative, adding or changing words (bolded) if necessary for coherence

work page

[23] [25]

Ensure that you do not have story beats that are primarily written by yourself (i.e., every story beat should consist mainly of text taken from snippets)

work page

[24] [27]

Do not output story title or any irrelevant details

work page

[25] [28]

NO EDITS

Review the final story for adherence to the ~{verbatim_perc}% rule and coherence, and edit it if you have produced too many tokens of your own or if the story is too incoherent. # Writing prompt: {writing_prompt} # Snippets: {snippets} Figure 21: Prompt for generation Prompt for generation revise This story contains way too much of your own writing! It’s ...

work page

[26] [29]

Read the generated story and writing prompt to understand the established context, plot, characters, and tone

work page

[27] [32]

Keep additions minimal, but feel free to delete larger spans ( phrases, sentences, paragraphs, etc.) whenever material is irrelevant or incoherent

Implement the changes. Keep additions minimal, but feel free to delete larger spans ( phrases, sentences, paragraphs, etc.) whenever material is irrelevant or incoherent

work page

[28] [33]

Review the final story for coherence adherence to the ~{verbatim_perc}% rule and coherence, and edit it if you have produced too many tokens of your own or if the story is too incoherent

work page

[29] [34]

NO EDITS

Output the edited writing and no other details. If there is no edit to be made, output "NO EDITS" Figure 23: Prompt for editing the first draft ofFrankentexts 38 Prompt for nonfiction generation You’re writing a news article by repurposing a provided collection of snippets from other stories. Your news article will only be accepted for publication if it i...

work page

[30] [35]

Read through the prompt and snippets carefully to understand the tone and available material

work page

[31] [36]

Many snippets are from serious dramas, historical fiction, or thrillers, so careful selection and modification will be needed

Select snippets that can be woven together to create a coherent and factual narrative fitting the prompt. Many snippets are from serious dramas, historical fiction, or thrillers, so careful selection and modification will be needed. Consider all provided snippets before moving onto the next step

work page

[32] [37]

Ensure character names, descriptions (like height), and actions align with the prompt

Modify the chosen snippets, bolding all changes. Ensure character names, descriptions (like height), and actions align with the prompt

work page

[33] [38]

Combine the snippets into a narrative, adding or changing words (bolded) if necessary for coherence and factuality

work page

[34] [39]

Ensure that you do not have news article beats that are primarily written by yourself (i.e ., every news article beat should consist mainly of text taken from snippets)

work page

[35] [40]

Track the word count, aiming for around {num_words} words

work page

[36] [41]

Do not output news article title or any irrelevant details

work page

[37] [42]

NO EDITS

Review the final news article for adherence to the ~{verbatim_perc}% rule, factuality and coherence, and edit it if you have produced too many tokens of your own or if the news article is too incoherent or non-factual. # Writing prompt: {writing_prompt} # Snippets: {snippets} Figure 24: Prompt for nonfiction generation Prompt for nonfiction generation rev...

work page

[38] [43]

Read the generated news article and writing prompt to understand the established context, plot, characters, and tone

work page

[39] [44]

For each sentence in the text, identify the specific spans of inconsistency within the generated text

work page

[40] [45]

- Contradictions: Information that conflicts with other details within the text (e.g., character traits, setting descriptions, established facts)

Identify minimal edits needed to correct these inconsistencies while respecting the { verbatim_perc}% rule. - Contradictions: Information that conflicts with other details within the text (e.g., character traits, setting descriptions, established facts). - Continuity errors: Actions or details that conflict with the established timeline or sequence of eve...

work page

[41] [46]

Keep additions minimal, but feel free to delete larger spans ( phrases, sentences, paragraphs, etc.) whenever material is irrelevant, incoherent, or non- factual

Implement the changes. Keep additions minimal, but feel free to delete larger spans ( phrases, sentences, paragraphs, etc.) whenever material is irrelevant, incoherent, or non- factual

work page

[42] [47]

Review the final news article for coherence adherence to the ~{verbatim_perc}% rule and coherence, and edit it if you have produced too many tokens of your own or if the news article is too incoherent or non-factual

work page

[43] [48]

NO EDITS

Output the edited writing and no other details. If there is no edit to be made, output "NO EDITS". Figure 26: Prompt for nonfiction edit Prompt for generating vanilla stories Your task is to write a coherent and engaging story based on the provided writing prompt. Your story should contain approximately {num_words} words. First, read the writing prompt ca...

work page