pith. sign in

arxiv: 2604.22142 · v1 · submitted 2026-04-24 · 💻 cs.CL · cs.CY

Voice Under Revision: Large Language Models and the Normalization of Personal Narrative

Pith reviewed 2026-05-08 12:09 UTC · model grok-4.3

classification 💻 cs.CL cs.CY
keywords large language modelsstylistic normalizationpersonal narrativesvoicecomputational stylisticsdigital humanitiestext rewritingnarrative texture
0
0 comments X

The pith

Large language models rewrite personal narratives toward a more polished, less personal style across models and prompts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how frontier LLMs alter personal narratives when asked to improve, rewrite, or preserve voice. It measures shifts using 13 linguistic markers such as function words, contractions, first-person pronouns, vocabulary diversity, word length, and punctuation. Across three models and all prompt types, rewrites consistently reduce markers of situated, informal voice while increasing elaboration and abstraction. Even voice-preserving instructions only weaken the effect without reversing its direction, and rewritten texts converge stylometrically, becoming harder to trace to their originals. The author argues this normalization is a form of textual mediation that affects how style, authorship, and narrative texture are analyzed in digital humanities.

Core claim

Across models and prompt conditions, LLM rewriting produces a consistent pattern of stylistic normalization. Function words, contractions, and first-person pronouns decrease, while vocabulary diversity, word length, and punctuation elaboration increase. These shifts occur whether the prompt asks the model to improve the text or simply to rewrite it. Voice-preserving prompts reduce the magnitude of the changes but do not eliminate their direction. Stylometric analysis shows that rewritten texts converge in feature space and become harder to match back to their source texts. Additional narrative markers indicate a shift from embedded to distanced narration, and from explicit causal reasoning,

What carries the argument

Stylometric measurement of 13 computational stylistics markers tracking changes in function words, pronouns, contractions, lexical diversity, and punctuation to detect directional normalization in narrative voice.

If this is right

  • Rewritten texts converge in stylometric feature space and become harder to attribute to their original sources.
  • Narrative style shifts from embedded, situated accounts toward distanced, abstract ones.
  • Markers such as function words, pronouns, contractions, and punctuation lose reliability as evidence for voice, authorship, and corpus integrity in computational text analysis.
  • LLM revision functions as textual mediation rather than neutral editing, with direct consequences for digital humanities scholarship.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Archives or datasets containing LLM-revised personal stories may require new attribution methods to avoid misidentifying normalized texts as original authorial voices.
  • Writers using LLMs for editing personal essays or memoirs could unintentionally lose distinctive markers of their individual style.
  • Studies of style in large corpora should screen for AI-revised content to prevent the normalization effect from skewing results on authorship or period-specific features.

Load-bearing premise

That observed changes in the chosen linguistic markers reliably reflect alterations in voice and narrative texture caused by the LLMs rather than by differences in prompt wording, text length, or model-specific artifacts.

What would settle it

A comparison of originals and rewrites across multiple models and prompts that shows no consistent directional decrease in first-person pronouns and function words or no increase in lexical diversity would falsify the normalization claim.

Figures

Figures reproduced from arXiv: 2604.22142 by Tom van Nuenen.

Figure 1
Figure 1. Figure 1: Radar plots showing normalization pattern across 13 markers for each prompt condition. view at source ↗
Figure 2
Figure 2. Figure 2: Normalization effects across 13 linguistic markers. (A) Signed Cohen’s view at source ↗
Figure 3
Figure 3. Figure 3: Stylometric convergence under LLM rewriting. (A) PCA projection based on character view at source ↗
Figure 4
Figure 4. Figure 4: Narrative stance shift under LLM rewriting. Bars show signed Cohen’s view at source ↗
read the original abstract

This study examines how large language model rewriting alters the style and narrative texture of personal narratives. It analyzes 300 personal narratives rewritten by three frontier LLMs under three prompt conditions: generic improvement, rewrite-only, and voice-preserving revision. Change is measured across 13 linguistic markers drawn from computational stylistics, including function words, vocabulary diversity, word length, punctuation, contractions, first-person pronouns, and emotion words. Across models and prompt conditions, LLM rewriting produces a consistent pattern of stylistic normalization. Function words, contractions, and first-person pronouns decrease, while vocabulary diversity, word length, and punctuation elaboration increase. These shifts occur whether the prompt asks the model to "improve" the text or simply to "rewrite" it. Voice-preserving prompts reduce the magnitude of the changes but do not eliminate their direction. Stylometric analysis shows that rewritten texts converge in feature space and become harder to match back to their source texts. Additional narrative markers indicate a shift from embedded to distanced narration, and from explicit causal reasoning to compressed abstraction. The findings suggest that contemporary LLMs exert a directional pull toward a more polished, less situated register. This has consequences for digital humanities and computational text analysis, where features such as function words, pronouns, contractions, and punctuation often serve as evidence for style, voice, authorship, and corpus integrity. LLM revision should therefore be understood not merely as surface-level editing, but as a consequential form of textual mediation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. This paper examines how large language model rewriting alters the style and narrative texture of personal narratives. It analyzes 300 personal narratives rewritten by three frontier LLMs under three prompt conditions (generic improvement, rewrite-only, voice-preserving) and measures change across 13 linguistic markers from computational stylistics. The central claim is that LLM rewriting produces consistent stylistic normalization: decreases in function words, contractions, and first-person pronouns, with increases in vocabulary diversity, word length, and punctuation elaboration. Voice-preserving prompts attenuate but do not reverse the direction. Rewritten texts converge in feature space, become harder to attribute to sources, and shift toward distanced narration and compressed abstraction, with implications for digital humanities and computational text analysis.

Significance. If the central claim holds after addressing confounds, the work would be significant for computational linguistics and digital humanities by documenting a directional pull of frontier LLMs toward a polished register in personal narratives. Strengths include the scale (300 texts), consistency across three models and three prompt conditions, and the observational design that avoids parameter fitting or circularity with prior self-citations. The use of established stylometric markers and the reported convergence/harder matching provide falsifiable patterns relevant to authorship attribution and corpus integrity.

major comments (2)
  1. [Abstract] Abstract: The reported increases in vocabulary diversity, average word length, and punctuation elaboration are length-sensitive metrics (type-token ratios and punctuation density typically rise with longer texts). The abstract gives no indication that token counts were matched between originals and rewrites, normalized (e.g., per 1,000 words), or entered as covariates. If LLMs systematically produce longer outputs under all prompt conditions, these headline shifts could be artifacts of length rather than LLM stylistic normalization.
  2. [Abstract] Abstract: The support for consistent directional changes across models and prompt conditions is weakened by the absence of details on statistical tests, controls for confounds such as text length, inter-annotator reliability (if manual coding was involved), and exact exclusion criteria for the 300 texts. These omissions make it difficult to assess whether the observed patterns are robust or driven by unaccounted variables.
minor comments (1)
  1. [Abstract] Abstract: The phrase 'additional narrative markers' for shifts from embedded to distanced narration and explicit causal reasoning to compressed abstraction is mentioned but not enumerated, which reduces clarity for readers unfamiliar with the specific operationalizations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for these constructive comments. We agree that the abstract requires greater clarity on potential length confounds and methodological details to strengthen the presentation of our findings. We respond to each point below and will make the indicated revisions.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The reported increases in vocabulary diversity, average word length, and punctuation elaboration are length-sensitive metrics (type-token ratios and punctuation density typically rise with longer texts). The abstract gives no indication that token counts were matched between originals and rewrites, normalized (e.g., per 1,000 words), or entered as covariates. If LLMs systematically produce longer outputs under all prompt conditions, these headline shifts could be artifacts of length rather than LLM stylistic normalization.

    Authors: We thank the referee for highlighting this important potential confound. We will normalize all length-sensitive metrics (such as type-token ratio and punctuation density) per 1,000 tokens and include text length as a covariate in our statistical models. The abstract will be revised to explicitly note these controls and to indicate that the reported directional shifts were assessed after accounting for length differences. revision: yes

  2. Referee: [Abstract] Abstract: The support for consistent directional changes across models and prompt conditions is weakened by the absence of details on statistical tests, controls for confounds such as text length, inter-annotator reliability (if manual coding was involved), and exact exclusion criteria for the 300 texts. These omissions make it difficult to assess whether the observed patterns are robust or driven by unaccounted variables.

    Authors: We agree that additional methodological transparency is needed in the abstract. We will revise the abstract to briefly describe the statistical tests used, the inclusion of text length and other confounds as controls, the fact that all 13 markers were derived via automated computational tools (rendering inter-annotator reliability inapplicable), and the exclusion criteria applied when selecting the 300 texts. These details will also be expanded in the methods section of the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity; observational empirical analysis using predefined markers

full rationale

The paper conducts an empirical study by rewriting 300 personal narratives with three LLMs under three prompt conditions and measuring shifts across 13 fixed linguistic markers drawn from existing computational stylistics literature. No equations, parameters, or derivations are present that reduce the reported normalization pattern to fitted inputs or self-definitions by construction. Central claims rest on direct observation of directional changes in function words, pronouns, diversity, etc., without any load-bearing self-citation chains or ansatzes that would make results equivalent to inputs. The analysis is self-contained against external benchmarks as it applies standard, non-fitted stylometric features to new data.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard domain assumptions in computational stylistics rather than new free parameters or invented entities.

axioms (1)
  • domain assumption The 13 linguistic markers (function words, vocabulary diversity, word length, punctuation, contractions, first-person pronouns, emotion words) are valid and sufficient indicators of changes in narrative voice and style.
    These markers are used throughout the measurement of stylistic normalization.

pith-pipeline@v0.9.0 · 5554 in / 1407 out tokens · 40725 ms · 2026-05-08T12:09:30.593374+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

2 extracted references · 1 canonical work pages

  1. [1]

    (1988).Variation Across Speech and Writing

    Biber, D. (1988).Variation Across Speech and Writing. Cambridge University Press. Biber, D., & Conrad, S. (2019).Register, Genre, and Style(2nd ed.). Cambridge University Press. Bruner, J. (1991). The narrative construction of reality.Critical Inquiry, 18(1), 1–21. Burrows, J. (2002). ‘Delta’: A measure of stylistic difference and a guide to likely author...

  2. [2]

    W., Boyd, R

    https://doi.org/10.1038/s41599-025-05986-3 Pennebaker, J. W., Boyd, R. L., Jordan, K., & Blackburn, K. (2015). The development and psychometric properties of LIWC2015. University of Texas at Austin. Pennebaker, J. W., & King, L. A. (1999). Linguistic styles: Language use as an individual difference. Journal of Personality and Social Psychology, 77(6), 129...