Voice Under Revision: Large Language Models and the Normalization of Personal Narrative
Pith reviewed 2026-05-08 12:09 UTC · model grok-4.3
The pith
Large language models rewrite personal narratives toward a more polished, less personal style across models and prompts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Across models and prompt conditions, LLM rewriting produces a consistent pattern of stylistic normalization. Function words, contractions, and first-person pronouns decrease, while vocabulary diversity, word length, and punctuation elaboration increase. These shifts occur whether the prompt asks the model to improve the text or simply to rewrite it. Voice-preserving prompts reduce the magnitude of the changes but do not eliminate their direction. Stylometric analysis shows that rewritten texts converge in feature space and become harder to match back to their source texts. Additional narrative markers indicate a shift from embedded to distanced narration, and from explicit causal reasoning,
What carries the argument
Stylometric measurement of 13 computational stylistics markers tracking changes in function words, pronouns, contractions, lexical diversity, and punctuation to detect directional normalization in narrative voice.
If this is right
- Rewritten texts converge in stylometric feature space and become harder to attribute to their original sources.
- Narrative style shifts from embedded, situated accounts toward distanced, abstract ones.
- Markers such as function words, pronouns, contractions, and punctuation lose reliability as evidence for voice, authorship, and corpus integrity in computational text analysis.
- LLM revision functions as textual mediation rather than neutral editing, with direct consequences for digital humanities scholarship.
Where Pith is reading between the lines
- Archives or datasets containing LLM-revised personal stories may require new attribution methods to avoid misidentifying normalized texts as original authorial voices.
- Writers using LLMs for editing personal essays or memoirs could unintentionally lose distinctive markers of their individual style.
- Studies of style in large corpora should screen for AI-revised content to prevent the normalization effect from skewing results on authorship or period-specific features.
Load-bearing premise
That observed changes in the chosen linguistic markers reliably reflect alterations in voice and narrative texture caused by the LLMs rather than by differences in prompt wording, text length, or model-specific artifacts.
What would settle it
A comparison of originals and rewrites across multiple models and prompts that shows no consistent directional decrease in first-person pronouns and function words or no increase in lexical diversity would falsify the normalization claim.
Figures
read the original abstract
This study examines how large language model rewriting alters the style and narrative texture of personal narratives. It analyzes 300 personal narratives rewritten by three frontier LLMs under three prompt conditions: generic improvement, rewrite-only, and voice-preserving revision. Change is measured across 13 linguistic markers drawn from computational stylistics, including function words, vocabulary diversity, word length, punctuation, contractions, first-person pronouns, and emotion words. Across models and prompt conditions, LLM rewriting produces a consistent pattern of stylistic normalization. Function words, contractions, and first-person pronouns decrease, while vocabulary diversity, word length, and punctuation elaboration increase. These shifts occur whether the prompt asks the model to "improve" the text or simply to "rewrite" it. Voice-preserving prompts reduce the magnitude of the changes but do not eliminate their direction. Stylometric analysis shows that rewritten texts converge in feature space and become harder to match back to their source texts. Additional narrative markers indicate a shift from embedded to distanced narration, and from explicit causal reasoning to compressed abstraction. The findings suggest that contemporary LLMs exert a directional pull toward a more polished, less situated register. This has consequences for digital humanities and computational text analysis, where features such as function words, pronouns, contractions, and punctuation often serve as evidence for style, voice, authorship, and corpus integrity. LLM revision should therefore be understood not merely as surface-level editing, but as a consequential form of textual mediation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This paper examines how large language model rewriting alters the style and narrative texture of personal narratives. It analyzes 300 personal narratives rewritten by three frontier LLMs under three prompt conditions (generic improvement, rewrite-only, voice-preserving) and measures change across 13 linguistic markers from computational stylistics. The central claim is that LLM rewriting produces consistent stylistic normalization: decreases in function words, contractions, and first-person pronouns, with increases in vocabulary diversity, word length, and punctuation elaboration. Voice-preserving prompts attenuate but do not reverse the direction. Rewritten texts converge in feature space, become harder to attribute to sources, and shift toward distanced narration and compressed abstraction, with implications for digital humanities and computational text analysis.
Significance. If the central claim holds after addressing confounds, the work would be significant for computational linguistics and digital humanities by documenting a directional pull of frontier LLMs toward a polished register in personal narratives. Strengths include the scale (300 texts), consistency across three models and three prompt conditions, and the observational design that avoids parameter fitting or circularity with prior self-citations. The use of established stylometric markers and the reported convergence/harder matching provide falsifiable patterns relevant to authorship attribution and corpus integrity.
major comments (2)
- [Abstract] Abstract: The reported increases in vocabulary diversity, average word length, and punctuation elaboration are length-sensitive metrics (type-token ratios and punctuation density typically rise with longer texts). The abstract gives no indication that token counts were matched between originals and rewrites, normalized (e.g., per 1,000 words), or entered as covariates. If LLMs systematically produce longer outputs under all prompt conditions, these headline shifts could be artifacts of length rather than LLM stylistic normalization.
- [Abstract] Abstract: The support for consistent directional changes across models and prompt conditions is weakened by the absence of details on statistical tests, controls for confounds such as text length, inter-annotator reliability (if manual coding was involved), and exact exclusion criteria for the 300 texts. These omissions make it difficult to assess whether the observed patterns are robust or driven by unaccounted variables.
minor comments (1)
- [Abstract] Abstract: The phrase 'additional narrative markers' for shifts from embedded to distanced narration and explicit causal reasoning to compressed abstraction is mentioned but not enumerated, which reduces clarity for readers unfamiliar with the specific operationalizations.
Simulated Author's Rebuttal
We are grateful to the referee for these constructive comments. We agree that the abstract requires greater clarity on potential length confounds and methodological details to strengthen the presentation of our findings. We respond to each point below and will make the indicated revisions.
read point-by-point responses
-
Referee: [Abstract] Abstract: The reported increases in vocabulary diversity, average word length, and punctuation elaboration are length-sensitive metrics (type-token ratios and punctuation density typically rise with longer texts). The abstract gives no indication that token counts were matched between originals and rewrites, normalized (e.g., per 1,000 words), or entered as covariates. If LLMs systematically produce longer outputs under all prompt conditions, these headline shifts could be artifacts of length rather than LLM stylistic normalization.
Authors: We thank the referee for highlighting this important potential confound. We will normalize all length-sensitive metrics (such as type-token ratio and punctuation density) per 1,000 tokens and include text length as a covariate in our statistical models. The abstract will be revised to explicitly note these controls and to indicate that the reported directional shifts were assessed after accounting for length differences. revision: yes
-
Referee: [Abstract] Abstract: The support for consistent directional changes across models and prompt conditions is weakened by the absence of details on statistical tests, controls for confounds such as text length, inter-annotator reliability (if manual coding was involved), and exact exclusion criteria for the 300 texts. These omissions make it difficult to assess whether the observed patterns are robust or driven by unaccounted variables.
Authors: We agree that additional methodological transparency is needed in the abstract. We will revise the abstract to briefly describe the statistical tests used, the inclusion of text length and other confounds as controls, the fact that all 13 markers were derived via automated computational tools (rendering inter-annotator reliability inapplicable), and the exclusion criteria applied when selecting the 300 texts. These details will also be expanded in the methods section of the revised manuscript. revision: yes
Circularity Check
No significant circularity; observational empirical analysis using predefined markers
full rationale
The paper conducts an empirical study by rewriting 300 personal narratives with three LLMs under three prompt conditions and measuring shifts across 13 fixed linguistic markers drawn from existing computational stylistics literature. No equations, parameters, or derivations are present that reduce the reported normalization pattern to fitted inputs or self-definitions by construction. Central claims rest on direct observation of directional changes in function words, pronouns, diversity, etc., without any load-bearing self-citation chains or ansatzes that would make results equivalent to inputs. The analysis is self-contained against external benchmarks as it applies standard, non-fitted stylometric features to new data.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The 13 linguistic markers (function words, vocabulary diversity, word length, punctuation, contractions, first-person pronouns, emotion words) are valid and sufficient indicators of changes in narrative voice and style.
Reference graph
Works this paper leans on
-
[1]
(1988).Variation Across Speech and Writing
Biber, D. (1988).Variation Across Speech and Writing. Cambridge University Press. Biber, D., & Conrad, S. (2019).Register, Genre, and Style(2nd ed.). Cambridge University Press. Bruner, J. (1991). The narrative construction of reality.Critical Inquiry, 18(1), 1–21. Burrows, J. (2002). ‘Delta’: A measure of stylistic difference and a guide to likely author...
1988
-
[2]
https://doi.org/10.1038/s41599-025-05986-3 Pennebaker, J. W., Boyd, R. L., Jordan, K., & Blackburn, K. (2015). The development and psychometric properties of LIWC2015. University of Texas at Austin. Pennebaker, J. W., & King, L. A. (1999). Linguistic styles: Language use as an individual difference. Journal of Personality and Social Psychology, 77(6), 129...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.