Directional Alignment and Narrative Agency in Human-LLM Co-Writing

Halfdan Nordahl Fundal; Yuri Bizzoni

arxiv: 2604.23676 · v1 · submitted 2026-04-26 · 💻 cs.HC

Directional Alignment and Narrative Agency in Human-LLM Co-Writing

Halfdan Nordahl Fundal , Yuri Bizzoni This is my paper

Pith reviewed 2026-05-08 05:43 UTC · model grok-4.3

classification 💻 cs.HC

keywords human-LLM co-writingnarrative agencysemantic noveltysentiment alignmentdirectional influencecreative collaborationturn-taking analysisstory development

0 comments

The pith

Humans introduce more semantic novelty and steer narrative direction in turn-based co-writing with LLMs, while the models mainly elaborate on those inputs with stronger emotional adaptation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies narrative agency in collaborative story writing between humans and large language models by collecting 87 co-written stories and applying quantitative measures of semantic novelty, sentiment alignment, and directional influence across turns. It finds an asymmetric pattern in which human contributions add more new semantic content that tends to determine what follows, whereas LLM turns largely develop and expand on the ideas already introduced by the human. Sentiment tracking shows a more bidirectional but still uneven emotional exchange, with LLMs adjusting more sharply to the prior turn yet also displaying a consistent positive tilt. The overall result points to a division of labor that could inform how such partnerships are structured for creative work.

Core claim

Our results show asymmetric influence: human turns introduce greater semantic novelty and are more likely to shape subsequent developments, whereas LLM contributions predominantly elaborate on human-introduced elements. At the sentiment level, alignment is also asymmetric, but more bidirectional: LLMs exhibit stronger turn-level emotional adaptation than humans, but both agents track each other's emotional valence and LLMs show an independent tendency to more positive emotional baselines. These findings indicate a complementary division of labor in human-LLM co-writing, where humans drive narrative innovation and direction, while LLMs act as adaptive amplifiers that sustain coherence and el

What carries the argument

Directional measures of semantic novelty and affective alignment applied to successive turns in a corpus of 87 human-LLM stories, which track how each agent's input alters the trajectory of the developing narrative.

Load-bearing premise

The chosen sentiment and semantic modeling techniques, along with the directional measures, accurately capture narrative agency and alignment without significant distortion from model-specific biases or corpus construction choices.

What would settle it

Re-running the analysis on the same stories with alternative semantic embedding models or different sentiment lexicons that reverses the result that human turns carry higher novelty and directional influence would undermine the central claim.

Figures

Figures reproduced from arXiv: 2604.23676 by Halfdan Nordahl Fundal, Yuri Bizzoni.

**Figure 1.** Figure 1: Dyadic task flow, visualizing how the partici view at source ↗

**Figure 2.** Figure 2: The user interface of the platform used in the view at source ↗

**Figure 5.** Figure 5: Linear regression of response valence as a view at source ↗

**Figure 4.** Figure 4: Example of the valence trajectories through view at source ↗

**Figure 6.** Figure 6: Distributions of surprisal (novelty) for each view at source ↗

**Figure 7.** Figure 7: Linear regression of resonance as a function view at source ↗

**Figure 8.** Figure 8: Frequency of directional alignment, shown as view at source ↗

**Figure 9.** Figure 9: shows the persistence of alignment streaks (survival-style retention curves). Streaks were longer in the User→LLM direction (mean = 2.19, median = 2, max = 7) than in the LLM→User direction (mean = 1.84, median = 1.5, max = 6). A Wilcoxon rank-sum test indicated a reliable difference in duration distributions (p = 0.03943) view at source ↗

**Figure 10.** Figure 10: Semantic self-alignment of users (blue) com view at source ↗

**Figure 12.** Figure 12: Resonance as a function of novelty for LLM view at source ↗

**Figure 13.** Figure 13: Transience as a function of novelty, with view at source ↗

read the original abstract

We investigate narrative agency in human-LLM creative co-writing, asking who drives story development in turn-based collaboration. Using a new corpus of 87 human-LLM co-written stories, we apply sentiment and semantic modeling to quantify affective alignment and semantic novelty in turn-taking, and directional measures to assess which agent shapes narrative progression. Our results show asymmetric influence: human turns introduce greater semantic novelty and are more likely to shape subsequent developments, whereas LLM contributions predominantly elaborate on human-introduced elements. At the sentiment level, alignment is also asymmetric, but more bidirectional: LLMs exhibit stronger turn-level emotional adaptation than humans, but both agents track each other's emotional valence and LLMs show an independent tendency to more positive emotional baselines. These findings indicate a complementary division of labor in human-LLM co-writing, where humans drive narrative innovation and direction, while LLMs act as adaptive amplifiers that sustain coherence and elaborate emerging narratives.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper reports asymmetric agency in human-LLM story co-writing from a new 87-story corpus, with humans adding more semantic novelty and direction while LLMs elaborate and adapt emotionally.

read the letter

The main point here is that humans appear to steer the story more through novelty and progression, while the LLM mostly fills in and tracks emotions. They collected 87 turn-based co-written stories and ran sentiment plus semantic measures plus some directional stats to show this split in roles. That gives a concrete picture of how the collaboration actually breaks down turn by turn, which extends earlier human-AI interaction work by focusing on the sequence of contributions rather than just overall output quality. The complementary division they describe—humans for innovation, LLMs for coherence and positive tone—lines up with what a lot of people observe in practice and could help when designing better co-writing interfaces. The new corpus itself is a usable addition for anyone who wants to test similar ideas. The soft spots sit mostly in the methods. The abstract gives no specifics on the exact sentiment or embedding models, how the directional measures are defined or calculated, what statistical tests were used, or whether they checked the outputs against human judgments. With only 87 stories and no mention of cross-validation or alternative metrics, the asymmetry could partly reflect choices in the modeling pipeline or how the prompts were set up rather than a stable pattern. If the semantic tools share biases with the LLM used for writing, that would weaken the novelty claims. The stress-test note on possible distortion from embeddings or corpus construction holds up based on what's shown. This is aimed at HCI researchers and people working on creative AI tools who care about actual collaboration dynamics. Readers running their own experiments with LLMs could borrow the corpus or the turn-level framing. It has enough new data and a clear question to deserve a serious referee, though the review would need to push for full method details, robustness checks, and some external validation of the measures. I would send it to peer review rather than desk reject.

Referee Report

3 major / 1 minor

Summary. The manuscript investigates narrative agency in human-LLM turn-based co-writing. Using a new corpus of 87 co-written stories, the authors apply sentiment and semantic modeling to quantify affective alignment and semantic novelty, along with directional measures to assess which agent shapes narrative progression. The central claims are asymmetric influence (humans introduce greater semantic novelty and are more likely to shape subsequent developments; LLMs elaborate on human-introduced elements) and more bidirectional but asymmetric sentiment alignment (LLMs show stronger turn-level emotional adaptation and a tendency toward positive baselines).

Significance. If the modeling choices prove robust, the work provides quantitative evidence for a complementary division of labor in creative co-writing, with humans driving innovation and LLMs acting as adaptive amplifiers. This could inform the design of collaborative writing tools in HCI. The use of a dedicated corpus and multiple metrics (sentiment, semantic novelty, directional measures) is a strength, though the small scale and lack of reported validation limit immediate impact.

major comments (3)

[Abstract] Abstract: the description of methods and results provides no details on the specific sentiment/semantic models, their hyperparameters, statistical tests, error handling, or data exclusion rules. These choices are load-bearing for the central claim of asymmetric human agency, as the observed differences in semantic novelty and directional shaping could be artifacts of the chosen embeddings or similarity metrics.
[Abstract] Corpus and data collection (implied in Abstract): no information is given on prompt designs, starting conditions, or collection procedure for the 87 stories. If prompts already bias toward human-led narrative starts or if the LLM family used for co-writing overlaps with the embedding models, the reported asymmetry (humans introduce novelty, LLMs elaborate) risks being circular rather than evidence of true division of labor.
[Abstract] Results (implied): the claims of greater human semantic novelty and shaping of subsequent turns rest on unvalidated application of the directional measures to a small corpus. No cross-validation against human judgments or alternative metrics is mentioned, leaving open the possibility that LLM embedding biases distort the novelty and alignment scores.

minor comments (1)

[Abstract] Abstract: consider specifying the exact number of turns per story or average corpus statistics to help readers assess the scale of the turn-taking analysis.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback, which identifies key areas where greater transparency and validation will strengthen the manuscript. We address each major comment below and outline specific revisions.

read point-by-point responses

Referee: [Abstract] Abstract: the description of methods and results provides no details on the specific sentiment/semantic models, their hyperparameters, statistical tests, error handling, or data exclusion rules. These choices are load-bearing for the central claim of asymmetric human agency, as the observed differences in semantic novelty and directional shaping could be artifacts of the chosen embeddings or similarity metrics.

Authors: We agree the abstract is too high-level. The full manuscript (Methods section) specifies VADER for sentiment with default hyperparameters, Sentence-BERT (all-MiniLM-L6-v2) embeddings with cosine similarity for semantic novelty, paired t-tests with Bonferroni correction for statistical comparisons, and no data exclusions beyond incomplete sessions. We will revise the abstract to concisely name the models, metrics, and tests while retaining brevity, ensuring the central claims are not presented without methodological context. revision: yes
Referee: [Abstract] Corpus and data collection (implied in Abstract): no information is given on prompt designs, starting conditions, or collection procedure for the 87 stories. If prompts already bias toward human-led narrative starts or if the LLM family used for co-writing overlaps with the embedding models, the reported asymmetry (humans introduce novelty, LLMs elaborate) risks being circular rather than evidence of true division of labor.

Authors: The full paper contains a dedicated Data Collection section describing neutral starting prompts (e.g., 'Begin a short story'), turn-based alternation via a custom interface, and use of GPT-4 for co-writing with a distinct embedding model (all-MiniLM-L6-v2) to avoid overlap. We will add a one-sentence summary of the procedure and model separation to the abstract to eliminate any appearance of circularity and confirm the asymmetry arises from interaction dynamics. revision: yes
Referee: [Abstract] Results (implied): the claims of greater human semantic novelty and shaping of subsequent turns rest on unvalidated application of the directional measures to a small corpus. No cross-validation against human judgments or alternative metrics is mentioned, leaving open the possibility that LLM embedding biases distort the novelty and alignment scores.

Authors: We acknowledge that the directional measures (turn-transition novelty and influence via embedding shifts) lack explicit human validation in the current draft. While robustness was checked across two embedding models, we agree this is insufficient. We will add a human validation subsection: two independent raters will score novelty and directional influence on a 20-story subset, with inter-rater agreement reported. This directly addresses potential embedding biases and bolsters the small-corpus claims. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical observational study with independent measurements

full rationale

The paper collects a new corpus of 87 human-LLM co-written stories and applies standard sentiment and semantic modeling plus directional measures to quantify alignment, novelty, and agency. No equations, derivations, or fitted parameters are presented that reduce the central claims to inputs by construction. The analysis relies on external NLP techniques applied to collected data rather than self-definitional loops, fitted-input predictions, or load-bearing self-citations. The findings emerge from direct measurement on the corpus without renaming known results or smuggling ansatzes via prior work.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

Analysis depends on standard assumptions in NLP sentiment and semantic similarity tools; specific modeling parameters for novelty and directional influence are not detailed in abstract.

free parameters (2)

semantic novelty threshold or window size
Choice of how to quantify novelty in turn-taking likely involves tunable parameters in embedding models or similarity metrics.
sentiment model hyperparameters
Emotional valence and adaptation measures depend on choices in the underlying sentiment analysis model.

axioms (2)

domain assumption Sentiment and semantic embeddings from off-the-shelf models accurately reflect narrative elements and emotional valence in co-written stories.
Invoked when applying modeling to quantify alignment and novelty.
domain assumption Turn-based alternation in the corpus represents typical human-LLM creative collaboration.
Basis for generalizing findings on agency.

pith-pipeline@v0.9.0 · 5453 in / 1418 out tokens · 52789 ms · 2026-05-08T05:43:19.336936+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

6 extracted references · 6 canonical work pages

[1]

um" to "yeah

From" um" to" yeah": Producing, predicting, and regulat- ing information flow in human conversation.arXiv preprint arXiv:2403.08890. Yuri Bizzoni and Pascale Feldkamp

work page arXiv
[2]

In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computa- tional Linguistics: Human Language Technologies, pages 3566–3575

Choose your own adventure: Paired suggestions in collaborative writing for evaluating story generation models. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computa- tional Linguistics: Human Language Technologies, pages 3566–3575. Pascale Feldkamp, Jan Kostkan, Ea Overgaard, Mia Ja- cobsen, and Yuri Bizzoni

work page 2021
[3]

Alberto Muñoz-Ortiz, Carlos Gómez-Rodríguez, and David Vilares

Continuous sentiment scores for literary and multi- lingual contexts.arXiv preprint arXiv:2508.14620. Alberto Muñoz-Ortiz, Carlos Gómez-Rodríguez, and David Vilares

work page arXiv
[4]

Association for Computational Linguistics

Are large language mod- els capable of generating human-level narratives? InProceedings of the 2024 Conference on Empiri- cal Methods in Natural Language Processing, pages 17659–17681, Miami, Florida, USA. Association for Computational Linguistics. Giovanna Varni, Isabelle Hupont, Chloe Clavel, and Mohamed Chetouani

work page 2024
[5]

ai love you

“ai love you”: Linguistic convergence in human-chatbot relationship develop- ment. InAcademy of Management Proceedings, vol- ume 2022, page 17063. Academy of Management Briarcliff Manor, NY 10510. Sergio E Zanotto and Segun Aroyehun

work page 2022
[6]

and Aroyehun, Segun , month = dec, year =

Human variability vs. machine consistency: A linguistic anal- ysis of texts generated by humans and large language models.arXiv preprint arXiv:2412.03025. Eric B Zhou, Dokyun Lee, and Bin Gu

work page arXiv

[1] [1]

um" to "yeah

From" um" to" yeah": Producing, predicting, and regulat- ing information flow in human conversation.arXiv preprint arXiv:2403.08890. Yuri Bizzoni and Pascale Feldkamp

work page arXiv

[2] [2]

In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computa- tional Linguistics: Human Language Technologies, pages 3566–3575

Choose your own adventure: Paired suggestions in collaborative writing for evaluating story generation models. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computa- tional Linguistics: Human Language Technologies, pages 3566–3575. Pascale Feldkamp, Jan Kostkan, Ea Overgaard, Mia Ja- cobsen, and Yuri Bizzoni

work page 2021

[3] [3]

Alberto Muñoz-Ortiz, Carlos Gómez-Rodríguez, and David Vilares

Continuous sentiment scores for literary and multi- lingual contexts.arXiv preprint arXiv:2508.14620. Alberto Muñoz-Ortiz, Carlos Gómez-Rodríguez, and David Vilares

work page arXiv

[4] [4]

Association for Computational Linguistics

Are large language mod- els capable of generating human-level narratives? InProceedings of the 2024 Conference on Empiri- cal Methods in Natural Language Processing, pages 17659–17681, Miami, Florida, USA. Association for Computational Linguistics. Giovanna Varni, Isabelle Hupont, Chloe Clavel, and Mohamed Chetouani

work page 2024

[5] [5]

ai love you

“ai love you”: Linguistic convergence in human-chatbot relationship develop- ment. InAcademy of Management Proceedings, vol- ume 2022, page 17063. Academy of Management Briarcliff Manor, NY 10510. Sergio E Zanotto and Segun Aroyehun

work page 2022

[6] [6]

and Aroyehun, Segun , month = dec, year =

Human variability vs. machine consistency: A linguistic anal- ysis of texts generated by humans and large language models.arXiv preprint arXiv:2412.03025. Eric B Zhou, Dokyun Lee, and Bin Gu

work page arXiv