Stable Personas: Dual-Assessment of Temporal Stability in LLM-Based Human Simulation

Jana Gonnermann-M\"uller; Jennifer Haase; Nicolas Leins; Sebastian Pokutta; Thomas Kosch

arxiv: 2601.22812 · v2 · pith:5P6RQBOZnew · submitted 2026-01-30 · 💻 cs.HC

Stable Personas: Dual-Assessment of Temporal Stability in LLM-Based Human Simulation

Jana Gonnermann-M\"uller , Jennifer Haase , Nicolas Leins , Thomas Kosch , Sebastian Pokutta This is my paper

Pith reviewed 2026-05-21 14:53 UTC · model grok-4.3

classification 💻 cs.HC

keywords LLM personastemporal stabilityhuman simulationbehavioral researchself-reportsobserver ratingsADHD presentations

0 comments

The pith

LLM personas produce stable self-reports across conversations but show declining observer-rated expression in longer dialogues.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether LLMs can maintain consistent personas over time, a requirement for using them as scalable stand-ins in behavioral research. It applies a dual-assessment method that tracks both the models' self-reported traits and independent observer ratings of how the persona appears in generated text. Self-reports prove highly consistent both across separate conversations and inside long single sessions. Observer ratings, however, detect a weakening of persona expression as conversations extend. The work treats self-report stability as support for LLM use in research while marking the regression pattern as a limit for multi-turn social simulations.

Core claim

Persona-instructed LLMs produce stable, persona-aligned self-reports both between conversations and within extended ones, yet observer ratings show a tendency for persona expressions to decline during longer interactions, establishing this regression as a boundary condition for multi-agent social simulation.

What carries the argument

Dual-assessment framework that measures self-reported characteristics separately from observer-rated persona expression in LLM outputs.

Load-bearing premise

Observer ratings accurately and independently capture the LLM's persona expression without systematic bias from the rating task or output artifacts.

What would settle it

Re-running the observer rating task with new raters, blinded conditions, or an alternative scale that finds no decline over conversation turns would indicate the drop stems from rating artifacts rather than true persona regression.

read the original abstract

Large Language Models (LLMs) acting as artificial agents offer the potential for scalable behavioral research, yet their validity depends on whether LLMs can maintain stable personas across extended conversations. We address this point using a dual-assessment framework measuring both self-reported characteristics and observer-rated persona expression. Across two experiments testing four persona conditions (default, high, moderate, and low ADHD presentations), seven LLMs, and three semantically equivalent persona prompts, we examine between-conversation stability (3,473 conversations) and within-conversation stability (1,370 conversations and 18 turns). Self-reports remain highly stable both between and within conversations. However, observer ratings reveal a tendency for persona expressions to decline during extended conversations. These findings suggest that persona-instructed LLMs produce stable, persona-aligned self-reports, an important prerequisite for behavioral research, while identifying this regression tendency as a boundary condition for multi-agent social simulation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript claims that LLM-based personas maintain high temporal stability in self-reported characteristics both between conversations (3,473 dialogues) and within conversations (1,370 dialogues across 18 turns), across seven LLMs and four persona conditions (default, high, moderate, and low ADHD presentations) using three semantically equivalent prompts. A dual-assessment framework shows stable self-reports but a decline in observer-rated persona expression during extended interactions, positioning the latter as a boundary condition for multi-agent social simulation while supporting use for self-report behavioral research.

Significance. If the observer ratings are shown to be independent of rater artifacts, the work supplies large-scale empirical evidence on persona stability that directly informs the validity of LLM agents for scalable behavioral research. The sample sizes and explicit condition counts provide a solid empirical base for the self-report stability result, which is a prerequisite for many simulation applications.

major comments (2)

[Methods] Observer rating procedure (Methods): The manuscript provides no inter-rater reliability statistics and no validation details for the observer prompts. This is load-bearing for the central claim that declining observer scores demonstrate genuine persona regression rather than rater LLM context-length or recency effects, especially since the rating task supplies full conversation history.
[Results] Within-conversation results: The reported decline in observer scores is interpreted as a boundary condition, yet no ablation is described that varies rater prompt length, uses human raters on a subsample, or tests rating prompts that summarize rather than supply full history. Without such controls the decline cannot be confidently attributed to the target persona.

minor comments (1)

[Abstract] Abstract: The phrase 'observer ratings reveal a tendency' is vague; a quantitative description of the magnitude or statistical test for the decline would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify the robustness of our dual-assessment approach. Below we provide point-by-point responses to the major comments and indicate planned revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Methods] Observer rating procedure (Methods): The manuscript provides no inter-rater reliability statistics and no validation details for the observer prompts. This is load-bearing for the central claim that declining observer scores demonstrate genuine persona regression rather than rater LLM context-length or recency effects, especially since the rating task supplies full conversation history.

Authors: We agree that explicit inter-rater reliability and prompt validation details are necessary to support interpretation of the observer ratings. In the revised version we will add inter-rater reliability statistics obtained by having two independent LLM raters score a stratified random subsample of 200 conversations, reporting agreement via Cohen’s kappa and percentage agreement. We will also expand the Methods section with a description of the iterative prompt-validation process used to ensure the observer instructions prioritize persona expression over recency or context-length cues. These additions directly address the concern that the observed decline may reflect rater artifacts. revision: yes
Referee: [Results] Within-conversation results: The reported decline in observer scores is interpreted as a boundary condition, yet no ablation is described that varies rater prompt length, uses human raters on a subsample, or tests rating prompts that summarize rather than supply full history. Without such controls the decline cannot be confidently attributed to the target persona.

Authors: We concur that additional controls would increase confidence in attributing the decline to persona regression. We will include a new ablation in the revision that re-rates a subsample of conversations under both full-history and condensed-summary prompt conditions and compares the resulting trajectories. We will also report human ratings on a small random subsample (approximately 50 conversations) to provide an external validity check on the LLM observer scores. While full-scale human rating of the entire corpus remains impractical, these targeted controls should help isolate the contribution of the target persona from rater-specific effects. revision: partial

Circularity Check

0 steps flagged

No significant circularity in empirical measurement study

full rationale

The paper conducts a purely empirical investigation of LLM persona stability via direct experimentation, collecting self-report and observer-rating data across thousands of conversations without any equations, derivations, fitted parameters, or first-principles claims that reduce to inputs by construction. Stability conclusions emerge from observed patterns in the collected data rather than from re-expression of prior definitions or self-citation chains. No load-bearing self-citations, ansatz smuggling, or renaming of known results appear in the measurement framework; the dual-assessment approach is self-contained against external benchmarks of conversation logs and ratings.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

This empirical study introduces no free parameters, no ad-hoc axioms, and no invented entities; it relies only on standard assumptions from experimental psychology about the validity of self-report and observer measures for trait expression.

axioms (1)

domain assumption Self-report and observer ratings are valid and complementary measures of persona trait expression in artificial agents.
The dual-assessment framework treats both measures as appropriate for detecting temporal stability without further justification in the abstract.

pith-pipeline@v0.9.0 · 5696 in / 1207 out tokens · 59411 ms · 2026-05-21T14:53:07.625975+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Self-reports remain highly stable both between and within conversations. However, observer ratings reveal a tendency for persona expressions to decline during extended conversations.
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat ≃ Nat recovery unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Variance decomposition via linear mixed models quantifies the contribution of each experimental factor

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

LLM-Based Educational Simulation: Evaluating Temporal Student Persona Stability Across ADHD Profiles
cs.HC 2026-05 unverdicted novelty 5.0

LLM-simulated ADHD student personas show stable self-reported traits but behavioral drift in unscripted interactions that explicit task prompts fully eliminate.