Old Habits Die Hard: How Conversational History Geometrically Traps LLMs

Adi Simhi; Fazl Barez; Martin Tutek; Shay B. Cohen; Yonatan Belinkov

arxiv: 2603.03308 · v2 · pith:XO5TSOFYnew · submitted 2026-02-08 · 💻 cs.CL · cs.AI

Old Habits Die Hard: How Conversational History Geometrically Traps LLMs

Adi Simhi , Fazl Barez , Martin Tutek , Yonatan Belinkov , Shay B. Cohen This is my paper

Pith reviewed 2026-05-21 12:56 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords conversational historyLLM biaslatent spaceMarkov chainshidden representationsgeometric trapsbehavioral persistence

0 comments

The pith

Conversational history traps LLMs in geometric gaps that confine their internal trajectories.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how earlier turns in a conversation shape what large language models generate next. It compares two ways of measuring how much the model sticks to previous choices: one by modeling the sequence of answers as a chain of states and counting how often it returns to the same state, and the other by checking whether the model's internal hidden vectors stay close together from one turn to the next. The two measures turn out to be tightly linked across many models and tasks. This link suggests that the reason the model repeats old habits is that empty regions in its internal space act as walls that keep its path from straying far from where it started. If this view is correct, it explains why simply telling the model to forget the past often fails to change its behavior.

Core claim

By modeling conversations as Markov chains to quantify state consistency and measuring the similarity of consecutive hidden representations, the work shows that behavioral persistence in LLMs arises as a geometric trap where gaps in the latent space confine the model's trajectory to paths set by prior interactions.

What carries the argument

Gaps in the latent space that restrict the trajectory of hidden representations and thereby sustain consistent response patterns across turns.

If this is right

Earlier errors such as hallucinations continue to shape later answers even when the immediate prompt has moved on.
The strength of this locking can be read out directly from how close successive hidden representations remain to each other.
The same pattern of confinement appears across different model families and across datasets that cover many kinds of conversational phenomena.
Quantifying the size of the gaps offers a concrete way to predict how strongly any given history will bias future outputs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Methods that deliberately nudge hidden states across the identified gaps could reduce the unwanted carry-over of past behaviors.
Comparable geometric barriers might limit flexibility in other sequential tasks such as long-form story writing or multi-step reasoning chains.
Measuring gap sizes on new models before deployment could serve as a diagnostic for how history-dependent those models will be.

Load-bearing premise

The observed correlation between Markov-chain state consistency and similarity of consecutive hidden representations shows a causal geometric trap rather than a surface-level statistical association from the model's training or architecture.

What would settle it

Observing strong behavioral persistence on new data or models while finding little or no correlation between the Markov-chain consistency scores and the hidden-representation similarity scores would falsify the geometric-trap account.

read the original abstract

How does the conversational past of large language models (LLMs) influence their future performance? Recent work suggests that LLMs are affected by their conversational history in unexpected ways. For instance, hallucinations in prior interactions may influence subsequent model responses. In this work, we introduce History-Echoes, a framework that investigates how conversational history biases subsequent generations. The framework explores this bias from two perspectives: probabilistically, we model conversations as Markov chains to quantify state consistency; geometrically, we measure the consistency of consecutive hidden representations. Across three model families and six datasets spanning diverse phenomena, our analysis reveals a strong correlation between the two perspectives. By bridging these perspectives, we demonstrate that behavioral persistence manifests as a geometric trap, where gaps in the latent space confine the model's trajectory. Code available at https://github.com/technion-cs-nlp/OldHabitsDieHard.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows a consistent correlation between Markov-modeled conversation state consistency and similarity of consecutive hidden states but the geometric-trap interpretation stays correlational.

read the letter

The main takeaway is that the authors find a strong correlation between treating conversations as Markov chains to measure state consistency and computing similarity between successive hidden representations, then frame the result as a geometric trap in latent space. They test this across three model families and six datasets covering different phenomena and release the code, which is useful for checking the measurements directly. The History-Echoes framework is new in how it pairs the probabilistic and geometric views in one setup, and the empirical pattern appears repeatable enough to be worth noting. The correlation itself is presented as an observation rather than derived from prior parameters, so it avoids obvious circularity. The softer spot is the move from that correlation to the claim that gaps in the latent space actively confine trajectories. The abstract gives no intervention, ablation, or counterfactual that would show the geometry is the operative cause instead of a downstream reflection of training data or the autoregressive objective. Without that, the trap language reads as an interpretation layered on top of the numbers rather than a mechanism isolated by the experiments. This work is aimed at people studying history effects in dialogue models or representation geometry in LLMs. Readers who want concrete measurements to build on or to test further would get value from it. It has a clear enough empirical core and public code to merit a full review rather than a quick pass.

Referee Report

2 major / 2 minor

Summary. The paper introduces the History-Echoes framework to study how conversational history biases subsequent LLM generations. Conversations are modeled as Markov chains to obtain a scalar measure of state consistency (probabilistic view) while cosine similarity (or equivalent) is computed between consecutive hidden representations (geometric view). Across three model families and six datasets, a strong correlation is reported between these quantities. The central claim is that this correlation demonstrates behavioral persistence as a 'geometric trap' in which gaps in the latent space confine the model's trajectory.

Significance. If substantiated, the work offers a useful bridge between probabilistic modeling of dialogue dynamics and geometric analysis of internal representations. The multi-model, multi-dataset scope and public code release are strengths that support reproducibility and generality. The findings could inform mitigation strategies for history-induced biases such as persistent hallucinations, provided the geometric-trap interpretation is shown to be more than a correlational byproduct of autoregressive training.

major comments (2)

[Abstract] Abstract (final paragraph) and corresponding discussion: the assertion that the reported correlation demonstrates an active 'geometric trap' in which 'gaps in the latent space confine the model's trajectory' is load-bearing for the central claim yet rests only on an observed association between Markov-chain consistency and hidden-state similarity. No intervention, ablation, or counterfactual experiment (e.g., perturbing trajectories while holding token probabilities fixed) is described that would distinguish causal geometric confinement from a passive statistical association arising from the training objective or data statistics. This distinction is required to elevate the result beyond correlation.
[Experiments] Experiments section (results tables/figures): quantitative values, error bars, dataset sizes, and controls for confounding factors such as model scale or prompt length are not referenced in the abstract and must be explicitly reported with statistical tests to support the 'strong correlation' claim across the three model families and six datasets.

minor comments (2)

[Methods] Clarify the precise definition of the Markov state and the hidden-representation similarity metric (e.g., which layer(s) and pooling method) in the methods section to improve reproducibility.
[Abstract] The title and abstract use the term 'geometrically traps' before the supporting analysis is presented; consider softening to 'suggests a geometric component' until the causal evidence is strengthened.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which highlights important distinctions between correlation and causation as well as the need for clearer reporting of quantitative results. We address each major comment below and describe the revisions planned for the next manuscript version.

read point-by-point responses

Referee: [Abstract] Abstract (final paragraph) and corresponding discussion: the assertion that the reported correlation demonstrates an active 'geometric trap' in which 'gaps in the latent space confine the model's trajectory' is load-bearing for the central claim yet rests only on an observed association between Markov-chain consistency and hidden-state similarity. No intervention, ablation, or counterfactual experiment (e.g., perturbing trajectories while holding token probabilities fixed) is described that would distinguish causal geometric confinement from a passive statistical association arising from the training objective or data statistics. This distinction is required to elevate the result beyond correlation.

Authors: We agree that the current evidence consists of a strong observed correlation rather than direct causal interventions, and that the manuscript language in the abstract and discussion overstates the causal interpretation. The multi-model, multi-dataset consistency provides supporting evidence for the geometric-trap view but cannot by itself rule out passive associations from autoregressive training. In the revised manuscript we will (1) revise the abstract and discussion to state that the correlation is consistent with behavioral persistence manifesting as a geometric trap, (2) add an explicit limitations paragraph acknowledging the absence of counterfactual or ablation experiments, and (3) outline possible future directions for causal tests such as controlled trajectory perturbations. These changes will be made without introducing new experiments. revision: yes
Referee: [Experiments] Experiments section (results tables/figures): quantitative values, error bars, dataset sizes, and controls for confounding factors such as model scale or prompt length are not referenced in the abstract and must be explicitly reported with statistical tests to support the 'strong correlation' claim across the three model families and six datasets.

Authors: We accept this point. While the full experiments section already contains the requested quantitative details (correlation coefficients, standard errors, dataset sizes, and controls for model scale and prompt length together with statistical tests), these were not summarized in the abstract. In the revision we will update the abstract to report the key quantitative findings, including the range of correlation strengths, mention of error bars, and reference to the statistical tests performed. The experiments section will be edited for explicit cross-references to these controls and tests. revision: yes

Circularity Check

0 steps flagged

No circularity: correlation between independent Markov and geometric measures is presented as an empirical observation

full rationale

The paper defines two distinct measurements—Markov-chain state consistency for the probabilistic view and cosine similarity (or equivalent) of consecutive hidden representations for the geometric view—then reports their observed correlation across models and datasets as an empirical result. The conclusion that this manifests as a 'geometric trap' is an interpretive bridge from the correlation rather than a quantity that reduces by construction to the input definitions or to any fitted parameters. No self-definitional loops, fitted inputs renamed as predictions, or load-bearing self-citations appear in the provided derivation chain; the measurements are computed separately and the link is data-driven rather than tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that conversations are adequately modeled as Markov chains and on the interpretive step that hidden-state similarity constitutes a confining geometric trap; no explicit free parameters or independently evidenced invented entities are stated in the abstract.

axioms (1)

domain assumption Conversations can be modeled as Markov chains to quantify state consistency.
Invoked to enable the probabilistic perspective of the History-Echoes framework.

invented entities (1)

geometric trap no independent evidence
purpose: To explain behavioral persistence as confinement of model trajectory by gaps in latent space.
New conceptual entity introduced to interpret the observed correlation between the two measurement perspectives.

pith-pipeline@v0.9.0 · 5689 in / 1337 out tokens · 60773 ms · 2026-05-21T12:56:52.156978+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We model the conversation as a Markov chain over a binary state space... Tr(T) = P(sϕ+|sϕ+) + P(sϕ−|sϕ−)
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

construct a two-dimensional orthonormal basis... θref = θ(h′ϕ+, h′ϕ−)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

AMEL: Accumulated Message Effects on LLM Judgments
cs.AI 2026-05 conditional novelty 6.0

LLMs exhibit an accumulated message effect where conversation history saturated with positive or negative evaluations biases subsequent judgments, with larger shifts on uncertain items, a negativity asymmetry, and no ...
SWAY: A Counterfactual Computational Linguistic Approach to Measuring and Mitigating Sycophancy
cs.CL 2026-04 unverdicted novelty 6.0

SWAY quantifies sycophancy in LLMs via shifts under linguistic pressure and a counterfactual chain-of-thought mitigation reduces it to near zero while preserving responsiveness to genuine evidence.