pith. sign in

arxiv: 1907.02764 · v1 · pith:HP7O77ZOnew · submitted 2019-07-05 · 📊 stat.ME · stat.AP

Analyses of 'change scores' do not estimate causal effects in observational data

Pith reviewed 2026-05-25 02:18 UTC · model grok-4.3

classification 📊 stat.ME stat.AP
keywords change scorescausal effectsobservational datadirected acyclic graphslongitudinal databaseline measurementsconfoundersmediators
0
0 comments X

The pith

Change-score analyses do not estimate causal effects in observational data unless the baseline measurement is a competing exposure.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows through directed acyclic graphs and simulations that subtracting a baseline outcome measurement from a follow-up measurement and analyzing the difference as an outcome yields misleading estimates of causal effects. This holds in observational data when the baseline acts as a confounder or mediator for the exposure-outcome relationship. A sympathetic reader would care because change scores are a common method in longitudinal studies across many fields, yet they can produce conclusions that diverge from those obtained by analyses that respect the actual causal structure. The paper states that only when the baseline functions as a competing exposure, as occurs in randomized experiments, do change-score analyses align with causal effect estimates.

Core claim

Change-score analyses do not provide meaningful causal effect estimates unless the variable representing measurements of the outcome at baseline is a competing exposure, as in a randomised experiment. Where such variables are confounders or mediators, the conclusions drawn from analyses of change scores diverge (potentially substantially) from those of DAG-informed analyses.

What carries the argument

Directed acyclic graphs (DAGs) that classify the baseline outcome measurement as competing exposure, confounder, or mediator, combined with simulations that compare regression coefficients from change-score models against coefficients from DAG-informed models.

If this is right

  • Observational studies that seek causal effect estimates should avoid change-score analyses.
  • Alternative analytical strategies that respect the causal roles of baseline measurements should be adopted instead.
  • Change-score analyses align with causal effects only in settings such as randomized experiments where the baseline measurement is a competing exposure.
  • When the baseline measurement is a confounder or mediator, change-score results can differ substantially from DAG-informed results.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Many existing observational studies that reported causal claims based on change scores could be reanalyzed with DAG methods to check whether their conclusions hold.
  • The problem may be especially common in epidemiology and psychology, where change scores remain popular for longitudinal outcomes.
  • Software that detects change-score models and suggests DAG-based alternatives could reduce the use of this approach in practice.

Load-bearing premise

The three simulated scenarios capture the causal structures that baseline measurements actually take in real observational data.

What would settle it

A real observational dataset in which the baseline outcome measurement is a confounder or mediator yet the change-score regression coefficient exactly matches the total causal effect recovered by a correctly specified DAG analysis would falsify the claim.

read the original abstract

Background: In longitudinal data, it is common to create 'change scores' by subtracting measurements taken at baseline from those taken at follow-up, and then to analyse the resulting 'change' as the outcome variable. In observational data, this approach can produce misleading causal effect estimates. The present article uses directed acyclic graphs (DAGs) and simple simulations to provide an accessible explanation of why change scores do not estimate causal effects in observational data. Methods: Data were simulated to match three general scenarios where the variable representing measurements of the outcome at baseline was a 1) competing exposure, 2) confounder, or 3) mediator for the total causal effect of the exposure on the variable representing measurements of the outcome at follow-up. Regression coefficients were compared between change-score analyses and DAG-informed analyses. Results: Change-score analyses do not provide meaningful causal effect estimates unless the variable representing measurements of the outcome at baseline is a competing exposure, as in a randomised experiment. Where such variables (i.e. baseline measurements of the outcome) are confounders or mediators, the conclusions drawn from analyses of change scores diverge (potentially substantially) from those of DAG-informed analyses. Conclusions: Future observational studies that seek causal effect estimates should avoid analysing change scores and adopt alternative analytical strategies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that in observational longitudinal data, change-score analyses (subtracting baseline from follow-up outcome and regressing the difference on exposure) do not estimate causal effects of the exposure unless the baseline outcome measurement is a competing exposure (as occurs in randomized experiments). It supports this via three canonical DAGs (baseline as competing exposure, confounder, or mediator) and matching linear simulations that recover the known causal effect under DAG-informed regression but show divergence under change-score regression when baseline is a confounder or mediator.

Significance. If the result holds, the finding is significant for applied causal inference in epidemiology and related fields, where change-score methods remain common yet can produce misleading inferences. The paper's use of standard DAGs together with simulations constructed directly from the structural equations implied by each DAG provides an accessible and algebraically transparent demonstration; the divergence follows immediately from the implicit constraint that the coefficient on baseline is fixed at -1.

major comments (2)
  1. [Methods] Methods (simulation design): the manuscript does not report the exact parameter values, sample sizes, or error variances used to generate the three scenarios, nor does it supply the simulation code or seed values. Without these, independent verification of the reported coefficient divergences is not possible, even though the qualitative result is an algebraic consequence of the change-score constraint.
  2. [Results] Results: the claim that divergences are 'potentially substantially' is not accompanied by the actual numerical coefficient values recovered from the change-score versus DAG-informed regressions in the confounder and mediator scenarios. Supplying these values (and any sensitivity checks across parameter ranges) would make the magnitude of the discrepancy concrete rather than qualitative.
minor comments (2)
  1. [Abstract] Abstract, Conclusions: the recommendation to 'adopt alternative analytical strategies' would be strengthened by a brief pointer to one or two standard alternatives (e.g., regression adjustment for baseline or g-methods) with a supporting reference.
  2. [Throughout] Notation: the manuscript consistently refers to 'the variable representing measurements of the outcome at baseline' and 'at follow-up'; introducing compact symbols (e.g., Y0, Y1, X) early would improve readability without altering meaning.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive review and positive assessment of the manuscript's significance. We address each major comment below and agree that the suggested additions will improve reproducibility and clarity.

read point-by-point responses
  1. Referee: [Methods] Methods (simulation design): the manuscript does not report the exact parameter values, sample sizes, or error variances used to generate the three scenarios, nor does it supply the simulation code or seed values. Without these, independent verification of the reported coefficient divergences is not possible, even though the qualitative result is an algebraic consequence of the change-score constraint.

    Authors: We agree that the simulation parameters should be reported for full reproducibility. In the revised manuscript we will add the exact parameter values, sample sizes, and error variances for each of the three scenarios. We will also supply the simulation code (including the random seed) as supplementary material or via a public repository. While we concur with the referee that the divergence follows algebraically from the change-score constraint (i.e., fixing the baseline coefficient at -1), providing the concrete implementation details will allow independent verification as requested. revision: yes

  2. Referee: [Results] Results: the claim that divergences are 'potentially substantially' is not accompanied by the actual numerical coefficient values recovered from the change-score versus DAG-informed regressions in the confounder and mediator scenarios. Supplying these values (and any sensitivity checks across parameter ranges) would make the magnitude of the discrepancy concrete rather than qualitative.

    Authors: We accept that reporting the specific numerical coefficient values will strengthen the results section. In the revision we will present the exact recovered coefficients from both the change-score and DAG-informed regressions for the confounder and mediator scenarios. We will also add sensitivity checks across a range of parameter values to quantify the magnitude of the discrepancies under different conditions. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper's central demonstration relies on three canonical DAG structures (baseline as competing exposure, confounder, or mediator) plus linear simulations generated directly from the structural equations implied by each DAG. The divergence between change-score regression (which imposes a fixed coefficient of -1 on baseline) and DAG-informed regression is an algebraic consequence of that constraint, shown via explicit comparison of coefficients without any fitted parameters, self-referential definitions, or load-bearing self-citations. The derivation is self-contained against the external benchmark of standard causal graphical models and does not reduce to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the three causal structures simulated accurately represent relevant real-world cases and that regression coefficient comparisons validly indicate whether change scores recover causal effects.

axioms (1)
  • domain assumption Baseline outcome measurements can function as competing exposures, confounders, or mediators in the exposure-outcome relationship.
    The paper structures its simulations and comparisons explicitly around these three scenarios.

pith-pipeline@v0.9.0 · 5775 in / 1272 out tokens · 40193 ms · 2026-05-25T02:18:40.562849+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.