pith. sign in

arxiv: 2507.21925 · v3 · submitted 2025-07-29 · 📊 stat.ME

Marginal and conditional summary measures: transportability and compatibility across studies

Pith reviewed 2026-05-19 02:49 UTC · model grok-4.3

classification 📊 stat.ME
keywords marginal summary measuresconditional summary measurestransportabilityeffect modificationevidence synthesiscollapsible measuresindirect treatment comparison
0
0 comments X

The pith

Marginal and conditional summary measures do not generally coincide, so their naive pooling in evidence synthesis produces bias.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that marginal and conditional summary measures for treatment effects carry different interpretations and do not match in general, even for collapsible measures once effect modification enters the picture. It walks through this for various outcome types and data-generating mechanisms, then traces the consequences for moving results from one population to another. The core practical point is that evidence synthesis methods, including indirect comparisons with covariate adjustment, mix incompatible measures unless deliberate steps are taken to align them. A sympathetic reader cares because pooled estimates used for decisions can be systematically off when this mismatch is ignored, and individual patient data makes alignment easier.

Core claim

Marginal and conditional summary measures do not generally coincide, have different interpretations and correspond to different decision questions. While these aspects have primarily been recognized for non-collapsible summary measures, they are equally problematic for some collapsible measures in the presence of effect modification. The paper clarifies the interpretation and properties of several marginal and conditional summary measures, considering different types of outcomes and hypothetical outcome-generating mechanisms, describes implications of the choice of summary measure for transportability, and illustrates existing summary measure incompatibility issues in the context of evidence

What carries the argument

The distinction between marginal and conditional summary measures and the role of effect modification by covariates in altering population-level treatment effects.

If this is right

  • Covariates not conventionally labeled effect modifiers can still modify population-level treatment effects.
  • Naive pooling of incompatible summary measures across studies introduces bias in evidence synthesis.
  • Methods for indirect treatment comparisons must align the type of summary measure used in each study.
  • Full individual patient data access simplifies checking and enforcing compatibility of summary measures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Meta-analyses and network meta-analyses should explicitly state the target population and the marginal-versus-conditional choice before pooling.
  • Adjustment methods for indirect comparisons could incorporate automated checks for measure compatibility before producing a pooled result.
  • The same logic applies to any setting that transports causal effects between populations, such as external-control-arm studies.

Load-bearing premise

Outcome-generating mechanisms include effect modification by covariates that change population-level treatment effects across different studies or transportability scenarios.

What would settle it

Generate data under a known effect-modification mechanism, compute a marginal summary from one population and a conditional summary from another, pool them without alignment, and check whether the combined estimate deviates from the true target-population effect; the deviation should vanish when only compatible measures are pooled.

Figures

Figures reproduced from arXiv: 2507.21925 by A. E. Ades, Anna Heath, Antonio Remiro-Az\'ocar, David M. Phillippo, Gianluca Baio, Nicky J. Welton, Sofia Dias.

Figure 1
Figure 1. Figure 1: Matrices indicating whether different estimands are equivalent for the homogeneous illustrative models. The blue squares denote matching estimand values; the dots denote the diagonal, where estimands are equivalent by definition. Outside the GLM framework, a model and summary measure that are widely used for the analysis of time-to-event outcomes in RCTs are the Cox proportional hazards model and the (log)… view at source ↗
Figure 2
Figure 2. Figure 2: Matrices indicating whether different estimands are equivalent for the heterogeneous illustrative models. The blue squares denote matching estimand values; the dots denote the diagonal, where estimands are equivalent by definition. 3.3. Non-linear (quadratic) heterogeneous CTEX In the heterogeneous illustrative model in Section 3.2, the conditional outcome expectation on the linear predictor scale varies l… view at source ↗
Figure 3
Figure 3. Figure 3: Matrices indicating whether different estimands are equivalent for the quadratic (heterogeneous) illustrative models. The blue squares denote matching estimand values; the dots denote the diagonal, where estimands are equivalent by definition. 4. Indirect treatment comparisons We have clarified the interpretation of different marginal and conditional summary measures, considering different types of outcome… view at source ↗
read the original abstract

Marginal and conditional summary measures do not generally coincide, have different interpretations and correspond to different decision questions. While these aspects have primarily been recognized for non-collapsible summary measures, they are equally problematic for some collapsible measures in the presence of effect modification. We clarify the interpretation and properties of several marginal and conditional summary measures, considering different types of outcomes and hypothetical outcome-generating mechanisms. We describe implications of the choice of summary measure for transportability, highlighting that covariates not conventionally described as effect modifiers can modify population-level treatment effects. Finally, we illustrate existing summary measure incompatibility issues in the context of evidence synthesis, using the case of covariate adjustment methods for indirect treatment comparisons. Because marginal and conditional summary measures do not generally coincide, their na\"ive pooling in evidence synthesis can produce bias. Almost invariably, care is needed to ensure that evidence synthesis methods are combining compatible summary measures, and this may be easier to accomplish with full access to individual patient data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper claims that marginal and conditional summary measures generally do not coincide, have distinct interpretations, and correspond to different decision questions. This distinction, long recognized for non-collapsible measures, also applies to collapsible measures under effect modification. The authors clarify properties across outcome types and outcome-generating mechanisms, discuss implications for transportability (including that non-conventional covariates can modify population-level effects), and illustrate incompatibility problems in evidence synthesis via covariate-adjusted indirect treatment comparisons, concluding that naive pooling of incompatible measures can induce bias.

Significance. If the clarifications hold, the work has moderate significance for statistical methodology in evidence synthesis and transportability. It reinforces standard results on collapsibility and effect modification with a focused illustration in indirect comparisons, which could help practitioners avoid bias when combining studies. The emphasis on compatible summary measures and the value of individual patient data is a practical contribution, though it largely synthesizes existing statistical properties rather than introducing novel derivations.

minor comments (3)
  1. [Abstract] Abstract: the claim that the issues are 'equally problematic for some collapsible measures' would be strengthened by briefly naming one such measure (e.g., risk difference) and the specific transportability scenario in which the population-level effect is modified.
  2. [Evidence synthesis illustration] Illustration of indirect comparisons: the description of bias from naïve pooling is qualitative; adding a small numerical example or sensitivity calculation showing the magnitude of incompatibility under differing covariate distributions would make the practical warning more concrete.
  3. [Throughout] Notation: ensure consistent use of symbols for marginal vs. conditional quantities throughout; a short table summarizing the interpretations for each outcome type would improve readability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their thoughtful summary of our manuscript and for recommending minor revision. The review accurately captures the paper's focus on the non-coincidence of marginal and conditional summary measures, their implications for transportability, and the risks of naive pooling in evidence synthesis. We are pleased that the practical relevance for indirect treatment comparisons and the emphasis on compatible measures and individual patient data were noted.

Circularity Check

0 steps flagged

No significant circularity; claims rest on standard collapsibility results

full rationale

The paper's core arguments derive from established statistical properties of marginal vs. conditional measures under effect modification and differing covariate distributions, using standard g-computation logic. These are not reduced to self-definitions, fitted inputs renamed as predictions, or load-bearing self-citations within the manuscript. Minor self-citations to prior methodological work exist but are not central or unverified; the derivations remain self-contained against external benchmarks like collapsibility theory.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on standard domain assumptions about outcome-generating mechanisms and effect modification without introducing new free parameters or invented entities.

axioms (1)
  • domain assumption Outcome-generating mechanisms include effect modification by covariates that can alter population-level treatment effects.
    Invoked when discussing different outcome types, hypothetical mechanisms, and transportability implications.

pith-pipeline@v0.9.0 · 5716 in / 1165 out tokens · 41805 ms · 2026-05-19T02:49:34.394661+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Incorporating estimands into meta-analyses of clinical trials

    stat.AP 2025-10 conditional novelty 6.0

    A framework is proposed to integrate estimands into meta-analyses of clinical trials to identify sources of heterogeneity from intercurrent event strategies and improve the external validity of pooled estimates for he...

  2. Propensity Score Weighting to Ensure Balance in Key Subgroups or Strata: A Practical Guide

    stat.ME 2026-04 unverdicted novelty 2.0

    A guide to stratified propensity score weighting for balancing key clinical subgroups in observational studies of treatment effects.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · cited by 2 Pith papers

  1. [1]

    , " * write output.state after.block = add.period write newline

    ENTRY address author booktitle chapter doi edition editor eid howpublished institution journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mid.sentence...

  2. [2]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in " " * FUNCTION format....

  3. [3]

    702--727

    Ades, A, Welton, NJ, Dias, S, Phillippo, DM & Caldwell, DM (2024), Twenty years of network meta-analysis: Continuing controversies and recent developments, Research synthesis methods, 15(5), pp. 702--727

  4. [4]

    Arel-Bundock, V, Greifer, N & Heiss, A (2024), How to interpret statistical models using marginaleffects for r and python, Journal of Statistical Software, 111, pp. 1--32

  5. [5]

    2837--2849

    Austin, PC (2013), The performance of different propensity score methods for estimating marginal hazard ratios, Statistics in medicine, 32(16), pp. 2837--2849

  6. [6]

    1242--1258

    Austin, PC (2014), The use of propensity score methods with survival or time-to-event outcomes: reporting measures of effect similar to those used in randomized experiments, Statistics in medicine, 33(7), pp. 1242--1258

  7. [7]

    3968--3971

    Ballman, KV (2015), Biomarker: predictive or prognostic? Journal of clinical oncology: official journal of the American Society of Clinical Oncology, 33(33), pp. 3968--3971

  8. [8]

    174--177

    Christensen, R, Bours, MJ & Nielsen, SM (2021), Effect modifiers and statistical tests for interaction in randomized trials, Journal of clinical epidemiology, 134, pp. 174--177

  9. [9]

    which causal measure is easier to generalize? arXiv preprint arXiv:2303.16008

    Colnet, B, Josse, J, Varoquaux, G & Scornet, E (2023), Risk ratio, odds ratio, risk difference... which causal measure is easier to generalize? arXiv preprint arXiv:2303.16008

  10. [10]

    528--557

    Daniel, R, Zhang, J & Farewell, D (2021), Making apples from oranges: Comparing noncollapsible effect estimators and their standard errors after adjustment for different covariate sets, Biometrical Journal, 63(3), pp. 528--557

  11. [11]

    (2023), Synthesizing cross-design evidence and cross-format data using network meta-regression, Research Synthesis Methods, 14(2), pp

    Hamza, T, Chalkou, K, Pellegrini, F, Kuhle, J, Benkert, P, Lorscheider, J, Zecca, C, Iglesias-Urrutia, CP, Manca, A, Furukawa, TA et al. (2023), Synthesizing cross-design evidence and cross-format data using network meta-regression, Research Synthesis Methods, 14(2), pp. 283--300

  12. [12]

    Hamza, T, Schwarzer, G & Salanti, G (2024), crossnma: An r package to synthesize cross-design evidence and cross-format data using network meta-analysis and network meta-regression, BMC Medical Research Methodology, 24(1), pp. 1--16

  13. [13]

    211--233

    Harari, O, Soltanifar, M, Cappelleri, JC, Verhoek, A, Ouwens, M, Daly, C & Heeg, B (2023), Network meta-interpolation: Effect modification adjustment in network meta-analysis using subgroup analyses, Research Synthesis Methods, 14(2), pp. 211--233

  14. [14]

    Hern \'a n, MA & Robins, JM (2020), Causal inference: what if, Boca Raton: Chapman & Hall/CRC

  15. [15]

    H jbjerre-Frandsen, E, van der Laan, MJ & Schuler, A (2025), Powering rcts for marginal effects with glms using prognostic score adjustment, arXiv preprint arXiv:2503.22284

  16. [16]

    Huitfeldt, A, Stensrud, MJ & Suzuki, E (2019), On the collapsibility of measures of effect in the counterfactual causal framework, Emerging themes in epidemiology, 16(1), pp. 1--5

  17. [17]

    537--549

    Ishak, KJ, Proskorovsky, I & Benedict, A (2015), Simulation and matching-based approaches for indirect comparison of treatments, Pharmacoeconomics, 33(6), pp. 537--549

  18. [18]

    Keene, ON, Lynggaard, H, Englert, S, Lanius, V & Wright, D (2023), Why estimands are needed to define treatment effects in clinical trials, BMC medicine, 21(1), p. 276

  19. [19]

    422--446

    Kiefer, C & Mayer, A (2019), Average effects based on regressions with a logarithmic link function: A new approach with stochastic covariates, Psychometrika, 84(2), pp. 422--446

  20. [20]

    Lenth, RV (2016), Least-squares means: the r package lsmeans, Journal of statistical software, 69(1), pp. 1--33

  21. [21]

    279--296

    Martinussen, T & Vansteelandt, S (2013), On collapsibility and confounding bias in cox and aalen regression models, Lifetime data analysis, 19(3), pp. 279--296

  22. [22]

    556--570

    Mayer, A, Umbach, N, Flunger, B & Kelava, A (2017), Effect analysis using nonlinear structural equation mixture modeling, Structural Equation Modeling: A Multidisciplinary Journal, 24(4), pp. 556--570

  23. [23]

    Morris, TP, Walker, AS, Williamson, EJ & White, IR (2022), Planning a method for covariate adjustment in individually randomised trials: a practical guide, Trials, 23(1), pp. 1--17

  24. [24]

    M \"u tze, T, Bell, J, Englert, S, Hougaard, P, Jackson, D, Lanius, V & Ravn, H (2025), Principles for defining estimands in clinical trials—a proposal, Pharmaceutical Statistics, 24(1), p. e2432

  25. [25]

    Phillippo, D, Ades, T, Dias, S, Palmer, S, Abrams, KR & Welton, N (2016), Nice dsu technical support document 18: methods for population-adjusted indirect comparisons in submissions to nice,

  26. [26]

    200--211

    Phillippo, DM, Ades, AE, Dias, S, Palmer, S, Abrams, KR & Welton, NJ (2018), Methods for population-adjusted indirect comparisons in health technology appraisal, Medical Decision Making, 38(2), pp. 200--211

  27. [27]

    Phillippo, DM, Dias, S, Ades, A, Belger, M, Brnabic, A, Saure, D, Schymura, Y & Welton, NJ (2023), Validating the assumptions of population adjustment: application of multilevel network meta-regression to a network of treatments for plaque psoriasis, Medical Decision Making, 43(1), pp. 53--67

  28. [28]

    1189--1210

    Phillippo, DM, Dias, S, Ades, A, Belger, M, Brnabic, A, Schacht, A, Saure, D, Kadziola, Z & Welton, NJ (2020), Multilevel network meta-regression for population-adjusted treatment comparisons, Journal of the Royal Statistical Society: Series A (Statistics in Society), 183(3), pp. 1189--1210

  29. [29]

    assessing the performance of population adjustment methods for anchored indirect comparisons: A simulation study

    Phillippo, DM, Dias, S, Ades, AE & Welton, NJ (2021), Target estimands for efficient decision making: Response to comments on “assessing the performance of population adjustment methods for anchored indirect comparisons: A simulation study”, Statistics in Medicine, 40(11), pp. 2759--2763

  30. [30]

    Phillippo, DM, Remiro-Az \'o car, A, Heath, A, Baio, G, Dias, S, Ades, A & Welton, NJ (2025), Effect modification and non-collapsibility together may lead to conflicting treatment decisions: A review of marginal and conditional estimands and recommendations for decision-making, Research Synthesis Methods, 16(2), pp. 1--27

  31. [31]

    5592--5596

    Remiro-Az \'o car, A (2022 a ), Some considerations on target estimands for health technology assessment, Statistics in Medicine, 41(28), pp. 5592--5596

  32. [32]

    5558--5569

    Remiro-Az \'o car, A (2022 b ), Target estimands for population-adjusted indirect comparisons, Statistics in Medicine, 41(28), pp. 5558--5569

  33. [33]

    4217--4249

    Remiro-Az \'o car, A (2024), Transportability of model-based estimands in evidence synthesis, Statistics in Medicine, 43(22), pp. 4217--4249

  34. [34]

    735--740

    Remiro-Az \'o car, A & Gorst-Rasmussen, A (2024), Broad versus narrow research questions in evidence synthesis: a parallel to (and plea for) estimands, Research Synthesis Methods, 15(5), pp. 735--740

  35. [35]

    assessing the performance of population adjustment methods for anchored indirect comparisons: A simulation study

    Remiro-Az \'o car, A, Heath, A & Baio, G (2021 a ), Conflating marginal and conditional treatment effects: Comments on “assessing the performance of population adjustment methods for anchored indirect comparisons: A simulation study”, Statistics in Medicine, 40(11), pp. 2753--2758

  36. [36]

    750--775

    Remiro-Az \'o car, A, Heath, A & Baio, G (2021 b ), Methods for population adjustment with limited access to individual patient data: A review and simulation study, Research synthesis methods, 12(6), pp. 750--775

  37. [37]

    716--744

    Remiro-Az \'o car, A, Heath, A & Baio, G (2022), Parametric g-computation for compatible indirect treatment comparisons with limited individual patient data, Research synthesis methods, 13(6), pp. 716--744

  38. [38]

    197--203

    Riley, RD, Dias, S, Donegan, S, Tierney, JF, Stewart, LA, Efthimiou, O & Phillippo, DM (2023), Using individual participant data to improve network meta-analysis projects, BMJ evidence-based medicine, 28(3), pp. 197--203

  39. [39]

    935--945

    Signorovitch, JE, Wu, EQ, Andrew, PY, Gerrits, CM, Kantor, E, Bao, Y, Gupta, SR & Mulani, PM (2010), Comparative effectiveness without head-to-head trials, Pharmacoeconomics, 28(10), pp. 935--945

  40. [40]

    356--359

    Sj \"o lander, A, Dahlqwist, E & Zetterqvist, J (2016), A note on the noncollapsibility of rate differences and rate ratios, Epidemiology, 27(3), pp. 356--359

  41. [41]

    Wiley StatsRef: Statistics Reference Online

    stat00513 (2016), Meta-analysis, Hedges, L. Wiley StatsRef: Statistics Reference Online

  42. [42]

    Wiley StatsRef: Statistics Reference Online

    stat03728 (2014), Effect modification and interaction, Greenland, S. Wiley StatsRef: Statistics Reference Online

  43. [43]

    Wiley StatsRef: Statistics Reference Online

    stat05130 (2015), Collapsibility, Greenland, S. Wiley StatsRef: Statistics Reference Online

  44. [44]

    Wiley StatsRef: Statistics Reference Online

    stat05152 (2015), Effect modification, McKnight, B. Wiley StatsRef: Statistics Reference Online

  45. [45]

    Wiley StatsRef: Statistics Reference Online

    stat07909 (2014), Network meta-analysis, R\"ucker, G. Wiley StatsRef: Statistics Reference Online

  46. [46]

    399--411

    Van Lancker, K, Bretz, F & Dukes, O (2024), Covariate adjustment in randomized controlled trials: General concepts and practical considerations, Clinical Trials, 21(4), pp. 399--411

  47. [47]

    5577--5585

    Van Lancker, K, Vo, TT & Akacha, M (2022), Estimands in heath technology assessment: a causal inference perspective, Statistics in medicine, 41(28), pp. 5577--5585

  48. [48]

    657--685

    Vansteelandt, S & Dukes, O (2022), Assumption-lean inference for generalised linear model parameters, Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(3), pp. 657--685

  49. [49]

    982--994

    Vellaisamy, P & Vijay, V (2008), Collapsibility of regression coefficients and its extensions, Journal of statistical planning and inference, 138(4), pp. 982--994

  50. [50]

    371--381

    Wei, J, Xu, J, Bornkamp, B, Lin, R, Tian, H, Xi, D, Zhang, X, Zhao, Z & Roychoudhury, S (2024), Conditional and unconditional treatment effects in randomized clinical trials: Estimands, estimation, and interpretation, Statistics in Biopharmaceutical Research, 16(3), pp. 371--381

  51. [51]

    438--443

    Westreich, D, Edwards, JK, Lesko, CR, Cole, SR & Stuart, EA (2019), Target validity and the hierarchy of study designs, American journal of epidemiology, 188(2), pp. 438--443