Arbitrated Indirect Treatment Comparisons

Weili He; Yixin Fang

arxiv: 2510.18071 · v2 · submitted 2025-10-20 · 📊 stat.ML · cs.LG· stat.ME

Arbitrated Indirect Treatment Comparisons

Yixin Fang , Weili He This is my paper

Pith reviewed 2026-05-18 05:28 UTC · model grok-4.3

classification 📊 stat.ML cs.LGstat.ME

keywords indirect treatment comparisonMAICoverlap populationMAIC paradoxhealth technology assessmentmatching-adjusted indirect comparisontreatment effect estimation

0 comments

The pith

Arbitrated indirect comparisons estimate treatment effects on the overlap population to eliminate conflicting conclusions from different sponsors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Matching-adjusted indirect comparison allows estimation of treatment effects by reweighting data from one trial to match another's population summary. The MAIC paradox occurs when sponsors targeting different populations reach opposite conclusions on the same data. The proposed arbitrated methods resolve this by always targeting the overlap population, where patients fit both trials' characteristics. This creates consistent estimates that do not depend on which sponsor conducts the analysis. Readers in health policy and statistics would care because it offers a way to make indirect evidence more reliable for treatment decisions.

Core claim

The paper claims that by focusing on the overlap population as the common target, arbitrated indirect treatment comparisons produce treatment effect estimates that are consistent across different analyses of the same data. This addresses the MAIC paradox where each sponsor implicitly chooses a different target population, leading to disagreements on relative effectiveness. The methods build on reweighting techniques but add arbitration to select the shared population.

What carries the argument

The overlap population serves as the arbitrated common target population for consistent estimation of treatment effects in indirect comparisons.

If this is right

Different sponsors analyzing the same trials will agree on treatment superiority when both target the overlap population.
Indirect comparison estimates become the same regardless of which trial's population is used as the reference.
Health technology assessments gain a standardized target that reduces disputes over which evidence to trust.
Existing IPD and AgD can still be used without collecting new data for the common target.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Extending the approach to network meta-analysis could define a single overlap target across many studies for broader consistency.
Sensitivity checks on covariate definitions of overlap would show how stable the agreement remains under different choices.
Regulators could adopt overlap targeting as a default rule to make reimbursement decisions less prone to sponsor-specific results.

Load-bearing premise

The overlap population must be large enough and clinically relevant so that estimates there apply without introducing new biases from restricting the population.

What would settle it

A randomized trial conducted specifically in the overlap population that shows treatment rankings different from those predicted by the arbitrated comparisons would challenge the claim of consistent resolution.

read the original abstract

Matching-adjusted indirect comparison (MAIC) has been increasingly employed in health technology assessments (HTA). By reweighting subjects from a trial with individual participant data (IPD) to match the covariate summary statistics of another trial with only aggregate data (AgD), MAIC facilitates the estimation of a treatment effect defined with respect to the AgD trial population. This manuscript introduces a new class of methods, termed arbitrated indirect treatment comparisons, designed to address the ``MAIC paradox'' -- a phenomenon highlighted by Jiang et al.~(2025). The MAIC paradox arises when different sponsors, analyzing the same data, reach conflicting conclusions regarding which treatment is more effective. The underlying issue is that each sponsor implicitly targets a different population. To resolve this inconsistency, the proposed methods focus on estimating treatment effects in a common target population, specifically chosen to be the overlap population.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces arbitrated indirect comparisons that target the overlap population to eliminate sponsor-dependent MAIC results, though the approach may still run into trouble with imprecise overlap from aggregate summaries.

read the letter

The main point is that this work defines a new class of arbitrated indirect treatment comparisons to fix the MAIC paradox by locking the analysis to the overlap population instead of letting each sponsor match to the other's aggregate data. That move is meant to produce the same treatment effect estimate no matter which trial supplies the individual participant data. The paper does a clear job spelling out how the paradox comes from mismatched target populations and why the overlap offers a shared, more neutral ground. It builds directly on the Jiang et al. 2025 observation and keeps the proposal inside the existing MAIC setup, which makes the framing easy to follow for people already using these methods. The focus on a defined overlap population is a reasonable attempt to preserve clinical relevance while cutting down on arbitrary sponsor choices. The softer part is whether the method actually delivers invariance in practice. When only means and variances come from the aggregate trial, the true overlap region is hard to pin down exactly, and any moment-matching or trimming step can shift the effective population in ways that depend on which dataset is treated as the IPD source. The stress-test concern about potential bias from imperfect overlap identification looks like it needs checking against the derivations or any simulations. This is aimed at biostatisticians who handle indirect comparisons for health technology assessments. Readers who already work with MAIC will see the practical angle right away. It deserves peer review so the details on weighting, positivity assumptions, and finite-sample behavior can get a proper look.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces a class of 'arbitrated indirect treatment comparisons' to resolve the MAIC paradox, in which sponsors analyzing the same IPD and AgD trials reach conflicting conclusions about relative treatment effects because each implicitly targets a different population. The proposed methods reweight or arbitrate to a common target chosen as the overlap population, with the goal of producing treatment-effect estimates that are invariant to which sponsor performs the analysis.

Significance. If the arbitrated estimators can be shown to be consistent for the overlap-specific effect and invariant under realistic aggregate-data constraints, the work would address a documented source of inconsistency in health-technology-assessment submissions. The emphasis on a well-defined common target population is a constructive response to the paradox identified by Jiang et al. (2025), and the approach could improve reproducibility of indirect comparisons if accompanied by clear theoretical guarantees and practical implementation guidance.

major comments (3)

[§3.2, Eq. (7)] §3.2, Eq. (7): the arbitrated weighting function is defined via moment matching to the overlap region, yet the manuscript provides no formal proof that the resulting estimator remains invariant to the choice of which trial supplies the IPD when only first- and second-moment summaries are available from the AgD trial; finite-sample trimming or support mismatch can induce bias that is not quantified.
[§5] §5, Simulation design: all reported scenarios assume the true overlap is exactly recoverable from the supplied moments and that positivity holds strictly; no results are shown for cases in which the AgD covariate support is narrower than the moment-matched region, which directly tests the skeptic's concern that new selection effects may be introduced.
[§4.1] §4.1: the claim that targeting the overlap population avoids reduction in clinical relevance is asserted without a quantitative comparison of effective sample size or variance inflation relative to standard MAIC targeting the AgD population; this trade-off is central to whether the method is practically preferable.

minor comments (2)

[§2] Notation: the symbol for the overlap population is introduced without an explicit set-theoretic definition in the early sections; adding one would clarify subsequent derivations.
[Figure 2] Figure 2: the schematic of the arbitrated procedure would be clearer if the arrows were labeled with the specific weighting or arbitration step being performed.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the theoretical foundations, robustness checks, and practical considerations of arbitrated indirect treatment comparisons. We address each major comment below and will revise the manuscript to incorporate the suggested improvements where feasible.

read point-by-point responses

Referee: [§3.2, Eq. (7)]: the arbitrated weighting function is defined via moment matching to the overlap region, yet the manuscript provides no formal proof that the resulting estimator remains invariant to the choice of which trial supplies the IPD when only first- and second-moment summaries are available from the AgD trial; finite-sample trimming or support mismatch can induce bias that is not quantified.

Authors: We agree that the current manuscript does not contain a formal proof of invariance when only first- and second-moment summaries are supplied. The arbitrated weighting is constructed to target the overlap population via moment matching, which yields invariance under the assumption that the moments fully characterize the relevant distributions; with limited aggregate data this is necessarily an approximation. We will revise §3.2 to state the precise conditions under which invariance holds, add a brief discussion of potential bias arising from support mismatch or trimming, and include a short theoretical note on the resulting estimator properties. A full proof under arbitrary moment constraints is not feasible within the present framework and will be noted as a limitation. revision: partial
Referee: [§5]: all reported scenarios assume the true overlap is exactly recoverable from the supplied moments and that positivity holds strictly; no results are shown for cases in which the AgD covariate support is narrower than the moment-matched region, which directly tests the skeptic's concern that new selection effects may be introduced.

Authors: We acknowledge that the simulation design in §5 focuses on settings where the supplied moments permit exact recovery of the overlap and strict positivity. To directly address the concern about narrower AgD support introducing additional selection effects, we will expand the simulation study to include scenarios with mismatched covariate supports. The revised §5 will report bias, variance, and coverage for these cases, thereby quantifying the robustness of the arbitrated estimators under realistic aggregate-data constraints. revision: yes
Referee: [§4.1]: the claim that targeting the overlap population avoids reduction in clinical relevance is asserted without a quantitative comparison of effective sample size or variance inflation relative to standard MAIC targeting the AgD population; this trade-off is central to whether the method is practically preferable.

Authors: We agree that a quantitative assessment of the effective-sample-size and variance trade-off is necessary to evaluate practical preferability. In the revised manuscript we will augment §4.1 with both analytic expressions and simulation-based comparisons of effective sample size and variance inflation between the overlap-targeted arbitrated estimators and standard MAIC targeting the AgD population. This addition will make the clinical-relevance claim evidence-based rather than asserted. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the derivation of arbitrated indirect treatment comparisons

full rationale

The paper introduces a new class of methods to resolve the MAIC paradox by explicitly targeting the overlap population as a common target for treatment effect estimation. The abstract and provided context define this choice directly as the resolution to sponsor-specific population differences, without reducing any estimator or prediction to a fitted input, self-referential definition, or load-bearing self-citation. No equations are shown that equate a claimed result to its own construction, and the central premise rests on standard reweighting applied to a predefined overlap region rather than deriving that region from the target quantity itself. The derivation chain remains independent of the result being claimed.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no information on free parameters, axioms, or invented entities; no details on modeling choices or assumptions are extractable.

pith-pipeline@v0.9.0 · 5671 in / 1201 out tokens · 62662 ms · 2026-05-18T05:28:23.160283+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages

[1]

Caro, J. J. and Ishak, K. J. (2010). No head-to-head trial? simulate the missing arms.Pharma- coeconomics, 28:957–967

work page 2010
[2]

Ding, Y., Liu, Y., and Qu, Y. (2025). Empirical simulation for complex clinical trial data in drug development.Communications in Statistics-Simulation and Computation, pages 1–10

work page 2025
[3]

(2024).Causal Inference in Pharmaceutical Statistics

Fang, Y. (2024).Causal Inference in Pharmaceutical Statistics. CRC Press. Hern´ an, M. A. and Robins, J. M. (2020).Causal Inference: What If. Boca Raton: Chapman & Hall/CRC. Højsgaard, S., Halekoh, U., and Yan, J. (2006). The r package geepack for generalized estimating equations.Journal of Statistical Software, 15:1–11

work page 2024
[4]

Imbens, G. W. and Rubin, D. B. (2015).Causal inference in statistics, social, and biomedical sciences. Cambridge University Press

work page 2015
[5]

J., Proskorovsky, I., and Benedict, A

Ishak, K. J., Proskorovsky, I., and Benedict, A. (2015). Simulation and matching-based approaches for indirect comparison of treatments.Pharmacoeconomics, 33(6):537–549. 12

work page 2015
[6]

C., Abrahami, D., Chen, Y., and Chu, H

Jiang, Z., Liu, J., Alemayehu, D., Cappelleri, J. C., Abrahami, D., Chen, Y., and Chu, H. (2025). A critical assessment of matching-adjusted indirect comparisons in relation to target populations. Research Synthesis Methods, 16(3):569–574

work page 2025
[7]

L., and Zaslavsky, A

Li, F., Morgan, K. L., and Zaslavsky, A. M. (2018). Balancing covariates via propensity score weighting.Journal of the American Statistical Association, 113(521):390–400

work page 2018
[8]

and Zeger, S

Liang, K.-Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73(1):13–22

work page 1986
[9]

E., Dias, S., Palmer, S., Abrams, K

Phillippo, D., Ades, A. E., Dias, S., Palmer, S., Abrams, K. R., and Welton, N. J. (2016). Nice dsu technical support document 18: methods for population-adjusted indirect comparisons in submissions to nice.NICE Decision Support Unit

work page 2016
[10]

Rosenbaum, P. R. and Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects.Biometrika, 70(1):41–55. SAS Inc. (2023).SAS/STAT ® 15.3 User’s Guide. SAS Institute Inc., Cary, NC

work page 1983
[11]

Wu, E. Q. (2012). Matching-adjusted indirect comparisons: a new tool for timely comparative effectiveness research.Value in Health, 15(6):940–947

work page 2012
[12]

E., Wu, E

Signorovitch, J. E., Wu, E. Q., Yu, A. P., Gerrits, C. M., Kantor, E., Bao, Y., Gupta, S. R., and Mulani, P. M. (2010). Comparative effectiveness without head-to-head trials: a method for matching-adjusted indirect comparisons applied to psoriasis treatment with adalimumab or etanercept.Pharmacoeconomics, 28:935–945. 13

work page 2010

[1] [1]

Caro, J. J. and Ishak, K. J. (2010). No head-to-head trial? simulate the missing arms.Pharma- coeconomics, 28:957–967

work page 2010

[2] [2]

Ding, Y., Liu, Y., and Qu, Y. (2025). Empirical simulation for complex clinical trial data in drug development.Communications in Statistics-Simulation and Computation, pages 1–10

work page 2025

[3] [3]

(2024).Causal Inference in Pharmaceutical Statistics

Fang, Y. (2024).Causal Inference in Pharmaceutical Statistics. CRC Press. Hern´ an, M. A. and Robins, J. M. (2020).Causal Inference: What If. Boca Raton: Chapman & Hall/CRC. Højsgaard, S., Halekoh, U., and Yan, J. (2006). The r package geepack for generalized estimating equations.Journal of Statistical Software, 15:1–11

work page 2024

[4] [4]

Imbens, G. W. and Rubin, D. B. (2015).Causal inference in statistics, social, and biomedical sciences. Cambridge University Press

work page 2015

[5] [5]

J., Proskorovsky, I., and Benedict, A

Ishak, K. J., Proskorovsky, I., and Benedict, A. (2015). Simulation and matching-based approaches for indirect comparison of treatments.Pharmacoeconomics, 33(6):537–549. 12

work page 2015

[6] [6]

C., Abrahami, D., Chen, Y., and Chu, H

Jiang, Z., Liu, J., Alemayehu, D., Cappelleri, J. C., Abrahami, D., Chen, Y., and Chu, H. (2025). A critical assessment of matching-adjusted indirect comparisons in relation to target populations. Research Synthesis Methods, 16(3):569–574

work page 2025

[7] [7]

L., and Zaslavsky, A

Li, F., Morgan, K. L., and Zaslavsky, A. M. (2018). Balancing covariates via propensity score weighting.Journal of the American Statistical Association, 113(521):390–400

work page 2018

[8] [8]

and Zeger, S

Liang, K.-Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73(1):13–22

work page 1986

[9] [9]

E., Dias, S., Palmer, S., Abrams, K

Phillippo, D., Ades, A. E., Dias, S., Palmer, S., Abrams, K. R., and Welton, N. J. (2016). Nice dsu technical support document 18: methods for population-adjusted indirect comparisons in submissions to nice.NICE Decision Support Unit

work page 2016

[10] [10]

Rosenbaum, P. R. and Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects.Biometrika, 70(1):41–55. SAS Inc. (2023).SAS/STAT ® 15.3 User’s Guide. SAS Institute Inc., Cary, NC

work page 1983

[11] [11]

Wu, E. Q. (2012). Matching-adjusted indirect comparisons: a new tool for timely comparative effectiveness research.Value in Health, 15(6):940–947

work page 2012

[12] [12]

E., Wu, E

Signorovitch, J. E., Wu, E. Q., Yu, A. P., Gerrits, C. M., Kantor, E., Bao, Y., Gupta, S. R., and Mulani, P. M. (2010). Comparative effectiveness without head-to-head trials: a method for matching-adjusted indirect comparisons applied to psoriasis treatment with adalimumab or etanercept.Pharmacoeconomics, 28:935–945. 13

work page 2010