Trust Me, I'm a Doctor?

Mats Stensrud; Zach Shahn

arxiv: 2605.01050 · v3 · pith:CYJBHKBYnew · submitted 2026-05-01 · 📊 stat.AP

Trust Me, I'm a Doctor?

Zach Shahn , Mats Stensrud This is my paper

Pith reviewed 2026-07-01 07:26 UTC · model grok-4.3

classification 📊 stat.AP

keywords physician strategiesclinical trialsobservational datasharp boundstreatment effectsmonotonicityevidence-based medicinenested designs

0 comments

The pith

Randomized and observational data yield sharp bounds on how often physicians outperform the trial's better treatment under a monotonicity assumption.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies the tension between trial-average treatment effects and individual physician decisions. It considers a randomized trial nested inside an observational cohort, so outcomes are seen under treatment, control, and usual care. The authors derive sharp bounds on the share of physicians whose personal strategies beat always picking the trial's better treatment. These bounds hold only under the assumption that no physician strategy is worse than always picking the trial's worse treatment. The results indicate when the data support following physician judgment rather than rigid trial recommendations.

Core claim

Under the assumption that no physician's strategy is worse than always choosing the worse performing treatment from the trial, sharp bounds can be derived on the proportion of physicians whose personal strategies perform better than always choosing the better performing treatment from the trial, using data from a randomized trial nested within an observational cohort.

What carries the argument

Sharp bounds on the proportion of outperforming physicians, obtained from nested randomized and observational data under a monotonicity restriction on physician strategies.

If this is right

The data can reveal how often physicians outperform the strategy suggested by the trial.
When the bounds are tight enough, they show whether relying on physician discretion is supported by the evidence.
When the bounds do not support discretion, stronger justification is required for departing from the trial-average recommendation.
The approach combines randomized and observational data from the same target population to assess individual-level performance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could be applied to existing trial-cohort data sets to produce numeric bounds for specific treatments.
If the monotonicity assumption is relaxed, wider or partial identification results might still be obtainable.
Policy decisions about allowing physician discretion could use these bounds as one quantitative input alongside other considerations.
The same nesting structure might be used to bound other forms of heterogeneity in treatment response.

Load-bearing premise

No physician's strategy performs worse than always choosing the worse performing treatment from the trial.

What would settle it

Observing even one physician whose outcomes under their strategy are worse than the outcomes from always choosing the trial's worse treatment would show the assumption fails and the bounds do not apply.

read the original abstract

Clinical trials usually target average treatment effects, but treatment decisions are made for individuals. This tension motivates a common criticism of evidence-based medicine: a treatment that is beneficial on average may be inappropriate for a particular patient, and skilled physicians may outperform rigid adherence to the strategy that performed best in a randomized trial. We consider how randomized and observational data from the same target population can be used to assess that possibility. Specifically, we study settings in which a randomized trial is nested within an observational cohort, so that outcomes are observed under treatment, control, and usual care. We ask what the observed data can reveal about how often physicians outperform the strategy suggested by the trial. We derive sharp bounds on the proportion of physicians whose personal strategies perform better than always choosing the better performing treatment from the trial under the assumption that no physician's strategy is worse than always choosing the worse performing treatment from the trial. These results shed light on when clinical data support relying on physician discretion over the trial-average recommendation and when stronger justification is required.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper derives sharp bounds on the share of physicians outperforming the trial-best strategy in nested RCT-observational data under a monotonicity assumption.

read the letter

The core result is a set of sharp bounds on how often physicians can beat the better trial arm, derived from the observed distribution under the assumption that no physician strategy is worse than the inferior trial arm. The nested design lets them observe outcomes under both trial arms and usual care, which supports the partial identification.

What stands out is the clean construction of those bounds. It directly tackles the average-versus-individual tension in evidence-based medicine without claiming point identification. The monotonicity restriction is stated explicitly and used to tighten the bounds, and the abstract indicates this specific approach is new relative to the cited work.

The main limitation is the strength of that monotonicity condition. It rules out any physician who performs worse than the inferior trial treatment, which may not hold when skill or bias varies. The paper likely notes this, but the bounds will move if the assumption is relaxed. No other internal inconsistencies appear in the setup.

This is for causal inference researchers and medical statisticians working on partial identification and treatment choice. It is a focused methodological contribution that deserves referee time even if the assumption needs discussion in review.

Referee Report

0 major / 1 minor

Summary. The paper considers a nested design with a randomized trial embedded in an observational cohort, yielding data on outcomes under treatment, control, and usual care. It derives sharp bounds on the proportion of physicians whose personal treatment strategies outperform always selecting the trial's better-performing arm, under the explicit monotonicity restriction that no physician strategy is worse than always selecting the trial's worse-performing arm. The bounds are expressed directly in terms of the observed data distribution.

Significance. If the derivation holds, the result supplies a partial-identification tool for quantifying when observed data support physician discretion over rigid adherence to the trial-best strategy. The explicit conditioning on a single, interpretable monotonicity assumption and the claim of sharpness constitute a clear methodological contribution to the literature on individual-level treatment rules and evidence-based medicine.

minor comments (1)

[Abstract] Abstract: the phrase 'always choosing the better performing treatment from the trial' could be clarified with a short parenthetical on how 'better' is defined when the trial reports heterogeneous effects.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the careful reading of the manuscript, the accurate summary of its contribution, and the recommendation to accept. We are pleased that the work is viewed as supplying a useful partial-identification tool under a single, interpretable monotonicity restriction.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents a partial-identification result deriving sharp bounds on the proportion of physicians outperforming the trial-best strategy, explicitly conditional on the stated monotonicity assumption (no strategy worse than the trial-worst). This is a standard bounding exercise from the observed data distribution under an external domain restriction; the assumption is not derived from the data or from self-referential equations, nor is any fitted parameter renamed as a prediction. No load-bearing self-citation, ansatz smuggling, or uniqueness theorem imported from prior author work appears in the provided abstract or strongest-claim description. The central claim remains independent of its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on one explicit domain assumption about physician strategies and the structural fact that a randomized trial is nested inside an observational cohort; no free parameters or new entities are introduced in the abstract.

axioms (1)

domain assumption No physician's strategy is worse than always choosing the worse performing treatment from the trial.
This monotonicity restriction is required to obtain sharp bounds and is stated explicitly in the abstract.

pith-pipeline@v0.9.1-grok · 5694 in / 1194 out tokens · 23195 ms · 2026-07-01T07:26:12.572145+00:00 · methodology

Review history (3 revisions) →

discussion (0)

Reference graph

Works this paper leans on

5 extracted references · 5 canonical work pages

[1]

Ole Fr¨ obert, Bo Lagerqvist, G¨ oran K

doi: 10.1056/NEJMoa1706443. Ole Fr¨ obert, Bo Lagerqvist, G¨ oran K. Olivecrona, Elmir Omerovic, Thorarinn Gudnason, Michael Maeng, Mattias Aasa, Oskar Anger˚ as, Fredrik Calais, Mariusz Danielewicz, David Erlinge, Lars Hellsten, Ulf Jensen, Anders C. Johansson, Bj¨ orn K˚ allstr¨ om, Bertil Lindahl, Johan Nilsson, Lars Robertson, Lennart Sandhall, Ingema...

work page doi:10.1056/nejmoa1706443
[2]

doi: 10.1056/NEJMoa1308789. M.A. Hernan and J.M. Robins.Causal Inference: What If. Chapman & Hall/CRC Monographs on Statistics & Applied Probab. CRC Press,

work page doi:10.1056/nejmoa1308789
[3]

Anthony A Matthews, Issa J Dahebreh, Conor J MacDonald, Bertil Lindahl, Robin Hofmann, David Erlinge, Troels Yndigegn, Anita Berglund, Tomas Jernberg, and Miguel A Hern´ an

doi: 10.1056/NEJMoa1814052. Anthony A Matthews, Issa J Dahebreh, Conor J MacDonald, Bertil Lindahl, Robin Hofmann, David Erlinge, Troels Yndigegn, Anita Berglund, Tomas Jernberg, and Miguel A Hern´ an. Prospective benchmarking of an observational analysis in the swedeheart registry against the reduce-ami randomized trial.European journal of epidemiology, ...

work page doi:10.1056/nejmoa1814052
[4]

perspectives on ‘harm’ in personalized medicine–an alternative perspective

doi: 10.1093/aje/kwj088. Aaron L Sarvet and Mats J Stensrud. Perspective on ‘harm’ in personalized medicine.American Journal of Epidemiology, 194(6):1743–1748, 2025a. Aaron L Sarvet and Mats J Stensrud. Rejoinder to “perspectives on ‘harm’ in personalized medicine–an alternative perspective”.American Journal of Epidemiology, 194(6):1752–1755, 2025b. Amit ...

work page doi:10.1093/aje/kwj088
[5]

Identification and estimation of joint potential outcome distribu- tions from a single study.arXiv preprint arXiv:2509.20506,

7 Zach Shahn and David Madigan. Identification and estimation of joint potential outcome distribu- tions from a single study.arXiv preprint arXiv:2509.20506,

work page arXiv

[1] [1]

Ole Fr¨ obert, Bo Lagerqvist, G¨ oran K

doi: 10.1056/NEJMoa1706443. Ole Fr¨ obert, Bo Lagerqvist, G¨ oran K. Olivecrona, Elmir Omerovic, Thorarinn Gudnason, Michael Maeng, Mattias Aasa, Oskar Anger˚ as, Fredrik Calais, Mariusz Danielewicz, David Erlinge, Lars Hellsten, Ulf Jensen, Anders C. Johansson, Bj¨ orn K˚ allstr¨ om, Bertil Lindahl, Johan Nilsson, Lars Robertson, Lennart Sandhall, Ingema...

work page doi:10.1056/nejmoa1706443

[2] [2]

doi: 10.1056/NEJMoa1308789. M.A. Hernan and J.M. Robins.Causal Inference: What If. Chapman & Hall/CRC Monographs on Statistics & Applied Probab. CRC Press,

work page doi:10.1056/nejmoa1308789

[3] [3]

Anthony A Matthews, Issa J Dahebreh, Conor J MacDonald, Bertil Lindahl, Robin Hofmann, David Erlinge, Troels Yndigegn, Anita Berglund, Tomas Jernberg, and Miguel A Hern´ an

doi: 10.1056/NEJMoa1814052. Anthony A Matthews, Issa J Dahebreh, Conor J MacDonald, Bertil Lindahl, Robin Hofmann, David Erlinge, Troels Yndigegn, Anita Berglund, Tomas Jernberg, and Miguel A Hern´ an. Prospective benchmarking of an observational analysis in the swedeheart registry against the reduce-ami randomized trial.European journal of epidemiology, ...

work page doi:10.1056/nejmoa1814052

[4] [4]

perspectives on ‘harm’ in personalized medicine–an alternative perspective

doi: 10.1093/aje/kwj088. Aaron L Sarvet and Mats J Stensrud. Perspective on ‘harm’ in personalized medicine.American Journal of Epidemiology, 194(6):1743–1748, 2025a. Aaron L Sarvet and Mats J Stensrud. Rejoinder to “perspectives on ‘harm’ in personalized medicine–an alternative perspective”.American Journal of Epidemiology, 194(6):1752–1755, 2025b. Amit ...

work page doi:10.1093/aje/kwj088

[5] [5]

Identification and estimation of joint potential outcome distribu- tions from a single study.arXiv preprint arXiv:2509.20506,

7 Zach Shahn and David Madigan. Identification and estimation of joint potential outcome distribu- tions from a single study.arXiv preprint arXiv:2509.20506,

work page arXiv