Efficient Generative Prediction for EHR Foundation Models: The SCOPE and REACH Estimators

Bashar Ramadan; Brett K. Beaulieu-Jones; Luke Solo; Matthew B.A. McDermott; Michael C. Burkhart; William F. Parker

arxiv: 2602.03730 · v2 · pith:Q6KJXRDWnew · submitted 2026-02-03 · 📊 stat.ML · cs.LG

Efficient Generative Prediction for EHR Foundation Models: The SCOPE and REACH Estimators

Luke Solo , Matthew B.A. McDermott , William F. Parker , Bashar Ramadan , Michael C. Burkhart , Brett K. Beaulieu-Jones This is my paper

Pith reviewed 2026-05-16 07:18 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords generative EHR modelsoutcome predictionMonte Carlo samplingvariance reductionunbiased estimatorsRao-BlackwellizationSCOPEREACH

0 comments

The pith

SCOPE and REACH estimators enable unbiased clinical outcome prediction from generative EHR models with far fewer tokens than Monte Carlo sampling.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Generative foundation models for electronic health records predict clinical outcomes by simulating future trajectories via Monte Carlo sampling, but this suffers from sparse estimates, high computational cost, and high variance. The paper introduces the Sum of Conditional Outcome Probability Estimator (SCOPE) and Risk Estimation from Anticipated Conditional Hazards (REACH) that leverage next-token probability distributions. Both estimators are proven unbiased, with REACH guaranteeing variance reduction over Monte Carlo for any model and outcome as a Rao-Blackwellization of importance sampling. Across 11 outcomes in MIMIC-IV and UChicago data, they match 100-sample Monte Carlo accuracy with median token reductions of 2.5 to 3.4 times and over 80 times for rare outcomes, while preserving calibration. This reduces the inference budget for generative EHR models, making them more practical for clinical use especially on rare high-impact events.

Core claim

The central claim is that SCOPE and REACH are unbiased estimators that use the generative model's next-token probabilities to compute outcome risks more efficiently than full trajectory Monte Carlo sampling, with REACH providing guaranteed variance reduction via Rao-Blackwellization of any naive importance sampling scheme that preserves the non-outcome token distribution.

What carries the argument

The SCOPE (Sum of Conditional Outcome Probability Estimator) and REACH (Risk Estimation from Anticipated Conditional Hazards) estimators that compute outcome probabilities by summing or anticipating conditional probabilities drawn from next-token distributions.

If this is right

Both estimators remain unbiased for any generative model and any outcome.
REACH guarantees variance reduction over Monte Carlo sampling for every model and outcome.
REACH is a Rao-Blackwellization of naive importance sampling schemes that preserve the non-outcome token distribution.
SCOPE reuses one sampled pool across arbitrary numbers of outcomes at no marginal generation cost.
Empirical accuracy matching 100-sample Monte Carlo is achieved with 2.5x to 3.4x median token reductions and over 80x for the rarest outcomes, with calibration preserved.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same estimators could reduce sampling costs in any generative model that produces sequential token probabilities, such as time-series or language models.
For clinical systems tracking many outcomes simultaneously, SCOPE would minimize total generation cost while REACH supplies per-outcome variance control.
If next-token modeling accuracy improves, these estimators would automatically deliver larger efficiency gains without changes to the sampling procedure.
A direct test would be to measure wall-clock inference time on a fixed hardware budget when replacing Monte Carlo with REACH for rare-event screening.

Load-bearing premise

The generative model's next-token probability distributions accurately reflect the underlying data distribution and can be directly leveraged for conditional outcome probability calculations without further approximation or model-specific adjustments.

What would settle it

A comparison where SCOPE or REACH estimates on a fixed model deviate from outcome frequencies obtained by running millions of Monte Carlo trajectories on the same model.

read the original abstract

Generative foundation models trained on tokenized electronic health record (EHR) timelines show promise for clinical outcome prediction via Monte Carlo sampling of simulated future trajectories. However, this approach suffers from three coupled limitations: sparse estimate distributions that poorly differentiate patient risk levels, extreme computational cost, and high sampling variance. We propose two new estimators that leverage next-token probability distributions underutilized by standard Monte Carlo: the Sum of Conditional Outcome Probability Estimator (SCOPE) and Risk Estimation from Anticipated Conditional Hazards (REACH). We prove both are unbiased, that REACH guarantees variance reduction over Monte Carlo for any model and outcome, and that REACH is a Rao-Blackwellization of any naive importance sampling scheme that preserves the non-outcome token distribution. Empirically, across $11$ clinically important outcomes in MIMIC-IV and the UChicago health system, SCOPE and REACH match $100$-sample Monte Carlo accuracy with median token reductions of $2.5\times$ to $3.4\times$ and reductions exceeding $80\times$ for the rarest outcomes, with calibration preserved throughout. Because SCOPE reuses a single sampled pool across an arbitrary number of outcomes at no marginal generation cost while REACH provides a per-task variance guarantee, the two estimators are complementary in deployment and together meaningfully reduce the inference budget required for generative EHR foundation models, particularly for rare, high-impact outcomes in healthcare.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript introduces SCOPE (Sum of Conditional Outcome Probability Estimator) and REACH (Risk Estimation from Anticipated Conditional Hazards) as alternatives to Monte Carlo sampling for clinical outcome prediction with generative EHR foundation models. It claims proofs that both estimators are unbiased, that REACH guarantees variance reduction over Monte Carlo for any model and outcome via Rao-Blackwellization of importance sampling that preserves the non-outcome token distribution, and that SCOPE enables reuse of a single sample pool across outcomes. Empirical results on 11 outcomes from MIMIC-IV and UChicago datasets report that the estimators match 100-sample Monte Carlo accuracy with median token reductions of 2.5×–3.4× (exceeding 80× for rarest outcomes) while preserving calibration.

Significance. If the unbiasedness and variance-reduction claims hold, the work provides a practical, theoretically grounded reduction in inference cost for generative EHR models, particularly valuable for rare high-impact outcomes where Monte Carlo variance is prohibitive. The complementary strengths of SCOPE (cross-outcome reuse at zero marginal cost) and REACH (per-task variance guarantee) are a clear strength, and the application of standard Monte Carlo and Rao-Blackwell tools to this domain is cleanly executed.

minor comments (3)

[Abstract, §4] Abstract and §4: the statement that SCOPE and REACH 'match 100-sample Monte Carlo accuracy' should specify the exact metric (e.g., AUC, Brier score, or calibration slope) and the tolerance used to declare equivalence; without this the reported token reductions are difficult to interpret.
[§3.2] §3.2: the proof that REACH is a Rao-Blackwellization of naive importance sampling would benefit from an explicit statement of the conditioning sigma-algebra and the preservation of the non-outcome token marginal; a short lemma isolating this step would improve readability.
[Table 2] Table 2: the per-outcome token-reduction factors are reported only as medians across models; adding inter-quartile ranges or per-model breakdowns would strengthen the claim that gains are consistent rather than driven by a few favorable cases.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the accurate summary of our work and for highlighting the practical value of the unbiasedness and variance-reduction properties of SCOPE and REACH. We are pleased with the recommendation for minor revision. No specific major comments were raised in the report, so we have no changes to propose at this time but are happy to incorporate any additional feedback the editor or referee may provide.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper defines SCOPE as the sum of conditional outcome probabilities and REACH as a Rao-Blackwellized conditional expectation over next-token distributions. Both unbiasedness and the variance-reduction guarantee follow directly from the definitions via standard conditional-probability identities and the Rao-Blackwell theorem; no parameter is fitted to data and then relabeled as a prediction, no self-citation supplies a load-bearing uniqueness result, and no ansatz is smuggled in. The derivation chain is therefore self-contained and does not reduce any claimed result to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on the domain assumption that the generative model's next-token probabilities are directly usable for estimation; no free parameters or new entities are introduced in the abstract.

axioms (1)

domain assumption The generative foundation model provides accurate next-token probability distributions that can be used directly for conditional outcome calculations.
The estimators SCOPE and REACH are defined in terms of these probabilities as stated in the abstract.

pith-pipeline@v0.9.0 · 5575 in / 1271 out tokens · 47137 ms · 2026-05-16T07:18:35.675829+00:00 · methodology

Efficient Generative Prediction for EHR Foundation Models: The SCOPE and REACH Estimators

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)