Validated Synthetic Patient Generation for Small Longitudinal Cohorts: Coagulation Dynamics Across Pregnancy

Carole McBride; Ira Bernstein; Jeffrey D. Varner; Maria Cristina Bravo; Thomas Orfeo

arxiv: 2604.07557 · v1 · submitted 2026-04-08 · 💻 cs.LG · q-bio.QM

Validated Synthetic Patient Generation for Small Longitudinal Cohorts: Coagulation Dynamics Across Pregnancy

Jeffrey D. Varner , Maria Cristina Bravo , Carole McBride , Thomas Orfeo , Ira Bernstein This is my paper

Pith reviewed 2026-05-10 17:31 UTC · model grok-4.3

classification 💻 cs.LG q-bio.QM

keywords synthetic patient generationlongitudinal cohortscoagulation dynamicspregnancystochastic attentiongenerative modelsdata augmentationsmall sample size

0 comments

The pith

A generative method creates synthetic patients from 23 real cases that match them statistically, structurally, and in mechanistic coagulation models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents multiplicity-weighted Stochastic Attention as a way to generate new patient profiles when real longitudinal data sets are too small for standard modeling. Real patient records are stored as patterns in an energy landscape, and new samples are produced by dynamics that stay faithful to the original geometry while allowing extra copies of rare subgroups. In tests on coagulation data from 23 pregnant women across three time points and 72 features, the synthetic records passed statistical distribution checks, structural similarity measures, and an ordinary differential equation model of the clotting cascade. A practical check showed that a mechanistic model trained solely on the synthetic records predicted outcomes for held-out real patients at the same level of accuracy as a model trained on the real records. The work targets settings such as maternal health and rare-disease studies where additional real enrollment is slow or expensive.

Core claim

Multiplicity-weighted Stochastic Attention embeds real patient profiles as memory patterns in a continuous energy landscape and samples novel synthetic patients through Langevin dynamics that preserve cohort geometry. Applied to the 23-patient coagulation data set, the generated patients were statistically, structurally, and mechanistically indistinguishable from the originals, including agreement with an ODE model of the coagulation cascade. A downstream test confirmed that mechanistic models calibrated entirely on the synthetic patients predicted held-out real patient outcomes as accurately as models calibrated on the real data.

What carries the argument

Multiplicity-weighted Stochastic Attention (SA), a Hopfield-network-based generator that stores patient profiles as memory patterns and draws new samples via Langevin dynamics, with per-pattern weights that amplify rare subgroups at inference time without retraining.

Load-bearing premise

The chosen validation tests are sufficient to establish that the synthetic patients are clinically useful and will generalize beyond this 23-patient coagulation data set.

What would settle it

A finding that a mechanistic coagulation model calibrated on the synthetic patients predicts held-out real patient outcomes with clearly lower accuracy than one calibrated on the real data would falsify the claim of equivalent downstream utility.

Figures

Figures reproduced from arXiv: 2604.07557 by Carole McBride, Ira Bernstein, Jeffrey D. Varner, Maria Cristina Bravo, Thomas Orfeo.

**Figure 1.** Figure 1: Physiological correlations are preserved in synthetic patients. Scatter plots of coagulation factor levels versus thrombin generation parameters across all three visits. Filled markers are real patients; open markers are SA-generated synthetic patients. Colors encode visit: blue (V1/baseline), green (V2/first trimester), orange (V3/third trimester). Real and synthetic patients occupy the same joint region… view at source ↗

**Figure 2.** Figure 2: Pregnancy-driven longitudinal trajectories are reproduced. Mean ± standard deviation bands for six key coagulation features across visits (BL = baseline, 1st = first trimester, 3rd = third trimester) for real (blue) and SA-generated synthetic (orange) patients. The characteristic pregnancy-driven increases in fibrinogen, Factor VIII, and vWF are captured, as are the stable-to-declining patterns in Factor … view at source ↗

**Figure 3.** Figure 3: Cross-visit correlation structure. Top row: full 216 × 216 Pearson correlation matrices for Real (K=23, left), SA (N=100, center), and MVN (N=100, right) populations. Each matrix is organized as a 3 × 3 grid of 72 × 72 blocks (delineated by black lines), where the diagonal blocks capture within-visit feature correlations (V1–V1, V2–V2, V3–V3) and the off-diagonal blocks capture cross-visit dependencies (… view at source ↗

**Figure 4.** Figure 4: PCA projections by visit: SA vs. MVN. Each panel shows the first two principal components (PC1: 27.7% variance, PC2: 15.4%) computed from standardized per-visit real patient data (K=23 patients per visit, 72 features). Dark markers indicate real patients; lighter markers indicate synthetic patients (N=100). Top row: SA-generated patients (circles) cluster tightly around the real data cloud at all three vis… view at source ↗

**Figure 5.** Figure 5: Condition-specific feature preservation. Grouped bar charts comparing real (dark bars) and SA-generated synthetic (light bars) patient means (± SD error bars) for eight coagulation features, shown separately for each clinical subgroup: Uncomplicated (n=18, left, blue), PCOS (n=3, center, orange), and Developed PE (n=5, right, red). Values are pooled across all three visits. Gray percentages indicate the me… view at source ↗

**Figure 6.** Figure 6: Mechanistic validation (TF-only). Top row: BZ2012 ODE-predicted TGA values (vertical axis) versus dataset TGA values (horizontal axis) for five thrombin generation parameters. The dataset values are the TGA measurements from each patient’s record, present for both real and synthetic patients. The ODE-predicted values are computed by running each patient’s coagulation factor levels through the 58-species B… view at source ↗

**Figure 7.** Figure 7: Downstream utility: synth-calibrated vs. real-calibrated mechanistic model predictions on held-out real patients. Each panel compares the BZ2012 ODE predictions for one TGA feature when the model is calibrated on real V1 patients (horizontal axis) versus synthetic V1 patients (vertical axis), evaluated on the same held-out real V2 and V3 patients. Points near the y=x line indicate that the two calibrations… view at source ↗

read the original abstract

Small longitudinal clinical cohorts, common in maternal health, rare diseases, and early-phase trials, limit computational modeling: too few patients to train reliable models, yet too costly and slow to expand through additional enrollment. We present multiplicity-weighted Stochastic Attention (SA), a generative framework based on modern Hopfield network theory that addresses this gap. SA embeds real patient profiles as memory patterns in a continuous energy landscape and generates novel synthetic patients via Langevin dynamics that interpolate between stored patterns while preserving the geometry of the original cohort. Per-pattern multiplicity weights enable targeted amplification of rare clinical subgroups at inference time without retraining. We applied SA to a longitudinal coagulation dataset from 23 pregnant patients spanning 72 biochemical features across 3 visits (pre-pregnancy baseline, first trimester, and third trimester), including rare subgroups such as polycystic ovary syndrome and preeclampsia. Synthetic patients generated by SA were statistically, structurally, and mechanistically indistinguishable from their real counterparts across multiple independent validation tests, including an ordinary differential equation model of the coagulation cascade. A downstream utility test further showed that a mechanistic model calibrated entirely on synthetic patients predicted held-out real patient outcomes as well as one calibrated on real data. These results demonstrate that SA can produce clinically useful synthetic cohorts from very small longitudinal datasets, enabling data-augmented modeling in small-cohort settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a Hopfield-based way to generate synthetic longitudinal patient trajectories from cohorts as small as 23, with weights to boost rare subgroups, but the n=23 validation does not support the strong 'indistinguishable' claims.

read the letter

The core contribution is a multiplicity-weighted stochastic attention generator, drawn from modern Hopfield networks, that embeds real patient vectors as memory patterns and samples new ones via Langevin dynamics. The weighting lets you amplify subgroups like preeclampsia at generation time without retraining. They test it on 72 coagulation features from 23 pregnant patients at three visits and add an ODE model of the cascade plus a downstream prediction check where a model fit only on synthetics performs comparably on held-out real cases. That multi-layer validation setup is the part that actually works and is worth noting for anyone facing similar data limits in maternal health or rare-disease modeling. The approach is new in this exact combination and setting, and the mechanistic ODE cross-check is a concrete strength over purely statistical generators. The main weakness is exactly the one the stress-test flags. With only 23 patients and three time points, standard tests for distributional match, correlation structure, ODE parameter recovery, and predictive parity have low power to detect moderate differences. Not rejecting the null therefore does not establish indistinguishability; it is equally consistent with undetected bias, especially once the rare-subgroup amplification reduces effective sample size further. Equivalence testing or explicit power calculations would have been the right move here, and their absence leaves the central claim under-supported. The paper is aimed at computational modelers who need to augment tiny longitudinal clinical datasets rather than at clinicians or large-scale ML practitioners. A reader already working on energy-based or attention-based generators for time-series medical data will get the most out of it as a worked example, while others will mainly see the sample-size limitation. It is coherent on its own terms and engages the relevant literature, so it deserves a serious referee even though the statistical claims will need tightening.

Referee Report

2 major / 2 minor

Summary. The paper introduces multiplicity-weighted Stochastic Attention (SA), a generative model grounded in modern Hopfield network theory that embeds real patient profiles as memory patterns and uses Langevin dynamics to generate novel synthetic longitudinal profiles while preserving cohort geometry. Per-pattern multiplicity weights allow amplification of rare subgroups at inference without retraining. Applied to a 23-patient longitudinal coagulation dataset (72 features, 3 visits: pre-pregnancy, first trimester, third trimester) including subgroups like PCOS and preeclampsia, the authors report that synthetics are statistically, structurally, and mechanistically indistinguishable from real data across validation layers including an independent ODE model of the coagulation cascade. A downstream utility experiment shows a mechanistic model calibrated solely on synthetics predicts held-out real outcomes comparably to one calibrated on real data.

Significance. If the indistinguishability and utility claims hold, the work would be significant for enabling reliable computational modeling in small-cohort domains such as maternal health and rare diseases, where data scarcity currently limits mechanistic and predictive modeling. The multi-layered validation strategy (statistical, structural, ODE mechanistic, and downstream predictive) and the ability to target rare subgroups via multiplicity weights without retraining represent strengths over purely statistical augmentation methods. The approach appears parameter-light, with only per-pattern multiplicity weights as free parameters.

major comments (2)

[Validation experiments / Results] The central claims of statistical, structural, and mechanistic indistinguishability (abstract and validation results) rest on hypothesis tests and comparisons performed with only 23 real patients (3 time points each). Standard tests for feature distributions, longitudinal correlations, and ODE parameter recovery have low power at this scale; failure to reject the null is consistent with both true fidelity and undetected moderate differences, especially in rare-subgroup amplification and trajectory dynamics. Equivalence testing or explicit power analysis is required to support the strong wording.
[Downstream utility experiment] Downstream utility test (abstract): the claim that a mechanistic model calibrated on synthetics 'predicted held-out real patient outcomes as well as' one calibrated on real data lacks reported held-out set size, performance metrics with confidence intervals, and a statistical test for non-inferiority. With small n, observed equivalence may reflect low power rather than true interchangeability.

minor comments (2)

[Abstract] Abstract: the phrase 'multiple independent validation tests' would benefit from naming the exact statistical, structural, and ODE metrics used.
[Methods] Notation for the energy landscape and Langevin dynamics steps could be clarified with a short pseudocode or equation reference in the methods.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive review. We have carefully addressed each major comment below, providing clarifications and making revisions to the manuscript where appropriate to strengthen the statistical rigor and reporting.

read point-by-point responses

Referee: [Validation experiments / Results] The central claims of statistical, structural, and mechanistic indistinguishability (abstract and validation results) rest on hypothesis tests and comparisons performed with only 23 real patients (3 time points each). Standard tests for feature distributions, longitudinal correlations, and ODE parameter recovery have low power at this scale; failure to reject the null is consistent with both true fidelity and undetected moderate differences, especially in rare-subgroup amplification and trajectory dynamics. Equivalence testing or explicit power analysis is required to support the strong wording.

Authors: We appreciate the referee's emphasis on the limited statistical power with n=23. This is an inherent constraint of the small-cohort setting our method targets. In the revised manuscript, we have added a post-hoc power analysis for the key hypothesis tests (now in Supplementary Materials) and incorporated equivalence testing via two one-sided tests (TOST) for distributional features, correlations, and ODE parameter recovery, using clinically motivated equivalence margins. We have also moderated the language in the abstract and Results from 'indistinguishable' to 'statistically consistent with' the real data. While we agree that no single test can be conclusive at this scale, the convergent evidence from statistical, structural, mechanistic (ODE), and predictive validations provides stronger support than isolated p-values alone. The multiplicity weighting and geometry-preserving generation further differentiate the approach in this low-n regime. revision: partial
Referee: [Downstream utility experiment] Downstream utility test (abstract): the claim that a mechanistic model calibrated on synthetics 'predicted held-out real patient outcomes as well as' one calibrated on real data lacks reported held-out set size, performance metrics with confidence intervals, and a statistical test for non-inferiority. With small n, observed equivalence may reflect low power rather than true interchangeability.

Authors: We agree that fuller reporting is needed. The revised manuscript now explicitly states the held-out set size, reports the relevant performance metrics with 95% confidence intervals, and includes a non-inferiority test (with pre-specified margin) comparing the synthetic-calibrated model to the real-data model. These details appear in the main Results and a new supplementary table. The updated presentation supports the utility claim while acknowledging the small-sample context; the consistency with the other validation layers helps address concerns about low power. revision: yes

Circularity Check

0 steps flagged

No significant circularity; validations are external and independent of generative process

full rationale

The paper derives a generative method (multiplicity-weighted Stochastic Attention) from modern Hopfield network theory, embeds real patient profiles as memory patterns, and samples new patients via Langevin dynamics. All load-bearing claims of indistinguishability and downstream utility rest on separate external tests: statistical and structural comparisons to real data, mechanistic match to an independent ODE coagulation model, and predictive performance of a mechanistic model trained on synthetics versus real data when evaluated on held-out real outcomes. None of these reduce by construction to quantities defined inside the generative equations or to fitted parameters renamed as predictions. No self-citation chains, uniqueness theorems, or ansatzes are invoked to force the central results; the validations remain falsifiable against the held-out real cohort and are therefore self-contained.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim depends on the assumption that Langevin sampling in the Hopfield energy landscape preserves clinically relevant geometry and that the listed validation tests are adequate proxies for real-world utility; no new physical entities are introduced.

free parameters (1)

per-pattern multiplicity weights
Weights chosen to amplify rare subgroups such as PCOS and preeclampsia; selection criteria not specified in abstract.

axioms (2)

domain assumption Modern Hopfield network theory supplies a continuous energy landscape in which patient profiles can be embedded as stable memory patterns.
Framework is explicitly based on this theory as stated in the abstract.
domain assumption Langevin dynamics can generate interpolations between stored patterns that preserve the original cohort geometry.
Core generative mechanism described without further justification in abstract.

pith-pipeline@v0.9.0 · 5549 in / 1478 out tokens · 40854 ms · 2026-05-10T17:31:07.271307+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages

[1]

Luci M Dusse, Danyelle R A Rios, Melina B Pinheiro, Alan J Cooper, and Bashir A Lwaleed

doi: 10.1111/1471-0528.12629. Luci M Dusse, Danyelle R A Rios, Melina B Pinheiro, Alan J Cooper, and Bashir A Lwaleed. Pre-eclampsia: relationship between coagulation, fibrinolysis and inflammation.Clinica Chimica Acta, 412(1–2):17–21, 2011. doi: 10.1016/j.cca.2010.09.030. Deyan Luan, Michael Zai, and Jeffrey D Varner. Computationally derived points of fr...

work page doi:10.1111/1471-0528.12629 2011
[2]

16 Jeffrey D Varner

Conference proceedings talk. 16 Jeffrey D Varner. Training-free generation of protein sequences from small family alignments via stochastic attention.arXiv preprint arXiv:2603.14717, 2026b. doi: 10.48550/arXiv.2603.14717. Abdulrahman Alswaidan and Jeffrey D Varner. Stochastic attention via Langevin dynamics on the modern Hopfield energy.arXiv preprint arX...

work page doi:10.48550/arxiv.2603.14717 2026

[1] [1]

Luci M Dusse, Danyelle R A Rios, Melina B Pinheiro, Alan J Cooper, and Bashir A Lwaleed

doi: 10.1111/1471-0528.12629. Luci M Dusse, Danyelle R A Rios, Melina B Pinheiro, Alan J Cooper, and Bashir A Lwaleed. Pre-eclampsia: relationship between coagulation, fibrinolysis and inflammation.Clinica Chimica Acta, 412(1–2):17–21, 2011. doi: 10.1016/j.cca.2010.09.030. Deyan Luan, Michael Zai, and Jeffrey D Varner. Computationally derived points of fr...

work page doi:10.1111/1471-0528.12629 2011

[2] [2]

16 Jeffrey D Varner

Conference proceedings talk. 16 Jeffrey D Varner. Training-free generation of protein sequences from small family alignments via stochastic attention.arXiv preprint arXiv:2603.14717, 2026b. doi: 10.48550/arXiv.2603.14717. Abdulrahman Alswaidan and Jeffrey D Varner. Stochastic attention via Langevin dynamics on the modern Hopfield energy.arXiv preprint arX...

work page doi:10.48550/arxiv.2603.14717 2026