FastOMOP: A Foundational Architecture for Reliable Agentic Real-World Evidence Generation on OMOP CDM data
Pith reviewed 2026-05-08 03:20 UTC · model grok-4.3
The pith
Process-boundary governance provides safety guarantees for agentic real-world evidence generation independent of model choice
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
FastOMOP separates governance, observability and orchestration into infrastructure layers that sit outside pluggable agent teams. Governance applies deterministic validation at process boundaries independent of agent reasoning, so that no compromised or hallucinating agent can bypass safety controls. Agent teams for phenotyping, study design and statistical analysis inherit these guarantees through controlled tool exposure. Validation across three OMOP CDM datasets using a natural-language-to-SQL team yielded reliability scores of 0.84-0.94 with perfect adversarial and out-of-scope block rates, establishing that safety guarantees hold regardless of which models power the agents.
What carries the argument
The process-boundary governance layer that applies deterministic validation rules independent of any agent reasoning or model
If this is right
- Agent teams for phenotyping, study design, and statistical analysis inherit safety guarantees through controlled tool exposure.
- Reliability scores of 0.84-0.94 are achieved on synthetic, MIMIC-IV, and real NHS OMOP CDM datasets.
- Perfect block rates are obtained for both adversarial and out-of-scope queries.
- The reliability gap in RWE deployment is architectural rather than a matter of model capability.
- FastOMOP supplies a governed architecture that supports progressive automation of RWE generation.
Where Pith is reading between the lines
- The same boundary-enforcement pattern could be applied to multi-agent systems in domains other than clinical data analysis.
- Auditable boundary logs may simplify regulatory review of automated evidence pipelines.
- Future deployments could measure whether the architecture continues to block failures when agent teams grow more complex or operate over longer time horizons.
Load-bearing premise
Deterministic validation rules applied at the process boundary are sufficient to catch all emergent unsafe behaviors, coordination failures, and hallucinations from agent teams across the full RWE lifecycle.
What would settle it
A full-lifecycle test in which an agent team produces unsafe or hallucinated RWE output that still passes every deterministic boundary validation and reaches the final evidence artifact.
read the original abstract
The Observational Medical Outcomes Partnership Common Data Model (OMOP CDM), maintained by the Observational Health Data Sciences and Informatics (OHDSI) collaboration, enabled the harmonisation of electronic health records data of nearly one billion patients in 83 countries. Yet generating real-world evidence (RWE) from these repositories remains a manual process requiring clinical, epidemiological and technical expertise. LLMs and multi-agent systems have shown promise for clinical tasks, but RWE automation exposes a fundamental challenge: agentic systems introduce emergent behaviours, coordination failures and safety risks that existing approaches fail to govern. No infrastructure exists to ensure agentic RWE generation is flexible, safe and auditable across the lifecycle. We introduce FastOMOP, an open-source multi-agent architecture that addresses this gap by separating three infrastructure layers, governance, observability and orchestration, from pluggable agent-teams. Governance is enforced at the process boundary through deterministic validation independent of agent reasoning, ensuring no compromised or hallucinating agent can bypass safety controls. Agent teams for phenotyping, study design and statistical analysis inherit these guarantees through controlled tool exposure. We validated FastOMOP using a natural-language-to-SQL agent team across three OMOP CDM datasets: synthetic data from Synthea, MIMIC-IV and a real-world NHS dataset from Lancashire Teaching Hospitals (IDRIL). FastOMOP achieved reliability scores of 0.84-0.94 with perfect adversarial and out-of-scope block rates, demonstrating process-boundary governance delivers safety guarantees independent of model choice. These results indicate that the reliability gap in RWE deployment is architectural rather than model capability, and establish FastOMOP as a governed architecture for progressive RWE automation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes FastOMOP, an open-source multi-agent architecture for automated real-world evidence (RWE) generation from OMOP CDM data. It separates governance, observability, and orchestration layers from pluggable agent teams, enforcing deterministic validation at process boundaries to deliver safety guarantees independent of underlying model choice. Validation is reported for a natural-language-to-SQL agent team on three OMOP datasets (Synthea, MIMIC-IV, and an NHS Lancashire dataset), with reliability scores of 0.84-0.94 and 100% block rates on adversarial and out-of-scope queries.
Significance. If the process-boundary governance approach generalizes, the work would provide a concrete architectural path to reliable agentic RWE systems on the OMOP CDM, which already covers nearly one billion patients. The concrete reliability numbers, perfect block rates on three datasets, and open-source release are strengths that enable reproducibility and practical deployment testing.
major comments (2)
- [Abstract and Experiments] Abstract and Experiments section: the central claim that 'process-boundary governance delivers safety guarantees independent of model choice' across the full RWE lifecycle rests on validation of only a single NL-to-SQL agent team; no experiments ablate the underlying LLM, and no results are reported for phenotyping, study design, or statistical analysis agent teams.
- [Architecture] Architecture section: the manuscript states that 'agent teams for phenotyping, study design and statistical analysis inherit these guarantees through controlled tool exposure,' but supplies no concrete deterministic validation predicates, rule definitions, or empirical results for those stages, leaving the generality claim unsupported.
minor comments (2)
- [Abstract] Abstract: 'perfect adversarial and out-of-scope block rates' are reported without describing how the adversarial test cases were constructed or what baselines (e.g., unguarded agents) were used for comparison.
- [Experiments] Experiments: reliability scores of 0.84-0.94 are given without accompanying statistical significance tests, confidence intervals, or error analysis broken down by failure mode.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback. We address each major comment point by point below, indicating where revisions will be made to improve clarity and support for our claims.
read point-by-point responses
-
Referee: [Abstract and Experiments] Abstract and Experiments section: the central claim that 'process-boundary governance delivers safety guarantees independent of model choice' across the full RWE lifecycle rests on validation of only a single NL-to-SQL agent team; no experiments ablate the underlying LLM, and no results are reported for phenotyping, study design, or statistical analysis agent teams.
Authors: The NL-to-SQL agent team was selected for validation because it forms the critical data-access boundary where query errors pose the greatest safety risk in RWE generation. The independence from model choice follows directly from the design: deterministic validators operate on structured outputs (e.g., SQL syntax, scope checks) rather than model internals, so the same governance layer blocks invalid actions regardless of which LLM generates the candidate output. We did not perform LLM ablations because the validator's correctness is model-agnostic by construction. We agree the abstract overstates coverage of the full lifecycle and will revise it to specify that empirical results are for the NL-to-SQL team while the architectural guarantees extend to other teams via controlled tool exposure. A brief discussion of this design rationale will be added to the Experiments section. revision: partial
-
Referee: [Architecture] Architecture section: the manuscript states that 'agent teams for phenotyping, study design and statistical analysis inherit these guarantees through controlled tool exposure,' but supplies no concrete deterministic validation predicates, rule definitions, or empirical results for those stages, leaving the generality claim unsupported.
Authors: We acknowledge that the current text describes inheritance at a high level without concrete predicate examples. Controlled tool exposure restricts each agent team to a narrow, pre-approved function set (e.g., a phenotyping agent may only invoke OHDSI-standard cohort definitions and cannot execute arbitrary SQL). The governance layer then applies deterministic checks on inputs and outputs at every boundary. We will add a dedicated subsection with explicit rule examples: for phenotyping, validation requires that all concept sets match published OHDSI definitions and that cohort size thresholds are met; for study design, parameter ranges are checked against ethical and statistical constraints. These additions will be supported by pseudocode and will clarify how the same boundary mechanism applies across stages. Empirical results remain limited to the NL-to-SQL team in the present work. revision: yes
- Empirical validation results for phenotyping, study design, and statistical analysis agent teams are not available in the current manuscript, as experiments focused on the NL-to-SQL component; full multi-team empirical evaluation is planned as future work.
Circularity Check
No circularity; empirical architecture validation with no self-referential derivations
full rationale
The paper introduces an architecture (FastOMOP) separating governance/observability/orchestration layers from pluggable agents, then reports empirical results from a single NL-to-SQL agent team on three datasets achieving 0.84-0.94 reliability and 100% block rates. No equations, fitted parameters, or first-principles derivations appear; the central claim rests on these reported measurements rather than any reduction to inputs by construction. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The derivation chain is self-contained as a system description plus benchmark results.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption OMOP CDM enables harmonization of EHR data across institutions and countries
invented entities (1)
-
FastOMOP multi-agent architecture with process-boundary governance
no independent evidence
Reference graph
Works this paper leans on
-
[1]
FitzHenry, F. et al. Creating a Common Data Model for Comparative Effec- tiveness with the Observational Medical Outcomes Partnership. Applied Clinical Informatics 06, 536–547 (2015). URL http://www.thieme-connect.de/DOI/ DOI?10.4338/ACI-2014-12-CR-0121
-
[2]
Reich, C. et al. OHDSI Standardized Vocabularies—a large-scale centralized reference ontology for international data harmonization. Journal of the American Medical Informatics Association 31, 583–590 (2024). URL https://academic.oup. com/jamia/article/31/3/583/7510741
2024
-
[3]
Data Partners (2025)
EHDEN. Data Partners (2025). URL https://www.ehden.eu/datapartners/. 18
2025
-
[4]
N3C Enclave Data Overview | National Center for Advancing Translational Sciences (2025)
National Center for Advancing Translational Sciences. N3C Enclave Data Overview | National Center for Advancing Translational Sciences (2025). URL https://ncats.nih.gov/research/research-activities/n3c/data-overview
2025
-
[5]
Quinlan, L., Ma, S. C., Romero, K., Singh, K. & Zhang, Y. Challenges in curat- ing Real World Data for modeling: A Bronchopulmonary Dysplasia case study. Proceedings of the American Conference of Pharmacometrics (ACoP15) (2024). URL https://scienceopen.com/hosted-document?doi=10.70534/EJEC8141
-
[6]
OMOP CDM Recognized as Top Digital Healthcare Tool by Digital Square – OHDSI (2025)
OHDSI. OMOP CDM Recognized as Top Digital Healthcare Tool by Digital Square – OHDSI (2025). URL https://www.ohdsi.org/ omop-cdm-digital-square/
2025
-
[7]
OHDSI/Atlas (2025)
OHDSI. OHDSI/Atlas (2025). URL https://github.com/OHDSI/Atlas. Original-date: 2015-07-08T16:26:35Z
2025
-
[8]
Schuemie, M. et al. in Health-Analytics Data to Evidence Suite (HADES): Open- Source Software for Observational Research (eds Bichel-Findlay, J., Otero, P., Scott, P. & Huesing, E.) MEDINFO 2023 — The Future Is Accessible 966–970 (IOS Press, 2024). URL https://ebooks.iospress.nl/doi/10.3233/SHTI231108
-
[9]
DAR WIN EU (2025)
DAR WIN EU. DAR WIN EU (2025). URL https://www.darwin-eu.org/
2025
-
[10]
Park, J. et al. Criteria2Query 3.0: Leveraging generative large language models for clinical trial eligibility query generation. Journal of Biomedical Informatics 154, 104649 (2024)
2024
-
[11]
OHDSI/Nostos (2025)
OHDSI. OHDSI/Nostos (2025). URL https://github.com/OHDSI/Nostos. Original-date: 2021-06-21T07:07:49Z
2025
-
[12]
& Choi, E
Lee, G., Kweon, S., Bae, S. & Choi, E. Naumann, T., Ben Abacha, A., Bethard, S., Roberts, K. & Bitterman, D. (eds) Overview of the EHRSQL 2024 Shared Task on Reliable Text-to-SQL Modeling on Electronic Health Records . (eds Naumann, T., Ben Abacha, A., Bethard, S., Roberts, K. & Bitterman, D.) Pro- ceedings of the 6th Clinical Natural Language Processing ...
2024
-
[13]
Liu, F. et al. A foundational architecture for AI agents in healthcare. Cell Reports Medicine 6, 102374 (2025). URL https://linkinghub.elsevier.com/retrieve/pii/ S2666379125004471
2025
-
[14]
Lungren, M. P. Developing next-generation cancer care man- agement with multi-agent orchestration (2025). URL https: //www.microsoft.com/en-us/industry/blog/healthcare/2025/05/19/ developing-next-generation-cancer-care-management-with-multi-agent-orchestration/ . 19
2025
- [15]
-
[16]
Gorenshtein, A., Omar, M., Glicksberg, B. S., Nadkarni, G. N. & Klang, E. AI Agents in Clinical Medicine: A Systematic Review (2025). URL https://www. medrxiv.org/content/10.1101/2025.08.22.25334232v1
- [17]
-
[18]
Model Context Protocol (2025)
Anthropic. Model Context Protocol (2025). URL https://github.com/ modelcontextprotocol
2025
-
[19]
Regulation (EU) 2016/679 of the European Parliament and of the Council: General Data Protection Regulation
European Parliament and Council of the European Union. Regulation (EU) 2016/679 of the European Parliament and of the Council: General Data Protection Regulation. Official Journal of the European Union L119, 1–88 (2016). URL https://op.europa.eu/en/publication-detail/-/publication/ 3e485e15-11bd-11e6-ba9a-01aa75ed71a1/language-en
2016
-
[20]
Regulation (EU) 2016/679 as retained in UK law: UK General Data Protection Regulation
UK Parliament. Regulation (EU) 2016/679 as retained in UK law: UK General Data Protection Regulation. UK Statutory Instruments (2016). URL https: //www.legislation.gov.uk/eur/2016/679. As retained by the European Union (Withdrawal) Act 2018
2016
-
[21]
Health Insurance Portability and Accountability Act of 1996 (1996)
US Congress. Health Insurance Portability and Accountability Act of 1996 (1996). URL https://www.govinfo.gov/app/details/PLAW-104publ191. Public Law 104-191
1996
-
[22]
SNOMED-CT: The advanced terminology and coding system for eHealth
Donnelly, K. SNOMED-CT: The advanced terminology and coding system for eHealth. Studies in Health Technology and Informatics 121, 279–290 (2006)
2006
-
[23]
RxNorm (2025)
US National Library of Medicine. RxNorm (2025). URL https://www.nlm.nih. gov/research/umls/rxnorm/index.html
2025
-
[24]
UMLS - LOINC (2025)
US National Library of Medicine. UMLS - LOINC (2025). URL https://www. nlm.nih.gov/research/umls/loinc_main.html
2025
-
[25]
agno-agi/agno (2025)
Agno. agno-agi/agno (2025). URL https://github.com/agno-agi/agno. Original- date: 2022-05-04T15:23:02Z
2025
-
[26]
pydantic·PyPI (2025)
Pydantic. pydantic·PyPI (2025). URL https://pypi.org/project/pydantic/
2025
-
[27]
ibis-framework: The portable Python dataframe library (2025)
Ibis Project. ibis-framework: The portable Python dataframe library (2025). URL https://ibis-project.org
2025
-
[28]
cz-sqlglot: An easily customizable SQL parser and transpiler (2025)
Mao, T. cz-sqlglot: An easily customizable SQL parser and transpiler (2025). URL https://github.com/tobymao/sqlglot. 20
2025
- [29]
-
[30]
& Burrows, E
Molinaro, A., Blacketer, C., DeFalco, F. & Burrows, E. ETLSyntheaBuilder: A Builder for Converting the Synthea Data to the OMOP CDM (2024). R package version 2.1
2024
-
[31]
Johnson, A. et al. MIMIC-IV. PhysioNet (2024). URL https://doi.org/10.13026/ kpb9-mt58. Version 3.1
2024
-
[32]
OHDSI/StudyAgent (2025)
OHDSI. OHDSI/StudyAgent (2025). URL https://github.com/OHDSI/ StudyAgent. 21
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.