pith. sign in

arxiv: 2604.24572 · v1 · submitted 2026-04-27 · 💻 cs.AI · cs.MA

FastOMOP: A Foundational Architecture for Reliable Agentic Real-World Evidence Generation on OMOP CDM data

Pith reviewed 2026-05-08 03:20 UTC · model grok-4.3

classification 💻 cs.AI cs.MA
keywords OMOP CDMmulti-agent systemsreal-world evidenceprocess-boundary governanceagentic systemssafety validationelectronic health recordsRWE automation
0
0 comments X

The pith

Process-boundary governance provides safety guarantees for agentic real-world evidence generation independent of model choice

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces FastOMOP, a multi-agent architecture that automates generation of real-world evidence from OMOP CDM health data repositories while addressing safety risks that arise when agents coordinate or reason. It separates infrastructure layers for governance, observability, and orchestration from the agent teams themselves, enforcing deterministic validation rules at every process boundary. These rules operate independently of any agent's internal reasoning, preventing hallucinations, coordination failures, or unsafe outputs from reaching downstream steps. Testing with a natural-language-to-SQL team on synthetic, MIMIC-IV, and real NHS datasets produced reliability scores of 0.84-0.94 and perfect rates for blocking adversarial or out-of-scope actions. The work claims this shows the reliability gap in automated RWE stems from missing architectural controls rather than limitations in the AI models used.

Core claim

FastOMOP separates governance, observability and orchestration into infrastructure layers that sit outside pluggable agent teams. Governance applies deterministic validation at process boundaries independent of agent reasoning, so that no compromised or hallucinating agent can bypass safety controls. Agent teams for phenotyping, study design and statistical analysis inherit these guarantees through controlled tool exposure. Validation across three OMOP CDM datasets using a natural-language-to-SQL team yielded reliability scores of 0.84-0.94 with perfect adversarial and out-of-scope block rates, establishing that safety guarantees hold regardless of which models power the agents.

What carries the argument

The process-boundary governance layer that applies deterministic validation rules independent of any agent reasoning or model

If this is right

  • Agent teams for phenotyping, study design, and statistical analysis inherit safety guarantees through controlled tool exposure.
  • Reliability scores of 0.84-0.94 are achieved on synthetic, MIMIC-IV, and real NHS OMOP CDM datasets.
  • Perfect block rates are obtained for both adversarial and out-of-scope queries.
  • The reliability gap in RWE deployment is architectural rather than a matter of model capability.
  • FastOMOP supplies a governed architecture that supports progressive automation of RWE generation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same boundary-enforcement pattern could be applied to multi-agent systems in domains other than clinical data analysis.
  • Auditable boundary logs may simplify regulatory review of automated evidence pipelines.
  • Future deployments could measure whether the architecture continues to block failures when agent teams grow more complex or operate over longer time horizons.

Load-bearing premise

Deterministic validation rules applied at the process boundary are sufficient to catch all emergent unsafe behaviors, coordination failures, and hallucinations from agent teams across the full RWE lifecycle.

What would settle it

A full-lifecycle test in which an agent team produces unsafe or hallucinated RWE output that still passes every deterministic boundary validation and reaches the final evidence artifact.

read the original abstract

The Observational Medical Outcomes Partnership Common Data Model (OMOP CDM), maintained by the Observational Health Data Sciences and Informatics (OHDSI) collaboration, enabled the harmonisation of electronic health records data of nearly one billion patients in 83 countries. Yet generating real-world evidence (RWE) from these repositories remains a manual process requiring clinical, epidemiological and technical expertise. LLMs and multi-agent systems have shown promise for clinical tasks, but RWE automation exposes a fundamental challenge: agentic systems introduce emergent behaviours, coordination failures and safety risks that existing approaches fail to govern. No infrastructure exists to ensure agentic RWE generation is flexible, safe and auditable across the lifecycle. We introduce FastOMOP, an open-source multi-agent architecture that addresses this gap by separating three infrastructure layers, governance, observability and orchestration, from pluggable agent-teams. Governance is enforced at the process boundary through deterministic validation independent of agent reasoning, ensuring no compromised or hallucinating agent can bypass safety controls. Agent teams for phenotyping, study design and statistical analysis inherit these guarantees through controlled tool exposure. We validated FastOMOP using a natural-language-to-SQL agent team across three OMOP CDM datasets: synthetic data from Synthea, MIMIC-IV and a real-world NHS dataset from Lancashire Teaching Hospitals (IDRIL). FastOMOP achieved reliability scores of 0.84-0.94 with perfect adversarial and out-of-scope block rates, demonstrating process-boundary governance delivers safety guarantees independent of model choice. These results indicate that the reliability gap in RWE deployment is architectural rather than model capability, and establish FastOMOP as a governed architecture for progressive RWE automation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes FastOMOP, an open-source multi-agent architecture for automated real-world evidence (RWE) generation from OMOP CDM data. It separates governance, observability, and orchestration layers from pluggable agent teams, enforcing deterministic validation at process boundaries to deliver safety guarantees independent of underlying model choice. Validation is reported for a natural-language-to-SQL agent team on three OMOP datasets (Synthea, MIMIC-IV, and an NHS Lancashire dataset), with reliability scores of 0.84-0.94 and 100% block rates on adversarial and out-of-scope queries.

Significance. If the process-boundary governance approach generalizes, the work would provide a concrete architectural path to reliable agentic RWE systems on the OMOP CDM, which already covers nearly one billion patients. The concrete reliability numbers, perfect block rates on three datasets, and open-source release are strengths that enable reproducibility and practical deployment testing.

major comments (2)
  1. [Abstract and Experiments] Abstract and Experiments section: the central claim that 'process-boundary governance delivers safety guarantees independent of model choice' across the full RWE lifecycle rests on validation of only a single NL-to-SQL agent team; no experiments ablate the underlying LLM, and no results are reported for phenotyping, study design, or statistical analysis agent teams.
  2. [Architecture] Architecture section: the manuscript states that 'agent teams for phenotyping, study design and statistical analysis inherit these guarantees through controlled tool exposure,' but supplies no concrete deterministic validation predicates, rule definitions, or empirical results for those stages, leaving the generality claim unsupported.
minor comments (2)
  1. [Abstract] Abstract: 'perfect adversarial and out-of-scope block rates' are reported without describing how the adversarial test cases were constructed or what baselines (e.g., unguarded agents) were used for comparison.
  2. [Experiments] Experiments: reliability scores of 0.84-0.94 are given without accompanying statistical significance tests, confidence intervals, or error analysis broken down by failure mode.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment point by point below, indicating where revisions will be made to improve clarity and support for our claims.

read point-by-point responses
  1. Referee: [Abstract and Experiments] Abstract and Experiments section: the central claim that 'process-boundary governance delivers safety guarantees independent of model choice' across the full RWE lifecycle rests on validation of only a single NL-to-SQL agent team; no experiments ablate the underlying LLM, and no results are reported for phenotyping, study design, or statistical analysis agent teams.

    Authors: The NL-to-SQL agent team was selected for validation because it forms the critical data-access boundary where query errors pose the greatest safety risk in RWE generation. The independence from model choice follows directly from the design: deterministic validators operate on structured outputs (e.g., SQL syntax, scope checks) rather than model internals, so the same governance layer blocks invalid actions regardless of which LLM generates the candidate output. We did not perform LLM ablations because the validator's correctness is model-agnostic by construction. We agree the abstract overstates coverage of the full lifecycle and will revise it to specify that empirical results are for the NL-to-SQL team while the architectural guarantees extend to other teams via controlled tool exposure. A brief discussion of this design rationale will be added to the Experiments section. revision: partial

  2. Referee: [Architecture] Architecture section: the manuscript states that 'agent teams for phenotyping, study design and statistical analysis inherit these guarantees through controlled tool exposure,' but supplies no concrete deterministic validation predicates, rule definitions, or empirical results for those stages, leaving the generality claim unsupported.

    Authors: We acknowledge that the current text describes inheritance at a high level without concrete predicate examples. Controlled tool exposure restricts each agent team to a narrow, pre-approved function set (e.g., a phenotyping agent may only invoke OHDSI-standard cohort definitions and cannot execute arbitrary SQL). The governance layer then applies deterministic checks on inputs and outputs at every boundary. We will add a dedicated subsection with explicit rule examples: for phenotyping, validation requires that all concept sets match published OHDSI definitions and that cohort size thresholds are met; for study design, parameter ranges are checked against ethical and statistical constraints. These additions will be supported by pseudocode and will clarify how the same boundary mechanism applies across stages. Empirical results remain limited to the NL-to-SQL team in the present work. revision: yes

standing simulated objections not resolved
  • Empirical validation results for phenotyping, study design, and statistical analysis agent teams are not available in the current manuscript, as experiments focused on the NL-to-SQL component; full multi-team empirical evaluation is planned as future work.

Circularity Check

0 steps flagged

No circularity; empirical architecture validation with no self-referential derivations

full rationale

The paper introduces an architecture (FastOMOP) separating governance/observability/orchestration layers from pluggable agents, then reports empirical results from a single NL-to-SQL agent team on three datasets achieving 0.84-0.94 reliability and 100% block rates. No equations, fitted parameters, or first-principles derivations appear; the central claim rests on these reported measurements rather than any reduction to inputs by construction. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The derivation chain is self-contained as a system description plus benchmark results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that OMOP CDM provides sufficient standardization for agentic workflows and introduces the FastOMOP architecture as a new engineered system without additional fitted parameters or physical entities.

axioms (1)
  • domain assumption OMOP CDM enables harmonization of EHR data across institutions and countries
    Invoked as the foundational data model for all agent teams and validation.
invented entities (1)
  • FastOMOP multi-agent architecture with process-boundary governance no independent evidence
    purpose: To enforce safety and auditability independent of agent reasoning
    Newly proposed system whose safety guarantees are demonstrated empirically in the paper.

pith-pipeline@v0.9.0 · 5630 in / 1285 out tokens · 33315 ms · 2026-05-08T03:20:02.849961+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 7 canonical work pages

  1. [1]

    FitzHenry, F. et al. Creating a Common Data Model for Comparative Effec- tiveness with the Observational Medical Outcomes Partnership. Applied Clinical Informatics 06, 536–547 (2015). URL http://www.thieme-connect.de/DOI/ DOI?10.4338/ACI-2014-12-CR-0121

  2. [2]

    Reich, C. et al. OHDSI Standardized Vocabularies—a large-scale centralized reference ontology for international data harmonization. Journal of the American Medical Informatics Association 31, 583–590 (2024). URL https://academic.oup. com/jamia/article/31/3/583/7510741

  3. [3]

    Data Partners (2025)

    EHDEN. Data Partners (2025). URL https://www.ehden.eu/datapartners/. 18

  4. [4]

    N3C Enclave Data Overview | National Center for Advancing Translational Sciences (2025)

    National Center for Advancing Translational Sciences. N3C Enclave Data Overview | National Center for Advancing Translational Sciences (2025). URL https://ncats.nih.gov/research/research-activities/n3c/data-overview

  5. [5]

    C., Romero, K., Singh, K

    Quinlan, L., Ma, S. C., Romero, K., Singh, K. & Zhang, Y. Challenges in curat- ing Real World Data for modeling: A Bronchopulmonary Dysplasia case study. Proceedings of the American Conference of Pharmacometrics (ACoP15) (2024). URL https://scienceopen.com/hosted-document?doi=10.70534/EJEC8141

  6. [6]

    OMOP CDM Recognized as Top Digital Healthcare Tool by Digital Square – OHDSI (2025)

    OHDSI. OMOP CDM Recognized as Top Digital Healthcare Tool by Digital Square – OHDSI (2025). URL https://www.ohdsi.org/ omop-cdm-digital-square/

  7. [7]

    OHDSI/Atlas (2025)

    OHDSI. OHDSI/Atlas (2025). URL https://github.com/OHDSI/Atlas. Original-date: 2015-07-08T16:26:35Z

  8. [8]

    Schuemie, M. et al. in Health-Analytics Data to Evidence Suite (HADES): Open- Source Software for Observational Research (eds Bichel-Findlay, J., Otero, P., Scott, P. & Huesing, E.) MEDINFO 2023 — The Future Is Accessible 966–970 (IOS Press, 2024). URL https://ebooks.iospress.nl/doi/10.3233/SHTI231108

  9. [9]

    DAR WIN EU (2025)

    DAR WIN EU. DAR WIN EU (2025). URL https://www.darwin-eu.org/

  10. [10]

    Park, J. et al. Criteria2Query 3.0: Leveraging generative large language models for clinical trial eligibility query generation. Journal of Biomedical Informatics 154, 104649 (2024)

  11. [11]

    OHDSI/Nostos (2025)

    OHDSI. OHDSI/Nostos (2025). URL https://github.com/OHDSI/Nostos. Original-date: 2021-06-21T07:07:49Z

  12. [12]

    & Choi, E

    Lee, G., Kweon, S., Bae, S. & Choi, E. Naumann, T., Ben Abacha, A., Bethard, S., Roberts, K. & Bitterman, D. (eds) Overview of the EHRSQL 2024 Shared Task on Reliable Text-to-SQL Modeling on Electronic Health Records . (eds Naumann, T., Ben Abacha, A., Bethard, S., Roberts, K. & Bitterman, D.) Pro- ceedings of the 6th Clinical Natural Language Processing ...

  13. [13]

    Liu, F. et al. A foundational architecture for AI agents in healthcare. Cell Reports Medicine 6, 102374 (2025). URL https://linkinghub.elsevier.com/retrieve/pii/ S2666379125004471

  14. [14]

    Lungren, M. P. Developing next-generation cancer care man- agement with multi-agent orchestration (2025). URL https: //www.microsoft.com/en-us/industry/blog/healthcare/2025/05/19/ developing-next-generation-cancer-care-management-with-multi-agent-orchestration/ . 19

  15. [15]

    Blondeel, M. et al. Demo: Healthcare Agent Orchestrator (HAO) for Patient Summarization in Molecular Tumor Boards (2025). URL http://arxiv.org/abs/ 2509.06602

  16. [16]

    S., Nadkarni, G

    Gorenshtein, A., Omar, M., Glicksberg, B. S., Nadkarni, G. N. & Klang, E. AI Agents in Clinical Medicine: A Systematic Review (2025). URL https://www. medrxiv.org/content/10.1101/2025.08.22.25334232v1

  17. [17]

    Hammond, L. et al. Multi-Agent Risks from Advanced AI (2025). URL http: //arxiv.org/abs/2502.14143. ArXiv:2502.14143 [cs]

  18. [18]

    Model Context Protocol (2025)

    Anthropic. Model Context Protocol (2025). URL https://github.com/ modelcontextprotocol

  19. [19]

    Regulation (EU) 2016/679 of the European Parliament and of the Council: General Data Protection Regulation

    European Parliament and Council of the European Union. Regulation (EU) 2016/679 of the European Parliament and of the Council: General Data Protection Regulation. Official Journal of the European Union L119, 1–88 (2016). URL https://op.europa.eu/en/publication-detail/-/publication/ 3e485e15-11bd-11e6-ba9a-01aa75ed71a1/language-en

  20. [20]

    Regulation (EU) 2016/679 as retained in UK law: UK General Data Protection Regulation

    UK Parliament. Regulation (EU) 2016/679 as retained in UK law: UK General Data Protection Regulation. UK Statutory Instruments (2016). URL https: //www.legislation.gov.uk/eur/2016/679. As retained by the European Union (Withdrawal) Act 2018

  21. [21]

    Health Insurance Portability and Accountability Act of 1996 (1996)

    US Congress. Health Insurance Portability and Accountability Act of 1996 (1996). URL https://www.govinfo.gov/app/details/PLAW-104publ191. Public Law 104-191

  22. [22]

    SNOMED-CT: The advanced terminology and coding system for eHealth

    Donnelly, K. SNOMED-CT: The advanced terminology and coding system for eHealth. Studies in Health Technology and Informatics 121, 279–290 (2006)

  23. [23]

    RxNorm (2025)

    US National Library of Medicine. RxNorm (2025). URL https://www.nlm.nih. gov/research/umls/rxnorm/index.html

  24. [24]

    UMLS - LOINC (2025)

    US National Library of Medicine. UMLS - LOINC (2025). URL https://www. nlm.nih.gov/research/umls/loinc_main.html

  25. [25]

    agno-agi/agno (2025)

    Agno. agno-agi/agno (2025). URL https://github.com/agno-agi/agno. Original- date: 2022-05-04T15:23:02Z

  26. [26]

    pydantic·PyPI (2025)

    Pydantic. pydantic·PyPI (2025). URL https://pypi.org/project/pydantic/

  27. [27]

    ibis-framework: The portable Python dataframe library (2025)

    Ibis Project. ibis-framework: The portable Python dataframe library (2025). URL https://ibis-project.org

  28. [28]

    cz-sqlglot: An easily customizable SQL parser and transpiler (2025)

    Mao, T. cz-sqlglot: An easily customizable SQL parser and transpiler (2025). URL https://github.com/tobymao/sqlglot. 20

  29. [29]

    Lee, G. et al. EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records (2023). URL http://arxiv.org/abs/2301.07695. ArXiv:2301.07695 [cs] version: 5

  30. [30]

    & Burrows, E

    Molinaro, A., Blacketer, C., DeFalco, F. & Burrows, E. ETLSyntheaBuilder: A Builder for Converting the Synthea Data to the OMOP CDM (2024). R package version 2.1

  31. [31]

    Johnson, A. et al. MIMIC-IV. PhysioNet (2024). URL https://doi.org/10.13026/ kpb9-mt58. Version 3.1

  32. [32]

    OHDSI/StudyAgent (2025)

    OHDSI. OHDSI/StudyAgent (2025). URL https://github.com/OHDSI/ StudyAgent. 21