Memisis: Orchestrating and Evaluating Synthetic Data for Tabular Health Datasets

Aadi Sharma; Amir M. Rahmani; Arshia Harish Puthran; Ian Harris; Mahdi Bagheri; Muhjaazee Love; Nitish Nagesh; Pengbao Zhou

arxiv: 2605.17758 · v1 · pith:3NCPVYDQnew · submitted 2026-05-18 · 💻 cs.LG

Memisis: Orchestrating and Evaluating Synthetic Data for Tabular Health Datasets

Nitish Nagesh , Mahdi Bagheri , Arshia Harish Puthran , Pengbao Zhou , Muhjaazee Love , Aadi Sharma , Ian Harris , Amir M. Rahmani This is my paper

Pith reviewed 2026-05-20 12:07 UTC · model grok-4.3

classification 💻 cs.LG

keywords synthetic datahealthcaretabular datasetslarge language modelsdata generationevaluation metricsprivacyfairness

0 comments

The pith

Memisis uses a language model agent to orchestrate synthetic data generation and evaluation for tabular health datasets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Memisis as a tool that brings together existing synthetic data generators, large language models, and evaluation metrics into one workflow for health data. It aims to let users describe what kind of synthetic data they need rather than adjusting technical settings themselves. This matters because creating good synthetic data that protects privacy while keeping useful patterns is hard, especially for medical research and decisions. The demonstration uses a schizophrenia dataset and shows similar results from three different generators.

Core claim

Memisis orchestrates and evaluates synthetic data for tabular health datasets by using an interactive agent driven by a local large language model. Users express their goals for the synthetic data, and the agent handles selecting among synthesizers like CTGAN, TVAE, and GaussianCopula, setting parameters such as training size and epochs, generating the data, and running evaluations for privacy, utility, and fairness. This creates a unified process instead of separate steps for generation and checking.

What carries the argument

An interactive agent powered by a local language model that interprets user goals to select, configure, and evaluate synthetic data tools.

If this is right

Users specify goals for synthetic data instead of tuning individual parameters.
The system runs necessary evaluations for privacy, utility, and fairness automatically.
Control is retained over training size, epochs, and number of synthetic samples.
Comparable performance across synthesizers is observed in the schizophrenia dataset example.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Non-experts in data synthesis could more easily produce suitable synthetic health datasets for their needs.
Keeping the language model local helps avoid sending health-related instructions to external services.
Similar orchestration ideas might apply to synthetic data tasks in other fields like finance or social sciences.

Load-bearing premise

The local language model agent can accurately figure out the user's intentions and choose the correct synthesizers and settings without errors or added biases.

What would settle it

Running Memisis with a clear user goal on a test dataset and finding that the produced synthetic data scores much worse on utility or fairness than data made by directly using the synthesizers with expert settings.

Figures

Figures reproduced from arXiv: 2605.17758 by Aadi Sharma, Amir M. Rahmani, Arshia Harish Puthran, Ian Harris, Mahdi Bagheri, Muhjaazee Love, Nitish Nagesh, Pengbao Zhou.

**Figure 1.** Figure 1: Memisis separates synthesis from evaluation so scoring cannot influence generation. The supervisor routes the user’s goal to a generator subgraph (CTGAN, TVAE, or GaussianCopula via SDV) and a separate evaluator subgraph (SDMetrics quality + Fairlearn FPR by group). After evaluation the supervisor compares the composite synth_score against repository-derived thresholds and either reports results or issues … view at source ↗

**Figure 2.** Figure 2: Memisis deployment stack. Users (researchers, data owners) interact via Streamlit. All interfaces share one FastAPI service. LangGraph agents (ReACT and multi-agent supervisor) sit alongside the different metrics. Llama3.2 is the Large Language Model (LLM) under consideration. Model checkpoints are stored as appropriate. fidelity (0.91) does not translate to a superior composite score when the downstream c… view at source ↗

read the original abstract

Synthetic data is widely used in healthcare to create datasets that are similar to original data but without the privacy concerns. Generating and evaluating synthetic data across privacy, utility and fairness is crucial for facilitating high quality data availability for downstream prediction tasks and clinical decision making. We present Memisis, a tool that orchestrates and evaluates synthetic data by leveraging existing synthetic data tools, the power of large language models and state-of-the-art evaluation metrics. Our tool creates a unified workflow for data generation, validation and evaluation. Users have control over the training size, training epochs and the number of synthetic rows to sample. Instead of knobs to tune synthetic data, the interactive agent allows users to specify their synthetic data generation goals and the tool will orchestrate the workflow by leveraging existing tools while performing the requisite evaluation. For the demo, we use an open source schizophrenia dataset with protected attributes related to race and gender, three different synthesizers and a local language model to orchestrate the workflow. We observe that CTGAN, TVAE and GaussianCopula have comparable performance across fairness and utility metrics. The workflow allows users flexibility and control over the data generation and evaluation process.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Memisis is a straightforward integration of a local LLM agent with off-the-shelf synthesizers for tabular health data, but the agent’s reliability is untested beyond one demo.

read the letter

The paper’s main contribution is Memisis, a workflow that lets users state goals in natural language and have a local LLM pick among CTGAN, TVAE, and GaussianCopula, set the training size and epochs, generate rows, and run standard privacy, utility, and fairness checks on a schizophrenia dataset with race and gender attributes. The demo reports that the three generators produce comparable scores, which aligns with what is already known about these methods on similar data.

Referee Report

2 major / 2 minor

Summary. The manuscript presents Memisis, a tool that orchestrates and evaluates synthetic data for tabular health datasets. It integrates existing synthesizers (CTGAN, TVAE, GaussianCopula), a local LLM-powered interactive agent to interpret user-specified goals, and state-of-the-art metrics for privacy, utility, and fairness. Users control training size, epochs, and synthetic row count. A single demo on an open schizophrenia dataset with race and gender attributes reports comparable performance across the three synthesizers on fairness and utility metrics, claiming a unified workflow without manual knob tuning.

Significance. If the LLM agent's goal interpretation and synthesizer selection prove reliable without introducing unmeasured biases, Memisis would offer a practical, accessible framework for synthetic health data generation that lowers barriers for downstream clinical tasks. The explicit leverage of pre-existing open tools and metrics is a strength that supports reproducibility and reduces reinvention.

major comments (2)

[Abstract] Abstract: the central claim that the interactive agent 'accurately interpret[s] user goals and reliably select[s] and configure[s] existing synthesizers' without new biases rests on an untested assumption; no accuracy, consistency, or bias metrics for the agent itself, nor any comparison against expert or exhaustive baselines, are reported.
[Demo] Demo description: the observation that CTGAN, TVAE, and GaussianCopula 'have comparable performance across fairness and utility metrics' is presented as a single high-level result without statistical tests, error analysis, multiple runs, or validation beyond the observation, limiting support for the evaluation component of the unified workflow.

minor comments (2)

The manuscript would benefit from an explicit description of the prompt templates or decision logic used by the local LLM agent to map user goals to synthesizer configurations.
Clarify the exact privacy metrics employed and how they are computed within the evaluation pipeline, as this is central to health-data applications.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive recognition of Memisis as a practical framework. We address each major comment below with honest revisions where the manuscript requires strengthening.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the interactive agent 'accurately interpret[s] user goals and reliably select[s] and configure[s] existing synthesizers' without new biases rests on an untested assumption; no accuracy, consistency, or bias metrics for the agent itself, nor any comparison against expert or exhaustive baselines, are reported.

Authors: We agree that the manuscript does not report quantitative metrics for the LLM agent's accuracy, consistency, or potential biases in goal interpretation and synthesizer selection. The presented work emphasizes the orchestration workflow and a single-dataset demonstration rather than a dedicated agent evaluation study. We will revise the abstract to qualify or remove the phrasing implying reliable selection without new biases. We will also add explicit discussion of the agent as an interface layer whose performance is not yet benchmarked, along with a clear statement of this limitation and planned future comparisons to expert baselines. These changes will appear in the revised manuscript. revision: yes
Referee: [Demo] Demo description: the observation that CTGAN, TVAE, and GaussianCopula 'have comparable performance across fairness and utility metrics' is presented as a single high-level result without statistical tests, error analysis, multiple runs, or validation beyond the observation, limiting support for the evaluation component of the unified workflow.

Authors: We acknowledge that the demo presents a single high-level observation without statistical tests, error analysis, or multiple runs. The section was intended to illustrate the end-to-end workflow on an open schizophrenia dataset rather than to serve as a comprehensive benchmark. We will revise the demo section to include results aggregated over multiple independent runs, report means and standard deviations for the fairness and utility metrics, and add basic statistical comparisons (e.g., paired tests where applicable) to provide stronger support for the evaluation claims. These enhancements will be incorporated in the next version. revision: yes

Circularity Check

0 steps flagged

No circularity; tool composes external components without self-referential reductions.

full rationale

The manuscript presents Memisis as an orchestration layer over pre-existing synthesizers (CTGAN, TVAE, GaussianCopula), a local LLM agent, and standard privacy/utility/fairness metrics. No equations, fitted parameters renamed as predictions, or derivation chains appear. The single demo observation of comparable metrics on one dataset is an empirical report, not a quantity forced by construction or by self-citation. The central workflow claim is a composition of independent open-source tools and does not reduce to any input defined inside the paper itself.

Axiom & Free-Parameter Ledger

3 free parameters · 1 axioms · 1 invented entities

The contribution centers on a new orchestration layer rather than new mathematics or data; it depends on user-specified controls and the effectiveness of prior synthesizers without introducing fitted constants or new entities with independent evidence.

free parameters (3)

training size
User-controlled amount of real data used to train the synthesizers.
training epochs
User-controlled number of training iterations for the generators.
number of synthetic rows
User-controlled quantity of output synthetic samples.

axioms (1)

domain assumption Existing synthesizers (CTGAN, TVAE, GaussianCopula) and standard fairness/utility metrics are appropriate for protected health attributes such as race and gender.
Invoked when claiming comparable performance in the demo without new validation of these components.

invented entities (1)

Memisis interactive agent no independent evidence
purpose: To translate natural-language user goals into synthesizer selection and workflow execution via LLM.
New component presented as the core of the orchestration system.

pith-pipeline@v0.9.0 · 5764 in / 1327 out tokens · 69250 ms · 2026-05-20T12:07:48.274392+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The orchestrator refines the responses based on user inputs... synth_score = quality×fairness_mult

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages

[1]

synthetic-data-generator, 2024

Argilla. synthetic-data-generator, 2024. URL https://github.com/argilla-io/syntheti c-data-generator. Apache-2.0; Gradio app and distilabel-based pipelines

work page 2024
[2]

Fairlearn: A toolkit for assessing and improving fairness in AI

Sarah Bird, Miro Dudík, Richard Edgar, Brandon Horn, Roman Lutz, Vanessa Milan, Mehrnoosh Sameki, Hanna Wallach, and Kathleen Walker. Fairlearn: A toolkit for assessing and improving fairness in AI. Technical Report MSR-TR-2020-32, Microsoft, May 2020. URL https://www.microsoft.com/en-us/research/publication/fairlearn-a-toolkit-for -assessing-and-improvin...

work page 2020
[3]

Openml: Insights from 10 years and more than a thousand papers.Patterns, 6(7), 2025

Bernd Bischl, Giuseppe Casalicchio, Taniya Das, Matthias Feurer, Sebastian Fischer, Pieter Gijsbers, Subhaditya Mukherjee, Andreas C Müller, László Németh, Luis Oala, et al. Openml: Insights from 10 years and more than a thousand papers.Patterns, 6(7), 2025

work page 2025
[4]

Lavista Ferres, and Mihaela van der Schaar

Thomas Callender, Anders Boyd, Robert Davis, Silas Ruhrberg Estevez, Juan M. Lavista Ferres, and Mihaela van der Schaar. Synthcraft: An AI partner for synthetic data generation to support data access and augmentation in healthcare. Technical report, Microsoft Research,

work page
[5]

URL https://www.microsoft.com/en-us/research/publication/synthcraft-an-a i-partner-for-synthetic-data-generation-to-support-data-access-and-augmentat ion-in-healthcare/

work page
[6]

DataCebo, Inc., 03 2026

Synthetic Data Metrics. DataCebo, Inc., 03 2026. URLhttps://docs.sdv.dev/sdmetrics/. Version 0.28.0

work page 2026
[7]

Why bias in AI is a problem and why business leaders should care (fairness series part 1), May 2020

Alexandra Ebert. Why bias in AI is a problem and why business leaders should care (fairness series part 1), May 2020. URLhttps://mostly.ai/blog/why-bias-in-ai-is-a-problem . MOSTLY AI Blog. Accessed 2026-03-27

work page 2020
[8]

Influence of patient race and ethnicity on clinical assessment in patients with affective disorders.Archives of general psychiatry, 69(6):593–600, 2012

Michael A Gara, William A Vega, Stephan Arndt, Michael Escamilla, David E Fleck, William B Lawson, Ira Lesser, Harold W Neighbors, Daniel R Wilson, Lesley M Arnold, et al. Influence of patient race and ethnicity on clinical assessment in patients with affective disorders.Archives of general psychiatry, 69(6):593–600, 2012

work page 2012
[9]

A naturalistic study of racial disparities in diagnoses at an outpatient behavioral health clinic.Psychiatric Services, 70(2):130–134, 2019

Michael A Gara, Shula Minsky, Steven M Silverstein, Theresa Miskimen, and Stephen M Strakowski. A naturalistic study of racial disparities in diagnoses at an outpatient behavioral health clinic.Psychiatric Services, 70(2):130–134, 2019. 5

work page 2019
[10]

Michael Giuffrè and David L. Shung. Harnessing the power of synthetic data in healthcare: innovation, application, and privacy.npj Digital Medicine, 6(1):186, 2023. doi: 10.1038/s41746 -023-00927-3

work page doi:10.1038/s41746 2023
[11]

June 17, 2025.doi:10.1101/2025

Alon Gorenshtein, Mahmud Omar, Benjamin S. Glicksberg, Girish N. Nadkarni, and Eyal Klang. AI agents in clinical medicine: A systematic review.medRxiv, 2025. doi: 10.1101/2025 .08.22.25334232. Preprint; also available via PMCID PMC12407621

work page doi:10.1101/2025 2025
[12]

Generate synthetic data (IBM watsonx data platform), 2024

IBM. Generate synthetic data (IBM watsonx data platform), 2024. URLhttps://dataplat form.cloud.ibm.com/docs/content/wsj/getting-started/get-started-generate-data. html?context=wx. Accessed 2026-03-27

work page 2024
[13]

Synthetic data for AI/ML development, 2021

MOSTLY AI. Synthetic data for AI/ML development, 2021. URLhttps://mostly.ai/ use-case/synthetic-data-for-analytics-ai-training . Use case overview. Accessed 2026-03-27

work page 2021
[14]

Nitish Nagesh, Salar Shakibhamedan, Mahdi Bagheri, Ziyu Wang, Nima TaheriNejad, Axel Jantsch, and Amir M. Rahmani. FairTabGen: High-fidelity and fair synthetic health data generation from limited samples, 2025. URLhttps://arxiv.org/abs/2508.11810

work page arXiv 2025
[15]

Nitish Nagesh, Ziyu Wang, and Amir M. Rahmani. FairCauseSyn: Towards causally fair LLM-augmented synthetic data generation, 2025. URLhttps://arxiv.org/abs/2506.19082. Accepted to IEEE EMBC 2025

work page arXiv 2025
[16]

Meta-analysis of black vs

Charles M Olbert, Arundati Nagendra, and Benjamin Buck. Meta-analysis of black vs. white racial disparity in schizophrenia diagnosis in the united states: Do structured assessments attenuate racial disparities?Journal of abnormal psychology, 127(1):104, 2018

work page 2018
[17]

Patki, R

Neha Patki, Roy Wedge, and Kalyan Veeramachaneni. The synthetic data vault. InIEEE International Conference on Data Science and Advanced Analytics (DSAA), pages 399–410, Oct 2016. doi: 10.1109/DSAA.2016.49

work page doi:10.1109/dsaa.2016.49 2016
[18]

Tonic documentation: synthetic data platform, 2024

Tonic AI. Tonic documentation: synthetic data platform, 2024. URLhttps://docs.tonic.ai/. Accessed 2026-03-27

work page 2024
[19]

Creating synthetic data using Llama 3.1 405B, July 2024

Tanay Varshney and Chintan Patel. Creating synthetic data using Llama 3.1 405B, July 2024. URL https://developer.nvidia.com/blog/creating-synthetic-data-using-llama-3 -1-405b/. NVIDIA Technical Blog. Accessed 2026-03-27

work page 2024
[20]

Jason Walonoski, Mark Kramer, Joseph Nichols, Andre Quina, Chris Moesel, Dylan Hall, Carlton Duffett, Kudakwashe Dube, Thomas Gallagher, and Scott McLachlan. Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record.Journal of the American Medical Informatics Association, 25(3): ...

work page 2018

[1] [1]

synthetic-data-generator, 2024

Argilla. synthetic-data-generator, 2024. URL https://github.com/argilla-io/syntheti c-data-generator. Apache-2.0; Gradio app and distilabel-based pipelines

work page 2024

[2] [2]

Fairlearn: A toolkit for assessing and improving fairness in AI

Sarah Bird, Miro Dudík, Richard Edgar, Brandon Horn, Roman Lutz, Vanessa Milan, Mehrnoosh Sameki, Hanna Wallach, and Kathleen Walker. Fairlearn: A toolkit for assessing and improving fairness in AI. Technical Report MSR-TR-2020-32, Microsoft, May 2020. URL https://www.microsoft.com/en-us/research/publication/fairlearn-a-toolkit-for -assessing-and-improvin...

work page 2020

[3] [3]

Openml: Insights from 10 years and more than a thousand papers.Patterns, 6(7), 2025

Bernd Bischl, Giuseppe Casalicchio, Taniya Das, Matthias Feurer, Sebastian Fischer, Pieter Gijsbers, Subhaditya Mukherjee, Andreas C Müller, László Németh, Luis Oala, et al. Openml: Insights from 10 years and more than a thousand papers.Patterns, 6(7), 2025

work page 2025

[4] [4]

Lavista Ferres, and Mihaela van der Schaar

Thomas Callender, Anders Boyd, Robert Davis, Silas Ruhrberg Estevez, Juan M. Lavista Ferres, and Mihaela van der Schaar. Synthcraft: An AI partner for synthetic data generation to support data access and augmentation in healthcare. Technical report, Microsoft Research,

work page

[5] [5]

URL https://www.microsoft.com/en-us/research/publication/synthcraft-an-a i-partner-for-synthetic-data-generation-to-support-data-access-and-augmentat ion-in-healthcare/

work page

[6] [6]

DataCebo, Inc., 03 2026

Synthetic Data Metrics. DataCebo, Inc., 03 2026. URLhttps://docs.sdv.dev/sdmetrics/. Version 0.28.0

work page 2026

[7] [7]

Why bias in AI is a problem and why business leaders should care (fairness series part 1), May 2020

Alexandra Ebert. Why bias in AI is a problem and why business leaders should care (fairness series part 1), May 2020. URLhttps://mostly.ai/blog/why-bias-in-ai-is-a-problem . MOSTLY AI Blog. Accessed 2026-03-27

work page 2020

[8] [8]

Influence of patient race and ethnicity on clinical assessment in patients with affective disorders.Archives of general psychiatry, 69(6):593–600, 2012

Michael A Gara, William A Vega, Stephan Arndt, Michael Escamilla, David E Fleck, William B Lawson, Ira Lesser, Harold W Neighbors, Daniel R Wilson, Lesley M Arnold, et al. Influence of patient race and ethnicity on clinical assessment in patients with affective disorders.Archives of general psychiatry, 69(6):593–600, 2012

work page 2012

[9] [9]

A naturalistic study of racial disparities in diagnoses at an outpatient behavioral health clinic.Psychiatric Services, 70(2):130–134, 2019

Michael A Gara, Shula Minsky, Steven M Silverstein, Theresa Miskimen, and Stephen M Strakowski. A naturalistic study of racial disparities in diagnoses at an outpatient behavioral health clinic.Psychiatric Services, 70(2):130–134, 2019. 5

work page 2019

[10] [10]

Michael Giuffrè and David L. Shung. Harnessing the power of synthetic data in healthcare: innovation, application, and privacy.npj Digital Medicine, 6(1):186, 2023. doi: 10.1038/s41746 -023-00927-3

work page doi:10.1038/s41746 2023

[11] [11]

June 17, 2025.doi:10.1101/2025

Alon Gorenshtein, Mahmud Omar, Benjamin S. Glicksberg, Girish N. Nadkarni, and Eyal Klang. AI agents in clinical medicine: A systematic review.medRxiv, 2025. doi: 10.1101/2025 .08.22.25334232. Preprint; also available via PMCID PMC12407621

work page doi:10.1101/2025 2025

[12] [12]

Generate synthetic data (IBM watsonx data platform), 2024

IBM. Generate synthetic data (IBM watsonx data platform), 2024. URLhttps://dataplat form.cloud.ibm.com/docs/content/wsj/getting-started/get-started-generate-data. html?context=wx. Accessed 2026-03-27

work page 2024

[13] [13]

Synthetic data for AI/ML development, 2021

MOSTLY AI. Synthetic data for AI/ML development, 2021. URLhttps://mostly.ai/ use-case/synthetic-data-for-analytics-ai-training . Use case overview. Accessed 2026-03-27

work page 2021

[14] [14]

Nitish Nagesh, Salar Shakibhamedan, Mahdi Bagheri, Ziyu Wang, Nima TaheriNejad, Axel Jantsch, and Amir M. Rahmani. FairTabGen: High-fidelity and fair synthetic health data generation from limited samples, 2025. URLhttps://arxiv.org/abs/2508.11810

work page arXiv 2025

[15] [15]

Nitish Nagesh, Ziyu Wang, and Amir M. Rahmani. FairCauseSyn: Towards causally fair LLM-augmented synthetic data generation, 2025. URLhttps://arxiv.org/abs/2506.19082. Accepted to IEEE EMBC 2025

work page arXiv 2025

[16] [16]

Meta-analysis of black vs

Charles M Olbert, Arundati Nagendra, and Benjamin Buck. Meta-analysis of black vs. white racial disparity in schizophrenia diagnosis in the united states: Do structured assessments attenuate racial disparities?Journal of abnormal psychology, 127(1):104, 2018

work page 2018

[17] [17]

Patki, R

Neha Patki, Roy Wedge, and Kalyan Veeramachaneni. The synthetic data vault. InIEEE International Conference on Data Science and Advanced Analytics (DSAA), pages 399–410, Oct 2016. doi: 10.1109/DSAA.2016.49

work page doi:10.1109/dsaa.2016.49 2016

[18] [18]

Tonic documentation: synthetic data platform, 2024

Tonic AI. Tonic documentation: synthetic data platform, 2024. URLhttps://docs.tonic.ai/. Accessed 2026-03-27

work page 2024

[19] [19]

Creating synthetic data using Llama 3.1 405B, July 2024

Tanay Varshney and Chintan Patel. Creating synthetic data using Llama 3.1 405B, July 2024. URL https://developer.nvidia.com/blog/creating-synthetic-data-using-llama-3 -1-405b/. NVIDIA Technical Blog. Accessed 2026-03-27

work page 2024

[20] [20]

Jason Walonoski, Mark Kramer, Joseph Nichols, Andre Quina, Chris Moesel, Dylan Hall, Carlton Duffett, Kudakwashe Dube, Thomas Gallagher, and Scott McLachlan. Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record.Journal of the American Medical Informatics Association, 25(3): ...

work page 2018