Memisis: Orchestrating and Evaluating Synthetic Data for Tabular Health Datasets
Pith reviewed 2026-05-20 12:07 UTC · model grok-4.3
The pith
Memisis uses a language model agent to orchestrate synthetic data generation and evaluation for tabular health datasets.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Memisis orchestrates and evaluates synthetic data for tabular health datasets by using an interactive agent driven by a local large language model. Users express their goals for the synthetic data, and the agent handles selecting among synthesizers like CTGAN, TVAE, and GaussianCopula, setting parameters such as training size and epochs, generating the data, and running evaluations for privacy, utility, and fairness. This creates a unified process instead of separate steps for generation and checking.
What carries the argument
An interactive agent powered by a local language model that interprets user goals to select, configure, and evaluate synthetic data tools.
If this is right
- Users specify goals for synthetic data instead of tuning individual parameters.
- The system runs necessary evaluations for privacy, utility, and fairness automatically.
- Control is retained over training size, epochs, and number of synthetic samples.
- Comparable performance across synthesizers is observed in the schizophrenia dataset example.
Where Pith is reading between the lines
- Non-experts in data synthesis could more easily produce suitable synthetic health datasets for their needs.
- Keeping the language model local helps avoid sending health-related instructions to external services.
- Similar orchestration ideas might apply to synthetic data tasks in other fields like finance or social sciences.
Load-bearing premise
The local language model agent can accurately figure out the user's intentions and choose the correct synthesizers and settings without errors or added biases.
What would settle it
Running Memisis with a clear user goal on a test dataset and finding that the produced synthetic data scores much worse on utility or fairness than data made by directly using the synthesizers with expert settings.
Figures
read the original abstract
Synthetic data is widely used in healthcare to create datasets that are similar to original data but without the privacy concerns. Generating and evaluating synthetic data across privacy, utility and fairness is crucial for facilitating high quality data availability for downstream prediction tasks and clinical decision making. We present Memisis, a tool that orchestrates and evaluates synthetic data by leveraging existing synthetic data tools, the power of large language models and state-of-the-art evaluation metrics. Our tool creates a unified workflow for data generation, validation and evaluation. Users have control over the training size, training epochs and the number of synthetic rows to sample. Instead of knobs to tune synthetic data, the interactive agent allows users to specify their synthetic data generation goals and the tool will orchestrate the workflow by leveraging existing tools while performing the requisite evaluation. For the demo, we use an open source schizophrenia dataset with protected attributes related to race and gender, three different synthesizers and a local language model to orchestrate the workflow. We observe that CTGAN, TVAE and GaussianCopula have comparable performance across fairness and utility metrics. The workflow allows users flexibility and control over the data generation and evaluation process.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents Memisis, a tool that orchestrates and evaluates synthetic data for tabular health datasets. It integrates existing synthesizers (CTGAN, TVAE, GaussianCopula), a local LLM-powered interactive agent to interpret user-specified goals, and state-of-the-art metrics for privacy, utility, and fairness. Users control training size, epochs, and synthetic row count. A single demo on an open schizophrenia dataset with race and gender attributes reports comparable performance across the three synthesizers on fairness and utility metrics, claiming a unified workflow without manual knob tuning.
Significance. If the LLM agent's goal interpretation and synthesizer selection prove reliable without introducing unmeasured biases, Memisis would offer a practical, accessible framework for synthetic health data generation that lowers barriers for downstream clinical tasks. The explicit leverage of pre-existing open tools and metrics is a strength that supports reproducibility and reduces reinvention.
major comments (2)
- [Abstract] Abstract: the central claim that the interactive agent 'accurately interpret[s] user goals and reliably select[s] and configure[s] existing synthesizers' without new biases rests on an untested assumption; no accuracy, consistency, or bias metrics for the agent itself, nor any comparison against expert or exhaustive baselines, are reported.
- [Demo] Demo description: the observation that CTGAN, TVAE, and GaussianCopula 'have comparable performance across fairness and utility metrics' is presented as a single high-level result without statistical tests, error analysis, multiple runs, or validation beyond the observation, limiting support for the evaluation component of the unified workflow.
minor comments (2)
- The manuscript would benefit from an explicit description of the prompt templates or decision logic used by the local LLM agent to map user goals to synthesizer configurations.
- Clarify the exact privacy metrics employed and how they are computed within the evaluation pipeline, as this is central to health-data applications.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and positive recognition of Memisis as a practical framework. We address each major comment below with honest revisions where the manuscript requires strengthening.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that the interactive agent 'accurately interpret[s] user goals and reliably select[s] and configure[s] existing synthesizers' without new biases rests on an untested assumption; no accuracy, consistency, or bias metrics for the agent itself, nor any comparison against expert or exhaustive baselines, are reported.
Authors: We agree that the manuscript does not report quantitative metrics for the LLM agent's accuracy, consistency, or potential biases in goal interpretation and synthesizer selection. The presented work emphasizes the orchestration workflow and a single-dataset demonstration rather than a dedicated agent evaluation study. We will revise the abstract to qualify or remove the phrasing implying reliable selection without new biases. We will also add explicit discussion of the agent as an interface layer whose performance is not yet benchmarked, along with a clear statement of this limitation and planned future comparisons to expert baselines. These changes will appear in the revised manuscript. revision: yes
-
Referee: [Demo] Demo description: the observation that CTGAN, TVAE, and GaussianCopula 'have comparable performance across fairness and utility metrics' is presented as a single high-level result without statistical tests, error analysis, multiple runs, or validation beyond the observation, limiting support for the evaluation component of the unified workflow.
Authors: We acknowledge that the demo presents a single high-level observation without statistical tests, error analysis, or multiple runs. The section was intended to illustrate the end-to-end workflow on an open schizophrenia dataset rather than to serve as a comprehensive benchmark. We will revise the demo section to include results aggregated over multiple independent runs, report means and standard deviations for the fairness and utility metrics, and add basic statistical comparisons (e.g., paired tests where applicable) to provide stronger support for the evaluation claims. These enhancements will be incorporated in the next version. revision: yes
Circularity Check
No circularity; tool composes external components without self-referential reductions.
full rationale
The manuscript presents Memisis as an orchestration layer over pre-existing synthesizers (CTGAN, TVAE, GaussianCopula), a local LLM agent, and standard privacy/utility/fairness metrics. No equations, fitted parameters renamed as predictions, or derivation chains appear. The single demo observation of comparable metrics on one dataset is an empirical report, not a quantity forced by construction or by self-citation. The central workflow claim is a composition of independent open-source tools and does not reduce to any input defined inside the paper itself.
Axiom & Free-Parameter Ledger
free parameters (3)
- training size
- training epochs
- number of synthetic rows
axioms (1)
- domain assumption Existing synthesizers (CTGAN, TVAE, GaussianCopula) and standard fairness/utility metrics are appropriate for protected health attributes such as race and gender.
invented entities (1)
-
Memisis interactive agent
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The orchestrator refines the responses based on user inputs... synth_score = quality×fairness_mult
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
synthetic-data-generator, 2024
Argilla. synthetic-data-generator, 2024. URL https://github.com/argilla-io/syntheti c-data-generator. Apache-2.0; Gradio app and distilabel-based pipelines
work page 2024
-
[2]
Fairlearn: A toolkit for assessing and improving fairness in AI
Sarah Bird, Miro Dudík, Richard Edgar, Brandon Horn, Roman Lutz, Vanessa Milan, Mehrnoosh Sameki, Hanna Wallach, and Kathleen Walker. Fairlearn: A toolkit for assessing and improving fairness in AI. Technical Report MSR-TR-2020-32, Microsoft, May 2020. URL https://www.microsoft.com/en-us/research/publication/fairlearn-a-toolkit-for -assessing-and-improvin...
work page 2020
-
[3]
Openml: Insights from 10 years and more than a thousand papers.Patterns, 6(7), 2025
Bernd Bischl, Giuseppe Casalicchio, Taniya Das, Matthias Feurer, Sebastian Fischer, Pieter Gijsbers, Subhaditya Mukherjee, Andreas C Müller, László Németh, Luis Oala, et al. Openml: Insights from 10 years and more than a thousand papers.Patterns, 6(7), 2025
work page 2025
-
[4]
Lavista Ferres, and Mihaela van der Schaar
Thomas Callender, Anders Boyd, Robert Davis, Silas Ruhrberg Estevez, Juan M. Lavista Ferres, and Mihaela van der Schaar. Synthcraft: An AI partner for synthetic data generation to support data access and augmentation in healthcare. Technical report, Microsoft Research,
-
[5]
URL https://www.microsoft.com/en-us/research/publication/synthcraft-an-a i-partner-for-synthetic-data-generation-to-support-data-access-and-augmentat ion-in-healthcare/
-
[6]
Synthetic Data Metrics. DataCebo, Inc., 03 2026. URLhttps://docs.sdv.dev/sdmetrics/. Version 0.28.0
work page 2026
-
[7]
Why bias in AI is a problem and why business leaders should care (fairness series part 1), May 2020
Alexandra Ebert. Why bias in AI is a problem and why business leaders should care (fairness series part 1), May 2020. URLhttps://mostly.ai/blog/why-bias-in-ai-is-a-problem . MOSTLY AI Blog. Accessed 2026-03-27
work page 2020
-
[8]
Michael A Gara, William A Vega, Stephan Arndt, Michael Escamilla, David E Fleck, William B Lawson, Ira Lesser, Harold W Neighbors, Daniel R Wilson, Lesley M Arnold, et al. Influence of patient race and ethnicity on clinical assessment in patients with affective disorders.Archives of general psychiatry, 69(6):593–600, 2012
work page 2012
-
[9]
Michael A Gara, Shula Minsky, Steven M Silverstein, Theresa Miskimen, and Stephen M Strakowski. A naturalistic study of racial disparities in diagnoses at an outpatient behavioral health clinic.Psychiatric Services, 70(2):130–134, 2019. 5
work page 2019
-
[10]
Michael Giuffrè and David L. Shung. Harnessing the power of synthetic data in healthcare: innovation, application, and privacy.npj Digital Medicine, 6(1):186, 2023. doi: 10.1038/s41746 -023-00927-3
-
[11]
June 17, 2025.doi:10.1101/2025
Alon Gorenshtein, Mahmud Omar, Benjamin S. Glicksberg, Girish N. Nadkarni, and Eyal Klang. AI agents in clinical medicine: A systematic review.medRxiv, 2025. doi: 10.1101/2025 .08.22.25334232. Preprint; also available via PMCID PMC12407621
-
[12]
Generate synthetic data (IBM watsonx data platform), 2024
IBM. Generate synthetic data (IBM watsonx data platform), 2024. URLhttps://dataplat form.cloud.ibm.com/docs/content/wsj/getting-started/get-started-generate-data. html?context=wx. Accessed 2026-03-27
work page 2024
-
[13]
Synthetic data for AI/ML development, 2021
MOSTLY AI. Synthetic data for AI/ML development, 2021. URLhttps://mostly.ai/ use-case/synthetic-data-for-analytics-ai-training . Use case overview. Accessed 2026-03-27
work page 2021
- [14]
- [15]
-
[16]
Charles M Olbert, Arundati Nagendra, and Benjamin Buck. Meta-analysis of black vs. white racial disparity in schizophrenia diagnosis in the united states: Do structured assessments attenuate racial disparities?Journal of abnormal psychology, 127(1):104, 2018
work page 2018
-
[17]
Neha Patki, Roy Wedge, and Kalyan Veeramachaneni. The synthetic data vault. InIEEE International Conference on Data Science and Advanced Analytics (DSAA), pages 399–410, Oct 2016. doi: 10.1109/DSAA.2016.49
-
[18]
Tonic documentation: synthetic data platform, 2024
Tonic AI. Tonic documentation: synthetic data platform, 2024. URLhttps://docs.tonic.ai/. Accessed 2026-03-27
work page 2024
-
[19]
Creating synthetic data using Llama 3.1 405B, July 2024
Tanay Varshney and Chintan Patel. Creating synthetic data using Llama 3.1 405B, July 2024. URL https://developer.nvidia.com/blog/creating-synthetic-data-using-llama-3 -1-405b/. NVIDIA Technical Blog. Accessed 2026-03-27
work page 2024
-
[20]
Jason Walonoski, Mark Kramer, Joseph Nichols, Andre Quina, Chris Moesel, Dylan Hall, Carlton Duffett, Kudakwashe Dube, Thomas Gallagher, and Scott McLachlan. Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record.Journal of the American Medical Informatics Association, 25(3): ...
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.