CasualSynth: Generating Structurally Sound Synthetic Data

Jiahao Sun; Thomas Lukasiewicz; Wei Dai; Zehua Cheng

arxiv: 2605.17528 · v1 · pith:ATEWO5Z2new · submitted 2026-05-17 · 💻 cs.LG · cs.AI· cs.CL

CasualSynth: Generating Structurally Sound Synthetic Data

Zehua Cheng , Wei Dai , Jiahao Sun , Thomas Lukasiewicz This is my paper

Pith reviewed 2026-05-20 14:12 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CL

keywords synthetic data generationcausal structureslarge language modelsstructural causal modelsconditional independenceinterventional datadata augmentation

0 comments

The pith

CausalSynth generates causally valid synthetic data by decoupling structure generation from LLM realization and using iterative verification to correct violations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents CausalSynth to generate synthetic data that respects the causal mechanisms of a target domain rather than merely appearing realistic. It first uses a structural causal model to produce variable assignments that obey the independencies implied by a given DAG through ancestral sampling. An LLM then translates each assignment into rich observations such as clinical notes, while an iterative verification step extracts violations and returns targeted fixes to the model. This setup matters for applications that need reliable causal synthetic data, for instance when training or evaluating models on interventional questions without access to real records that preserve privacy or ethics constraints.

Core claim

CausalSynth decouples causal structure generation from semantic realization. A Structural Causal Model generates causal skeletons that satisfy the Global Markov Property via ancestral sampling. An LLM acts as a constrained realizer that maps each skeleton to high-dimensional observations. An Iterative Consistency Verification module detects structural violations through deterministic extraction and feeds targeted corrections back to the LLM, forming a closed-loop refinement. The framework identifies the Semantic Backdoor problem in which LLMs override imposed causal facts with pre-training priors and shows that the iterative mechanism reduces the resulting selection bias relative to standard

What carries the argument

The Iterative Consistency Verification module, which performs deterministic extraction of structural violations from LLM outputs and feeds targeted corrections back to the LLM to close the refinement loop.

If this is right

Preserves conditional independencies with false-positive rates near the nominal α=0.05 level across ASIA, ALARM, and MIMIC-Struct benchmarks.
Achieves realizability rates above 96 percent with 70B-parameter LLM backbones.
Enables principled interventional and counterfactual data generation through noise retention and graph mutilation.
Reduces selection bias arising from the Semantic Backdoor relative to standard rejection sampling.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same skeleton-plus-verification pattern could be reused to create large synthetic datasets for causal discovery algorithms in domains such as economics or genomics.
If the verification loop remains efficient at scale, it might allow privacy-preserving training corpora for causal reasoning models that would otherwise require restricted real-world records.
Testing the method on DAGs that are themselves estimated from data rather than supplied in advance would reveal whether the approach tolerates uncertainty in the underlying structure.

Load-bearing premise

The Iterative Consistency Verification module can reliably detect structural violations through deterministic extraction and reduce selection bias by feeding corrections back to the LLM without introducing new unmeasured distortions.

What would settle it

If false-positive rates for conditional-independence tests on the ALARM benchmark rise well above the nominal 0.05 level or if realizability falls below 90 percent under 70B-parameter backbones, the claim that the framework reliably produces causally sound data would be undermined.

Figures

Figures reproduced from arXiv: 2605.17528 by Jiahao Sun, Thomas Lukasiewicz, Wei Dai, Zehua Cheng.

**Figure 1.** Figure 1: The CausalSynth Generation Pipeline. The framework operates in three main phases. In Phase I, a causal skeleton [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: Adjacency-matrix difference (Learned − Oracle) for ALARM. Errors are spatially localized; the global topology is preserved. the residual edges by physiological subsystem and edge type. Of [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

read the original abstract

Large Language Models (LLMs) generate realistic synthetic data but offer no guarantee that their outputs respect the causal mechanisms governing the target domain. We introduce CausalSynth, a framework that decouples causal structure generation from semantic realization, yielding synthetic data that is both causally valid and linguistically rich. The framework operates in three phases. First, a Structural Causal Model (SCM) - a tuple of structural equations defined over a directed acyclic graph (DAG) generates causal skeletons, i.e., variable assignments that satisfy the Global Markov Property of the governing DAG, via ancestral sampling. Second, an LLM acts as a constrained \emph{realizer}, a conditional translator that maps each skeleton to a high-dimensional observation such as a clinical note or a transaction log. Third, an Iterative Consistency Verification module detects structural violations through deterministic extraction and feeds targeted corrections back to the LLM, forming a closed-loop refinement process. We identify the Semantic Backdoor problem the systematic tendency of LLMs to override imposed causal facts with pre-training priors -- and prove that our iterative mechanism reduces the resulting selection bias relative to standard rejection sampling. On three causal benchmarks (ASIA, ALARM, and MIMIC-Struct), CausalSynth preserved conditional independencies with false-positive rates near the nominal $\alpha=0.05$ level and achieved realizability rates above 96% with 70B-parameter LLM backbones. The framework additionally supports principled interventional and counterfactual generation through noise retention and graph mutilation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CausalSynth assembles SCM sampling, LLM realization, and an iterative correction loop to tackle semantic backdoor bias in synthetic data, but the extraction step in verification needs more concrete evidence to support the independence claims.

read the letter

The main point is a three-phase setup that first draws causally consistent assignments from a structural causal model via ancestral sampling, then has an LLM turn those into rich outputs like clinical notes, and finally runs an iterative check that extracts variables and relations deterministically to spot violations and prompt fixes. They also flag the semantic backdoor where LLMs fall back on training priors and claim the loop cuts selection bias versus plain rejection sampling. That assembly is new even if the pieces are familiar from prior work on SCMs and prompting. The benchmark numbers on ASIA, ALARM, and MIMIC-Struct are the strongest part: false-positive rates for conditional independencies sit near the nominal 0.05 level and realizability clears 96 percent with 70B models, which suggests the outputs stay usable for downstream causal tasks. The paper also notes support for interventions and counterfactuals through graph changes and noise retention, which is a practical plus. The soft spots sit in the verification module. Deterministic extraction from free-form LLM text, especially high-dimensional clinical notes, is unlikely to be complete or error-free without explicit constraints or auxiliary tools, and missed violations would inflate the reported independence preservation. The abstract mentions a proof of bias reduction but gives no quantitative metrics, extraction accuracy ablations, or checks on whether corrections add new distortions. These are real gaps rather than minor omissions. The work targets causal inference researchers who need private or scarce data for testing methods or training models in domains like medicine. A reader looking for a concrete pipeline to generate structurally sound synthetic records would find the description and results useful. It deserves a serious referee because the central idea is coherent, the claims are falsifiable on the cited benchmarks, and the empirical protocol can be checked once the extraction details are filled in. I would send it to review with requests for clearer implementation specs on the correction loop and additional diagnostics.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces CausalSynth, a framework that decouples causal structure generation from semantic realization for producing synthetic data. It uses a Structural Causal Model (SCM) over a DAG to generate causal skeletons via ancestral sampling that satisfy the Global Markov Property, an LLM as a constrained realizer to map skeletons to high-dimensional outputs such as clinical notes, and an Iterative Consistency Verification module that performs deterministic extraction to detect violations and feeds targeted corrections back to the LLM in a closed loop. The authors identify the Semantic Backdoor problem (LLMs overriding causal facts with pre-training priors) and claim their mechanism reduces selection bias relative to rejection sampling. On ASIA, ALARM, and MIMIC-Struct benchmarks, the method preserves conditional independencies with false-positive rates near nominal α=0.05 and achieves realizability rates above 96% with 70B-parameter LLMs, while also supporting interventional and counterfactual generation via noise retention and graph mutilation.

Significance. If the empirical claims hold under rigorous validation, the work could be significant for generating causally consistent synthetic data in domains like healthcare and causal discovery, where preserving conditional independencies matters for downstream inference. The modular separation of SCM-based skeletons from LLM realization, the explicit treatment of the Semantic Backdoor, and the closed-loop refinement mechanism are conceptually useful contributions. The reported results on standard causal benchmarks provide an initial evaluation point, and the support for interventions/counterfactuals via graph operations is a practical strength.

major comments (2)

[Iterative Consistency Verification module] Iterative Consistency Verification module (as described in the abstract and methods): the manuscript supplies no implementation details on the deterministic extraction process, the exact correction prompts, or whether false-positive rates incorporate multiple-testing corrections. This is load-bearing for the central claim, because preservation of conditional independencies at α=0.05 on MIMIC-Struct (free-text clinical notes) depends on reliable detection of structural violations; any incompleteness in extraction would inflate apparent success rates.
[Abstract and empirical results] Abstract and empirical results: the claim that the iterative mechanism 'proves' bias reduction versus rejection sampling is asserted without quantitative bias metrics, ablation studies on extraction accuracy, or evidence that corrections reduce Semantic Backdoor effects without introducing new unmeasured distortions. This directly affects the assertion that the closed-loop approach outperforms standard rejection sampling.

minor comments (2)

Add a concrete example of skeleton-to-realization mapping and one full iteration of the verification loop to improve clarity of the three-phase pipeline.
Clarify how realizability rate is operationalized (e.g., exact criteria for a valid high-dimensional observation) and report per-benchmark breakdowns rather than aggregate figures.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which identify key areas for improving the clarity and empirical support of our framework. We address each major comment below and describe the revisions planned for the next version of the manuscript.

read point-by-point responses

Referee: [Iterative Consistency Verification module] Iterative Consistency Verification module (as described in the abstract and methods): the manuscript supplies no implementation details on the deterministic extraction process, the exact correction prompts, or whether false-positive rates incorporate multiple-testing corrections. This is load-bearing for the central claim, because preservation of conditional independencies at α=0.05 on MIMIC-Struct (free-text clinical notes) depends on reliable detection of structural violations; any incompleteness in extraction would inflate apparent success rates.

Authors: We agree that the current manuscript does not provide sufficient implementation details on the Iterative Consistency Verification module. In the revised version we will add a dedicated subsection in the Methods that specifies the deterministic extraction rules for detecting violations of the Global Markov Property, the exact correction prompt templates passed back to the LLM, and the statistical procedure used to compute false-positive rates (including whether any multiple-testing correction such as Bonferroni was applied). We will also report the accuracy of the extraction step on a validation subset of MIMIC-Struct to address concerns about potential incompleteness. revision: yes
Referee: [Abstract and empirical results] Abstract and empirical results: the claim that the iterative mechanism 'proves' bias reduction versus rejection sampling is asserted without quantitative bias metrics, ablation studies on extraction accuracy, or evidence that corrections reduce Semantic Backdoor effects without introducing new unmeasured distortions. This directly affects the assertion that the closed-loop approach outperforms standard rejection sampling.

Authors: The manuscript contains a theoretical argument that the iterative correction loop reduces selection bias relative to rejection sampling by retaining and repairing samples instead of discarding them. We acknowledge, however, that the current version lacks explicit quantitative bias metrics and dedicated ablation studies. In the revision we will add an ablation comparing the iterative method against rejection sampling on the ASIA and ALARM benchmarks, reporting direct measures of Semantic Backdoor incidence before and after correction as well as additional causal-consistency metrics to check for new distortions. We maintain that the reported conditional-independence preservation and realizability rates provide supporting evidence, but we will strengthen the empirical section with the requested quantitative comparisons. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical results are direct measurements

full rationale

The paper reports empirical performance on fixed external benchmarks (ASIA, ALARM, MIMIC-Struct) as direct measurements of conditional independence preservation (FPR near α=0.05) and realizability (>96%). These quantities are not derived from fitted parameters or self-referential predictions. The Iterative Consistency Verification module and claimed proof of bias reduction versus rejection sampling are described procedurally without equations that reduce the reported rates to the inputs by construction. No self-citation chain or ansatz is invoked as load-bearing for the central claims. The derivation chain remains self-contained against the stated benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on the unproven assumption that an LLM can be turned into a reliable constrained realizer and that deterministic extraction from generated text accurately recovers the implied causal graph.

axioms (2)

domain assumption LLMs can be constrained to respect externally supplied causal facts when given targeted corrections
Invoked in the description of the Iterative Consistency Verification module and the claim that it reduces selection bias.
standard math Ancestral sampling from an SCM produces variable assignments that satisfy the Global Markov Property
Stated as the first phase of the framework.

pith-pipeline@v0.9.0 · 5797 in / 1379 out tokens · 36567 ms · 2026-05-20T14:12:49.901429+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat recovery / orbit embedding unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Phase I constructs the causal skeleton v by drawing samples from the joint distribution implied by the SCM, P_M(V). ... ancestral sampling ... produces samples whose joint distribution factorizes according to G by construction.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel / Jcost uniqueness unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We identify the Semantic Backdoor problem ... and prove that our iterative mechanism reduces the resulting selection bias relative to standard rejection sampling (Theorem 2).

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · 6 internal anchors

[1]

Ingo A Beinlich, Henri Jacques Suermondt, R Martin Chavez, and Gregory F Cooper. 1989. The ALARM monitoring system: A case study with two proba- bilistic inference techniques for belief networks. InAIME 89: Second European Conference on Artificial Intelligence in Medicine, London, August 29th–31st 1989. Proceedings. Springer, 247–256

work page 1989
[2]

Emily M Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. InProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 610–623

work page 2021
[3]

2006.Pattern Recognition and Machine Learning

Christopher M Bishop. 2006.Pattern Recognition and Machine Learning. Springer

work page 2006
[4]

Vadim Borisov, Kathrin Sessler, Tobias Leemann, Martin Pawelczyk, and Gjergji Kasneci. 2023. Language Models are Realistic Tabular Data Generators. InInter- national Conference on Learning Representations

work page 2023
[5]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al . 2020. Language Models are Few-Shot Learners. InAdvances in Neural Information Processing Systems, Vol. 33. 1877–1901

work page 2020
[6]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. 2021. Evaluating Large Language Models Trained on Code.arXiv preprint arXiv:2107.03374(2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[7]

David Maxwell Chickering. 2002. Optimal Structure Identification With Greedy Search. InJournal of Machine Learning Research, Vol. 3. 507–554

work page 2002
[8]

Tomas Geffner, Javier Antoran, Adam Foster, Wenbo Gong, Chao Ma, Emre Kiciman, Amit Sharma, Angus Lamb, Martin Kukla, Nick Pawlowski, Miltiadis Allamanis, and Cheng Zhang. 2022. Deep End-to-end Causal Inference. InWork- shop on Causal Representation Learning at NeurIPS

work page 2022
[9]

Saibo Geng, Martin Josifoski, Maxime Peyrard, and Robert West. 2023. Grammar-Constrained Decoding for Structured NLP Generation.arXiv preprint arXiv:2305.13971(2023)

work page arXiv 2023
[10]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. InAdvances in Neural Information Processing Systems, Vol. 27

work page 2014
[11]

Pengcheng He, Jianfeng Gao, and Weizhu Chen. 2021. DeBERTaV3: Improv- ing DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing. arXiv:2111.09543 [cs.CL]

work page internal anchor Pith review Pith/arXiv arXiv 2021
[12]

Or Honovich, Roee Aharoni, Jonathan Herzig, Hagai Taitelbaum, Doron Kuk- liansy, Vered Cohen, Thomas Scialom, Idan Szpektor, Avinatan Hassidim, and Yossi Matias. 2022. TRUE: Re-evaluating Factual Consistency Evaluation. In Proceedings of NAACL-HLT

work page 2022
[13]

Maximilian Ilse, Patrick Forré, Max Welling, and Joris M Mooij. 2022. Combining Interventional and Observational Data Using Causal Reductions. InAdvances in Approximate Bayesian Inference (AABI)

work page 2022
[14]

Adrián Javaloy, Pablo Sánchez-Martín, and Isabel Valera. 2023. Causal Normaliz- ing Flows: From Theory to Practice. InAdvances in Neural Information Processing Systems, Vol. 36

work page 2023
[15]

Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. 2023. Survey of Hallucination in Natural Language Generation.Comput. Surveys55, 12 (2023), 1–38

work page 2023
[16]

Zhijing Jin, Yuen Chen, Felix Leber, Luigi Gresele, Ojasv Kamath, Bernhard Schölkopf, et al. 2024. CLadder: Assessing Causal Reasoning in Language Models. Advances in Neural Information Processing Systems36 (2024)

work page 2024
[17]

Alistair E W Johnson, Tom J Pollard, Lu Shen, Li-wei H Lehman, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark. 2016. MIMIC-III, a freely accessible critical care database.Scientific Data3 (2016), 160035

work page 2016
[18]

James Jordon, Lukasz Szpruch, Florimond Houssiau, Tom Sherborne, et al. 2022. Synthetic data–what, why and how?arXiv preprint arXiv:2205.03257(2022)

work page arXiv 2022
[19]

Diviyan Kalainathan, Olivier Goudet, and Ritik Dutta. 2020. Causal discovery toolbox: Uncovering causal relationships in python.Journal of Machine Learning Research21, 37 (2020), 1–5

work page 2020
[20]

Emre Kıcıman, Robert Ness, Amit Sharma, and Chenhao Tan. 2023. Causal Reasoning and Large Language Models: Opening a New Frontier for Causality. arXiv preprint arXiv:2305.00050(2023)

work page arXiv 2023
[21]

Diederik P Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. arXiv preprint arXiv:1312.6114(2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[22]

Murat Kocaoglu, Christopher Snyder, Alexandros G Dimakis, and Sriram Vish- wanath. 2018. CausalGAN: Learning Causal Implicit Generative Models with Adversarial Training. InInternational Conference on Learning Representations

work page 2018
[23]

2009.Probabilistic Graphical Models: Principles and Techniques

Daphne Koller and Nir Friedman. 2009.Probabilistic Graphical Models: Principles and Techniques. MIT Press

work page 2009
[24]

Akim Kotelnikov, Dmitry Baranchuk, Ivan Rubachev, and Artem Babenko. 2023. TabDDPM: Modelling Tabular Data with Diffusion Models. InInternational Con- ference on Machine Learning. 17564–17579

work page 2023
[25]

Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. 2017. Counterfac- tual Fairness. InAdvances in Neural Information Processing Systems, Vol. 30

work page 2017
[26]

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E Gonzalez, Hao Zhang, and Ion Stoica. 2023. vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention. InProceedings of the 29th Symposium on Operating Systems Principles. 611–626

work page 2023
[27]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al

work page
[28]

In Advances in Neural Information Processing Systems, Vol

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In Advances in Neural Information Processing Systems, Vol. 33. 9459–9474

work page
[29]

Gary Marcus. 2018. Deep Learning: A Critical Appraisal.arXiv preprint arXiv:1801.00631(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[30]

Joshua Maynez, Shashi Narayan, Bernd Bohnet, and Ryan McDonald. 2020. On Faithfulness and Factuality in Abstractive Summarization. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 1906–1919

work page 2020
[31]

2021.Synthetic Data for Deep Learning

Sergey I Nikolenko. 2021.Synthetic Data for Deep Learning. Springer

work page 2021
[32]

OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]

work page internal anchor Pith review Pith/arXiv arXiv 2023
[33]

Shirui Pan, Linhao Luo, Yufei Wang, Chen Chen, Jiapu Wang, and Xindong Wu

work page
[34]

IEEE Transactions on Knowledge and Data Engineering(2024)

Unifying Large Language Models and Knowledge Graphs: A Roadmap. IEEE Transactions on Knowledge and Data Engineering(2024)

work page 2024
[35]

Nick Pawlowski, Daniel Coelho de Castro, and Ben Glocker. 2020. Deep Structural Causal Models for Tractable Counterfactual Inference. InAdvances in Neural Information Processing Systems, Vol. 33. 857–869

work page 2020
[36]

2009.Causality

Judea Pearl. 2009.Causality. Cambridge university press

work page 2009
[37]

Judea Pearl et al. 2000. Models, reasoning and inference.Cambridge, UK: Cam- bridgeUniversityPress19, 2 (2000), 3

work page 2000
[38]

2018.The Book of Why: The New Science of Cause and Effect

Judea Pearl and Dana Mackenzie. 2018.The Book of Why: The New Science of Cause and Effect. Basic Books

work page 2018
[39]

2017.Elements of Causal Inference: Foundations and Learning Algorithms

Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. 2017.Elements of Causal Inference: Foundations and Learning Algorithms. MIT Press

work page 2017
[40]

Donald B Rubin. 1993. Statistical Disclosure Limitation.Journal of Official Statistics9, 2 (1993), 461–468

work page 1993
[41]

Pablo Sánchez-Martín, Miriam Rateike, and Isabel Valera. 2022. VACA: Designing Variational Graph Autoencoders for Causal Queries. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 8159–8168

work page 2022
[42]

Peter Schulam and Suchi Saria. 2017. Reliable Decision Support using Counter- factual Models. InAdvances in Neural Information Processing Systems, Vol. 30

work page 2017
[43]

2000.Causation, Prediction, and Search(2 ed.)

Peter Spirtes, Clark N Glymour, and Richard Scheines. 2000.Causation, Prediction, and Search(2 ed.). MIT Press

work page 2000
[44]

Thomas Verma and Judea Pearl. 1990. Equivalence and Synthesis of Causal Models. InProceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence. 255–270

work page 1990
[45]

Veniamin Veselovsky et al. 2023. Generating Synthetic Data with Large Language Models: A Survey.arXiv preprint arXiv:2308.07338(2023)

work page arXiv 2023
[46]

Alexander Wan, Eric Wallace, Sheng Shen, and Dan Klein. 2023. Poisoning Language Models During Instruction Tuning.International Conference on Machine Learning(2023)

work page 2023
[47]

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. 2023. Self-Consistency Improves Chain of Thought Reasoning in Language Models. InInternational Conference on Learning Representations

work page 2023
[48]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc V Le, and Denny Zhou. 2022. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. InAdvances in Neural Information Processing Systems, Vol. 35. 24824–24837

work page 2022
[49]

Liang Wendong, Armin Kekic, Mohamed Bouhamidi, and Bernhard Schölkopf

work page
[50]

Causal Composition of Synthetic Data via Structural Augmentation.arXiv preprint arXiv:2401.13218(2024)

work page arXiv 2024
[51]

Brandon T Willard and Rémi Louf. 2023. Efficient Guided Generation for Large Language Models.arXiv preprint arXiv:2307.09702(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[52]

Jiashu Xu, Mingyu Derek Ma, Fei Wang, Chaowei Xiao, and Muhao Chen. 2024. Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models. InProceedings of NAACL-HLT

work page 2024
[53]

Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, and Kalyan Veeramachaneni

work page
[54]

InAdvances in Neural Information Processing Systems, Vol

Modeling Tabular Data using Conditional GAN. InAdvances in Neural Information Processing Systems, Vol. 32

work page
[55]

Tony Z Zhao, Eric Wallace, Shi Feng, Dan Klein, and Sameer Singh. 2021. Cal- ibrate Before Use: Improving Few-shot Performance of Language Models. In International Conference on Machine Learning. 12697–12706

work page 2021
[56]

[C3] Blood Pressure: HIGH — You MUST include this exact value

Xun Zheng, Bryon Aragam, Pradeep K Ravikumar, and Eric P Xing. 2018. DAGs with NO TEARS: Continuous Optimization for Structure Learning. InAdvances in Neural Information Processing Systems, Vol. 31. CasualSynth: Generating Structurally Sound Synthetic Data Conference’17, July 2017, Washington, DC, USA A Problem Formulation Generating synthetic data for hi...

work page 2018
[57]

same patient

≤𝐻 𝑏 (𝜖) +𝜖log(|V 𝑖 | −1). The joint bound follows from the chain rule: 𝐻( V | ˆV)= Í𝑁 𝑖=1 𝐻(𝑉 𝑖 | ˆ𝑉𝑖, ˆ𝑉1, . . . , ˆ𝑉𝑖−1 ) ≤ Í𝑁 𝑖=1 𝐻(𝑉 𝑖 | ˆ𝑉𝑖 ), where the inequality uses the fact that conditioning reduces en- tropy.□ Corollary 2 (Ideal Extractor).When 𝜖= 0, the conditional entropy 𝐻(𝑉 𝑖 | ˆ𝑉𝑖 )= 0for all 𝑖, and consequently 𝐻( V | ˆV)= 0. The realize...

work page

[1] [1]

Ingo A Beinlich, Henri Jacques Suermondt, R Martin Chavez, and Gregory F Cooper. 1989. The ALARM monitoring system: A case study with two proba- bilistic inference techniques for belief networks. InAIME 89: Second European Conference on Artificial Intelligence in Medicine, London, August 29th–31st 1989. Proceedings. Springer, 247–256

work page 1989

[2] [2]

Emily M Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. InProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 610–623

work page 2021

[3] [3]

2006.Pattern Recognition and Machine Learning

Christopher M Bishop. 2006.Pattern Recognition and Machine Learning. Springer

work page 2006

[4] [4]

Vadim Borisov, Kathrin Sessler, Tobias Leemann, Martin Pawelczyk, and Gjergji Kasneci. 2023. Language Models are Realistic Tabular Data Generators. InInter- national Conference on Learning Representations

work page 2023

[5] [5]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al . 2020. Language Models are Few-Shot Learners. InAdvances in Neural Information Processing Systems, Vol. 33. 1877–1901

work page 2020

[6] [6]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. 2021. Evaluating Large Language Models Trained on Code.arXiv preprint arXiv:2107.03374(2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021

[7] [7]

David Maxwell Chickering. 2002. Optimal Structure Identification With Greedy Search. InJournal of Machine Learning Research, Vol. 3. 507–554

work page 2002

[8] [8]

Tomas Geffner, Javier Antoran, Adam Foster, Wenbo Gong, Chao Ma, Emre Kiciman, Amit Sharma, Angus Lamb, Martin Kukla, Nick Pawlowski, Miltiadis Allamanis, and Cheng Zhang. 2022. Deep End-to-end Causal Inference. InWork- shop on Causal Representation Learning at NeurIPS

work page 2022

[9] [9]

Saibo Geng, Martin Josifoski, Maxime Peyrard, and Robert West. 2023. Grammar-Constrained Decoding for Structured NLP Generation.arXiv preprint arXiv:2305.13971(2023)

work page arXiv 2023

[10] [10]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. InAdvances in Neural Information Processing Systems, Vol. 27

work page 2014

[11] [11]

Pengcheng He, Jianfeng Gao, and Weizhu Chen. 2021. DeBERTaV3: Improv- ing DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing. arXiv:2111.09543 [cs.CL]

work page internal anchor Pith review Pith/arXiv arXiv 2021

[12] [12]

Or Honovich, Roee Aharoni, Jonathan Herzig, Hagai Taitelbaum, Doron Kuk- liansy, Vered Cohen, Thomas Scialom, Idan Szpektor, Avinatan Hassidim, and Yossi Matias. 2022. TRUE: Re-evaluating Factual Consistency Evaluation. In Proceedings of NAACL-HLT

work page 2022

[13] [13]

Maximilian Ilse, Patrick Forré, Max Welling, and Joris M Mooij. 2022. Combining Interventional and Observational Data Using Causal Reductions. InAdvances in Approximate Bayesian Inference (AABI)

work page 2022

[14] [14]

Adrián Javaloy, Pablo Sánchez-Martín, and Isabel Valera. 2023. Causal Normaliz- ing Flows: From Theory to Practice. InAdvances in Neural Information Processing Systems, Vol. 36

work page 2023

[15] [15]

Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. 2023. Survey of Hallucination in Natural Language Generation.Comput. Surveys55, 12 (2023), 1–38

work page 2023

[16] [16]

Zhijing Jin, Yuen Chen, Felix Leber, Luigi Gresele, Ojasv Kamath, Bernhard Schölkopf, et al. 2024. CLadder: Assessing Causal Reasoning in Language Models. Advances in Neural Information Processing Systems36 (2024)

work page 2024

[17] [17]

Alistair E W Johnson, Tom J Pollard, Lu Shen, Li-wei H Lehman, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark. 2016. MIMIC-III, a freely accessible critical care database.Scientific Data3 (2016), 160035

work page 2016

[18] [18]

James Jordon, Lukasz Szpruch, Florimond Houssiau, Tom Sherborne, et al. 2022. Synthetic data–what, why and how?arXiv preprint arXiv:2205.03257(2022)

work page arXiv 2022

[19] [19]

Diviyan Kalainathan, Olivier Goudet, and Ritik Dutta. 2020. Causal discovery toolbox: Uncovering causal relationships in python.Journal of Machine Learning Research21, 37 (2020), 1–5

work page 2020

[20] [20]

Emre Kıcıman, Robert Ness, Amit Sharma, and Chenhao Tan. 2023. Causal Reasoning and Large Language Models: Opening a New Frontier for Causality. arXiv preprint arXiv:2305.00050(2023)

work page arXiv 2023

[21] [21]

Diederik P Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. arXiv preprint arXiv:1312.6114(2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[22] [22]

Murat Kocaoglu, Christopher Snyder, Alexandros G Dimakis, and Sriram Vish- wanath. 2018. CausalGAN: Learning Causal Implicit Generative Models with Adversarial Training. InInternational Conference on Learning Representations

work page 2018

[23] [23]

2009.Probabilistic Graphical Models: Principles and Techniques

Daphne Koller and Nir Friedman. 2009.Probabilistic Graphical Models: Principles and Techniques. MIT Press

work page 2009

[24] [24]

Akim Kotelnikov, Dmitry Baranchuk, Ivan Rubachev, and Artem Babenko. 2023. TabDDPM: Modelling Tabular Data with Diffusion Models. InInternational Con- ference on Machine Learning. 17564–17579

work page 2023

[25] [25]

Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. 2017. Counterfac- tual Fairness. InAdvances in Neural Information Processing Systems, Vol. 30

work page 2017

[26] [26]

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E Gonzalez, Hao Zhang, and Ion Stoica. 2023. vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention. InProceedings of the 29th Symposium on Operating Systems Principles. 611–626

work page 2023

[27] [27]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al

work page

[28] [28]

In Advances in Neural Information Processing Systems, Vol

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In Advances in Neural Information Processing Systems, Vol. 33. 9459–9474

work page

[29] [29]

Gary Marcus. 2018. Deep Learning: A Critical Appraisal.arXiv preprint arXiv:1801.00631(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[30] [30]

Joshua Maynez, Shashi Narayan, Bernd Bohnet, and Ryan McDonald. 2020. On Faithfulness and Factuality in Abstractive Summarization. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 1906–1919

work page 2020

[31] [31]

2021.Synthetic Data for Deep Learning

Sergey I Nikolenko. 2021.Synthetic Data for Deep Learning. Springer

work page 2021

[32] [32]

OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]

work page internal anchor Pith review Pith/arXiv arXiv 2023

[33] [33]

Shirui Pan, Linhao Luo, Yufei Wang, Chen Chen, Jiapu Wang, and Xindong Wu

work page

[34] [34]

IEEE Transactions on Knowledge and Data Engineering(2024)

Unifying Large Language Models and Knowledge Graphs: A Roadmap. IEEE Transactions on Knowledge and Data Engineering(2024)

work page 2024

[35] [35]

Nick Pawlowski, Daniel Coelho de Castro, and Ben Glocker. 2020. Deep Structural Causal Models for Tractable Counterfactual Inference. InAdvances in Neural Information Processing Systems, Vol. 33. 857–869

work page 2020

[36] [36]

2009.Causality

Judea Pearl. 2009.Causality. Cambridge university press

work page 2009

[37] [37]

Judea Pearl et al. 2000. Models, reasoning and inference.Cambridge, UK: Cam- bridgeUniversityPress19, 2 (2000), 3

work page 2000

[38] [38]

2018.The Book of Why: The New Science of Cause and Effect

Judea Pearl and Dana Mackenzie. 2018.The Book of Why: The New Science of Cause and Effect. Basic Books

work page 2018

[39] [39]

2017.Elements of Causal Inference: Foundations and Learning Algorithms

Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. 2017.Elements of Causal Inference: Foundations and Learning Algorithms. MIT Press

work page 2017

[40] [40]

Donald B Rubin. 1993. Statistical Disclosure Limitation.Journal of Official Statistics9, 2 (1993), 461–468

work page 1993

[41] [41]

Pablo Sánchez-Martín, Miriam Rateike, and Isabel Valera. 2022. VACA: Designing Variational Graph Autoencoders for Causal Queries. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 8159–8168

work page 2022

[42] [42]

Peter Schulam and Suchi Saria. 2017. Reliable Decision Support using Counter- factual Models. InAdvances in Neural Information Processing Systems, Vol. 30

work page 2017

[43] [43]

2000.Causation, Prediction, and Search(2 ed.)

Peter Spirtes, Clark N Glymour, and Richard Scheines. 2000.Causation, Prediction, and Search(2 ed.). MIT Press

work page 2000

[44] [44]

Thomas Verma and Judea Pearl. 1990. Equivalence and Synthesis of Causal Models. InProceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence. 255–270

work page 1990

[45] [45]

Veniamin Veselovsky et al. 2023. Generating Synthetic Data with Large Language Models: A Survey.arXiv preprint arXiv:2308.07338(2023)

work page arXiv 2023

[46] [46]

Alexander Wan, Eric Wallace, Sheng Shen, and Dan Klein. 2023. Poisoning Language Models During Instruction Tuning.International Conference on Machine Learning(2023)

work page 2023

[47] [47]

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. 2023. Self-Consistency Improves Chain of Thought Reasoning in Language Models. InInternational Conference on Learning Representations

work page 2023

[48] [48]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc V Le, and Denny Zhou. 2022. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. InAdvances in Neural Information Processing Systems, Vol. 35. 24824–24837

work page 2022

[49] [49]

Liang Wendong, Armin Kekic, Mohamed Bouhamidi, and Bernhard Schölkopf

work page

[50] [50]

Causal Composition of Synthetic Data via Structural Augmentation.arXiv preprint arXiv:2401.13218(2024)

work page arXiv 2024

[51] [51]

Brandon T Willard and Rémi Louf. 2023. Efficient Guided Generation for Large Language Models.arXiv preprint arXiv:2307.09702(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[52] [52]

Jiashu Xu, Mingyu Derek Ma, Fei Wang, Chaowei Xiao, and Muhao Chen. 2024. Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models. InProceedings of NAACL-HLT

work page 2024

[53] [53]

Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, and Kalyan Veeramachaneni

work page

[54] [54]

InAdvances in Neural Information Processing Systems, Vol

Modeling Tabular Data using Conditional GAN. InAdvances in Neural Information Processing Systems, Vol. 32

work page

[55] [55]

Tony Z Zhao, Eric Wallace, Shi Feng, Dan Klein, and Sameer Singh. 2021. Cal- ibrate Before Use: Improving Few-shot Performance of Language Models. In International Conference on Machine Learning. 12697–12706

work page 2021

[56] [56]

[C3] Blood Pressure: HIGH — You MUST include this exact value

Xun Zheng, Bryon Aragam, Pradeep K Ravikumar, and Eric P Xing. 2018. DAGs with NO TEARS: Continuous Optimization for Structure Learning. InAdvances in Neural Information Processing Systems, Vol. 31. CasualSynth: Generating Structurally Sound Synthetic Data Conference’17, July 2017, Washington, DC, USA A Problem Formulation Generating synthetic data for hi...

work page 2018

[57] [57]

same patient

≤𝐻 𝑏 (𝜖) +𝜖log(|V 𝑖 | −1). The joint bound follows from the chain rule: 𝐻( V | ˆV)= Í𝑁 𝑖=1 𝐻(𝑉 𝑖 | ˆ𝑉𝑖, ˆ𝑉1, . . . , ˆ𝑉𝑖−1 ) ≤ Í𝑁 𝑖=1 𝐻(𝑉 𝑖 | ˆ𝑉𝑖 ), where the inequality uses the fact that conditioning reduces en- tropy.□ Corollary 2 (Ideal Extractor).When 𝜖= 0, the conditional entropy 𝐻(𝑉 𝑖 | ˆ𝑉𝑖 )= 0for all 𝑖, and consequently 𝐻( V | ˆV)= 0. The realize...

work page