arxiv: 2605.11527 · v1 · submitted 2026-05-12 · 💻 cs.LG · cs.CR· cs.DB

Recognition: no theorem link

FERMI: Exploiting Relations for Membership Inference Against Tabular Diffusion Models

Abtin Mahyar , Masoumeh Shafieinejad , Yuhan Liu , Xi He

Authors on Pith no claims yet

Pith reviewed 2026-05-13 01:16 UTC · model grok-4.3

classification 💻 cs.LG cs.CRcs.DB

keywords membership inferencetabular diffusion modelsrelational dataprivacy attacksdata synthesissynthetic datawhite-box attackblack-box attack

0 comments

The pith

A relational feature-mapping attack called FERMI raises membership inference success rates against tabular diffusion models when parent-table information is available only during attack training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that membership inference attacks on tabular diffusion models become substantially stronger once they incorporate auxiliary signals from parent tables in relational datasets. Existing attacks treat each table in isolation and therefore miss leakage that travels through relations, even when the attacker sees only the target record at test time. If the gains hold, they indicate that current privacy evaluations of synthetic tabular data underestimate risk for the multi-table structures common in real sensitive records. The work focuses on a practical attacker model where relational data aids attack-model training but is withheld at inference.

Core claim

We propose FERMI (FEature-mapping for Relational Membership Inference), which resolves this gap by enriching single-table features with relational membership signal. Across three tabular diffusion architectures and three real-world relational datasets, FERMI consistently improves attack performance over single-table baselines, with TPR@0.1FPR rising by up to 53% over the single-table baseline in the white-box setting and 22% in the black-box setting.

What carries the argument

FERMI, a feature-mapping step that injects relational membership signals extracted from parent tables into the single-table feature vector used by the attack classifier.

If this is right

Single-table membership inference baselines systematically underestimate privacy leakage for relational tabular data.
Tabular diffusion models trained on linked tables release more membership information than models trained on isolated tables.
Attackers can obtain higher true-positive rates at fixed low false-positive rates by training on parent-table features even when those features are unavailable at inference time.
Privacy assessments of synthetic data generators must incorporate multi-table structure to be reliable.
Defenses for diffusion models on relational data need to address leakage that propagates through parent-child links.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Privacy audits of synthetic relational datasets should routinely include parent-table information during attack calibration.
The same relational enrichment technique could be tested on other generative models such as GANs or VAEs to measure whether the leakage pattern is architecture-specific.
If the relational signal proves stable across domains, regulators might require relational privacy evaluations before releasing synthetic versions of linked government or health records.

Load-bearing premise

That the observed performance gains reflect genuine additional membership leakage carried by relations rather than artifacts of the selected datasets or attack hyperparameters.

What would settle it

Repeating the white-box and black-box experiments on new relational datasets where parent-table attributes are uncorrelated with membership status in the target table, and checking whether the TPR@0.1FPR gap shrinks to near zero.

Figures

Figures reproduced from arXiv: 2605.11527 by Abtin Mahyar, Masoumeh Shafieinejad, Xi He, Yuhan Liu.

**Figure 2.** Figure 2: Mean diffusion loss per timestep for members and non-members across the three feature [PITH_FULL_IMAGE:figures/full_fig_p016_2.png] view at source ↗

read the original abstract

Diffusion models are the leading approach for tabular data synthesis and are increasingly used to share sensitive records. Whether they actually protect privacy has become a pressing question. Membership inference attacks are the standard tool for this purpose, yet existing attacks assume a single-table setting and ignore the multi-relational structure of real sensitive data. A core challenge in assessing privacy risks from membership inference attacks in multi-table settings is how to leverage auxiliary information from relations associated with the target table, such as its parent tables. Particularly, we study a practical setting in which such auxiliary information is available only when training the attack model. At inference time, the attacker observes only the attribute values of the target record from the target table. We propose FERMI (FEature-mapping for Relational Membership Inference), which resolves this gap by enriching single-table features with relational membership signal. Across three tabular diffusion architectures and three real-world relational datasets, FERMI consistently improves attack performance over single-table baselines, with TPR@$0.1$FPR rising by up to 53% over the single-table baseline in the white-box setting and 22% in the black-box setting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FERMI shows relational features can lift MIA success on tabular diffusion models, but the gains may just reflect higher input dimensionality rather than genuine leakage.

read the letter

The main point is that FERMI adds a feature-mapping step to pull in signals from parent tables during attack training, and this produces higher true-positive rates at low false-positive rates than single-table baselines. The lift reaches 53% in white-box and 22% in black-box settings across the reported runs. That is the concrete addition: prior membership inference work on tabular diffusion models treated each table in isolation, so mapping relational information into the attack features is the new piece. The paper runs the same attack architecture on three diffusion models and three real relational datasets, which gives a reasonable check that the pattern is not tied to one setup. The numbers are reported consistently, and the threat model (auxiliary relations available only at attack training time) matches a practical scenario where an attacker has some side information but not the full linked records at inference. The methods stay within standard MIA techniques, with no circular derivations or invented entities. The citation pattern covers the relevant single-table baselines and diffusion papers without obvious gaps. The soft spot is the missing control for feature count. The single-table baseline uses fewer inputs by design, so any systematic difference in scale, normalization, or training dynamics could explain part of the lift even if the diffusion model leaks nothing extra through the relations. An ablation that holds dimensionality fixed while scrambling or nullifying the relational component would have clarified whether the improvement tracks the actual relational signal. Without it, the causal claim that the gains come from diffusion-model leakage via relations rests on weaker ground. This work is aimed at researchers who evaluate privacy for synthetic relational data or who build membership inference tools. Someone already running attacks on tabular generators would get a usable feature-mapping recipe to test. I would send it to peer review because the idea is straightforward to implement, the experiments cover multiple architectures, and referees can ask for the missing ablation without starting from scratch.

Referee Report

2 major / 2 minor

Summary. The paper introduces FERMI, a feature-mapping technique for membership inference attacks (MIA) on tabular diffusion models in multi-relational settings. It enriches single-table attack features with signals derived from parent tables, but restricts the attacker to target-table attributes only at inference time. Experiments across three diffusion architectures and three real-world relational datasets report consistent gains over single-table baselines, with TPR@0.1FPR improving by up to 53% (white-box) and 22% (black-box).

Significance. If the reported gains can be shown to arise from genuine relational leakage rather than feature-count artifacts, the work would meaningfully extend MIA analysis to realistic multi-table data synthesis, where diffusion models are used for privacy-sensitive sharing. This would strengthen the case for relation-aware privacy defenses in tabular generative modeling.

major comments (2)

[§5] §5 (Experimental Evaluation): The headline TPR@0.1FPR lifts (up to 53%/22%) rest on comparing a relational-enriched attack model against a single-table baseline that necessarily uses fewer features. No ablation is reported that holds total input dimensionality, feature scale, and training dynamics fixed while randomizing or nullifying the relational component (e.g., via shuffled parent-table rows or noise padding). Without this control, the performance difference cannot be attributed to diffusion-model leakage through relations rather than the expanded feature space.
[§4] §4 (FERMI Method): The precise construction of the 'relational membership signal' (what statistics, embeddings, or aggregations are extracted from parent tables and how they are mapped) is described at a high level only. Reproducibility requires the exact feature definitions, aggregation functions, and any hyperparameters used to combine them with single-table features.

minor comments (2)

[§5] The manuscript does not report the number of independent runs, standard deviations, or statistical significance tests for the TPR@0.1FPR differences, making it difficult to assess whether the gains are robust.
[§3] Notation for the threat model (white-box vs. black-box access to the diffusion model) and the exact point at which auxiliary relational data becomes unavailable should be stated more explicitly in the problem formulation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for their thorough review and valuable suggestions. The comments highlight important aspects for strengthening the experimental validation and methodological details. We respond to each major comment point-by-point and indicate the revisions we will make.

read point-by-point responses

Referee: [§5] §5 (Experimental Evaluation): The headline TPR@0.1FPR lifts (up to 53%/22%) rest on comparing a relational-enriched attack model against a single-table baseline that necessarily uses fewer features. No ablation is reported that holds total input dimensionality, feature scale, and training dynamics fixed while randomizing or nullifying the relational component (e.g., via shuffled parent-table rows or noise padding). Without this control, the performance difference cannot be attributed to diffusion-model leakage through relations rather than the expanded feature space.

Authors: We agree that controlling for feature dimensionality is important to isolate the contribution of relational signals. In the revised version, we will include an ablation study that matches the input dimensionality of the FERMI attack by padding the single-table baseline with either random noise features or shuffled relational features (which nullify any meaningful relational leakage). We will report the results of this control experiment to demonstrate that the performance gains are indeed due to genuine relational information leakage from the diffusion models rather than mere feature count. This addresses the concern directly. revision: yes
Referee: [§4] §4 (FERMI Method): The precise construction of the 'relational membership signal' (what statistics, embeddings, or aggregations are extracted from parent tables and how they are mapped) is described at a high level only. Reproducibility requires the exact feature definitions, aggregation functions, and any hyperparameters used to combine them with single-table features.

Authors: We acknowledge that Section 4 provides a high-level overview of the relational membership signal construction. To improve reproducibility, we will expand this section in the revised manuscript with precise definitions: specifically, we will detail the aggregation functions (e.g., mean, variance, count of matching records in parent tables), any embedding techniques used for categorical relations, and the exact hyperparameters for feature combination (such as concatenation weights or normalization). Additionally, we will include pseudocode for the feature mapping process and make the implementation details available in the supplementary material or code repository. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical attack evaluation

full rationale

The paper proposes FERMI as an empirical membership inference method that augments single-table features with relational signals from parent tables, then reports observed TPR@0.1FPR lifts on three diffusion architectures and three datasets. No derivation chain, first-principles result, or mathematical prediction is claimed; the central results are experimental comparisons to single-table baselines. No self-citation is load-bearing, no parameter is fitted and then renamed as a prediction, and no ansatz or uniqueness theorem is invoked. The work is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract does not disclose any explicit free parameters, axioms, or invented entities; the method is described as a feature enrichment step whose internal hyperparameters are not detailed.

pith-pipeline@v0.9.0 · 5509 in / 1058 out tokens · 28747 ms · 2026-05-13T01:16:07.551130+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · 2 internal anchors

[1]

How faithful is your synthetic data? sample-level metrics for evaluating and auditing generative models

Ahmed Alaa, Boris Van Breugel, Evgeny S Saveliev, and Mihaela Van Der Schaar. How faithful is your synthetic data? sample-level metrics for evaluating and auditing generative models. In International conference on machine learning, pages 290–306. PMLR, 2022

work page 2022
[2]

Privacy and confidentiality

Stefan Bender, Ron S Jarmin, Frauke Kreuter, and Julia Lane. Privacy and confidentiality. In Big data and social science, pages 313–331. Chapman and Hall/CRC, 2020

work page 2020
[3]

Guide to the financial data set

Petr Berka et al. Guide to the financial data set. InThe ECML/PKDD, 2000

work page 2000
[4]

Scalable membership inference attacks via quantile regression.Advances in Neural Information Processing Systems, 36:314–330, 2023

Martin Bertran, Shuai Tang, Aaron Roth, Michael Kearns, Jamie H Morgenstern, and Steven Z Wu. Scalable membership inference attacks via quantile regression.Advances in Neural Information Processing Systems, 36:314–330, 2023

work page 2023
[5]

Membership inference attacks from first principles

Nicholas Carlini, Steve Chien, Milad Nasr, Shuang Song, Andreas Terzis, and Florian Tramer. Membership inference attacks from first principles. In2022 IEEE symposium on security and privacy (SP), pages 1897–1914. IEEE, 2022

work page 1914
[6]

Extracting training data from diffusion models

Nicolas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramer, Borja Balle, Daphne Ippolito, and Eric Wallace. Extracting training data from diffusion models. In32nd USENIX security symposium (USENIX Security 23), pages 5253–5270, 2023

work page 2023
[7]

Are diffusion models vulnerable to membership inference attacks? InInternational Conference on Machine Learning, pages 8717–8730

Jinhao Duan, Fei Kong, Shiqi Wang, Xiaoshuang Shi, and Kaidi Xu. Are diffusion models vulnerable to membership inference attacks? InInternational Conference on Machine Learning, pages 8717–8730. PMLR, 2023

work page 2023
[8]

Harnessing the power of synthetic data in healthcare: innovation, application, and privacy.NPJ digital medicine, 6(1):186, 2023

Mauro Giuffrè and Dennis L Shung. Harnessing the power of synthetic data in healthcare: innovation, application, and privacy.NPJ digital medicine, 6(1):186, 2023

work page 2023
[9]

Comprehensive analysis of privacy in black-box and white-box inference attacks against generative adversarial network

Trung Ha, Tran Khanh Dang, and Nhan Nguyen-Tan. Comprehensive analysis of privacy in black-box and white-box inference attacks against generative adversarial network. InIn- ternational Conference on Future Data and Security Engineering, pages 323–337. Springer, 2021

work page 2021
[10]

Logan: Membership inference attacks against generative models.arXiv preprint arXiv:1705.07663, 2017

Jamie Hayes, Luca Melis, George Danezis, and Emiliano De Cristofaro. Logan: Membership inference attacks against generative models.arXiv preprint arXiv:1705.07663, 2017

work page arXiv 2017
[11]

Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

work page 2020
[12]

Membership inference of diffusion models.arXiv preprint arXiv:2301.09956, 2023

Hailong Hu and Jun Pang. Membership inference of diffusion models.arXiv preprint arXiv:2301.09956, 2023

work page arXiv 2023
[13]

Relational data generation with graph neural networks and latent diffusion models

Valter Hudovernik. Relational data generation with graph neural networks and latent diffusion models. InNeurIPS 2024 Third Table Representation Learning Workshop, 2024

work page 2024
[14]

Evaluating differentially private machine learning in practice

Bargav Jayaraman and David Evans. Evaluating differentially private machine learning in practice. In28th USENIX security symposium (USENIX security 19), pages 1895–1912, 2019

work page 1912
[15]

Tabddpm: Mod- elling tabular data with diffusion models

Akim Kotelnikov, Dmitry Baranchuk, Ivan Rubachev, and Artem Babenko. Tabddpm: Mod- elling tabular data with diffusion models. InInternational conference on machine learning, pages 17564–17579. PMLR, 2023

work page 2023
[16]

Practical bayes-optimal membership inference attacks.arXiv preprint arXiv:2505.24089, 2025

Marcus Lassila, Johan Östman, Khac-Hoang Ngo, et al. Practical bayes-optimal membership inference attacks.arXiv preprint arXiv:2505.24089, 2025

work page arXiv 2025
[17]

Goggle: Generative modelling for tabular data by learning relational structure

Tennison Liu, Zhaozhi Qian, Jeroen Berrevoets, and Mihaela van der Schaar. Goggle: Generative modelling for tabular data by learning relational structure. InThe Eleventh International Conference on Learning Representations, 2023. 10

work page 2023
[18]

New money: A systematic review of synthetic data generation for finance.arXiv preprint arXiv:2510.26076, 2025

James Meldrum, Basem Suleiman, Fethi Rabhi, and Muhammad Johan Alibasa. New money: A systematic review of synthetic data generation for finance.arXiv preprint arXiv:2510.26076, 2025

work page arXiv 2025
[19]

Synthetic data generation: a privacy- preserving approach to accelerate rare disease research.Frontiers in Digital Health, 7:1563991, 2025

Jorge M Mendes, Aziz Barbar, and Marwa Refaie. Synthetic data generation: a privacy- preserving approach to accelerate rare disease research.Frontiers in Digital Health, 7:1563991, 2025

work page 2025
[20]

Clavaddpm: Multi-relational data synthesis with cluster-guided diffusion models.Advances in Neural Information Processing Systems, 37:83521–83547, 2024

Wei Pang, Masoumeh Shafieinejad, Lucy Liu, Stephanie Hazlewood, and Xi He. Clavaddpm: Multi-relational data synthesis with cluster-guided diffusion models.Advances in Neural Information Processing Systems, 37:83521–83547, 2024

work page 2024
[21]

International microdata: Version 7.3 [dataset]

Integrated Public Use Microdata Series. International microdata: Version 7.3 [dataset]. min- neapolis population center: Ipums, 2020

work page 2020
[22]

Midst challenge at satml 2025: Membership inference over diffusion-models-based synthetic tabular data.arXiv preprint arXiv:2603.19185, 2026

Masoumeh Shafieinejad, Xi He, Mahshid Alinoori, John Jewell, Sana Ayromlou, Wei Pang, Veronica Chatrath, Garui Sharma, and Deval Pandya. Midst challenge at satml 2025: Membership inference over diffusion-models-based synthetic tabular data.arXiv preprint arXiv:2603.19185, 2026

work page internal anchor Pith review arXiv 2025
[23]

Tabd- iff: a mixed-type diffusion model for tabular data generation.arXiv preprint arXiv:2410.20626, 2024

Juntong Shi, Minkai Xu, Harper Hua, Hengrui Zhang, Stefano Ermon, and Jure Leskovec. Tabd- iff: a mixed-type diffusion model for tabular data generation.arXiv preprint arXiv:2410.20626, 2024

work page arXiv 2024
[24]

Membership inference attacks against machine learning models

Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. In2017 IEEE symposium on security and privacy (SP), pages 3–18. IEEE, 2017

work page 2017
[25]

Domain adaptation: challenges, methods, datasets, and applications.IEEE access, 11:6973–7020, 2023

Peeyush Singhal, Rahee Walambe, Sheela Ramanna, and Ketan Kotecha. Domain adaptation: challenges, methods, datasets, and applications.IEEE access, 11:6973–7020, 2023

work page 2023
[26]

Risdal, Sharath Rao, and W

Jeremy Stanley, M. Risdal, Sharath Rao, and W. Cukierski. Instacart market basket analysis, 2017

work page 2017
[27]

Correlation alignment for unsupervised domain adaptation

Baochen Sun, Jiashi Feng, and Kate Saenko. Correlation alignment for unsupervised domain adaptation. InDomain adaptation in computer vision applications, pages 153–171. Springer, 2017

work page 2017
[28]

The hipaa privacy rule and the eu gdpr: illustrative comparisons.Seton Hall L

Stacey A Tovino. The hipaa privacy rule and the eu gdpr: illustrative comparisons.Seton Hall L. Rev., 47:973, 2016

work page 2016
[29]

Membership inference attacks against synthetic data through overfitting detection.arXiv preprint arXiv:2302.12580, 2023

Boris Van Breugel, Hao Sun, Zhaozhi Qian, and Mihaela van der Schaar. Membership inference attacks against synthetic data through overfitting detection.arXiv preprint arXiv:2302.12580, 2023

work page arXiv 2023
[30]

Ensembling membership inference attacks against tabular generative models

Joshua Ward, Yuxuan Yang, Chi-Hua Wang, and Guang Cheng. Ensembling membership inference attacks against tabular generative models. InProceedings of the 18th ACM Workshop on Artificial Intelligence and Security, pages 182–193, 2025

work page 2025
[31]

Finding Connections: Membership Inference Attacks for the Multi-Table Synthetic Data Setting

Joshua Ward, Chi-Hua Wang, and Guang Cheng. Finding connections: Membership inference attacks for the multi-table synthetic data setting.arXiv preprint arXiv:2602.07126, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[32]

Winning the midst challenge: New membership inference attacks on diffusion models for tabular data synthesis.arXiv preprint arXiv:2503.12008, 2025

Xiaoyu Wu, Yifei Pang, Terrance Liu, and Steven Wu. Winning the midst challenge: New membership inference attacks on diffusion models for tabular data synthesis.arXiv preprint arXiv:2503.12008, 2025

work page arXiv 2025
[33]

Zhang, J

Hengrui Zhang, Jiani Zhang, Balasubramaniam Srinivasan, Zhengyuan Shen, Xiao Qin, Christos Faloutsos, Huzefa Rangwala, and George Karypis. Mixed-type tabular data synthesis with score-based diffusion in latent space.arXiv preprint arXiv:2310.09656, 2023

work page arXiv 2023
[34]

Membership inference attacks against synthetic health data.Journal of biomedical informatics, 125:103977, 2022

Ziqi Zhang, Chao Yan, and Bradley A Malin. Membership inference attacks against synthetic health data.Journal of biomedical informatics, 125:103977, 2022. 11 A Datasets Here we describe the three real-world multi-relational datasets used in our evaluation. Summary statistics are reported in Table 5. For each dataset, we identify a singletarget tableon whi...

work page 2022
[35]

TabDiff [23]adopts a continuous-time diffusion process with per-column learned noise schedules

additionally requires loading parent–child clustering checkpoints, which assign each child record to a cluster conditioned on its parent; this cluster assignment serves as the conditioning variable for the denoising function. TabDiff [23]adopts a continuous-time diffusion process with per-column learned noise schedules. We probe the model at seven time po...

work page