pith. machine review for the scientific record. sign in

arxiv: 2605.11527 · v1 · submitted 2026-05-12 · 💻 cs.LG · cs.CR· cs.DB

Recognition: no theorem link

FERMI: Exploiting Relations for Membership Inference Against Tabular Diffusion Models

Authors on Pith no claims yet

Pith reviewed 2026-05-13 01:16 UTC · model grok-4.3

classification 💻 cs.LG cs.CRcs.DB
keywords membership inferencetabular diffusion modelsrelational dataprivacy attacksdata synthesissynthetic datawhite-box attackblack-box attack
0
0 comments X

The pith

A relational feature-mapping attack called FERMI raises membership inference success rates against tabular diffusion models when parent-table information is available only during attack training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that membership inference attacks on tabular diffusion models become substantially stronger once they incorporate auxiliary signals from parent tables in relational datasets. Existing attacks treat each table in isolation and therefore miss leakage that travels through relations, even when the attacker sees only the target record at test time. If the gains hold, they indicate that current privacy evaluations of synthetic tabular data underestimate risk for the multi-table structures common in real sensitive records. The work focuses on a practical attacker model where relational data aids attack-model training but is withheld at inference.

Core claim

We propose FERMI (FEature-mapping for Relational Membership Inference), which resolves this gap by enriching single-table features with relational membership signal. Across three tabular diffusion architectures and three real-world relational datasets, FERMI consistently improves attack performance over single-table baselines, with TPR@0.1FPR rising by up to 53% over the single-table baseline in the white-box setting and 22% in the black-box setting.

What carries the argument

FERMI, a feature-mapping step that injects relational membership signals extracted from parent tables into the single-table feature vector used by the attack classifier.

If this is right

  • Single-table membership inference baselines systematically underestimate privacy leakage for relational tabular data.
  • Tabular diffusion models trained on linked tables release more membership information than models trained on isolated tables.
  • Attackers can obtain higher true-positive rates at fixed low false-positive rates by training on parent-table features even when those features are unavailable at inference time.
  • Privacy assessments of synthetic data generators must incorporate multi-table structure to be reliable.
  • Defenses for diffusion models on relational data need to address leakage that propagates through parent-child links.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Privacy audits of synthetic relational datasets should routinely include parent-table information during attack calibration.
  • The same relational enrichment technique could be tested on other generative models such as GANs or VAEs to measure whether the leakage pattern is architecture-specific.
  • If the relational signal proves stable across domains, regulators might require relational privacy evaluations before releasing synthetic versions of linked government or health records.

Load-bearing premise

That the observed performance gains reflect genuine additional membership leakage carried by relations rather than artifacts of the selected datasets or attack hyperparameters.

What would settle it

Repeating the white-box and black-box experiments on new relational datasets where parent-table attributes are uncorrelated with membership status in the target table, and checking whether the TPR@0.1FPR gap shrinks to near zero.

Figures

Figures reproduced from arXiv: 2605.11527 by Abtin Mahyar, Masoumeh Shafieinejad, Xi He, Yuhan Liu.

Figure 1
Figure 1. Figure 1: Overview of FERMI. The victim trains per-table diffusion models (top). The adversary [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Mean diffusion loss per timestep for members and non-members across the three feature [PITH_FULL_IMAGE:figures/full_fig_p016_2.png] view at source ↗
read the original abstract

Diffusion models are the leading approach for tabular data synthesis and are increasingly used to share sensitive records. Whether they actually protect privacy has become a pressing question. Membership inference attacks are the standard tool for this purpose, yet existing attacks assume a single-table setting and ignore the multi-relational structure of real sensitive data. A core challenge in assessing privacy risks from membership inference attacks in multi-table settings is how to leverage auxiliary information from relations associated with the target table, such as its parent tables. Particularly, we study a practical setting in which such auxiliary information is available only when training the attack model. At inference time, the attacker observes only the attribute values of the target record from the target table. We propose FERMI (FEature-mapping for Relational Membership Inference), which resolves this gap by enriching single-table features with relational membership signal. Across three tabular diffusion architectures and three real-world relational datasets, FERMI consistently improves attack performance over single-table baselines, with TPR@$0.1$FPR rising by up to 53% over the single-table baseline in the white-box setting and 22% in the black-box setting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces FERMI, a feature-mapping technique for membership inference attacks (MIA) on tabular diffusion models in multi-relational settings. It enriches single-table attack features with signals derived from parent tables, but restricts the attacker to target-table attributes only at inference time. Experiments across three diffusion architectures and three real-world relational datasets report consistent gains over single-table baselines, with TPR@0.1FPR improving by up to 53% (white-box) and 22% (black-box).

Significance. If the reported gains can be shown to arise from genuine relational leakage rather than feature-count artifacts, the work would meaningfully extend MIA analysis to realistic multi-table data synthesis, where diffusion models are used for privacy-sensitive sharing. This would strengthen the case for relation-aware privacy defenses in tabular generative modeling.

major comments (2)
  1. [§5] §5 (Experimental Evaluation): The headline TPR@0.1FPR lifts (up to 53%/22%) rest on comparing a relational-enriched attack model against a single-table baseline that necessarily uses fewer features. No ablation is reported that holds total input dimensionality, feature scale, and training dynamics fixed while randomizing or nullifying the relational component (e.g., via shuffled parent-table rows or noise padding). Without this control, the performance difference cannot be attributed to diffusion-model leakage through relations rather than the expanded feature space.
  2. [§4] §4 (FERMI Method): The precise construction of the 'relational membership signal' (what statistics, embeddings, or aggregations are extracted from parent tables and how they are mapped) is described at a high level only. Reproducibility requires the exact feature definitions, aggregation functions, and any hyperparameters used to combine them with single-table features.
minor comments (2)
  1. [§5] The manuscript does not report the number of independent runs, standard deviations, or statistical significance tests for the TPR@0.1FPR differences, making it difficult to assess whether the gains are robust.
  2. [§3] Notation for the threat model (white-box vs. black-box access to the diffusion model) and the exact point at which auxiliary relational data becomes unavailable should be stated more explicitly in the problem formulation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for their thorough review and valuable suggestions. The comments highlight important aspects for strengthening the experimental validation and methodological details. We respond to each major comment point-by-point and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [§5] §5 (Experimental Evaluation): The headline TPR@0.1FPR lifts (up to 53%/22%) rest on comparing a relational-enriched attack model against a single-table baseline that necessarily uses fewer features. No ablation is reported that holds total input dimensionality, feature scale, and training dynamics fixed while randomizing or nullifying the relational component (e.g., via shuffled parent-table rows or noise padding). Without this control, the performance difference cannot be attributed to diffusion-model leakage through relations rather than the expanded feature space.

    Authors: We agree that controlling for feature dimensionality is important to isolate the contribution of relational signals. In the revised version, we will include an ablation study that matches the input dimensionality of the FERMI attack by padding the single-table baseline with either random noise features or shuffled relational features (which nullify any meaningful relational leakage). We will report the results of this control experiment to demonstrate that the performance gains are indeed due to genuine relational information leakage from the diffusion models rather than mere feature count. This addresses the concern directly. revision: yes

  2. Referee: [§4] §4 (FERMI Method): The precise construction of the 'relational membership signal' (what statistics, embeddings, or aggregations are extracted from parent tables and how they are mapped) is described at a high level only. Reproducibility requires the exact feature definitions, aggregation functions, and any hyperparameters used to combine them with single-table features.

    Authors: We acknowledge that Section 4 provides a high-level overview of the relational membership signal construction. To improve reproducibility, we will expand this section in the revised manuscript with precise definitions: specifically, we will detail the aggregation functions (e.g., mean, variance, count of matching records in parent tables), any embedding techniques used for categorical relations, and the exact hyperparameters for feature combination (such as concatenation weights or normalization). Additionally, we will include pseudocode for the feature mapping process and make the implementation details available in the supplementary material or code repository. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical attack evaluation

full rationale

The paper proposes FERMI as an empirical membership inference method that augments single-table features with relational signals from parent tables, then reports observed TPR@0.1FPR lifts on three diffusion architectures and three datasets. No derivation chain, first-principles result, or mathematical prediction is claimed; the central results are experimental comparisons to single-table baselines. No self-citation is load-bearing, no parameter is fitted and then renamed as a prediction, and no ansatz or uniqueness theorem is invoked. The work is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract does not disclose any explicit free parameters, axioms, or invented entities; the method is described as a feature enrichment step whose internal hyperparameters are not detailed.

pith-pipeline@v0.9.0 · 5509 in / 1058 out tokens · 28747 ms · 2026-05-13T01:16:07.551130+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · 2 internal anchors

  1. [1]

    How faithful is your synthetic data? sample-level metrics for evaluating and auditing generative models

    Ahmed Alaa, Boris Van Breugel, Evgeny S Saveliev, and Mihaela Van Der Schaar. How faithful is your synthetic data? sample-level metrics for evaluating and auditing generative models. In International conference on machine learning, pages 290–306. PMLR, 2022

  2. [2]

    Privacy and confidentiality

    Stefan Bender, Ron S Jarmin, Frauke Kreuter, and Julia Lane. Privacy and confidentiality. In Big data and social science, pages 313–331. Chapman and Hall/CRC, 2020

  3. [3]

    Guide to the financial data set

    Petr Berka et al. Guide to the financial data set. InThe ECML/PKDD, 2000

  4. [4]

    Scalable membership inference attacks via quantile regression.Advances in Neural Information Processing Systems, 36:314–330, 2023

    Martin Bertran, Shuai Tang, Aaron Roth, Michael Kearns, Jamie H Morgenstern, and Steven Z Wu. Scalable membership inference attacks via quantile regression.Advances in Neural Information Processing Systems, 36:314–330, 2023

  5. [5]

    Membership inference attacks from first principles

    Nicholas Carlini, Steve Chien, Milad Nasr, Shuang Song, Andreas Terzis, and Florian Tramer. Membership inference attacks from first principles. In2022 IEEE symposium on security and privacy (SP), pages 1897–1914. IEEE, 2022

  6. [6]

    Extracting training data from diffusion models

    Nicolas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramer, Borja Balle, Daphne Ippolito, and Eric Wallace. Extracting training data from diffusion models. In32nd USENIX security symposium (USENIX Security 23), pages 5253–5270, 2023

  7. [7]

    Are diffusion models vulnerable to membership inference attacks? InInternational Conference on Machine Learning, pages 8717–8730

    Jinhao Duan, Fei Kong, Shiqi Wang, Xiaoshuang Shi, and Kaidi Xu. Are diffusion models vulnerable to membership inference attacks? InInternational Conference on Machine Learning, pages 8717–8730. PMLR, 2023

  8. [8]

    Harnessing the power of synthetic data in healthcare: innovation, application, and privacy.NPJ digital medicine, 6(1):186, 2023

    Mauro Giuffrè and Dennis L Shung. Harnessing the power of synthetic data in healthcare: innovation, application, and privacy.NPJ digital medicine, 6(1):186, 2023

  9. [9]

    Comprehensive analysis of privacy in black-box and white-box inference attacks against generative adversarial network

    Trung Ha, Tran Khanh Dang, and Nhan Nguyen-Tan. Comprehensive analysis of privacy in black-box and white-box inference attacks against generative adversarial network. InIn- ternational Conference on Future Data and Security Engineering, pages 323–337. Springer, 2021

  10. [10]

    Logan: Membership inference attacks against generative models.arXiv preprint arXiv:1705.07663, 2017

    Jamie Hayes, Luca Melis, George Danezis, and Emiliano De Cristofaro. Logan: Membership inference attacks against generative models.arXiv preprint arXiv:1705.07663, 2017

  11. [11]

    Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

  12. [12]

    Membership inference of diffusion models.arXiv preprint arXiv:2301.09956, 2023

    Hailong Hu and Jun Pang. Membership inference of diffusion models.arXiv preprint arXiv:2301.09956, 2023

  13. [13]

    Relational data generation with graph neural networks and latent diffusion models

    Valter Hudovernik. Relational data generation with graph neural networks and latent diffusion models. InNeurIPS 2024 Third Table Representation Learning Workshop, 2024

  14. [14]

    Evaluating differentially private machine learning in practice

    Bargav Jayaraman and David Evans. Evaluating differentially private machine learning in practice. In28th USENIX security symposium (USENIX security 19), pages 1895–1912, 2019

  15. [15]

    Tabddpm: Mod- elling tabular data with diffusion models

    Akim Kotelnikov, Dmitry Baranchuk, Ivan Rubachev, and Artem Babenko. Tabddpm: Mod- elling tabular data with diffusion models. InInternational conference on machine learning, pages 17564–17579. PMLR, 2023

  16. [16]

    Practical bayes-optimal membership inference attacks.arXiv preprint arXiv:2505.24089, 2025

    Marcus Lassila, Johan Östman, Khac-Hoang Ngo, et al. Practical bayes-optimal membership inference attacks.arXiv preprint arXiv:2505.24089, 2025

  17. [17]

    Goggle: Generative modelling for tabular data by learning relational structure

    Tennison Liu, Zhaozhi Qian, Jeroen Berrevoets, and Mihaela van der Schaar. Goggle: Generative modelling for tabular data by learning relational structure. InThe Eleventh International Conference on Learning Representations, 2023. 10

  18. [18]

    New money: A systematic review of synthetic data generation for finance.arXiv preprint arXiv:2510.26076, 2025

    James Meldrum, Basem Suleiman, Fethi Rabhi, and Muhammad Johan Alibasa. New money: A systematic review of synthetic data generation for finance.arXiv preprint arXiv:2510.26076, 2025

  19. [19]

    Synthetic data generation: a privacy- preserving approach to accelerate rare disease research.Frontiers in Digital Health, 7:1563991, 2025

    Jorge M Mendes, Aziz Barbar, and Marwa Refaie. Synthetic data generation: a privacy- preserving approach to accelerate rare disease research.Frontiers in Digital Health, 7:1563991, 2025

  20. [20]

    Clavaddpm: Multi-relational data synthesis with cluster-guided diffusion models.Advances in Neural Information Processing Systems, 37:83521–83547, 2024

    Wei Pang, Masoumeh Shafieinejad, Lucy Liu, Stephanie Hazlewood, and Xi He. Clavaddpm: Multi-relational data synthesis with cluster-guided diffusion models.Advances in Neural Information Processing Systems, 37:83521–83547, 2024

  21. [21]

    International microdata: Version 7.3 [dataset]

    Integrated Public Use Microdata Series. International microdata: Version 7.3 [dataset]. min- neapolis population center: Ipums, 2020

  22. [22]

    Midst challenge at satml 2025: Membership inference over diffusion-models-based synthetic tabular data.arXiv preprint arXiv:2603.19185, 2026

    Masoumeh Shafieinejad, Xi He, Mahshid Alinoori, John Jewell, Sana Ayromlou, Wei Pang, Veronica Chatrath, Garui Sharma, and Deval Pandya. Midst challenge at satml 2025: Membership inference over diffusion-models-based synthetic tabular data.arXiv preprint arXiv:2603.19185, 2026

  23. [23]

    Tabd- iff: a mixed-type diffusion model for tabular data generation.arXiv preprint arXiv:2410.20626, 2024

    Juntong Shi, Minkai Xu, Harper Hua, Hengrui Zhang, Stefano Ermon, and Jure Leskovec. Tabd- iff: a mixed-type diffusion model for tabular data generation.arXiv preprint arXiv:2410.20626, 2024

  24. [24]

    Membership inference attacks against machine learning models

    Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. In2017 IEEE symposium on security and privacy (SP), pages 3–18. IEEE, 2017

  25. [25]

    Domain adaptation: challenges, methods, datasets, and applications.IEEE access, 11:6973–7020, 2023

    Peeyush Singhal, Rahee Walambe, Sheela Ramanna, and Ketan Kotecha. Domain adaptation: challenges, methods, datasets, and applications.IEEE access, 11:6973–7020, 2023

  26. [26]

    Risdal, Sharath Rao, and W

    Jeremy Stanley, M. Risdal, Sharath Rao, and W. Cukierski. Instacart market basket analysis, 2017

  27. [27]

    Correlation alignment for unsupervised domain adaptation

    Baochen Sun, Jiashi Feng, and Kate Saenko. Correlation alignment for unsupervised domain adaptation. InDomain adaptation in computer vision applications, pages 153–171. Springer, 2017

  28. [28]

    The hipaa privacy rule and the eu gdpr: illustrative comparisons.Seton Hall L

    Stacey A Tovino. The hipaa privacy rule and the eu gdpr: illustrative comparisons.Seton Hall L. Rev., 47:973, 2016

  29. [29]

    Membership inference attacks against synthetic data through overfitting detection.arXiv preprint arXiv:2302.12580, 2023

    Boris Van Breugel, Hao Sun, Zhaozhi Qian, and Mihaela van der Schaar. Membership inference attacks against synthetic data through overfitting detection.arXiv preprint arXiv:2302.12580, 2023

  30. [30]

    Ensembling membership inference attacks against tabular generative models

    Joshua Ward, Yuxuan Yang, Chi-Hua Wang, and Guang Cheng. Ensembling membership inference attacks against tabular generative models. InProceedings of the 18th ACM Workshop on Artificial Intelligence and Security, pages 182–193, 2025

  31. [31]

    Finding Connections: Membership Inference Attacks for the Multi-Table Synthetic Data Setting

    Joshua Ward, Chi-Hua Wang, and Guang Cheng. Finding connections: Membership inference attacks for the multi-table synthetic data setting.arXiv preprint arXiv:2602.07126, 2026

  32. [32]

    Winning the midst challenge: New membership inference attacks on diffusion models for tabular data synthesis.arXiv preprint arXiv:2503.12008, 2025

    Xiaoyu Wu, Yifei Pang, Terrance Liu, and Steven Wu. Winning the midst challenge: New membership inference attacks on diffusion models for tabular data synthesis.arXiv preprint arXiv:2503.12008, 2025

  33. [33]

    Zhang, J

    Hengrui Zhang, Jiani Zhang, Balasubramaniam Srinivasan, Zhengyuan Shen, Xiao Qin, Christos Faloutsos, Huzefa Rangwala, and George Karypis. Mixed-type tabular data synthesis with score-based diffusion in latent space.arXiv preprint arXiv:2310.09656, 2023

  34. [34]

    Membership inference attacks against synthetic health data.Journal of biomedical informatics, 125:103977, 2022

    Ziqi Zhang, Chao Yan, and Bradley A Malin. Membership inference attacks against synthetic health data.Journal of biomedical informatics, 125:103977, 2022. 11 A Datasets Here we describe the three real-world multi-relational datasets used in our evaluation. Summary statistics are reported in Table 5. For each dataset, we identify a singletarget tableon whi...

  35. [35]

    TabDiff [23]adopts a continuous-time diffusion process with per-column learned noise schedules

    additionally requires loading parent–child clustering checkpoints, which assign each child record to a cluster conditioned on its parent; this cluster assignment serves as the conditioning variable for the denoising function. TabDiff [23]adopts a continuous-time diffusion process with per-column learned noise schedules. We probe the model at seven time po...