pith. sign in

arxiv: 2502.02514 · v5 · submitted 2025-02-04 · 💻 cs.CV · cs.LG

Privacy Attacks on Image AutoRegressive Models

Pith reviewed 2026-05-23 03:27 UTC · model grok-4.3

classification 💻 cs.CV cs.LG
keywords privacy attacksmembership inferenceimage autoregressive modelsdiffusion modelsdata extractiondataset inferenceprivacy-utility trade-off
0
0 comments X

The pith

Image autoregressive models expose training data to membership inference attacks at rates far higher than comparable diffusion models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that image autoregressive models suffer substantially higher privacy leakage than diffusion models despite matching generation quality and speed. It introduces a new membership inference attack that identifies training images with a true positive rate of 94.57 percent at one percent false positive rate, compared to 6.38 percent on diffusion models. The same attack enables dataset inference from only four samples and permits extraction of hundreds of training images. These results point to an empirical privacy-utility trade-off where architectural choices that improve generation also increase vulnerability to privacy attacks.

Core claim

Image autoregressive models are empirically significantly more vulnerable to privacy attacks than diffusion models that achieve similar performance, as shown by a novel membership inference attack reaching 94.57 percent true positive rate at one percent false positive rate on IARs versus 6.38 percent on DMs, dataset inference succeeding with four samples instead of two hundred, and extraction of 698 training points from one IAR variant.

What carries the argument

A novel membership inference attack that exploits the autoregressive prediction process to detect whether an image was part of the training set.

If this is right

  • Practitioners selecting between generation paradigms must weigh the documented speed and quality gains of IARs against their measured increase in membership inference success.
  • Dataset owners can use the four-sample dataset inference method to audit whether an IAR was trained on their private collection.
  • Model developers can apply the extraction procedure to quantify how many training images an IAR has memorized.
  • Future IAR training recipes must incorporate privacy defenses if the observed leakage rates are to be reduced.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the privacy gap persists across model scales, regulatory pressure may favor diffusion models for any deployment involving personal images.
  • The autoregressive token-by-token structure may inherently require more memorization of local statistics than the global denoising process in diffusion models.
  • A follow-up experiment could test whether adding explicit privacy regularization during IAR training closes the attack gap without harming FID scores.

Load-bearing premise

The membership inference attack and diffusion model baselines are implemented and evaluated under comparable training conditions and model scales.

What would settle it

A replication study that trains an image autoregressive model and a diffusion model on identical data and compute, then applies the reported attack and finds the true positive rate gap disappears.

Figures

Figures reproduced from arXiv: 2502.02514 by Adam Dziedzic, Antoni Kowalczuk, Franziska Boenisch, Jan Dubi\'nski.

Figure 1
Figure 1. Figure 1: Privacy-utility and generation speed-performance trade-off for IARs compared to DMs. 1) IARs achieve better and faster image generation, but reveal more information to potential training data identification attacks. 2) In particular, large IAR models are most vulnerable. 3) In case of large IARs, even the identification of individual training samples (MIAs) has a high success rate. 4) MAR models are more p… view at source ↗
Figure 2
Figure 2. Figure 2: DI success for IARs vs DMs. We report the gen￾erative quality expressed with the FID score vs the number of suspect samples P required to carry out DI. Overall Performance and Comparison to DMs. We present our results in [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Extracted Training Samples. We note that IARs can reconstruct verbatim images from their training data. The first row shows the original training samples and the second one presents the extracted images. 5.3. Extracting Training Data from IARs To analyze memorization in IARs, we design a novel train￾ing data extraction attack for IARs. This attack builds on elements of data extraction attacks for LLMs (Car… view at source ↗
Figure 4
Figure 4. Figure 4: Dataset Inference for IARs Procedural Steps. The process consists of four main steps: 1 Data Preparation: Prepare the data to verify whether the (suspected) member samples P were used to train the IAR. The (confirmed) nonmember samples U, from the same distribution as P, serve as the validation set. 2 Feature Extraction: Run each individual MIA on all inputs from {P, U} to extract membership features for a… view at source ↗
Figure 5
Figure 5. Figure 5: Comprehensive comparison of the trade-offs between IARs and DMs. Here we describe the comprehensive process of training and generation cost estimation of IARs and DMs, which results in the plot [PITH_FULL_IMAGE:figures/full_fig_p021_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Privacy-utility trade-off of our mitigation strategy. We show that successfully defending VAR and RAR against MIA and DI requires adding noise that severely harms the performance. Interestingly, we are able to limit the extent of memorization for VAR, and fully defend MAR against MIA and DI. J.1. Method Given an input sample x, we perturb the output of the IAR according to a noise scale σ, which we can adj… view at source ↗
Figure 7
Figure 7. Figure 7: Image extracted from VAR-d30 without prefix. (Left) memorized image, (right) generated image. K.2. Prefix Length vs. Number of Extracted Images We analyze the effect of the prefix length on the number of extracted samples. As our method leverages conditioning on a part of the input sequence, in [PITH_FULL_IMAGE:figures/full_fig_p026_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Images extracted from both VAR-d30, and RAR-XXL. 27 [PITH_FULL_IMAGE:figures/full_fig_p027_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: An image extracted from both VAR-d30, and MAR-H. 0 5 10 15 20 25 30 Prefix length i 0 100 200 300 400 500 600 700 Number of memorized samples Model RAR-XXL VAR-d30 [PITH_FULL_IMAGE:figures/full_fig_p028_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Prefix length and the number of extracted samples. We show that with an increase of the prefix length, the success of our extraction method increases [PITH_FULL_IMAGE:figures/full_fig_p028_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Distance function d and the SSCD score. We show that d correlates with the final memorization score. This result makes our candidate selection process sound, and reduces the cost of extracting memorized samples. 28 [PITH_FULL_IMAGE:figures/full_fig_p028_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Non-cherry-picked extracted images. Odd columns from the left correspond to the original image, even to extracted. From left, the images are for VAR-d30, RAR-XXL, and MAR-H. 29 [PITH_FULL_IMAGE:figures/full_fig_p029_12.png] view at source ↗
read the original abstract

Image AutoRegressive generation has emerged as a new powerful paradigm with image autoregressive models (IARs) matching state-of-the-art diffusion models (DMs) in image quality (FID: 1.48 vs. 1.58) while allowing for a higher generation speed. However, the privacy risks associated with IARs remain unexplored, raising concerns regarding their responsible deployment. To address this gap, we conduct a comprehensive privacy analysis of IARs, comparing their privacy risks to the ones of DMs as reference points. Concretely, we develop a novel membership inference attack (MIA) that achieves a remarkably high success rate in detecting training images (with True Positive Rate at False Positive Rate = 1% of 94.57% vs. 6.38% for DMs with comparable attacks). We leverage our novel MIA to provide dataset inference (DI) for IARs, and show that it requires as few as 4 samples to detect dataset membership (compared to 200 for DI in DMs), confirming a higher information leakage in IARs. Finally, we are able to extract hundreds of training data points from an IAR (e.g., 698 from VAR-\textit{d}30). Our results suggest a fundamental privacy-utility trade-off: while IARs excel in image generation quality and speed, they are \textit{empirically} significantly more vulnerable to privacy attacks compared to DMs that achieve similar performance. We release the code at https://github.com/sprintml/privacy_attacks_against_iars for reproducibility.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper claims that image autoregressive models (IARs) matching diffusion models (DMs) in generation quality (FID 1.48 vs 1.58) and speed are empirically far more vulnerable to privacy attacks, with a novel membership inference attack (MIA) achieving 94.57% TPR at 1% FPR on IARs versus 6.38% on DMs with comparable attacks, dataset inference succeeding with only 4 samples versus 200 for DMs, and extraction of hundreds of training points (e.g., 698 from VAR-d30). It concludes this indicates a fundamental privacy-utility trade-off and releases code for reproducibility.

Significance. If the IAR-DM comparisons hold under matched training regimes, the work would be significant for identifying substantially higher leakage in a competitive new generative paradigm, with direct implications for responsible deployment; the code release is a positive factor supporting reproducibility.

major comments (2)
  1. [Abstract] Abstract: the central claim of a 'fundamental privacy-utility trade-off' caused by autoregressive vs diffusion inductive bias requires that the reported MIA gap (94.57% vs 6.38% TPR@1% FPR) and DI gap (4 vs 200 samples) be attributable to architecture rather than unmatched factors; the text states only that DM baselines use 'comparable attacks' and IARs achieve similar FID, without establishing that DMs were trained at matching scale (e.g., VAR-d30 parameter count), on identical data, with the same optimizer and schedule.
  2. [Abstract] Abstract: the novel MIA and its evaluation lack any description of attack construction, feature extraction, threshold selection, or statistical testing, so it is impossible to determine whether the high success rates are robust or influenced by post-hoc choices; this is load-bearing because the headline numerical results rest on this attack.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address the two major comments point by point below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim of a 'fundamental privacy-utility trade-off' caused by autoregressive vs diffusion inductive bias requires that the reported MIA gap (94.57% vs 6.38% TPR@1% FPR) and DI gap (4 vs 200 samples) be attributable to architecture rather than unmatched factors; the text states only that DM baselines use 'comparable attacks' and IARs achieve similar FID, without establishing that DMs were trained at matching scale (e.g., VAR-d30 parameter count), on identical data, with the same optimizer and schedule.

    Authors: We acknowledge that the models compared are drawn from the published literature rather than retrained under identical conditions (data, optimizer, schedule, and exact parameter count). Our intent was to evaluate representative state-of-the-art IARs and DMs that achieve comparable FID scores using the attacks described in the paper. We agree that explicitly matched training regimes would provide stronger evidence for an inductive-bias-driven trade-off. In the revision we will expand the model description section to list the precise training details, data, and hyperparameters of every model used, and we will qualify the trade-off claim as an empirical observation on current published models rather than a strictly controlled causal statement. revision: partial

  2. Referee: [Abstract] Abstract: the novel MIA and its evaluation lack any description of attack construction, feature extraction, threshold selection, or statistical testing, so it is impossible to determine whether the high success rates are robust or influenced by post-hoc choices; this is load-bearing because the headline numerical results rest on this attack.

    Authors: The construction of the novel MIA (including the autoregressive-specific likelihood features, the per-token probability aggregation, the threshold selection procedure via validation-set calibration, and the statistical testing via bootstrap confidence intervals) is fully detailed in Section 3.2 and evaluated in Section 4.1. We will add a one-sentence summary of the attack methodology to the abstract and ensure the key design choices are cross-referenced from the abstract to Section 3 in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical attack evaluation

full rationale

The paper reports results from implementing a novel membership inference attack and dataset inference on IARs versus DM baselines, with direct experimental metrics (TPR@FPR, sample counts for DI, extracted data points). No equations, fitted parameters renamed as predictions, self-definitional steps, or load-bearing self-citations appear in the derivation chain. The central claims rest on measured attack success rates under stated conditions rather than any reduction to inputs by construction, so the evaluation is self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical machine learning security paper; contains no mathematical derivations, free parameters in equations, axioms, or invented entities. Central claims rest entirely on experimental attack success rates.

pith-pipeline@v0.9.0 · 5831 in / 1045 out tokens · 26449 ms · 2026-05-23T03:27:28.974169+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · 1 internal anchor

  1. [1]

    Dao, Q., Phung, H., Nguyen, B., and Tran, A

    PMLR, 2020. Dao, Q., Phung, H., Nguyen, B., and Tran, A. Flow match- ing in latent space. arXiv preprint arXiv:2307.08698 , 2023. Das, D., Zhang, J., and Tram`er, F. Blind baselines beat mem- bership inference attacks for foundation models. arXiv preprint arXiv:2406.16201, 2024. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. Imagenet: ...

  2. [2]

    Gao, S., Zhou, P., Cheng, M.-M., and Yan, S

    URL http://www.dspace.cam.ac.uk/ handle/1810/3486. Gao, S., Zhou, P., Cheng, M.-M., and Yan, S. Masked diffusion transformer is a strong image synthesizer, 2023. Han, J., Liu, J., Jiang, Y ., Yan, B., Zhang, Y ., Yuan, Z., Peng, B., and Liu, X. Infinity: Scaling bitwise autoregressive modeling for high-resolution image synthesis, 2024. URL https://arxiv.o...

  3. [3]

    Scalable Diffusion Models with Transformers

    URL https://aclanthology.org/2023. findings-acl.719. 11 Privacy Attacks on Image AutoRegressive Models Nasr, M., Hayes, J., Steinke, T., Balle, B., Tram`er, F., Jagiel- ski, M., Carlini, N., and Terzis, A. Tight auditing of differentially private machine learning. In 32nd USENIX Security Symposium (USENIX Security 23) , pp. 1631– 1648, 2023. Peebles, W. a...

  4. [4]

    Shokri, R., Stronati, M., Song, C., and Shmatikov, V

    URL https://openreview.net/forum? id=zWqr3MQuNs. Shokri, R., Stronati, M., Song, C., and Shmatikov, V . Mem- bership inference attacks against machine learning mod- els. In 2017 IEEE Symposium on Security and Pri- vacy (SP) , pp. 3–18, Los Alamitos, CA, USA, may

  5. [5]

    doi: 10.1109/SP.2017

    IEEE Computer Society. doi: 10.1109/SP.2017

  6. [6]

    org/10.1109/SP.2017.41

    URL https://doi.ieeecomputersociety. org/10.1109/SP.2017.41. Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., and Ganguli, S. Deep unsupervised learning using nonequi- librium thermodynamics. In International Conference on Machine Learning, 2015. Song, J., Meng, C., and Ermon, S. Denoising diffu- sion implicit models. In International Conference on Lea...

  7. [7]

    We note that IARs inherently expose the full information about p(x) at the output (per-token logits, see Equation (1))

    Access to p(x) boosts MIA (Zarifzadeh et al., 2024). We note that IARs inherently expose the full information about p(x) at the output (per-token logits, see Equation (1)). In contrast, DMs do not, as they learn to transform N (0, I) to the data distribution q(x) by iterative denoising process. This difference is expressed with varying MIA designs for DMs...

  8. [8]

    For each training sample passed through the IAR, the model ”sees” N different sequences to predict

    AutoRegressive training exposes IARs to more data per update. For each training sample passed through the IAR, the model ”sees” N different sequences to predict. Conversely, DMs only ”sees” a single, noisy image. This influences two factors: a) training time of the model—DMs require to be trained two times longer than IARs, on average. b) privacy leakage—...

  9. [9]

    Previous works (Maini et al., 2024; Dubi´nski et al., 2025) aggregate signal from many MIAs to yield a stronger attack

    Multiple independent signals amplify leakage. Previous works (Maini et al., 2024; Dubi´nski et al., 2025) aggregate signal from many MIAs to yield a stronger attack. Notably, each token predicted by IARs leaks unique information from the model, as it is generated from a (slightly) different prefix. Thus, per-token losses/logits that IAR-specific MIAs use,...

  10. [10]

    Training duration is a factor that increases vulnerability for MIA and DI for DMs the most

  11. [11]

    Model size influences leakage more for IARs than for DMs

  12. [12]

    It also correlates with MIA performance

    Is IAR factor plays the most significant role for the DI performance. It also correlates with MIA performance. Our results show that while these two factors—model size and training duration—influence the performance of our attacks against the models, the results strengthen our notion that IARs tend to leak more privacy than IARs due to their inherent char...

  13. [13]

    In DMs we evaluate the training duration varies between 0.21B to 1.79B samples seen, whereas IARs are trained with between 0.26B and 0.51B samples

    Training duration, expressed by number of data points seen during training, e.g., RAR-B sees 400 × 1.27M ≈ 0.5B samples. In DMs we evaluate the training duration varies between 0.21B to 1.79B samples seen, whereas IARs are trained with between 0.26B and 0.51B samples

  14. [14]

    DMs minimize Equation (3), while IARs— Equation (2)

    Training objectives. DMs minimize Equation (3), while IARs— Equation (2). Importantly, DMs minimize the expected error over timesteps and data, which necessitates a twice as long training duration for DMs than IARs (on average) to achieve comparable FID

  15. [15]

    IARs benefit from scaling laws (Kaplan et al., 2020), and that allows them to be scaled up to sizes greater than DMs, before their performance plateaus

    Model sizes. IARs benefit from scaling laws (Kaplan et al., 2020), and that allows them to be scaled up to sizes greater than DMs, before their performance plateaus. DMs cannot be scaled that well—the performance gains diminish faster with the increase of size. In effect, the biggest IARs we evaluate—V AR- d30 and RAR-XXL— are on average 2-3 times bigger ...

  16. [16]

    All models incorporate an encoder-decoder network for training and inference, e.g., VQ- V AE (Esser et al., 2020)

    Two stage architectures. All models incorporate an encoder-decoder network for training and inference, e.g., VQ- V AE (Esser et al., 2020). Importantly, these encoders differ between models. V AR’s next-scale prediction paradigm requires training of a specialized encoder that understands how to process residual token maps, used during encoding an image to...

  17. [17]

    LDM instead employs the UNet architecture (Ronneberger et al., 2015), being a prior work

    as their diffusion backbones. LDM instead employs the UNet architecture (Ronneberger et al., 2015), being a prior work. We refer the reader to the original publications for more details about their architectures and training strategies. LDM (Latent Diffusion Model) by Rombach et al. (2022) first propose running diffusion in a learned latent space rather t...