Privacy Attacks on Image AutoRegressive Models
Pith reviewed 2026-05-23 03:27 UTC · model grok-4.3
The pith
Image autoregressive models expose training data to membership inference attacks at rates far higher than comparable diffusion models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Image autoregressive models are empirically significantly more vulnerable to privacy attacks than diffusion models that achieve similar performance, as shown by a novel membership inference attack reaching 94.57 percent true positive rate at one percent false positive rate on IARs versus 6.38 percent on DMs, dataset inference succeeding with four samples instead of two hundred, and extraction of 698 training points from one IAR variant.
What carries the argument
A novel membership inference attack that exploits the autoregressive prediction process to detect whether an image was part of the training set.
If this is right
- Practitioners selecting between generation paradigms must weigh the documented speed and quality gains of IARs against their measured increase in membership inference success.
- Dataset owners can use the four-sample dataset inference method to audit whether an IAR was trained on their private collection.
- Model developers can apply the extraction procedure to quantify how many training images an IAR has memorized.
- Future IAR training recipes must incorporate privacy defenses if the observed leakage rates are to be reduced.
Where Pith is reading between the lines
- If the privacy gap persists across model scales, regulatory pressure may favor diffusion models for any deployment involving personal images.
- The autoregressive token-by-token structure may inherently require more memorization of local statistics than the global denoising process in diffusion models.
- A follow-up experiment could test whether adding explicit privacy regularization during IAR training closes the attack gap without harming FID scores.
Load-bearing premise
The membership inference attack and diffusion model baselines are implemented and evaluated under comparable training conditions and model scales.
What would settle it
A replication study that trains an image autoregressive model and a diffusion model on identical data and compute, then applies the reported attack and finds the true positive rate gap disappears.
Figures
read the original abstract
Image AutoRegressive generation has emerged as a new powerful paradigm with image autoregressive models (IARs) matching state-of-the-art diffusion models (DMs) in image quality (FID: 1.48 vs. 1.58) while allowing for a higher generation speed. However, the privacy risks associated with IARs remain unexplored, raising concerns regarding their responsible deployment. To address this gap, we conduct a comprehensive privacy analysis of IARs, comparing their privacy risks to the ones of DMs as reference points. Concretely, we develop a novel membership inference attack (MIA) that achieves a remarkably high success rate in detecting training images (with True Positive Rate at False Positive Rate = 1% of 94.57% vs. 6.38% for DMs with comparable attacks). We leverage our novel MIA to provide dataset inference (DI) for IARs, and show that it requires as few as 4 samples to detect dataset membership (compared to 200 for DI in DMs), confirming a higher information leakage in IARs. Finally, we are able to extract hundreds of training data points from an IAR (e.g., 698 from VAR-\textit{d}30). Our results suggest a fundamental privacy-utility trade-off: while IARs excel in image generation quality and speed, they are \textit{empirically} significantly more vulnerable to privacy attacks compared to DMs that achieve similar performance. We release the code at https://github.com/sprintml/privacy_attacks_against_iars for reproducibility.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that image autoregressive models (IARs) matching diffusion models (DMs) in generation quality (FID 1.48 vs 1.58) and speed are empirically far more vulnerable to privacy attacks, with a novel membership inference attack (MIA) achieving 94.57% TPR at 1% FPR on IARs versus 6.38% on DMs with comparable attacks, dataset inference succeeding with only 4 samples versus 200 for DMs, and extraction of hundreds of training points (e.g., 698 from VAR-d30). It concludes this indicates a fundamental privacy-utility trade-off and releases code for reproducibility.
Significance. If the IAR-DM comparisons hold under matched training regimes, the work would be significant for identifying substantially higher leakage in a competitive new generative paradigm, with direct implications for responsible deployment; the code release is a positive factor supporting reproducibility.
major comments (2)
- [Abstract] Abstract: the central claim of a 'fundamental privacy-utility trade-off' caused by autoregressive vs diffusion inductive bias requires that the reported MIA gap (94.57% vs 6.38% TPR@1% FPR) and DI gap (4 vs 200 samples) be attributable to architecture rather than unmatched factors; the text states only that DM baselines use 'comparable attacks' and IARs achieve similar FID, without establishing that DMs were trained at matching scale (e.g., VAR-d30 parameter count), on identical data, with the same optimizer and schedule.
- [Abstract] Abstract: the novel MIA and its evaluation lack any description of attack construction, feature extraction, threshold selection, or statistical testing, so it is impossible to determine whether the high success rates are robust or influenced by post-hoc choices; this is load-bearing because the headline numerical results rest on this attack.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback. We address the two major comments point by point below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim of a 'fundamental privacy-utility trade-off' caused by autoregressive vs diffusion inductive bias requires that the reported MIA gap (94.57% vs 6.38% TPR@1% FPR) and DI gap (4 vs 200 samples) be attributable to architecture rather than unmatched factors; the text states only that DM baselines use 'comparable attacks' and IARs achieve similar FID, without establishing that DMs were trained at matching scale (e.g., VAR-d30 parameter count), on identical data, with the same optimizer and schedule.
Authors: We acknowledge that the models compared are drawn from the published literature rather than retrained under identical conditions (data, optimizer, schedule, and exact parameter count). Our intent was to evaluate representative state-of-the-art IARs and DMs that achieve comparable FID scores using the attacks described in the paper. We agree that explicitly matched training regimes would provide stronger evidence for an inductive-bias-driven trade-off. In the revision we will expand the model description section to list the precise training details, data, and hyperparameters of every model used, and we will qualify the trade-off claim as an empirical observation on current published models rather than a strictly controlled causal statement. revision: partial
-
Referee: [Abstract] Abstract: the novel MIA and its evaluation lack any description of attack construction, feature extraction, threshold selection, or statistical testing, so it is impossible to determine whether the high success rates are robust or influenced by post-hoc choices; this is load-bearing because the headline numerical results rest on this attack.
Authors: The construction of the novel MIA (including the autoregressive-specific likelihood features, the per-token probability aggregation, the threshold selection procedure via validation-set calibration, and the statistical testing via bootstrap confidence intervals) is fully detailed in Section 3.2 and evaluated in Section 4.1. We will add a one-sentence summary of the attack methodology to the abstract and ensure the key design choices are cross-referenced from the abstract to Section 3 in the revised manuscript. revision: yes
Circularity Check
No circularity: purely empirical attack evaluation
full rationale
The paper reports results from implementing a novel membership inference attack and dataset inference on IARs versus DM baselines, with direct experimental metrics (TPR@FPR, sample counts for DI, extracted data points). No equations, fitted parameters renamed as predictions, self-definitional steps, or load-bearing self-citations appear in the derivation chain. The central claims rest on measured attack success rates under stated conditions rather than any reduction to inputs by construction, so the evaluation is self-contained.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We develop a novel membership inference attack (MIA) that achieves a remarkably high success rate in detecting training images (with True Positive Rate at False Positive Rate = 1% of 94.57% vs. 6.38% for DMs with comparable attacks).
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We leverage our novel MIA to provide dataset inference (DI) for IARs, and show that it requires as few as 4 samples to detect dataset membership (compared to 200 for DI in DMs)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Dao, Q., Phung, H., Nguyen, B., and Tran, A
PMLR, 2020. Dao, Q., Phung, H., Nguyen, B., and Tran, A. Flow match- ing in latent space. arXiv preprint arXiv:2307.08698 , 2023. Das, D., Zhang, J., and Tram`er, F. Blind baselines beat mem- bership inference attacks for foundation models. arXiv preprint arXiv:2406.16201, 2024. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. Imagenet: ...
-
[2]
Gao, S., Zhou, P., Cheng, M.-M., and Yan, S
URL http://www.dspace.cam.ac.uk/ handle/1810/3486. Gao, S., Zhou, P., Cheng, M.-M., and Yan, S. Masked diffusion transformer is a strong image synthesizer, 2023. Han, J., Liu, J., Jiang, Y ., Yan, B., Zhang, Y ., Yuan, Z., Peng, B., and Liu, X. Infinity: Scaling bitwise autoregressive modeling for high-resolution image synthesis, 2024. URL https://arxiv.o...
-
[3]
Scalable Diffusion Models with Transformers
URL https://aclanthology.org/2023. findings-acl.719. 11 Privacy Attacks on Image AutoRegressive Models Nasr, M., Hayes, J., Steinke, T., Balle, B., Tram`er, F., Jagiel- ski, M., Carlini, N., and Terzis, A. Tight auditing of differentially private machine learning. In 32nd USENIX Security Symposium (USENIX Security 23) , pp. 1631– 1648, 2023. Peebles, W. a...
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[4]
Shokri, R., Stronati, M., Song, C., and Shmatikov, V
URL https://openreview.net/forum? id=zWqr3MQuNs. Shokri, R., Stronati, M., Song, C., and Shmatikov, V . Mem- bership inference attacks against machine learning mod- els. In 2017 IEEE Symposium on Security and Pri- vacy (SP) , pp. 3–18, Los Alamitos, CA, USA, may
work page 2017
- [5]
-
[6]
URL https://doi.ieeecomputersociety. org/10.1109/SP.2017.41. Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., and Ganguli, S. Deep unsupervised learning using nonequi- librium thermodynamics. In International Conference on Machine Learning, 2015. Song, J., Meng, C., and Ermon, S. Denoising diffu- sion implicit models. In International Conference on Lea...
-
[7]
Access to p(x) boosts MIA (Zarifzadeh et al., 2024). We note that IARs inherently expose the full information about p(x) at the output (per-token logits, see Equation (1)). In contrast, DMs do not, as they learn to transform N (0, I) to the data distribution q(x) by iterative denoising process. This difference is expressed with varying MIA designs for DMs...
work page 2024
-
[8]
For each training sample passed through the IAR, the model ”sees” N different sequences to predict
AutoRegressive training exposes IARs to more data per update. For each training sample passed through the IAR, the model ”sees” N different sequences to predict. Conversely, DMs only ”sees” a single, noisy image. This influences two factors: a) training time of the model—DMs require to be trained two times longer than IARs, on average. b) privacy leakage—...
-
[9]
Multiple independent signals amplify leakage. Previous works (Maini et al., 2024; Dubi´nski et al., 2025) aggregate signal from many MIAs to yield a stronger attack. Notably, each token predicted by IARs leaks unique information from the model, as it is generated from a (slightly) different prefix. Thus, per-token losses/logits that IAR-specific MIAs use,...
work page 2024
-
[10]
Training duration is a factor that increases vulnerability for MIA and DI for DMs the most
-
[11]
Model size influences leakage more for IARs than for DMs
-
[12]
It also correlates with MIA performance
Is IAR factor plays the most significant role for the DI performance. It also correlates with MIA performance. Our results show that while these two factors—model size and training duration—influence the performance of our attacks against the models, the results strengthen our notion that IARs tend to leak more privacy than IARs due to their inherent char...
work page 2024
-
[13]
Training duration, expressed by number of data points seen during training, e.g., RAR-B sees 400 × 1.27M ≈ 0.5B samples. In DMs we evaluate the training duration varies between 0.21B to 1.79B samples seen, whereas IARs are trained with between 0.26B and 0.51B samples
-
[14]
DMs minimize Equation (3), while IARs— Equation (2)
Training objectives. DMs minimize Equation (3), while IARs— Equation (2). Importantly, DMs minimize the expected error over timesteps and data, which necessitates a twice as long training duration for DMs than IARs (on average) to achieve comparable FID
-
[15]
Model sizes. IARs benefit from scaling laws (Kaplan et al., 2020), and that allows them to be scaled up to sizes greater than DMs, before their performance plateaus. DMs cannot be scaled that well—the performance gains diminish faster with the increase of size. In effect, the biggest IARs we evaluate—V AR- d30 and RAR-XXL— are on average 2-3 times bigger ...
work page 2020
-
[16]
Two stage architectures. All models incorporate an encoder-decoder network for training and inference, e.g., VQ- V AE (Esser et al., 2020). Importantly, these encoders differ between models. V AR’s next-scale prediction paradigm requires training of a specialized encoder that understands how to process residual token maps, used during encoding an image to...
work page 2020
-
[17]
LDM instead employs the UNet architecture (Ronneberger et al., 2015), being a prior work
as their diffusion backbones. LDM instead employs the UNet architecture (Ronneberger et al., 2015), being a prior work. We refer the reader to the original publications for more details about their architectures and training strategies. LDM (Latent Diffusion Model) by Rombach et al. (2022) first propose running diffusion in a learned latent space rather t...
work page 2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.