arxiv: 2602.01150 · v2 · submitted 2026-02-01 · 💻 cs.LG · cs.AI· cs.CR· cs.CV· math.OC

Recognition: 2 theorem links

· Lean Theorem

SMI: Statistical Membership Inference for Reliable Unlearned Model Auditing

Jialong Sun , Zeming Wei , Jiaxuan Zou , Jiacheng Gong , Jie Fu , Chengyang Dong , Heng Xu , Jialong Li

show 1 more author

Bo Liu

Authors on Pith no claims yet

Pith reviewed 2026-05-16 08:55 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CRcs.CVmath.OC

keywords machine unlearningmembership inferencemodel auditingstatistical estimationforgetting verificationfeature distributionmixture modeling

0 comments

The pith

Unlearned samples sit in different feature-space positions than true non-members, so membership inference audits systematically overestimate forgetting.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that standard membership inference attacks fail as audits for machine unlearning because samples removed from training occupy distinct locations in the model's feature space compared with data the model never encountered. This separation creates an unavoidable bias that makes forgetting appear more complete than it actually is. The authors introduce Statistical Membership Inference, a training-free method that estimates the proportion of non-member samples inside the unlearned model's feature distribution. The method also supplies bootstrap reference ranges that quantify how reliable each audit result is. Experiments demonstrate that the statistical estimator outperforms attack-based baselines while eliminating the cost of training shadow models.

Core claim

We prove that unlearned samples occupy fundamentally different positions in the feature space than non-member samples, making this alignment bias unavoidable and unobservable, which leads to systematically optimistic evaluations of unlearning performance. We reformulate auditing as estimating the non-member mixture proportion in the unlearned feature distribution and supply bootstrap ranges for quantified reliability.

What carries the argument

Statistical Membership Inference (SMI), which models the unlearned feature distribution as a mixture of member and non-member components and estimates the non-member proportion without shadow-model training.

If this is right

Auditing requires no additional shadow-model training or repeated attack queries.
Each audit result comes with explicit numerical bounds on its reliability.
The same statistical procedure can be applied to any model from which feature vectors can be extracted.
Overly optimistic forgetting rates reported by prior MIA-based audits can be corrected retroactively.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The observed feature-space separation may also appear in other post-training edits such as fine-tuning or pruning, suggesting SMI-style checks could be useful there.
Practitioners could use the bootstrap ranges to decide minimum numbers of test samples needed for audits that meet a target reliability threshold.
If the mixture-model assumption holds across domains, similar proportion-estimation techniques might detect other hidden data effects such as poisoning or backdoors.

Load-bearing premise

The feature vectors produced by the unlearned model can be treated as draws from a two-component mixture whose components correspond to forgotten members and true non-members.

What would settle it

Apply SMI to a controlled unlearned model whose exact count of removed training samples is known in advance and check whether the estimated non-member proportion and its bootstrap interval recover the ground-truth fraction at the expected rate.

read the original abstract

Machine unlearning (MU) is essential for enforcing the right to be forgotten in machine learning systems. A key challenge of MU is how to reliably audit whether a model has truly forgotten specified training data. Membership Inference Attacks (MIAs) are widely used for unlearned model auditing, where samples that evade membership detection are regarded as successfully forgotten. We show this assumption is fundamentally flawed: failed membership inference does not imply true forgetting. We prove that unlearned samples occupy fundamentally different positions in the feature space than non-member samples, making this alignment bias unavoidable and unobservable, which leads to systematically optimistic evaluations of unlearning performance. Meanwhile, training shadow models for MIA incurs substantial computational overhead. To address both limitations, we propose Statistical Membership Inference (SMI), a training-free auditing framework that reformulates auditing as estimating the non-member mixture proportion in the unlearned feature distribution. Beyond estimating the forgetting rate, SMI also provides bootstrap reference ranges for quantified auditing reliability. Extensive experiments show that SMI consistently outperforms all MIA-based baselines, with no shadow model training required. Overall, SMI establishes a principled and efficient alternative to MIA-based auditing methods, with both theoretical guarantees and strong empirical performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper correctly flags that MIA-based unlearning audits are biased because forgotten points sit apart in feature space, but the proposed two-component mixture estimator looks misspecified by the authors' own proof.

read the letter

The main takeaway is that standard membership inference attacks give optimistic audits for unlearned models. The authors prove unlearned samples land in positions distinct from both retained members and true non-members, so failed attacks do not mean the data is actually forgotten. They replace shadow-model MIAs with a direct statistical estimate of the non-member proportion in the observed feature distribution, plus bootstrap ranges to quantify reliability. No shadow training is needed, which removes a clear practical cost. That shift from attack-based auditing to mixture proportion estimation is the clearest novelty, and the efficiency claim is straightforward to accept on its face. The bootstrap intervals are a useful addition for giving auditors a sense of how stable the number is. The soft spot is the identifiability problem the stress-test note flags. If unlearned points really form a separable third mode, as the proof asserts, then fitting a two-component member/non-member mixture will misallocate mass and bias the forgetting rate. The abstract does not show conditions under which the estimator still recovers the right weight, and the experiments are described only at a high level. Without the derivation or the exact mixture model, it is hard to know whether the reported gains are robust or just an artifact of the same misalignment. This work is aimed at people doing machine unlearning research or building privacy compliance tools. A reader who needs a lighter auditing method than full MIAs will find the critique of current practice useful even if they end up modifying the estimator. The paper deserves a serious referee because the underlying auditing problem is real and the direction is worth testing, though the mixture assumption needs explicit justification or a fallback when the third mode appears. I would send it to review rather than desk reject.

Referee Report

2 major / 1 minor

Summary. The paper claims that standard membership inference attacks (MIAs) for auditing machine unlearning are fundamentally flawed because unlearned samples occupy distinct positions in feature space from true non-members, producing systematically optimistic evaluations. It introduces Statistical Membership Inference (SMI), a training-free method that reformulates auditing as estimating the non-member mixture weight in the unlearned model's feature distribution (modeled as a two-component mixture) and supplies bootstrap reference ranges for reliability. Experiments reportedly show SMI outperforming MIA baselines without shadow-model training.

Significance. If the mixture estimation is robust, SMI would supply an efficient, shadow-model-free auditing procedure with built-in uncertainty quantification, addressing both the computational cost and bias issues of existing MIA-based audits. The bootstrap ranges constitute a concrete strength for practical deployment.

major comments (2)

[Proof of separation and SMI mixture model] The proof that unlearned samples occupy fundamentally different positions in feature space than non-member samples (abstract and theoretical section) directly conflicts with the two-component mixture assumption used to derive the non-member proportion estimator in the SMI formulation. If the unlearned component is separable from both retained members and true non-members, the model is misspecified; standard estimators (EM, moment matching) will misattribute mass and bias the forgetting-rate estimate. No identifiability conditions (minimum separation, parametric form) are supplied to guarantee recovery when a third mode is present.
[SMI derivation and guarantees] The central claim that SMI provides 'theoretical guarantees' for reliable auditing rests on the mixture being exactly two components; the separation result undermines this without additional derivation showing that the estimator remains consistent or that the third component can be absorbed without bias.

minor comments (1)

[Abstract] The abstract states 'theoretical guarantees' but the manuscript does not explicitly list the assumptions (e.g., Gaussianity, separation thresholds) under which the bootstrap ranges are valid.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback. The comments raise an important question about the relationship between our separation result and the two-component mixture model in SMI. We address each point below and indicate the revisions we will make.

read point-by-point responses

Referee: [Proof of separation and SMI mixture model] The proof that unlearned samples occupy fundamentally different positions in feature space than non-member samples (abstract and theoretical section) directly conflicts with the two-component mixture assumption used to derive the non-member proportion estimator in the SMI formulation. If the unlearned component is separable from both retained members and true non-members, the model is misspecified; standard estimators (EM, moment matching) will misattribute mass and bias the forgetting-rate estimate. No identifiability conditions (minimum separation, parametric form) are supplied to guarantee recovery when a third mode is present.

Authors: The separation result is derived to explain the systematic bias of MIA-based auditing: unlearned samples do not lie in the same region of feature space as true non-members, so membership inference on unlearned samples cannot be interpreted as evidence of forgetting. SMI instead fits a two-component mixture (retained-member features versus true non-member features) to the feature distribution produced by the unlearned model and then estimates the non-member weight on held-out data that includes the forget set. We acknowledge that a distinct third mode for unlearned samples can produce misspecification. In the revised manuscript we will (i) state the modeling assumption explicitly, (ii) supply a sufficient condition for identifiability (bounded total-variation distance between the unlearned and non-member components), and (iii) add a robustness experiment showing that the bootstrap estimator remains stable under moderate separation observed in our datasets. revision: partial
Referee: [SMI derivation and guarantees] The central claim that SMI provides 'theoretical guarantees' for reliable auditing rests on the mixture being exactly two components; the separation result undermines this without additional derivation showing that the estimator remains consistent or that the third component can be absorbed without bias.

Authors: The theoretical guarantees claimed in the paper are for the bootstrap reference ranges that quantify auditing reliability, not for exact recovery of the mixing proportion under arbitrary misspecification. We will add a short derivation in the revised version showing that, when the unlearned component lies within a fixed total-variation ball around the non-member component, the moment-matching estimator remains consistent for the non-member weight and the bootstrap intervals retain their coverage properties. This clarifies the scope of the guarantees while preserving the practical utility of the method. revision: yes

Circularity Check

0 steps flagged

No circularity: SMI is an independent statistical estimator on observed features, not derived from unlearning process or self-referential inputs

full rationale

The paper's derivation consists of (1) a proof that unlearned samples occupy distinct feature-space positions from non-members (creating MIA bias) and (2) reformulation of auditing as direct estimation of the non-member mixture weight in the observed unlearned-model feature distribution, with bootstrap ranges for reliability. Neither step reduces to its own inputs by construction: the proof is a geometric claim about positions, and SMI applies standard mixture proportion estimation (EM or moment methods) to the empirical feature distribution without fitting parameters that are then renamed as predictions. No self-citation is load-bearing for the central claim, no ansatz is smuggled, and no uniqueness theorem is imported from prior author work. The two-component mixture assumption may be misspecified if a third separable mode exists, but that is an identifiability concern, not a circular reduction. The method is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on standard statistical mixture modeling assumptions applied to model features; no free parameters or new entities are explicitly introduced in the abstract.

axioms (2)

domain assumption Unlearned samples occupy positions in feature space that are statistically distinguishable from non-member samples
Invoked to establish the unavoidable bias in membership inference and to justify the mixture model.
domain assumption The observed feature distribution after unlearning is a mixture whose non-member proportion can be estimated by standard statistical procedures
Required for the core estimation step and bootstrap reliability ranges.

pith-pipeline@v0.9.0 · 5544 in / 1319 out tokens · 28633 ms · 2026-05-16T08:55:13.138941+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Df = α Dv_t + (1-α) Dt_t ... min R2(α) := ||Σf - α Σv + (1-α) Σt + (α-α²) Δ²||_F² (Eq. 6); MMD objective min ||μf(k) - α μv(k) - (1-α) μt(k)||_H² (Eq. 8)
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

RD(Q) ≤ RS(Q) + sqrt(2/m (χ²(Q∥P)+1) log(1/δ)) (Thm 2.1); auditing bound adds sqrt(1/2 D∞(Dt∥Df)) (Cor. 2.2)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.