pith. machine review for the scientific record. sign in

arxiv: 2602.01150 · v2 · submitted 2026-02-01 · 💻 cs.LG · cs.AI· cs.CR· cs.CV· math.OC

Recognition: 2 theorem links

· Lean Theorem

SMI: Statistical Membership Inference for Reliable Unlearned Model Auditing

Authors on Pith no claims yet

Pith reviewed 2026-05-16 08:55 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CRcs.CVmath.OC
keywords machine unlearningmembership inferencemodel auditingstatistical estimationforgetting verificationfeature distributionmixture modeling
0
0 comments X

The pith

Unlearned samples sit in different feature-space positions than true non-members, so membership inference audits systematically overestimate forgetting.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that standard membership inference attacks fail as audits for machine unlearning because samples removed from training occupy distinct locations in the model's feature space compared with data the model never encountered. This separation creates an unavoidable bias that makes forgetting appear more complete than it actually is. The authors introduce Statistical Membership Inference, a training-free method that estimates the proportion of non-member samples inside the unlearned model's feature distribution. The method also supplies bootstrap reference ranges that quantify how reliable each audit result is. Experiments demonstrate that the statistical estimator outperforms attack-based baselines while eliminating the cost of training shadow models.

Core claim

We prove that unlearned samples occupy fundamentally different positions in the feature space than non-member samples, making this alignment bias unavoidable and unobservable, which leads to systematically optimistic evaluations of unlearning performance. We reformulate auditing as estimating the non-member mixture proportion in the unlearned feature distribution and supply bootstrap ranges for quantified reliability.

What carries the argument

Statistical Membership Inference (SMI), which models the unlearned feature distribution as a mixture of member and non-member components and estimates the non-member proportion without shadow-model training.

If this is right

  • Auditing requires no additional shadow-model training or repeated attack queries.
  • Each audit result comes with explicit numerical bounds on its reliability.
  • The same statistical procedure can be applied to any model from which feature vectors can be extracted.
  • Overly optimistic forgetting rates reported by prior MIA-based audits can be corrected retroactively.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The observed feature-space separation may also appear in other post-training edits such as fine-tuning or pruning, suggesting SMI-style checks could be useful there.
  • Practitioners could use the bootstrap ranges to decide minimum numbers of test samples needed for audits that meet a target reliability threshold.
  • If the mixture-model assumption holds across domains, similar proportion-estimation techniques might detect other hidden data effects such as poisoning or backdoors.

Load-bearing premise

The feature vectors produced by the unlearned model can be treated as draws from a two-component mixture whose components correspond to forgotten members and true non-members.

What would settle it

Apply SMI to a controlled unlearned model whose exact count of removed training samples is known in advance and check whether the estimated non-member proportion and its bootstrap interval recover the ground-truth fraction at the expected rate.

read the original abstract

Machine unlearning (MU) is essential for enforcing the right to be forgotten in machine learning systems. A key challenge of MU is how to reliably audit whether a model has truly forgotten specified training data. Membership Inference Attacks (MIAs) are widely used for unlearned model auditing, where samples that evade membership detection are regarded as successfully forgotten. We show this assumption is fundamentally flawed: failed membership inference does not imply true forgetting. We prove that unlearned samples occupy fundamentally different positions in the feature space than non-member samples, making this alignment bias unavoidable and unobservable, which leads to systematically optimistic evaluations of unlearning performance. Meanwhile, training shadow models for MIA incurs substantial computational overhead. To address both limitations, we propose Statistical Membership Inference (SMI), a training-free auditing framework that reformulates auditing as estimating the non-member mixture proportion in the unlearned feature distribution. Beyond estimating the forgetting rate, SMI also provides bootstrap reference ranges for quantified auditing reliability. Extensive experiments show that SMI consistently outperforms all MIA-based baselines, with no shadow model training required. Overall, SMI establishes a principled and efficient alternative to MIA-based auditing methods, with both theoretical guarantees and strong empirical performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that standard membership inference attacks (MIAs) for auditing machine unlearning are fundamentally flawed because unlearned samples occupy distinct positions in feature space from true non-members, producing systematically optimistic evaluations. It introduces Statistical Membership Inference (SMI), a training-free method that reformulates auditing as estimating the non-member mixture weight in the unlearned model's feature distribution (modeled as a two-component mixture) and supplies bootstrap reference ranges for reliability. Experiments reportedly show SMI outperforming MIA baselines without shadow-model training.

Significance. If the mixture estimation is robust, SMI would supply an efficient, shadow-model-free auditing procedure with built-in uncertainty quantification, addressing both the computational cost and bias issues of existing MIA-based audits. The bootstrap ranges constitute a concrete strength for practical deployment.

major comments (2)
  1. [Proof of separation and SMI mixture model] The proof that unlearned samples occupy fundamentally different positions in feature space than non-member samples (abstract and theoretical section) directly conflicts with the two-component mixture assumption used to derive the non-member proportion estimator in the SMI formulation. If the unlearned component is separable from both retained members and true non-members, the model is misspecified; standard estimators (EM, moment matching) will misattribute mass and bias the forgetting-rate estimate. No identifiability conditions (minimum separation, parametric form) are supplied to guarantee recovery when a third mode is present.
  2. [SMI derivation and guarantees] The central claim that SMI provides 'theoretical guarantees' for reliable auditing rests on the mixture being exactly two components; the separation result undermines this without additional derivation showing that the estimator remains consistent or that the third component can be absorbed without bias.
minor comments (1)
  1. [Abstract] The abstract states 'theoretical guarantees' but the manuscript does not explicitly list the assumptions (e.g., Gaussianity, separation thresholds) under which the bootstrap ranges are valid.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback. The comments raise an important question about the relationship between our separation result and the two-component mixture model in SMI. We address each point below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [Proof of separation and SMI mixture model] The proof that unlearned samples occupy fundamentally different positions in feature space than non-member samples (abstract and theoretical section) directly conflicts with the two-component mixture assumption used to derive the non-member proportion estimator in the SMI formulation. If the unlearned component is separable from both retained members and true non-members, the model is misspecified; standard estimators (EM, moment matching) will misattribute mass and bias the forgetting-rate estimate. No identifiability conditions (minimum separation, parametric form) are supplied to guarantee recovery when a third mode is present.

    Authors: The separation result is derived to explain the systematic bias of MIA-based auditing: unlearned samples do not lie in the same region of feature space as true non-members, so membership inference on unlearned samples cannot be interpreted as evidence of forgetting. SMI instead fits a two-component mixture (retained-member features versus true non-member features) to the feature distribution produced by the unlearned model and then estimates the non-member weight on held-out data that includes the forget set. We acknowledge that a distinct third mode for unlearned samples can produce misspecification. In the revised manuscript we will (i) state the modeling assumption explicitly, (ii) supply a sufficient condition for identifiability (bounded total-variation distance between the unlearned and non-member components), and (iii) add a robustness experiment showing that the bootstrap estimator remains stable under moderate separation observed in our datasets. revision: partial

  2. Referee: [SMI derivation and guarantees] The central claim that SMI provides 'theoretical guarantees' for reliable auditing rests on the mixture being exactly two components; the separation result undermines this without additional derivation showing that the estimator remains consistent or that the third component can be absorbed without bias.

    Authors: The theoretical guarantees claimed in the paper are for the bootstrap reference ranges that quantify auditing reliability, not for exact recovery of the mixing proportion under arbitrary misspecification. We will add a short derivation in the revised version showing that, when the unlearned component lies within a fixed total-variation ball around the non-member component, the moment-matching estimator remains consistent for the non-member weight and the bootstrap intervals retain their coverage properties. This clarifies the scope of the guarantees while preserving the practical utility of the method. revision: yes

Circularity Check

0 steps flagged

No circularity: SMI is an independent statistical estimator on observed features, not derived from unlearning process or self-referential inputs

full rationale

The paper's derivation consists of (1) a proof that unlearned samples occupy distinct feature-space positions from non-members (creating MIA bias) and (2) reformulation of auditing as direct estimation of the non-member mixture weight in the observed unlearned-model feature distribution, with bootstrap ranges for reliability. Neither step reduces to its own inputs by construction: the proof is a geometric claim about positions, and SMI applies standard mixture proportion estimation (EM or moment methods) to the empirical feature distribution without fitting parameters that are then renamed as predictions. No self-citation is load-bearing for the central claim, no ansatz is smuggled, and no uniqueness theorem is imported from prior author work. The two-component mixture assumption may be misspecified if a third separable mode exists, but that is an identifiability concern, not a circular reduction. The method is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on standard statistical mixture modeling assumptions applied to model features; no free parameters or new entities are explicitly introduced in the abstract.

axioms (2)
  • domain assumption Unlearned samples occupy positions in feature space that are statistically distinguishable from non-member samples
    Invoked to establish the unavoidable bias in membership inference and to justify the mixture model.
  • domain assumption The observed feature distribution after unlearning is a mixture whose non-member proportion can be estimated by standard statistical procedures
    Required for the core estimation step and bootstrap reliability ranges.

pith-pipeline@v0.9.0 · 5544 in / 1319 out tokens · 28633 ms · 2026-05-16T08:55:13.138941+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.