Adaptive Power Iteration Method for Differentially Private PCA

Alina Ene; Huy Le Nguyen; Ta Duy Nguyen

arxiv: 2602.11454 · v3 · pith:5MHNS2R2new · submitted 2026-02-12 · 💻 cs.DS · cs.LG

Adaptive Power Iteration Method for Differentially Private PCA

Ta Duy Nguyen , Alina Ene , Huy Le Nguyen This is my paper

Pith reviewed 2026-05-21 14:05 UTC · model grok-4.3

classification 💻 cs.DS cs.LG

keywords differentially private PCApower iterationlow coherenceadaptive filteringtop singular vectorrow differential privacy

0 comments

The pith

A coherence-adaptive filter in power iteration gives better error bounds for differentially private top singular vector computation on low-coherence matrices.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an algorithm for privately approximating the top singular vector of a matrix under row-differential privacy. It improves on standard worst-case error bounds precisely when the input matrix has low coherence, a structural property that holds for many practical datasets such as i.i.d. samples. The improvement comes from a new filtering step inside the power iteration that automatically adjusts its behavior to the coherence parameter. A reader would care because this moves private PCA from purely worst-case analysis toward guarantees that match the structure actually present in real data.

Core claim

We give a novel algorithm that achieves beyond-worst-case guarantees for input matrices with low coherence. We introduce a new filtering technique which adapts to this coherence parameter.

What carries the argument

The coherence-adaptive filtering technique inside the power iteration, which scales noise or truncation thresholds according to the matrix coherence to tighten error bounds beyond the worst-case analysis.

Load-bearing premise

The input matrix has low coherence so that the adaptive filter can meaningfully reduce error relative to standard worst-case bounds.

What would settle it

Run the algorithm on a matrix with high coherence and check whether the observed error remains no better than the worst-case bound that non-adaptive methods already achieve.

read the original abstract

We study $\left(\epsilon,\delta\right)$-differentially private algorithms for the problem of approximately computing the top singular vector of a matrix $A\in\mathbb{R}^{n\times d}$ where each row of $A$ is a data point in $\mathbb{R}^{d}$. Following Dwork-Talwar-Thakurta-Zhang (STOC 2014), we consider the privacy model where neighboring inputs differ by one single row. We give a novel algorithm that achieves beyond-worst-case guarantees for input matrices with low coherence, which is a structural property of matrices in many applications, including but not limited to i.i.d. data. Our algorithm contributes to the extensive literature on private power iteration methods, where we introduce a new filtering technique which adapts to this coherence parameter. Our work departs from and complements the work by Hardt-Roth (STOC 2013) which achieves beyond-worst-case guarantees for the more restrictive privacy model where neighboring inputs differ in one single entry by at most 1.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents an adaptive power iteration algorithm for approximating the top singular vector of an n-by-d matrix under row-neighboring (ε,δ)-differential privacy. The central contribution is a novel filtering technique that adapts to the matrix coherence parameter to obtain improved, beyond-worst-case utility guarantees for low-coherence inputs (common in i.i.d. data settings), complementing prior entry-neighboring results such as Hardt-Roth (STOC 2013).

Significance. If the privacy and utility claims hold, the work would meaningfully advance differentially private PCA by moving beyond worst-case analysis in the row-neighboring model, which is the more natural setting for many applications. The coherence-adaptive filtering idea is a targeted technical contribution that could translate to better practical performance on structured data while preserving the simplicity of power iteration.

major comments (2)

[§4.2 and Theorem 5.1] §4.2 (Filtering Step) and Theorem 5.1 (Privacy): the filtering procedure adapts its threshold or projection to the coherence parameter μ, yet the sensitivity analysis is written only for the non-adaptive case. If μ is estimated from the private matrix rather than supplied as public input, the row-neighboring sensitivity of the filter must be re-derived; otherwise the end-to-end privacy guarantee does not cover the adaptive version that is required for the claimed utility improvement.
[Theorem 6.3] Theorem 6.3 (Utility Bound): the error term that improves with low coherence is derived under the assumption that the filter exactly matches the true μ. When μ must be approximated privately, an additional error term appears; the current proof does not quantify how this extra term interacts with the coherence-dependent improvement, leaving the “beyond-worst-case” claim unsupported for unknown coherence.

minor comments (2)

[§2 and Algorithm 1] Notation for the coherence parameter μ is introduced in §2 but used without re-statement in the algorithm pseudocode; adding a one-line reminder would improve readability.
[§7] The experimental section reports results only on synthetic low-coherence matrices; a brief discussion of how coherence is measured on real data would strengthen the practical relevance claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and valuable comments on our manuscript. The points raised about the privacy analysis for the adaptive filter and the utility bound under approximate coherence are important to address. We respond to each major comment below and will incorporate the necessary clarifications and extensions in the revised version.

read point-by-point responses

Referee: [§4.2 and Theorem 5.1] §4.2 (Filtering Step) and Theorem 5.1 (Privacy): the filtering procedure adapts its threshold or projection to the coherence parameter μ, yet the sensitivity analysis is written only for the non-adaptive case. If μ is estimated from the private matrix rather than supplied as public input, the row-neighboring sensitivity of the filter must be re-derived; otherwise the end-to-end privacy guarantee does not cover the adaptive version that is required for the claimed utility improvement.

Authors: We thank the referee for highlighting this subtlety in the privacy analysis. In the manuscript, the coherence parameter μ is introduced as a structural property of the input matrix for which the algorithm provides improved guarantees; the filtering step in §4.2 is described with respect to this parameter. We agree that if μ must be estimated from the private data to achieve a fully adaptive algorithm, the sensitivity analysis in Theorem 5.1 requires re-derivation to account for the additional sensitivity of the estimation step. In the revision we will add a private estimation procedure for μ (allocating a small portion of the privacy budget) and update the end-to-end privacy proof accordingly, ensuring the (ε,δ) guarantee holds for the adaptive version. revision: yes
Referee: [Theorem 6.3] Theorem 6.3 (Utility Bound): the error term that improves with low coherence is derived under the assumption that the filter exactly matches the true μ. When μ must be approximated privately, an additional error term appears; the current proof does not quantify how this extra term interacts with the coherence-dependent improvement, leaving the “beyond-worst-case” claim unsupported for unknown coherence.

Authors: We acknowledge that Theorem 6.3 currently derives the coherence-dependent error improvement under the assumption of exact knowledge of μ. When μ is privately approximated, an additive error term will appear. In the revision we will extend the proof of Theorem 6.3 to incorporate this term explicitly. We will show that, for a sufficiently accurate private estimate of μ (achievable with modest privacy budget), the leading coherence-adaptive improvement remains intact and the overall bound continues to be strictly better than the worst-case guarantee for low-coherence matrices. This will substantiate the beyond-worst-case claim even when μ is not known in advance. revision: yes

Circularity Check

0 steps flagged

No circularity: novel filtering technique presented as independent algorithmic contribution

full rationale

The abstract and description introduce a new filtering technique that adapts to the coherence parameter to achieve beyond-worst-case guarantees under row-neighboring DP, departing from prior work by Hardt-Roth. No equations, derivations, or load-bearing steps are exhibited in the provided text that reduce by construction to fitted inputs, self-definitions, or self-citation chains. Cited prior results (Dwork et al., Hardt-Roth) are external and non-overlapping in authorship. The central claim therefore remains self-contained as an algorithmic advance rather than a renaming or implicit fit of the target quantity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are identifiable from the abstract alone.

pith-pipeline@v0.9.0 · 5703 in / 900 out tokens · 32303 ms · 2026-05-21T14:05:24.321244+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our algorithm is based on a filtering technique which adapts to the coherence parameter of the input matrix... uses the sparse vector technique to find the suitable threshold in each iteration.
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 1.1... sin²(v1,x) ≤ B with B = O((min{4n/σ₁²,d} Υ² / (ε² σ₁² κ²) + ... )

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages

[1]

For allt≤T, j∈[d]: g(t), V:,j ≤2 q log T n β

work page
[2]

D g(t), V:,j E ≥2 s log T n β # ≤Pr

For allt, k≤Tand for all rowsaofA: D g(t), V:,−1Σ2k −1 V ⊤a −1 E ≤2σ 2k 2 q log T n β . Proof.By Lemma 2.1, and the fact thatVis a unitary matrix, we have for eacht≤T, j∈[d] Pr " D g(t), V:,j E ≥2 s log T n β # ≤Pr " D g(t), V:,j E ≥ ∥V:,j∥2 s 2 log 2T n β # ≤ β 2T n ≤ β 2T d. For eacht, k≤Tand each rowaofA, since V:,−1Σ2k −1 V ⊤a −1 2 ≤σ 2k 2 V ⊤a 2 ≤σ 2...

work page
[3]

For allt≤T, j∈[d]: g(t), Vj ≤2 q log 2T n β

work page
[4]

3.P (t) := b∈B: b, y(t) ≥θ (t) ≤ 8(log(Tlogn)+2 log 8 β ) ε = c2 ε

For allt, k≤Tand for all rowsbofB: g(t), V ⊤ −1Σ2k −1b−1 ≤2σ 2k 2 q log 2T n β . 3.P (t) := b∈B: b, y(t) ≥θ (t) ≤ 8(log(Tlogn)+2 log 8 β ) ε = c2 ε . To start, by Remark A.1 θ(t) ≤ 1 n4T + 2 max b D y(t), b E .(8) By Lemma A.3, for all rowsbofBand for allj∈[d],, we have D N(t) −1y(t),Σ 2k −1b−1 E ≤ c2 ε σ2k 2 max b∈B D y(t), b E ;(9) N(t) 1 y(t) ≤ c2 ε σ1...

work page
[5]

From Lemma 2.2, with probability at least1−β, for allt≤T, 2 q log 2 δ ε V ⊤ Headg(t)θ(t) 2 2 ≤ 4 log 2 δ ε2 θ(t) 2 5|Head|log 2T β .(16)

work page
[6]

From Lemma 2.1, D Σ2 Heady(t) Head −N (t) Heady(t), V ⊤ Headg(t) E ∼ N 0, Σ2 Heady(t) Head −N (t) Heady(t) 2 2 , so with probability at least1−β, for allt≤T, D Σ2 Heady(t) Head −N (t) Heady(t), V ⊤ Headg(t) E ≤ Σ2 Heady(t) Head −N (t) Heady(t) 2 s 2 log 2T β .(17)

work page
[7]

From Lemma A.5, with probability1−β, we can bound for allt≤T max b∈B D y(t), b E ≤M(t,0) +σ 1Υm(t) ≤s2m(t) +α t 2(M(0) −s 2m(0) + T α2 K(1 +s 2) εn4T ) +σ 1Υm(t) =(s2 +σ 1Υ)m(t) + M(0) −s 2m(0) + T α2 K(1 +s 2) εn4T αt 2 ≤(s2 +σ 1Υ)m(t) +Cα t 2. Using the update rule, we have y(t+1) Head 2 2 = Σ2 Heady(t) Head + 2 q log 2 δ ε V ⊤ Headg(t)θ(t) −N (t) Heady...

work page
[8]

From Lemma 2.2, with probability at least1−β, for allt≤T, 2 q log 2 δ ε V ⊤ Tailg(t)θ(t) 2 2 ≤ 4 log 2 δ ε2 θ(t) 2 5|Tail|log 2T β .(21)

work page
[9]

From Lemma 2.1, D Σ2 Taily(t) Tail −N (t) Taily(t), V ⊤ Tailg(t) E ∼ N 0, Σ2 Taily(t) Tail −N (t) Taily(t) 2 2 , so with proba- bility at least1−β, for allt≤T, D Σ2 Taily(t) Tail −N (t) Taily(t), V ⊤ Tailg(t) E ≤ Σ2 Taily(t) Tail −N (t) Taily(t) 2 s 2 log 2T β .(22)

work page
[10]

Letj= min(4n/σ 2 1, d) + 1and assume thatj≤d

With probability1−β, we can bound for allt≤T max b∈B D y(t), b E ≤M(t,0) +σ 1Υm(t) ≤(s 2 +σ 1Υ)m(t) +Cα t 2. Letj= min(4n/σ 2 1, d) + 1and assume thatj≤d. As before, we can bound y(t+1) Tail 2 2 = Σ2 Taily(t) Tail + 2 q log 2 δ ε V ⊤ Tailg(t)θ(t) −N (t) Taily(t) 2 2 = Σ2 Taily(t) Tail −N (t) Taily(t) 2 + 2 q log 2 δ ε V ⊤ Tailg(t)θ(t) 2 2 + 2 q log 2 δ ε ...

work page
[11]

Therefore, from Lemma A.10, with probability at least3 4 −6β, y(T) 2 2 = y(T) 1 2 + y(T) −1 2 2 ≤ m(T) 2 + Rm(T) + 6KαT 2 (1 + 8T d) 2 = 1 + R+α T 2 ·6K(1 + 8T d) 1 m(T) 2! m(T) 2 ≤  1 + R+ 6000K(1 + 8T d) α2 α1 T!2  m(T) 2 . Thus, y(T) 1 2 y(T) 2 2 ≥ 1 1 + R+ 6000K(1 + 8T d) 1 + κ 2 −T 2 We then obtain the following results as consequences: sin2 y(T)...

work page
[12]

Thus, with probability at least1− β 2, we havePn i=1 Xi =Pn i=1 gig⊤ i − Σ 2 =A ⊤A−n Σ 2 . To show (26), we note that, since thegi are iid samples fromN(0, Σ 2 ), we have nX i=1 E gig⊤ i − Σ 2 2 2 =n· Eg∼N(0, Σ 2 ) gg ⊤ − Σ 2 2 2 ≤n· σ2 1 dX i=1 σ2 i ! , where the inequality follows from Lemma B.1. Notethatthematrices Xi aresymmetric. Additionally, wehave...

work page
[13]

X i x2 i ≤ 1 t2 x2 1 # Furthermore, we have Pr

By combining with (25), we obtain that, with probability at least1−β, we have A⊤A−n Σ 2 2 ≤G= max (s 2Mlog 2d β ,6Rlog 2d β ) = max   O   vuutn· σ2 1 dX i=1 σ2 i ! log d β   , O dX i=1 σ2 i ! log n β log d β !   Next, we recall Wedin’s theorem. Theorem B.2(Wedin’s theorem, [Wed72]).Let bΣ = Σ + E. Let σi be the i-th largest singular value ofΣ. L...

work page
[14]

Since κ≥ 1 3 κ, the second condition is satisfied

We setT = 3K2 1 κ. Since κ≥ 1 3 κ, the second condition is satisfied. Finally, consider the third condition. Sinceσ2 1 ≥ 1 2 nσ2 1 andΥ≤K 4 · 1√n, we have 4 K εσ1 Υ + K2 εσ2 1 ≤4 √ 2KK4 εσ1n + 2K2 4 εσ2 1n ! ≤8 max ( √ 2KK4 εσ1n , 2K2 4 εσ2 1n ) Thus, to satisfy the third condition, it suffices to ensure thatn≥dand 1 3 κ≥8 max ( √ 2KK4 εσ1n , 2K2 4 εσ2 1n...

work page

[1] [1]

For allt≤T, j∈[d]: g(t), V:,j ≤2 q log T n β

work page

[2] [2]

D g(t), V:,j E ≥2 s log T n β # ≤Pr

For allt, k≤Tand for all rowsaofA: D g(t), V:,−1Σ2k −1 V ⊤a −1 E ≤2σ 2k 2 q log T n β . Proof.By Lemma 2.1, and the fact thatVis a unitary matrix, we have for eacht≤T, j∈[d] Pr " D g(t), V:,j E ≥2 s log T n β # ≤Pr " D g(t), V:,j E ≥ ∥V:,j∥2 s 2 log 2T n β # ≤ β 2T n ≤ β 2T d. For eacht, k≤Tand each rowaofA, since V:,−1Σ2k −1 V ⊤a −1 2 ≤σ 2k 2 V ⊤a 2 ≤σ 2...

work page

[3] [3]

For allt≤T, j∈[d]: g(t), Vj ≤2 q log 2T n β

work page

[4] [4]

3.P (t) := b∈B: b, y(t) ≥θ (t) ≤ 8(log(Tlogn)+2 log 8 β ) ε = c2 ε

For allt, k≤Tand for all rowsbofB: g(t), V ⊤ −1Σ2k −1b−1 ≤2σ 2k 2 q log 2T n β . 3.P (t) := b∈B: b, y(t) ≥θ (t) ≤ 8(log(Tlogn)+2 log 8 β ) ε = c2 ε . To start, by Remark A.1 θ(t) ≤ 1 n4T + 2 max b D y(t), b E .(8) By Lemma A.3, for all rowsbofBand for allj∈[d],, we have D N(t) −1y(t),Σ 2k −1b−1 E ≤ c2 ε σ2k 2 max b∈B D y(t), b E ;(9) N(t) 1 y(t) ≤ c2 ε σ1...

work page

[5] [5]

From Lemma 2.2, with probability at least1−β, for allt≤T, 2 q log 2 δ ε V ⊤ Headg(t)θ(t) 2 2 ≤ 4 log 2 δ ε2 θ(t) 2 5|Head|log 2T β .(16)

work page

[6] [6]

From Lemma 2.1, D Σ2 Heady(t) Head −N (t) Heady(t), V ⊤ Headg(t) E ∼ N 0, Σ2 Heady(t) Head −N (t) Heady(t) 2 2 , so with probability at least1−β, for allt≤T, D Σ2 Heady(t) Head −N (t) Heady(t), V ⊤ Headg(t) E ≤ Σ2 Heady(t) Head −N (t) Heady(t) 2 s 2 log 2T β .(17)

work page

[7] [7]

From Lemma A.5, with probability1−β, we can bound for allt≤T max b∈B D y(t), b E ≤M(t,0) +σ 1Υm(t) ≤s2m(t) +α t 2(M(0) −s 2m(0) + T α2 K(1 +s 2) εn4T ) +σ 1Υm(t) =(s2 +σ 1Υ)m(t) + M(0) −s 2m(0) + T α2 K(1 +s 2) εn4T αt 2 ≤(s2 +σ 1Υ)m(t) +Cα t 2. Using the update rule, we have y(t+1) Head 2 2 = Σ2 Heady(t) Head + 2 q log 2 δ ε V ⊤ Headg(t)θ(t) −N (t) Heady...

work page

[8] [8]

From Lemma 2.2, with probability at least1−β, for allt≤T, 2 q log 2 δ ε V ⊤ Tailg(t)θ(t) 2 2 ≤ 4 log 2 δ ε2 θ(t) 2 5|Tail|log 2T β .(21)

work page

[9] [9]

From Lemma 2.1, D Σ2 Taily(t) Tail −N (t) Taily(t), V ⊤ Tailg(t) E ∼ N 0, Σ2 Taily(t) Tail −N (t) Taily(t) 2 2 , so with proba- bility at least1−β, for allt≤T, D Σ2 Taily(t) Tail −N (t) Taily(t), V ⊤ Tailg(t) E ≤ Σ2 Taily(t) Tail −N (t) Taily(t) 2 s 2 log 2T β .(22)

work page

[10] [10]

Letj= min(4n/σ 2 1, d) + 1and assume thatj≤d

With probability1−β, we can bound for allt≤T max b∈B D y(t), b E ≤M(t,0) +σ 1Υm(t) ≤(s 2 +σ 1Υ)m(t) +Cα t 2. Letj= min(4n/σ 2 1, d) + 1and assume thatj≤d. As before, we can bound y(t+1) Tail 2 2 = Σ2 Taily(t) Tail + 2 q log 2 δ ε V ⊤ Tailg(t)θ(t) −N (t) Taily(t) 2 2 = Σ2 Taily(t) Tail −N (t) Taily(t) 2 + 2 q log 2 δ ε V ⊤ Tailg(t)θ(t) 2 2 + 2 q log 2 δ ε ...

work page

[11] [11]

Therefore, from Lemma A.10, with probability at least3 4 −6β, y(T) 2 2 = y(T) 1 2 + y(T) −1 2 2 ≤ m(T) 2 + Rm(T) + 6KαT 2 (1 + 8T d) 2 = 1 + R+α T 2 ·6K(1 + 8T d) 1 m(T) 2! m(T) 2 ≤  1 + R+ 6000K(1 + 8T d) α2 α1 T!2  m(T) 2 . Thus, y(T) 1 2 y(T) 2 2 ≥ 1 1 + R+ 6000K(1 + 8T d) 1 + κ 2 −T 2 We then obtain the following results as consequences: sin2 y(T)...

work page

[12] [12]

Thus, with probability at least1− β 2, we havePn i=1 Xi =Pn i=1 gig⊤ i − Σ 2 =A ⊤A−n Σ 2 . To show (26), we note that, since thegi are iid samples fromN(0, Σ 2 ), we have nX i=1 E gig⊤ i − Σ 2 2 2 =n· Eg∼N(0, Σ 2 ) gg ⊤ − Σ 2 2 2 ≤n· σ2 1 dX i=1 σ2 i ! , where the inequality follows from Lemma B.1. Notethatthematrices Xi aresymmetric. Additionally, wehave...

work page

[13] [13]

X i x2 i ≤ 1 t2 x2 1 # Furthermore, we have Pr

By combining with (25), we obtain that, with probability at least1−β, we have A⊤A−n Σ 2 2 ≤G= max (s 2Mlog 2d β ,6Rlog 2d β ) = max   O   vuutn· σ2 1 dX i=1 σ2 i ! log d β   , O dX i=1 σ2 i ! log n β log d β !   Next, we recall Wedin’s theorem. Theorem B.2(Wedin’s theorem, [Wed72]).Let bΣ = Σ + E. Let σi be the i-th largest singular value ofΣ. L...

work page

[14] [14]

Since κ≥ 1 3 κ, the second condition is satisfied

We setT = 3K2 1 κ. Since κ≥ 1 3 κ, the second condition is satisfied. Finally, consider the third condition. Sinceσ2 1 ≥ 1 2 nσ2 1 andΥ≤K 4 · 1√n, we have 4 K εσ1 Υ + K2 εσ2 1 ≤4 √ 2KK4 εσ1n + 2K2 4 εσ2 1n ! ≤8 max ( √ 2KK4 εσ1n , 2K2 4 εσ2 1n ) Thus, to satisfy the third condition, it suffices to ensure thatn≥dand 1 3 κ≥8 max ( √ 2KK4 εσ1n , 2K2 4 εσ2 1n...

work page