Adaptive Power Iteration Method for Differentially Private PCA
Pith reviewed 2026-05-21 14:05 UTC · model grok-4.3
The pith
A coherence-adaptive filter in power iteration gives better error bounds for differentially private top singular vector computation on low-coherence matrices.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We give a novel algorithm that achieves beyond-worst-case guarantees for input matrices with low coherence. We introduce a new filtering technique which adapts to this coherence parameter.
What carries the argument
The coherence-adaptive filtering technique inside the power iteration, which scales noise or truncation thresholds according to the matrix coherence to tighten error bounds beyond the worst-case analysis.
Load-bearing premise
The input matrix has low coherence so that the adaptive filter can meaningfully reduce error relative to standard worst-case bounds.
What would settle it
Run the algorithm on a matrix with high coherence and check whether the observed error remains no better than the worst-case bound that non-adaptive methods already achieve.
read the original abstract
We study $\left(\epsilon,\delta\right)$-differentially private algorithms for the problem of approximately computing the top singular vector of a matrix $A\in\mathbb{R}^{n\times d}$ where each row of $A$ is a data point in $\mathbb{R}^{d}$. Following Dwork-Talwar-Thakurta-Zhang (STOC 2014), we consider the privacy model where neighboring inputs differ by one single row. We give a novel algorithm that achieves beyond-worst-case guarantees for input matrices with low coherence, which is a structural property of matrices in many applications, including but not limited to i.i.d. data. Our algorithm contributes to the extensive literature on private power iteration methods, where we introduce a new filtering technique which adapts to this coherence parameter. Our work departs from and complements the work by Hardt-Roth (STOC 2013) which achieves beyond-worst-case guarantees for the more restrictive privacy model where neighboring inputs differ in one single entry by at most 1.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents an adaptive power iteration algorithm for approximating the top singular vector of an n-by-d matrix under row-neighboring (ε,δ)-differential privacy. The central contribution is a novel filtering technique that adapts to the matrix coherence parameter to obtain improved, beyond-worst-case utility guarantees for low-coherence inputs (common in i.i.d. data settings), complementing prior entry-neighboring results such as Hardt-Roth (STOC 2013).
Significance. If the privacy and utility claims hold, the work would meaningfully advance differentially private PCA by moving beyond worst-case analysis in the row-neighboring model, which is the more natural setting for many applications. The coherence-adaptive filtering idea is a targeted technical contribution that could translate to better practical performance on structured data while preserving the simplicity of power iteration.
major comments (2)
- [§4.2 and Theorem 5.1] §4.2 (Filtering Step) and Theorem 5.1 (Privacy): the filtering procedure adapts its threshold or projection to the coherence parameter μ, yet the sensitivity analysis is written only for the non-adaptive case. If μ is estimated from the private matrix rather than supplied as public input, the row-neighboring sensitivity of the filter must be re-derived; otherwise the end-to-end privacy guarantee does not cover the adaptive version that is required for the claimed utility improvement.
- [Theorem 6.3] Theorem 6.3 (Utility Bound): the error term that improves with low coherence is derived under the assumption that the filter exactly matches the true μ. When μ must be approximated privately, an additional error term appears; the current proof does not quantify how this extra term interacts with the coherence-dependent improvement, leaving the “beyond-worst-case” claim unsupported for unknown coherence.
minor comments (2)
- [§2 and Algorithm 1] Notation for the coherence parameter μ is introduced in §2 but used without re-statement in the algorithm pseudocode; adding a one-line reminder would improve readability.
- [§7] The experimental section reports results only on synthetic low-coherence matrices; a brief discussion of how coherence is measured on real data would strengthen the practical relevance claim.
Simulated Author's Rebuttal
We thank the referee for the careful reading and valuable comments on our manuscript. The points raised about the privacy analysis for the adaptive filter and the utility bound under approximate coherence are important to address. We respond to each major comment below and will incorporate the necessary clarifications and extensions in the revised version.
read point-by-point responses
-
Referee: [§4.2 and Theorem 5.1] §4.2 (Filtering Step) and Theorem 5.1 (Privacy): the filtering procedure adapts its threshold or projection to the coherence parameter μ, yet the sensitivity analysis is written only for the non-adaptive case. If μ is estimated from the private matrix rather than supplied as public input, the row-neighboring sensitivity of the filter must be re-derived; otherwise the end-to-end privacy guarantee does not cover the adaptive version that is required for the claimed utility improvement.
Authors: We thank the referee for highlighting this subtlety in the privacy analysis. In the manuscript, the coherence parameter μ is introduced as a structural property of the input matrix for which the algorithm provides improved guarantees; the filtering step in §4.2 is described with respect to this parameter. We agree that if μ must be estimated from the private data to achieve a fully adaptive algorithm, the sensitivity analysis in Theorem 5.1 requires re-derivation to account for the additional sensitivity of the estimation step. In the revision we will add a private estimation procedure for μ (allocating a small portion of the privacy budget) and update the end-to-end privacy proof accordingly, ensuring the (ε,δ) guarantee holds for the adaptive version. revision: yes
-
Referee: [Theorem 6.3] Theorem 6.3 (Utility Bound): the error term that improves with low coherence is derived under the assumption that the filter exactly matches the true μ. When μ must be approximated privately, an additional error term appears; the current proof does not quantify how this extra term interacts with the coherence-dependent improvement, leaving the “beyond-worst-case” claim unsupported for unknown coherence.
Authors: We acknowledge that Theorem 6.3 currently derives the coherence-dependent error improvement under the assumption of exact knowledge of μ. When μ is privately approximated, an additive error term will appear. In the revision we will extend the proof of Theorem 6.3 to incorporate this term explicitly. We will show that, for a sufficiently accurate private estimate of μ (achievable with modest privacy budget), the leading coherence-adaptive improvement remains intact and the overall bound continues to be strictly better than the worst-case guarantee for low-coherence matrices. This will substantiate the beyond-worst-case claim even when μ is not known in advance. revision: yes
Circularity Check
No circularity: novel filtering technique presented as independent algorithmic contribution
full rationale
The abstract and description introduce a new filtering technique that adapts to the coherence parameter to achieve beyond-worst-case guarantees under row-neighboring DP, departing from prior work by Hardt-Roth. No equations, derivations, or load-bearing steps are exhibited in the provided text that reduce by construction to fitted inputs, self-definitions, or self-citation chains. Cited prior results (Dwork et al., Hardt-Roth) are external and non-overlapping in authorship. The central claim therefore remains self-contained as an algorithmic advance rather than a renaming or implicit fit of the target quantity.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our algorithm is based on a filtering technique which adapts to the coherence parameter of the input matrix... uses the sparse vector technique to find the suitable threshold in each iteration.
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 1.1... sin²(v1,x) ≤ B with B = O((min{4n/σ₁²,d} Υ² / (ε² σ₁² κ²) + ... )
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
For allt≤T, j∈[d]: g(t), V:,j ≤2 q log T n β
-
[2]
D g(t), V:,j E ≥2 s log T n β # ≤Pr
For allt, k≤Tand for all rowsaofA: D g(t), V:,−1Σ2k −1 V ⊤a −1 E ≤2σ 2k 2 q log T n β . Proof.By Lemma 2.1, and the fact thatVis a unitary matrix, we have for eacht≤T, j∈[d] Pr " D g(t), V:,j E ≥2 s log T n β # ≤Pr " D g(t), V:,j E ≥ ∥V:,j∥2 s 2 log 2T n β # ≤ β 2T n ≤ β 2T d. For eacht, k≤Tand each rowaofA, since V:,−1Σ2k −1 V ⊤a −1 2 ≤σ 2k 2 V ⊤a 2 ≤σ 2...
-
[3]
For allt≤T, j∈[d]: g(t), Vj ≤2 q log 2T n β
-
[4]
3.P (t) := b∈B: b, y(t) ≥θ (t) ≤ 8(log(Tlogn)+2 log 8 β ) ε = c2 ε
For allt, k≤Tand for all rowsbofB: g(t), V ⊤ −1Σ2k −1b−1 ≤2σ 2k 2 q log 2T n β . 3.P (t) := b∈B: b, y(t) ≥θ (t) ≤ 8(log(Tlogn)+2 log 8 β ) ε = c2 ε . To start, by Remark A.1 θ(t) ≤ 1 n4T + 2 max b D y(t), b E .(8) By Lemma A.3, for all rowsbofBand for allj∈[d],, we have D N(t) −1y(t),Σ 2k −1b−1 E ≤ c2 ε σ2k 2 max b∈B D y(t), b E ;(9) N(t) 1 y(t) ≤ c2 ε σ1...
-
[5]
From Lemma 2.2, with probability at least1−β, for allt≤T, 2 q log 2 δ ε V ⊤ Headg(t)θ(t) 2 2 ≤ 4 log 2 δ ε2 θ(t) 2 5|Head|log 2T β .(16)
-
[6]
From Lemma 2.1, D Σ2 Heady(t) Head −N (t) Heady(t), V ⊤ Headg(t) E ∼ N 0, Σ2 Heady(t) Head −N (t) Heady(t) 2 2 , so with probability at least1−β, for allt≤T, D Σ2 Heady(t) Head −N (t) Heady(t), V ⊤ Headg(t) E ≤ Σ2 Heady(t) Head −N (t) Heady(t) 2 s 2 log 2T β .(17)
-
[7]
From Lemma A.5, with probability1−β, we can bound for allt≤T max b∈B D y(t), b E ≤M(t,0) +σ 1Υm(t) ≤s2m(t) +α t 2(M(0) −s 2m(0) + T α2 K(1 +s 2) εn4T ) +σ 1Υm(t) =(s2 +σ 1Υ)m(t) + M(0) −s 2m(0) + T α2 K(1 +s 2) εn4T αt 2 ≤(s2 +σ 1Υ)m(t) +Cα t 2. Using the update rule, we have y(t+1) Head 2 2 = Σ2 Heady(t) Head + 2 q log 2 δ ε V ⊤ Headg(t)θ(t) −N (t) Heady...
-
[8]
From Lemma 2.2, with probability at least1−β, for allt≤T, 2 q log 2 δ ε V ⊤ Tailg(t)θ(t) 2 2 ≤ 4 log 2 δ ε2 θ(t) 2 5|Tail|log 2T β .(21)
-
[9]
From Lemma 2.1, D Σ2 Taily(t) Tail −N (t) Taily(t), V ⊤ Tailg(t) E ∼ N 0, Σ2 Taily(t) Tail −N (t) Taily(t) 2 2 , so with proba- bility at least1−β, for allt≤T, D Σ2 Taily(t) Tail −N (t) Taily(t), V ⊤ Tailg(t) E ≤ Σ2 Taily(t) Tail −N (t) Taily(t) 2 s 2 log 2T β .(22)
-
[10]
Letj= min(4n/σ 2 1, d) + 1and assume thatj≤d
With probability1−β, we can bound for allt≤T max b∈B D y(t), b E ≤M(t,0) +σ 1Υm(t) ≤(s 2 +σ 1Υ)m(t) +Cα t 2. Letj= min(4n/σ 2 1, d) + 1and assume thatj≤d. As before, we can bound y(t+1) Tail 2 2 = Σ2 Taily(t) Tail + 2 q log 2 δ ε V ⊤ Tailg(t)θ(t) −N (t) Taily(t) 2 2 = Σ2 Taily(t) Tail −N (t) Taily(t) 2 + 2 q log 2 δ ε V ⊤ Tailg(t)θ(t) 2 2 + 2 q log 2 δ ε ...
-
[11]
Therefore, from Lemma A.10, with probability at least3 4 −6β, y(T) 2 2 = y(T) 1 2 + y(T) −1 2 2 ≤ m(T) 2 + Rm(T) + 6KαT 2 (1 + 8T d) 2 = 1 + R+α T 2 ·6K(1 + 8T d) 1 m(T) 2! m(T) 2 ≤ 1 + R+ 6000K(1 + 8T d) α2 α1 T!2 m(T) 2 . Thus, y(T) 1 2 y(T) 2 2 ≥ 1 1 + R+ 6000K(1 + 8T d) 1 + κ 2 −T 2 We then obtain the following results as consequences: sin2 y(T)...
-
[12]
Thus, with probability at least1− β 2, we havePn i=1 Xi =Pn i=1 gig⊤ i − Σ 2 =A ⊤A−n Σ 2 . To show (26), we note that, since thegi are iid samples fromN(0, Σ 2 ), we have nX i=1 E gig⊤ i − Σ 2 2 2 =n· Eg∼N(0, Σ 2 ) gg ⊤ − Σ 2 2 2 ≤n· σ2 1 dX i=1 σ2 i ! , where the inequality follows from Lemma B.1. Notethatthematrices Xi aresymmetric. Additionally, wehave...
-
[13]
X i x2 i ≤ 1 t2 x2 1 # Furthermore, we have Pr
By combining with (25), we obtain that, with probability at least1−β, we have A⊤A−n Σ 2 2 ≤G= max (s 2Mlog 2d β ,6Rlog 2d β ) = max O vuutn· σ2 1 dX i=1 σ2 i ! log d β , O dX i=1 σ2 i ! log n β log d β ! Next, we recall Wedin’s theorem. Theorem B.2(Wedin’s theorem, [Wed72]).Let bΣ = Σ + E. Let σi be the i-th largest singular value ofΣ. L...
-
[14]
Since κ≥ 1 3 κ, the second condition is satisfied
We setT = 3K2 1 κ. Since κ≥ 1 3 κ, the second condition is satisfied. Finally, consider the third condition. Sinceσ2 1 ≥ 1 2 nσ2 1 andΥ≤K 4 · 1√n, we have 4 K εσ1 Υ + K2 εσ2 1 ≤4 √ 2KK4 εσ1n + 2K2 4 εσ2 1n ! ≤8 max ( √ 2KK4 εσ1n , 2K2 4 εσ2 1n ) Thus, to satisfy the third condition, it suffices to ensure thatn≥dand 1 3 κ≥8 max ( √ 2KK4 εσ1n , 2K2 4 εσ2 1n...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.