Federated Transfer Learning with Differential Privacy
Pith reviewed 2026-05-24 03:27 UTC · model grok-4.3
The pith
Introduces federated differential privacy as an intermediate model between local and central DP and analyzes minimax rates for four statistical tasks under heterogeneity and privacy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
we show that federated differential privacy is an intermediate privacy model between the well-established local and central models of differential privacy.
Load-bearing premise
The paper assumes that minimax rates for the four listed statistical problems can be derived while simultaneously incorporating both data heterogeneity across sites and the federated differential privacy constraint without a trusted server (abstract).
Figures
read the original abstract
Federated learning has emerged as a powerful framework for analysing distributed data, yet two challenges remain pivotal: heterogeneity across sites and privacy of local data. In this paper, we address both challenges within a federated transfer learning framework, aiming to enhance learning on a target data set by leveraging information from multiple heterogeneous source data sets while adhering to privacy constraints. We rigorously formulate the notion of federated differential privacy, which offers privacy guarantees for each data set without assuming a trusted central server. Under this privacy model, we study four statistical problems: univariate mean estimation, low-dimensional linear regression, high-dimensional linear regression, and M-estimation. By investigating the minimax rates and quantifying the cost of privacy, we show that federated differential privacy is an intermediate privacy model between the well-established local and central models of differential privacy. Our analyses account for data heterogeneity and privacy, highlighting the fundamental costs associated with each factor and the benefits of knowledge transfer in federated learning.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript formulates federated differential privacy (no trusted server) in a transfer-learning setting with heterogeneous source and target datasets. It derives minimax rates for univariate mean estimation, low-dimensional linear regression, high-dimensional linear regression, and M-estimation under this privacy model, claims these rates lie strictly between the local-DP and central-DP rates, and quantifies the separate costs of privacy and heterogeneity together with the benefit of knowledge transfer.
Significance. If the minimax derivations are correct and the intermediate positioning holds after accounting for heterogeneity, the work would supply a new, realistic privacy model for federated settings and explicit rate characterizations that separate privacy cost from heterogeneity cost across four canonical problems. The explicit treatment of transfer across heterogeneous sites is a strength.
major comments (3)
- [Abstract, §3] Abstract and §3: the central claim that federated DP rates are strictly between local and central DP requires explicit side-by-side statements of the three rates (local, federated, central) for each of the four problems; without these comparisons the intermediate positioning cannot be verified from the stated results.
- [§4] §4 (heterogeneity modeling): the minimax formulation must incorporate a concrete heterogeneity parameter (e.g., bounded mean shift or total-variation distance between source and target distributions) that appears in the rate expressions; the current treatment leaves the precise interaction between heterogeneity and the federated privacy constraint unspecified, which is load-bearing for separating the two costs.
- [§5–§8] §5–§8 (proofs of minimax rates): the abstract asserts rigorous derivations, yet the provided text does not contain the full proofs or the precise local-randomization mechanism that aggregates noise without a server; verification that the rates are indeed strictly better than local DP while respecting the no-trusted-server constraint is therefore impossible.
minor comments (2)
- Notation for the privacy parameters (ε, δ) and the heterogeneity radius should be introduced once and used consistently across all four problem sections.
- Figure captions should state the precise values of n, m, d, K, and the heterogeneity parameter used in each plotted curve.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below and will revise the manuscript accordingly to improve clarity and verifiability.
read point-by-point responses
-
Referee: [Abstract, §3] Abstract and §3: the central claim that federated DP rates are strictly between local and central DP requires explicit side-by-side statements of the three rates (local, federated, central) for each of the four problems; without these comparisons the intermediate positioning cannot be verified from the stated results.
Authors: We agree that explicit comparisons are needed for verification. In the revision we will add a table (or explicit statements) in the abstract and Section 3 listing the minimax rates under local DP, federated DP, and central DP for univariate mean estimation, low-dimensional linear regression, high-dimensional linear regression, and M-estimation, confirming the strict intermediate positioning. revision: yes
-
Referee: [§4] §4 (heterogeneity modeling): the minimax formulation must incorporate a concrete heterogeneity parameter (e.g., bounded mean shift or total-variation distance between source and target distributions) that appears in the rate expressions; the current treatment leaves the precise interaction between heterogeneity and the federated privacy constraint unspecified, which is load-bearing for separating the two costs.
Authors: We will revise Section 4 to introduce an explicit heterogeneity parameter (e.g., a bound on mean shift or total-variation distance between source and target distributions) that appears directly in the minimax rate expressions. This will make the interaction with the federated privacy constraint precise and allow clear separation of privacy and heterogeneity costs. revision: yes
-
Referee: [§5–§8] §5–§8 (proofs of minimax rates): the abstract asserts rigorous derivations, yet the provided text does not contain the full proofs or the precise local-randomization mechanism that aggregates noise without a server; verification that the rates are indeed strictly better than local DP while respecting the no-trusted-server constraint is therefore impossible.
Authors: The full proofs and the local-randomization mechanism (each site adds noise locally; aggregation occurs without a trusted server) are contained in the appendix of the arXiv version. We will add prominent references to the appendix in the main text (Sections 3 and 5–8) and include a concise description of the mechanism in Section 3. The derived rates are strictly better than local DP because transfer learning permits controlled information sharing under the federated constraint. revision: yes
Circularity Check
Minimax derivations under federated DP are independent of inputs
full rationale
The paper defines federated differential privacy as a new privacy model without a trusted server, then derives minimax rates for four statistical problems (univariate mean estimation, low-dimensional and high-dimensional linear regression, M-estimation) while incorporating heterogeneity. These steps rely on standard information-theoretic lower bounds and upper-bound constructions that quantify privacy cost separately from heterogeneity; no equation reduces a claimed prediction to a fitted parameter by construction, no uniqueness theorem is imported via self-citation, and no ansatz is smuggled in. The intermediate positioning between local and central DP follows directly from comparing the derived rates rather than from re-labeling inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Data heterogeneity across sites can be modeled while preserving privacy guarantees without a trusted server
invented entities (1)
-
federated differential privacy
no independent evidence
Forward citations
Cited by 1 Pith paper
-
General Lower Bounds for Differentially Private Federated Learning with Arbitrary Public-Transcript Interactions
Derives a federated van Trees lower bound under total clientwise sample-level zCDP for parameter estimation with squared l2 loss in federated learning protocols with arbitrary public-transcript interactions.
Reference graph
Works this paper leans on
-
[1]
Algorithm 4 is (ϵ, δ)-central DP
-
[2]
Initialise Algorithm 4 with β0 = 0 and step size ρ = 18L(1 + 81L2)−1. Suppose that n ≳ dT log n∨(T /δ) η log(T /[η(ϵ ∧ δ)]) ϵ , (35) and T = ⌈C log(n)⌉ for some absolute constant C > 0. We then have with probability at least 1 − 7η that ∥βT − β∗∥2 ≲ ∥β∗∥2 n C 81L2+1 + log log(n) η r d log(n) n + d log2(n/η) p log(log(n)/η) log(1/δ) nϵ
-
[3]
Lemma 9 shows that Algorithm 4 achieves the optimal convergence rate up to poly- logarithmic factors
In addition, suppose that ∥β∗∥2 ≤ C′ for some absolute constant C′ and C ≥ (81L2+1)/2, then we have ∥βT −β∗∥2 ≲ r(n, d, ϵ, δ, η) = log log(n) η r d log(n) n + d log2(n/η) p log(1/δ) log(log(n)/η)) nϵ with probability at least 1 − 7η. Lemma 9 shows that Algorithm 4 achieves the optimal convergence rate up to poly- logarithmic factors. Compared to Cai et al...
work page 2019
-
[4]
Let N =P k∈{0}∪A nk. For η ∈ (0, 1), under the conditions that min k∈{0}∪A nk ≳ T dlog(T /η) ∨ T log(T /(δη)) log(T /(η(ϵ ∧ δ))ϵ−1, R ≳ p d log(N/η) and Rt ≳ p log(N/η)PrivateVariance({X ⊤ τt+iβt − Yτt+i}b i=1, ϵ′, δ′), we have that P(E ′ 1 ∩ E ′ 2 ∩ E ′ 3 ∩ E ′
-
[5]
≥ 1 − 6η. Proof of Corollary 12. The proof is a generalisation of the single site result in Lemma 11 to multi-site. For brevity, we only point out the differences between controlling {Ei}i∈[4] and {E ′ i}i∈[4]. For E ′ 1, we note that the population version of X k∈{0}∪A nk b(k)N b(k) X i=1 X(k) τt+iX(k)⊤ τt+i is ˜Σ = P k∈{0}∪A nkΣ(k)/N, which has λmin(˜Σ)...
-
[6]
≥ 1 − η, as long as N ≳ T dlog(T /η). For E ′ 2, the same arguments for controlling E2 in Lemma 11 still work, but with n by N in the choice of R to account for the union bound over N random variables. The same arguments for E3 also works for E ′ 3 but with Σ replaced by Σ (k) where appro- priate, and notice that ∥βt − β∥Σ(k) ≲ ∥βt − β∥2 for any k. 56 For...
-
[7]
≥ 1 − 2η. Definition 2. Given a data set D, we say a randomised algorithm M is (ϵ, δ)-central DP with respect to a set S ⊆ D, if P(M(D) ∈ O|D) ≤ eϵP(M(D′) ∈ O|D′) for any measurable set O and any data set D′ that can be obtained by altering at most one data entry in S. We use MS ϵ,δ to denote the set of all procedures that are (ϵ, δ)-central DP with respe...
work page 2019
-
[8]
9s′ 10s − 1 ξ2 + 22 5 L2ξ − 2 11 9 2 L4 # ≤ 1 − s s′ 5 11L2
(47) Writing fβ(y, x) as the joint density, we have fβ(y, x) = 1√ 2πσ m+n exp − Pn i=1(yi − x⊤ i β)2 +Pn+m i=n+1(yi − x⊤ i β′)2 2σ2 m+nY i=1 ϕ(xi), where ϕ(xi) is the density of N (0, I). Note that since β′ is not a function of β, we have ∂fβ(y, x) ∂β = fβ(y, x) σ2 nX i=1 (yi − x⊤ i β)xi, and therefore we have X i∈[n] EAi = X j∈[d] E {M(Y , X)}j X i∈[n] (...
work page 2019
-
[9]
= s s log(d/η) log(n0) n0 + s log1/2(1/δ) log5/2(n0d/η) n0ϵ ,
-
[10]
= s s log(d/η) log(nA + n0) nA + n0 + h + p |A|ds log1/2(1/δ) log5/2[((nA + n0)d)/η] (nA + n0)ϵ . Case 1: When √ | ˆA|ds′ log1/2(1/δ) log5/2[((n ˆA+n0)d)/η] (n ˆA+n0)ϵ ≤ C0rHLR(n0, s′, d, ϵ, δ, η) ≲ rHLR(n0, s, d, ϵ, δ, η) and h ≤ crHLR(n0, s, d, ϵ, δ, η), where c is the constant in Proposition 18.(iii): We have
-
[11]
≲ [1], ˆA = A with probability at least 1 − η by Proposition 18.(iii), and the bound [2] follows from Proposition 19.(ii). Case 2: When q | ˆA|ds′ log1/2(1/δ) log5/2[((n ˆA + n0)d)/η] (n ˆA + n0)ϵ ≤ C0rHLR(n0, s′, d, ϵ, δ, η) and h > cr HLR(n0, s, d, ϵ, δ, η), where c is the constant in Proposition 18.(iii): [2] ≳ [1]. By Proposition 18.(ii), we know ∥β(k...
work page 2019
-
[12]
Going back to (58), we have Lt n(βt+1) − Lt n(βt) ≤ 1 2 γ∥βt+1 I t − βt I t + ξ/γ · gt I t∥2 2 − ξ2 2γ ∥gt I t∥2 2 + (1 − ξ)⟨βt+1 − βt, gt⟩ 71 ≤ 1 2 γ∥βt+1 I t − βt I t + ξ/γ · gt I t∥2 2 − ξ2 2γ ∥gt I t\(St∪S)∥2 2 − ξ2 2γ ∥gt St∪S∥2 2 − 9ξ 20γ (1 − ξ)∥gt St+1∪St∥2 2 + Cs′∥wt∥2 ∞. Consider a set S′ ⊆ St\St+1 with |S′| = |I t\(St ∪ S)| = |St+1\(St ∪ S)|. A...
work page 2019
-
[13]
Going back to (75), we have Lt N(βt+1) − Lt N(βt) ≤ 1 2 γ∥βt+1 I t − βt I t + ξ/γ · gt I t∥2 2 − ξ2 2γ ∥gt I t∥2 2 + (1 − ξ)⟨βt+1 − βt, gt⟩ ≤ 1 2 γ∥βt+1 I t − βt I t + ξ/γ · gt I t∥2 2 − ξ2 2γ ∥gt I t\(St∪S)∥2 2 − ξ2 2γ ∥gt St∪S∥2 2 − 9ξ 20γ (1 − ξ)∥gt St+1∪St∥2 2 + Cs′∥wt∥2 ∞. Consider a set S′ ⊆ St\St+1 with |S′| = |I t\(St ∪ S)| = |St+1\(St ∪ S)|. Appl...
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.