Optimal Cox regression under federated differential privacy: coefficients and cumulative hazards
Pith reviewed 2026-05-18 21:34 UTC · model grok-4.3
The pith
Cox regression coefficients achieve matching minimax rates under federated differential privacy with server-level phase transitions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We derive minimax lower bounds together with upper bounds that match up to poly-logarithmic factors for the regression coefficients, thereby revealing server-level phase transitions between private and non-private regimes. We also consider a relaxed differential privacy framework with partially public information. For cumulative hazard estimation, we propose a private tree-based version of the Breslow estimator for nonparametric integral estimation under FDP. As a by-product, this leads to a private survival function estimator that attains a nearly minimax optimal rate.
What carries the argument
Minimax lower and upper bounds on estimation error for Cox regression coefficients under federated differential privacy with heterogeneous per-server sample sizes and privacy budgets, paired with a tree-based private Breslow estimator for the cumulative baseline hazard.
Load-bearing premise
The standard Cox proportional hazards model holds with the usual regularity conditions on the data-generating process and the federated differential privacy model accurately captures the heterogeneous per-server sample sizes and privacy budgets.
What would settle it
Vary the number of servers while holding total sample size and privacy budgets fixed, then check whether the observed estimation error rates for the coefficients transition exactly as predicted between the private and non-private regimes.
read the original abstract
We study two foundational problems in distributed survival analysis under federated differential privacy (FDP): estimation of the Cox regression coefficients and of the cumulative baseline hazard functions, allowing for heterogeneous per-sever sample sizes and privacy budgets. To quantify the fundamental cost of privacy, we derive minimax lower bounds together with upper bounds that match up to poly-logarithmic factors for the regression coefficients, thereby revealing server-level phase transitions between private and non-private regimes. We also consider a relaxed differential privacy framework with partially public information. Our analysis shows that the role of public covariates depends strongly on the privacy model. For cumulative hazard estimation, we propose a private tree-based version of the Breslow estimator for nonparametric integral estimation under FDP. As a by-product, this leads to a private survival function estimator that attains a nearly minimax optimal rate. Numerical experiments, including a real-data application, support the theoretical findings. The proposed methods are implemented in an accompanying R package FDPCox.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript studies estimation of Cox regression coefficients and cumulative baseline hazards in a federated differential privacy (FDP) model that allows heterogeneous per-server sample sizes n_j and privacy budgets ε_j. It derives minimax lower bounds for the coefficients via information-theoretic arguments and constructs matching upper bounds (up to poly-log factors) using private score equations with noise addition, thereby identifying server-level phase transitions between private and non-private regimes. A relaxed FDP model with partially public covariates is analyzed, and a tree-based private Breslow estimator is proposed for the cumulative hazard, yielding a nearly minimax-optimal private survival function estimator. Theoretical results are supported by simulations, a real-data example, and an R package FDPCox.
Significance. If the matching bounds hold, the work provides the first rigorous quantification of privacy costs in federated survival analysis, with explicit phase-transition thresholds that depend on the heterogeneity of n_j and ε_j. The extension to public information and the nonparametric hazard estimator broaden applicability to medical data settings. The reproducible implementation via the R package is a clear strength that supports adoption and verification.
minor comments (3)
- [Abstract] Abstract: the statement that upper bounds match lower bounds 'up to poly-logarithmic factors' would be more informative if the precise order of the logarithmic terms (e.g., log(n) or log(1/ε)) were indicated.
- [§2] §2 (model): the composition rule used to aggregate heterogeneous per-server privacy budgets ε_j into the global FDP guarantee should be stated explicitly, as different composition theorems could affect the phase-transition thresholds.
- [Numerical experiments] Numerical experiments: the figures illustrating the phase transitions would benefit from an additional panel or table that varies the degree of heterogeneity in (n_j, ε_j) to make the server-level effect visually clearer.
Simulated Author's Rebuttal
Thank you for the positive referee report and the recommendation for minor revision. We appreciate the recognition of the contributions regarding the minimax lower and upper bounds for Cox coefficients under federated DP, the identification of server-level phase transitions, the analysis of relaxed FDP with public covariates, and the tree-based private Breslow estimator for cumulative hazards. The support from simulations, real-data example, and the R package FDPCox is also noted. Since no specific major comments were raised, we will proceed with minor revisions to the manuscript as suggested.
Circularity Check
No significant circularity
full rationale
The derivation relies on standard information-theoretic packing arguments for minimax lower bounds on Cox coefficients under heterogeneous federated DP, matched by upper bounds constructed from private score equations plus calibrated noise. Heterogeneity in per-server n_j and epsilon_j is explicitly parameterized in both directions, and phase-transition thresholds arise directly from comparing privacy-adjusted effective sample size to the non-private regime. No load-bearing step reduces by definition or self-citation to the target result; the analysis is self-contained against external benchmarks with no evident self-definitional, fitted-input, or ansatz-smuggling patterns.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Cox proportional hazards model with standard regularity conditions
- domain assumption Federated differential privacy with heterogeneous per-server sample sizes and privacy budgets
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We derive minimax lower bounds together with upper bounds that match up to poly-logarithmic factors for the regression coefficients, thereby revealing server-level phase transitions between private and non-private regimes.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 2 Pith papers
-
General Lower Bounds for Differentially Private Federated Learning with Arbitrary Public-Transcript Interactions
Derives a federated van Trees lower bound under total clientwise sample-level zCDP for parameter estimation with squared l2 loss in federated learning protocols with arbitrary public-transcript interactions.
-
Benchmarking the Utility of Privacy-Preserving Cox Regression Under Data-Driven Clipping Bounds: A Multi-Dataset Simulation Study
At typical differential privacy levels, Cox models lose significance for about 90% of covariates and drop to random predictive performance, with usable results requiring much weaker privacy.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.