Optimal Cox regression under federated differential privacy: coefficients and cumulative hazards

Elly K. H. Hung; Yi Yu

arxiv: 2508.19640 · v2 · submitted 2025-08-27 · 🧮 math.ST · stat.ME· stat.TH

Optimal Cox regression under federated differential privacy: coefficients and cumulative hazards

Elly K. H. Hung , Yi Yu This is my paper

Pith reviewed 2026-05-18 21:34 UTC · model grok-4.3

classification 🧮 math.ST stat.MEstat.TH

keywords Cox regressionfederated differential privacyminimax boundssurvival analysisBreslow estimatorcumulative hazardphase transitions

0 comments

The pith

Cox regression coefficients achieve matching minimax rates under federated differential privacy with server-level phase transitions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives matching minimax lower and upper bounds for estimating regression coefficients in the Cox model under federated differential privacy with varying server sample sizes and privacy budgets. These bounds reveal phase transitions where the rates behave differently in private versus non-private regimes depending on the number of servers. It also introduces a tree-based private Breslow estimator for the cumulative baseline hazard that yields a nearly optimal private survival function estimator. A relaxed privacy model with some public information is analyzed, showing that the benefit of public covariates varies with the privacy setup. These findings clarify the privacy-utility tradeoff in distributed survival data analysis.

Core claim

We derive minimax lower bounds together with upper bounds that match up to poly-logarithmic factors for the regression coefficients, thereby revealing server-level phase transitions between private and non-private regimes. We also consider a relaxed differential privacy framework with partially public information. For cumulative hazard estimation, we propose a private tree-based version of the Breslow estimator for nonparametric integral estimation under FDP. As a by-product, this leads to a private survival function estimator that attains a nearly minimax optimal rate.

What carries the argument

Minimax lower and upper bounds on estimation error for Cox regression coefficients under federated differential privacy with heterogeneous per-server sample sizes and privacy budgets, paired with a tree-based private Breslow estimator for the cumulative baseline hazard.

Load-bearing premise

The standard Cox proportional hazards model holds with the usual regularity conditions on the data-generating process and the federated differential privacy model accurately captures the heterogeneous per-server sample sizes and privacy budgets.

What would settle it

Vary the number of servers while holding total sample size and privacy budgets fixed, then check whether the observed estimation error rates for the coefficients transition exactly as predicted between the private and non-private regimes.

read the original abstract

We study two foundational problems in distributed survival analysis under federated differential privacy (FDP): estimation of the Cox regression coefficients and of the cumulative baseline hazard functions, allowing for heterogeneous per-sever sample sizes and privacy budgets. To quantify the fundamental cost of privacy, we derive minimax lower bounds together with upper bounds that match up to poly-logarithmic factors for the regression coefficients, thereby revealing server-level phase transitions between private and non-private regimes. We also consider a relaxed differential privacy framework with partially public information. Our analysis shows that the role of public covariates depends strongly on the privacy model. For cumulative hazard estimation, we propose a private tree-based version of the Breslow estimator for nonparametric integral estimation under FDP. As a by-product, this leads to a private survival function estimator that attains a nearly minimax optimal rate. Numerical experiments, including a real-data application, support the theoretical findings. The proposed methods are implemented in an accompanying R package FDPCox.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives matching minimax bounds for Cox coefficients under heterogeneous federated DP that show server-level phase transitions, plus a tree-based private Breslow estimator for cumulative hazards.

read the letter

This paper pins down the privacy cost for federated Cox models. It derives minimax lower bounds for the regression coefficients that match the upper bounds up to poly-log factors, and these bounds highlight server-level phase transitions between private and non-private regimes. The work also handles heterogeneous sample sizes and privacy budgets across servers, which is a realistic touch. They extend to a relaxed DP setting with partial public information and show how that changes things depending on the privacy model. For the cumulative hazard, they propose a private tree-based Breslow estimator that achieves nearly optimal rates for the survival function as well. The accompanying R package and numerical experiments, including real data, back up the theory. The derivations use standard information-theoretic packing for the lowers and careful noise addition for the uppers, with no obvious gaps in the logic from what is described. One limitation is the reliance on the standard Cox model assumptions, like proportional hazards holding exactly. In real medical data that often doesn't, so the phase transitions might shift. The poly-log factors leave some room for tightening, but that's common in these analyses. This is for researchers in statistical privacy and survival analysis. Anyone working on federated medical data analysis would get value from the explicit rates and the estimator. It deserves a serious referee because the contributions are grounded and the matching bounds are a clear step forward in the subfield. I recommend putting it through peer review.

Referee Report

0 major / 3 minor

Summary. The manuscript studies estimation of Cox regression coefficients and cumulative baseline hazards in a federated differential privacy (FDP) model that allows heterogeneous per-server sample sizes n_j and privacy budgets ε_j. It derives minimax lower bounds for the coefficients via information-theoretic arguments and constructs matching upper bounds (up to poly-log factors) using private score equations with noise addition, thereby identifying server-level phase transitions between private and non-private regimes. A relaxed FDP model with partially public covariates is analyzed, and a tree-based private Breslow estimator is proposed for the cumulative hazard, yielding a nearly minimax-optimal private survival function estimator. Theoretical results are supported by simulations, a real-data example, and an R package FDPCox.

Significance. If the matching bounds hold, the work provides the first rigorous quantification of privacy costs in federated survival analysis, with explicit phase-transition thresholds that depend on the heterogeneity of n_j and ε_j. The extension to public information and the nonparametric hazard estimator broaden applicability to medical data settings. The reproducible implementation via the R package is a clear strength that supports adoption and verification.

minor comments (3)

[Abstract] Abstract: the statement that upper bounds match lower bounds 'up to poly-logarithmic factors' would be more informative if the precise order of the logarithmic terms (e.g., log(n) or log(1/ε)) were indicated.
[§2] §2 (model): the composition rule used to aggregate heterogeneous per-server privacy budgets ε_j into the global FDP guarantee should be stated explicitly, as different composition theorems could affect the phase-transition thresholds.
[Numerical experiments] Numerical experiments: the figures illustrating the phase transitions would benefit from an additional panel or table that varies the degree of heterogeneity in (n_j, ε_j) to make the server-level effect visually clearer.

Simulated Author's Rebuttal

0 responses · 0 unresolved

Thank you for the positive referee report and the recommendation for minor revision. We appreciate the recognition of the contributions regarding the minimax lower and upper bounds for Cox coefficients under federated DP, the identification of server-level phase transitions, the analysis of relaxed FDP with public covariates, and the tree-based private Breslow estimator for cumulative hazards. The support from simulations, real-data example, and the R package FDPCox is also noted. Since no specific major comments were raised, we will proceed with minor revisions to the manuscript as suggested.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The derivation relies on standard information-theoretic packing arguments for minimax lower bounds on Cox coefficients under heterogeneous federated DP, matched by upper bounds constructed from private score equations plus calibrated noise. Heterogeneity in per-server n_j and epsilon_j is explicitly parameterized in both directions, and phase-transition thresholds arise directly from comparing privacy-adjusted effective sample size to the non-private regime. No load-bearing step reduces by definition or self-citation to the target result; the analysis is self-contained against external benchmarks with no evident self-definitional, fitted-input, or ansatz-smuggling patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Relies on standard Cox model assumptions and federated DP definitions without introducing new free parameters or invented entities; no ad-hoc fitting mentioned.

axioms (2)

domain assumption Cox proportional hazards model with standard regularity conditions
Invoked as the foundational model for regression coefficient estimation and hazard functions.
domain assumption Federated differential privacy with heterogeneous per-server sample sizes and privacy budgets
Central modeling choice for the distributed setting and phase transition analysis.

pith-pipeline@v0.9.0 · 5698 in / 1261 out tokens · 44917 ms · 2026-05-18T21:34:21.805728+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We derive minimax lower bounds together with upper bounds that match up to poly-logarithmic factors for the regression coefficients, thereby revealing server-level phase transitions between private and non-private regimes.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

General Lower Bounds for Differentially Private Federated Learning with Arbitrary Public-Transcript Interactions
cs.LG 2026-05 unverdicted novelty 8.0

Derives a federated van Trees lower bound under total clientwise sample-level zCDP for parameter estimation with squared l2 loss in federated learning protocols with arbitrary public-transcript interactions.
Benchmarking the Utility of Privacy-Preserving Cox Regression Under Data-Driven Clipping Bounds: A Multi-Dataset Simulation Study
cs.CR 2026-04 accept novelty 4.0

At typical differential privacy levels, Cox models lose significance for about 90% of covariates and drop to random predictive performance, with usable results requiring much weaker privacy.