Private Federated Learning for High-dimensional Time Series

Kejun Chen; Qianqian Zhu

arxiv: 2604.07135 · v1 · submitted 2026-04-08 · 📊 stat.ME

Private Federated Learning for High-dimensional Time Series

Kejun Chen , Qianqian Zhu This is my paper

Pith reviewed 2026-05-10 18:03 UTC · model grok-4.3

classification 📊 stat.ME

keywords private federated learninghigh-dimensional time seriesvector autoregressive modelsdifferential privacylow-rank structurefederated estimationprivacy-utility tradeoffrank selection

0 comments

The pith

A privacy-preserving federated learning framework allows multiple clients to improve high-dimensional vector autoregressive model estimates by sharing a common low-rank structure while keeping data private.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a method for federated learning in high-dimensional time series data that respects privacy constraints. It models each client's dynamics as sharing a low-rank component with sparse individual deviations, then uses a two-stage process to learn the shared part privately and personalize locally. This pooling helps when individual datasets are small, as shown by better accuracy in simulations and real applications like state electricity data and country macro forecasts. The work provides error bounds that quantify how privacy levels affect estimation quality and proves that a ridge-based method can consistently select the rank. Readers should care because it offers a way to leverage distributed data without centralizing sensitive information.

Core claim

Under the assumption of a shared low-rank structure plus sparse client-specific deviations in vector autoregressive models, the two-stage differentially private federated estimator achieves lower estimation error than single-client methods when local sample sizes are limited, with non-asymptotic bounds characterizing the privacy-utility tradeoff and a consistent ridge-type rank selection criterion.

What carries the argument

The two-stage estimation procedure that performs differentially private representation learning on the shared low-rank component across clients and then applies local personalization for the sparse deviations.

Load-bearing premise

Each client's time series dynamics follow a common low-rank structure with only sparse client-specific deviations.

What would settle it

If in simulations or real data where the low-rank plus sparse assumption does not hold, the federated method shows no accuracy improvement over local methods even with small samples, the benefit would be falsified.

Figures

Figures reproduced from arXiv: 2604.07135 by Kejun Chen, Qianqian Zhu.

**Figure 2.** Figure 2: Average Frobenius-norm estimation errors of [PITH_FULL_IMAGE:figures/full_fig_p023_2.png] view at source ↗

**Figure 3.** Figure 3: Heatmaps of average Frobenius-norm estimation errors (upper panel) for [PITH_FULL_IMAGE:figures/full_fig_p024_3.png] view at source ↗

**Figure 4.** Figure 4: Average Frobenius-norm estimation errors for [PITH_FULL_IMAGE:figures/full_fig_p025_4.png] view at source ↗

**Figure 5.** Figure 5: Average Frobenius-norm estimation errors of federated and single-client methods, and [PITH_FULL_IMAGE:figures/full_fig_p026_5.png] view at source ↗

**Figure 6.** Figure 6: Heatmaps of the estimated shared component [PITH_FULL_IMAGE:figures/full_fig_p030_6.png] view at source ↗

**Figure 7.** Figure 7: Heatmaps of the estimated shared component [PITH_FULL_IMAGE:figures/full_fig_p032_7.png] view at source ↗

read the original abstract

In the era of big data, leveraging information from multiple clients while preserving data privacy has emerged as a critical challenge in modern statistical modeling and forecasting. This paper introduces a privacy-preserving federated learning framework for high-dimensional vector autoregressive models, where each client's dynamics are characterized by a common low-rank structure augmented with sparse client-specific deviations. We develop a two-stage estimation procedure that integrates differentially private representation learning for the shared component with local personalization for client-specific adjustments, enabling effective information pooling under selective privacy constraints. Non-asymptotic error bounds are established for both the single-client and federated estimators to characterize the inherent privacy-utility trade-off, and consistency of a ridge-type rank selection criterion is proved. Simulation studies demonstrate that federation substantially improves estimation accuracy when local sample sizes are limited. Two empirical applications to analyzing electricity-economy linkages across U.S. states and conducting multi-task macroeconomic forecasting across countries, highlight the superior predictive accuracy of the proposed method over existing single-client benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a two-stage federated estimator for high-dimensional VARs under differential privacy, with error bounds and rank selection, but the accuracy gains depend on the shared low-rank plus sparse deviations model being recoverable despite the noise.

read the letter

The main thing to know is that this work combines differentially private representation learning for a common low-rank component with local sparse personalization in a federated high-dimensional VAR setting. It supplies non-asymptotic error bounds for the estimators and shows that a ridge-type criterion selects the rank consistently. Simulations indicate federation helps when each client has limited samples, and the authors apply it to electricity-economy data across states and multi-country macro forecasting, where it beats single-client baselines on prediction error.

Referee Report

2 major / 2 minor

Summary. The paper introduces a privacy-preserving federated learning framework for high-dimensional vector autoregressive models, where each client's dynamics follow a common low-rank structure plus sparse client-specific deviations. It proposes a two-stage procedure that first performs differentially private representation learning on the shared low-rank component and then local personalization for client-specific terms. Non-asymptotic error bounds are derived for both single-client and federated estimators to characterize the privacy-utility trade-off, consistency of a ridge-type rank selection criterion is proved, and simulations plus two empirical applications (electricity-economy linkages across U.S. states and multi-country macro forecasting) are used to show that federation improves accuracy when local sample sizes are small.

Significance. If the central results hold, the work provides a theoretically grounded approach to pooling information across distributed time series datasets while respecting differential privacy, which is relevant for sensitive applications in macroeconomics and energy. The non-asymptotic bounds and the rank-selection consistency proof are strengths that go beyond purely empirical federated methods; the simulation evidence of gains under limited local samples is also a concrete contribution.

major comments (2)

[§2] §2 (model): The common low-rank plus sparse client-specific deviation assumption is load-bearing for the claimed federated accuracy gains and for the two-stage procedure to outperform single-client baselines. The non-asymptotic bounds and privacy-utility characterization are derived under this structure; if the true dynamics deviate even moderately or if DP noise prevents reliable extraction of the shared factors in the presence of the sparse terms, the federated estimator cannot be guaranteed to improve upon local estimators. The paper should provide a sensitivity analysis or explicit conditions quantifying how large the sparse deviations can be before recovery fails.
[§4] §4 (theoretical results): The non-asymptotic error bounds for the federated estimator need to explicitly track the additional error induced by the sparse client-specific components when they are estimated after the DP-noisy shared representation is obtained. It is not clear from the stated bounds whether the privacy noise scale interacts with the sparsity level in a way that could inflate the total error beyond the single-client case; without this, the privacy-utility trade-off characterization is incomplete.

minor comments (2)

[Abstract] The term 'selective privacy constraints' appears in the abstract but is not defined or operationalized in the model or procedure sections; add a brief clarification or reference to the specific privacy mechanism used.
[Simulations] In the simulation section, the specific values chosen for the privacy budget, clipping thresholds, and the ridge parameter in rank selection should be reported in a table or appendix to support reproducibility of the reported accuracy gains.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the role of our modeling assumptions and the completeness of the theoretical characterization. We address each major comment below and describe the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [§2] §2 (model): The common low-rank plus sparse client-specific deviation assumption is load-bearing for the claimed federated accuracy gains and for the two-stage procedure to outperform single-client baselines. The non-asymptotic bounds and privacy-utility characterization are derived under this structure; if the true dynamics deviate even moderately or if DP noise prevents reliable extraction of the shared factors in the presence of the sparse terms, the federated estimator cannot be guaranteed to improve upon local estimators. The paper should provide a sensitivity analysis or explicit conditions quantifying how large the sparse deviations can be before recovery fails.

Authors: We agree that the low-rank plus sparse deviation structure is fundamental to the federated gains and to the validity of the two-stage procedure. The non-asymptotic bounds in Theorems 4.1 and 4.2 are derived under Assumption 2.1, which already encodes the sparsity level s of the client-specific deviations; the error terms explicitly depend on s, the local sample size, and the privacy noise, thereby providing implicit conditions on the maximum allowable deviation magnitude before the federated estimator loses its advantage. To address the request for greater transparency, we will add a new subsection (Section 2.3) that derives an explicit threshold on the deviation norm beyond which the shared-component recovery fails with high probability, and we will include additional simulation experiments that vary both the sparsity level and the magnitude of the client-specific terms to illustrate the sensitivity. revision: yes
Referee: [§4] §4 (theoretical results): The non-asymptotic error bounds for the federated estimator need to explicitly track the additional error induced by the sparse client-specific components when they are estimated after the DP-noisy shared representation is obtained. It is not clear from the stated bounds whether the privacy noise scale interacts with the sparsity level in a way that could inflate the total error beyond the single-client case; without this, the privacy-utility trade-off characterization is incomplete.

Authors: The referee correctly notes that the interaction between privacy noise and the sparse-component estimation step merits explicit tracking. In the current proof of Theorem 4.2 the total error is already decomposed into the DP-induced error on the shared representation plus the subsequent sparse-deviation estimation error; the latter term grows with both the privacy noise scale and the sparsity level s because the noisy shared factors serve as the input to the local ridge regression. We will revise the theorem statement and the proof to isolate this interaction term and add a corollary that directly compares the federated bound with the single-client bound, stating the precise regime (in terms of ε, s, and local sample size) under which the federated estimator remains superior. This will make the privacy-utility trade-off fully explicit. revision: yes

Circularity Check

0 steps flagged

No circularity: bounds and consistency proved independently under stated model assumptions

full rationale

The abstract and reader's summary indicate that non-asymptotic error bounds are derived for the estimators and consistency of the ridge-type rank selection is proved separately. No quoted equations or steps reduce a claimed prediction or result to a fitted parameter or self-citation by construction. The common low-rank plus sparse deviation structure is an explicit modeling assumption enabling the two-stage procedure, not a derived output. Simulations and applications provide external checks. This is the normal case of a self-contained derivation.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption of a common low-rank structure plus sparse client-specific deviations for the VAR dynamics, plus the differential privacy mechanism in the first stage; no free parameters are explicitly fitted in the abstract description beyond the rank selection.

free parameters (2)

rank dimension
The dimension of the shared low-rank structure is chosen via the ridge-type criterion whose consistency is proved.
privacy budget
The differential privacy noise level controls the privacy-utility trade-off in the shared representation learning stage.

axioms (1)

domain assumption Each client's high-dimensional vector autoregressive dynamics share a common low-rank structure augmented with sparse client-specific deviations.
This assumption underpins the two-stage estimation that pools information across clients while allowing personalization.

pith-pipeline@v0.9.0 · 5456 in / 1462 out tokens · 71799 ms · 2026-05-10T18:03:17.944706+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

each client’s dynamics are characterized by a common low-rank structure augmented with sparse client-specific deviations... two-stage estimation procedure that integrates differentially private representation learning for the shared component with local personalization
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

non-asymptotic error bounds... consistency of a ridge-type rank selection criterion

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

7 extracted references · 7 canonical work pages

[1]

Agarwal, A., Negahban, S., and Wainwright, M. J. (2012). Noisy Matrix Decomposition via Convex Relaxation: Optimal Rates in High Dimensions. The Annals of Statistics , 40:1171–

work page 2012
[2]

110 Ahking, F. (2014). The Economies of the Great Lakes States. Technical report, Federal Reserve Bank of Chicago, Chicago, IL. Arora, V. and Lieskovsky, J. (2016). Electricity Use as an Indicator of U.S. Economic Activity. Technical report, U.S. Energy Information Administration, Washington, DC. Bai, P., Safikhani, A., and Michailidis, G. (2023). Multipl...

work page arXiv 2014
[3]

Nemes, G. (2010). More Accurate Approximations for the Gamma Function. arXiv preprint arXiv:1003.6020. Phillips, P. W., Castle, D., and Smyth, S. J. (2020). Evidence-based Policy Making: Deter- mining What is Evidence. Heliyon,

work page arXiv 2010
[4]

112 Reinsel, G. C. and Velu, R. P. (1998). Multiple Time Series Modeling With Reduced Ranks. In Multivariate Reduced-Rank Regression: Theory and Applications , pages 113–154. Springer. Shen, Y., Li, J., Cai, J.-F., and Xia, D. (2025). Computationally Eﬀicient and Statistically Optimal Robust High-dimensional Linear Regression. The Annals of Statistics , 5...

work page 1998
[5]

Wainwright, M

Cambridge university press. Wainwright, M. J. (2019). High-dimensional Statistics: A Non-asymptotic Viewpoint , vol- ume

work page 2019
[6]

Wang, D., Zheng, Y., Lian, H., and Li, G

Cambridge university press. Wang, D., Zheng, Y., Lian, H., and Li, G. (2022). High-dimensional Vector Autoregressive Time Series Modeling via Tensor Decomposition. Journal of the American Statistical Association , 117:1338–1356. Wei, K., Cai, J.-F., Chan, T. F., and Leung, S. (2016). Guarantees of Riemannian Optimization for Low Rank Matrix Recovery. SIAM...

work page 2022
[7]

Weyl, H. (1912). Das asymptotische Verteilungsgesetz der Eigenwerte linearer partieller Dif- ferentialgleichungen (mit einer Anwendung auf die Theorie der Hohlraumstrahlung). Math- ematische Annalen, 71:441–479. Xia, Q., Xu, W., and Zhu, L. (2015). Consistently Determining the Number of Factors in Multivariate Volatility Modelling. Statistica Sinica, 25:1...

work page 1912

[1] [1]

Agarwal, A., Negahban, S., and Wainwright, M. J. (2012). Noisy Matrix Decomposition via Convex Relaxation: Optimal Rates in High Dimensions. The Annals of Statistics , 40:1171–

work page 2012

[2] [2]

110 Ahking, F. (2014). The Economies of the Great Lakes States. Technical report, Federal Reserve Bank of Chicago, Chicago, IL. Arora, V. and Lieskovsky, J. (2016). Electricity Use as an Indicator of U.S. Economic Activity. Technical report, U.S. Energy Information Administration, Washington, DC. Bai, P., Safikhani, A., and Michailidis, G. (2023). Multipl...

work page arXiv 2014

[3] [3]

Nemes, G. (2010). More Accurate Approximations for the Gamma Function. arXiv preprint arXiv:1003.6020. Phillips, P. W., Castle, D., and Smyth, S. J. (2020). Evidence-based Policy Making: Deter- mining What is Evidence. Heliyon,

work page arXiv 2010

[4] [4]

112 Reinsel, G. C. and Velu, R. P. (1998). Multiple Time Series Modeling With Reduced Ranks. In Multivariate Reduced-Rank Regression: Theory and Applications , pages 113–154. Springer. Shen, Y., Li, J., Cai, J.-F., and Xia, D. (2025). Computationally Eﬀicient and Statistically Optimal Robust High-dimensional Linear Regression. The Annals of Statistics , 5...

work page 1998

[5] [5]

Wainwright, M

Cambridge university press. Wainwright, M. J. (2019). High-dimensional Statistics: A Non-asymptotic Viewpoint , vol- ume

work page 2019

[6] [6]

Wang, D., Zheng, Y., Lian, H., and Li, G

Cambridge university press. Wang, D., Zheng, Y., Lian, H., and Li, G. (2022). High-dimensional Vector Autoregressive Time Series Modeling via Tensor Decomposition. Journal of the American Statistical Association , 117:1338–1356. Wei, K., Cai, J.-F., Chan, T. F., and Leung, S. (2016). Guarantees of Riemannian Optimization for Low Rank Matrix Recovery. SIAM...

work page 2022

[7] [7]

Weyl, H. (1912). Das asymptotische Verteilungsgesetz der Eigenwerte linearer partieller Dif- ferentialgleichungen (mit einer Anwendung auf die Theorie der Hohlraumstrahlung). Math- ematische Annalen, 71:441–479. Xia, Q., Xu, W., and Zhu, L. (2015). Consistently Determining the Number of Factors in Multivariate Volatility Modelling. Statistica Sinica, 25:1...

work page 1912