Privacy-preserving federated tensor decomposition of single-cell immune data: recovering multicellular programs across institutions
Pith reviewed 2026-06-26 05:58 UTC · model grok-4.3
The pith
Federated tensor decomposition recovers multicellular programs across institutions by merging local subspaces with stacked SVD after global-mean centering.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A coordinator merges local program subspaces by stacked SVD under federated global-mean centering, and this merge is provably equivalent up to truncation to the centralized decomposition while conferring robustness to site-label confounding, allowing accurate recovery of multicellular programs such as the interferon response without any site sharing cells.
What carries the argument
Stacked SVD merge of local subspaces after federated global-mean centering, which aligns the federated result with the global structure.
If this is right
- Recovers the canonical interferon program with ISG enrichment AUC 0.998 and case-control separation 0.958 on the SLE atlas across institution and ancestry partitions.
- Achieves subspace correlation 0.989 on three real COVID-19 sites and exact recovery (correlation 1.000) when no site observes all cell types.
- On the interstitial-lung-disease atlas the recovered program predicts disease with AUC 0.96 versus 0.91 for the best single cell type, and the advantage survives federation.
- Secure aggregation reduces membership-inference attack AUC from 0.91 to 0.61.
Where Pith is reading between the lines
- The same centering step could be tested on other tensor or matrix decompositions in genomics to handle batch effects without data pooling.
- Extending the approach to longitudinal or spatial single-cell datasets might allow recovery of dynamic multicellular programs across sites.
- The privacy gain from sharing only subspaces suggests direct applicability to other high-dimensional biological traits governed by institutional data silos.
Load-bearing premise
Local program subspaces computed independently at each site contain sufficient information for the stacked SVD merge with global-mean centering to recover the global multicellular programs without material loss.
What would settle it
On the 261-donor SLE atlas, a direct run of the federated estimator yields an interferon program AUC outside the reported bootstrap interval of [-0.004, +0.012] relative to the centralized result.
Figures
read the original abstract
Tensor decomposition of donor $\times$ cell-type $\times$ gene single-cell data recovers \emph{multicellular programs}: coordinated axes of inter-individual transcriptional variation that span cell types and stratify disease. Yet immune single-cell atlases are increasingly multi-institution, multi-ancestry, and governed, so patient cells often cannot be pooled. We present a federated estimator: each site computes a local program subspace, and a coordinator merges these by stacked SVD under federated global-mean centering, provably equivalent (up to truncation) to the centralised decomposition. This centering makes the merge robust to site-label confounding (program AUC $0.957$ vs.\ $0.861$ for naive per-site centering). Only program subspaces leave a site, and aggregation is compatible with secure aggregation. On a 261-donor systemic lupus erythematosus atlas it recovers the canonical interferon program (ISG enrichment AUC $0.998$; case--control separation $0.958$; bootstrap $\Delta\text{AUC}=-0.000$, 95\% CI $[-0.004,+0.012]$ vs.\ centralised), across institution-scale and multi-ancestry partitions, and across three \emph{real} COVID-19 sites (subspace correlation $0.989$). It recovers the program when \emph{no site observes all cell types} (correlation $1.000$, exact by construction), which fixed-feature federated PCA cannot. On an interstitial-lung-disease atlas the recovered program predicts disease better than the best single cell type (AUC $0.96$ vs.\ $0.91$; gap 95\% CI excludes zero) and the advantage survives federation; a liver cohort is consistent ($p=0.005$). Membership-inference shows secure aggregation cuts attack AUC from $0.91$ to $0.61$. The method enables cross-institution, cross-ancestry recovery of multicellular immune programs without sharing cells.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces a federated estimator for tensor decomposition of donor × cell-type × gene single-cell immune data. Each institution computes a local program subspace; a coordinator merges these via stacked SVD after federated global-mean centering. The method is claimed to be provably equivalent (up to truncation) to the centralized decomposition, robust to site-label confounding, and able to recover multicellular programs even when no site observes all cell types. Validation on a 261-donor SLE atlas, COVID-19 sites, an ILD atlas, and a liver cohort reports high AUCs (e.g., ISG enrichment 0.998, case-control 0.958), subspace correlations (0.989), bootstrap CIs overlapping centralized results, and exact recovery (correlation 1.000) in the incomplete cell-type case; secure aggregation reduces membership-inference attack AUC from 0.91 to 0.61.
Significance. If the equivalence and recovery claims hold, the work enables privacy-preserving cross-institution recovery of multicellular immune programs without pooling cells, addressing a key barrier in multi-ancestry and multi-site single-cell atlases. Strengths include the explicit robustness to site confounding via global-mean centering, the exact-by-construction recovery when cell-type coverage is incomplete (a case where fixed-feature federated PCA fails), and compatibility with secure aggregation. The empirical results on real disease cohorts (SLE, COVID, ILD) with metrics comparable to centralized baselines support practical utility.
minor comments (3)
- [Abstract] Abstract: the phrase 'provably equivalent (up to truncation)' would benefit from a parenthetical note on the truncation rank or the precise condition under which equivalence holds, to aid readers who encounter only the abstract.
- [Results] The bootstrap procedure for the ΔAUC confidence interval is referenced but the number of replicates and resampling unit (donors vs. cells) are not stated in the provided summary; add these details in the methods for reproducibility.
- [Methods] Figure legends or methods should explicitly state the rank chosen for the tensor decomposition and how it was selected, as this directly affects the 'up to truncation' equivalence claim.
Simulated Author's Rebuttal
We thank the referee for the positive summary, recognition of the method's significance for multi-institution immune atlases, and the recommendation of minor revision. No specific major comments or requested changes were provided in the report.
Circularity Check
No significant circularity
full rationale
The central claim is a mathematical equivalence (up to truncation) between the federated stacked-SVD merge under global-mean centering and the centralized tensor decomposition, supported by an explicit construction, a centering step that addresses site confounding, and direct empirical validation against centralized baselines (subspace correlations, AUCs within bootstrap CI, exact recovery when cell-type coverage is incomplete). No load-bearing step reduces by definition or by self-citation to the target result; the equivalence argument is presented as provable rather than fitted, and performance metrics are reported against external centralized references. The paper is therefore self-contained against its own benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Coordinated, multicellular patterns of transcriptional variation that stratify patient cohorts are revealed by tensor decomposition,
J. Mitchel, M. G. Gordon, R. K. Perez, E. Biederstedt, R. Bueno, C. J. Ye, and P. V . Kharchenko, “Coordinated, multicellular patterns of transcriptional variation that stratify patient cohorts are revealed by tensor decomposition,”Nature Biotechnology, vol. 43, pp. 1192–1201, 2025
2025
-
[2]
Context-aware deconvolution of cell–cell communication with tensor-cell2cell,
E. Armingol, H. M. Baghdassarian, C. Martino, A. Perez-Lopez, C. Aamodt, R. Knight, and N. E. Lewis, “Context-aware deconvolution of cell–cell communication with tensor-cell2cell,”Nature Communica- tions, vol. 13, p. 3665, 2022
2022
-
[3]
Integrative, high-resolution analysis of single-cell gene expression across experimental conditions with parafac2-rise,
A. Chenet al., “Integrative, high-resolution analysis of single-cell gene expression across experimental conditions with parafac2-rise,”Cell Systems, 2025, complete author list before submission
2025
-
[4]
The human cell atlas,
A. Regev, S. A. Teichmann, E. S. Lander, I. Amit, C. Benoist, E. Birney, B. Bodenmiller, P. Campbell, P. Carninci, M. Clatworthyet al., “The human cell atlas,”eLife, vol. 6, p. e27041, 2017
2017
-
[5]
The future of digital health with federated learning,
N. Rieke, J. Hancox, W. Li, F. Milletar `ı, H. R. Roth, S. Albarqouni, S. Bakas, M. N. Galtier, B. A. Landman, K. Maier-Hein, S. Ourselin, M. Sheller, R. M. Summers, A. Trask, D. Xu, M. Baust, and M. J. Cardoso, “The future of digital health with federated learning,”npj Digital Medicine, vol. 3, p. 119, 2020
2020
-
[6]
Feder- ated learning in medicine: facilitating multi-institutional collaborations without sharing patient data,
M. J. Sheller, B. Edwards, G. A. Reina, J. Martin, S. Pati, A. Kotrotsou, M. Milchenko, W. Xu, D. Marcus, R. R. Colen, and S. Bakas, “Feder- ated learning in medicine: facilitating multi-institutional collaborations without sharing patient data,”Scientific Reports, vol. 10, p. 12598, 2020
2020
-
[7]
Communication-efficient learning of deep networks from decentralized data,
H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. Ag ¨uera y Arcas, “Communication-efficient learning of deep networks from decentralized data,” inProc. 20th Int. Conf. on Artificial Intelligence and Statistics (AISTATS), PMLR 54, 2017, pp. 1273–1282. [Online]. Available: https://proceedings.mlr.press/v54/mcmahan17a.html
2017
-
[8]
Advances and open problems in federated learning,
P. Kairouz, H. B. McMahan, B. Avent, A. Belletet al., “Advances and open problems in federated learning,”Foundations and Trends in Machine Learning, vol. 14, no. 1–2, pp. 1–210, 2021
2021
-
[9]
Fed- scgen: privacy-preserving federated batch effect correction of single-cell rna sequencing data,
M. Bakhtiari, S. Bonn, F. Theis, O. Zolotareva, and J. Baumbach, “Fed- scgen: privacy-preserving federated batch effect correction of single-cell rna sequencing data,”Genome Biology, 2025
2025
-
[10]
Privacy-preserving federated neural network learning for disease-associated cell classification,
S. Sav, J.-P. Bossuat, J. R. Troncoso-Pastoriza, M. Claassen, and J.- P. Hubaux, “Privacy-preserving federated neural network learning for disease-associated cell classification,”Patterns, vol. 3, no. 5, p. 100487, 2022
2022
-
[11]
Secure and federated quantitative trait loci mapping with privateqtl,
others, “Secure and federated quantitative trait loci mapping with privateqtl,”Cell Genomics, 2025, real (PMID 39947138, Cell Genomics 2025, PII S2666-979X(25)00025-4); complete author list + exact DOI at submission
2025
-
[12]
Practical secure aggregation for privacy-preserving machine learning,
K. Bonawitz, V . Ivanov, B. Kreuter, A. Marcedone, H. B. McMahan, S. Patel, D. Ramage, A. Segal, and K. Seth, “Practical secure aggregation for privacy-preserving machine learning,” inProc. 2017 ACM SIGSAC Conf. on Computer and Communications Security (CCS), 2017, pp. 1175–1191
2017
-
[13]
Distributed estimation of principal eigenspaces,
J. Fan, D. Wang, K. Wang, and Z. Zhu, “Distributed estimation of principal eigenspaces,”Annals of Statistics, vol. 47, no. 6, pp. 3009– 3031, 2019
2019
-
[14]
Federated principal component analysis,
A. Grammenos, R. Mendoza-Smith, J. Crowcroft, and C. Mascolo, “Federated principal component analysis,” inAdvances in Neural Information Processing Systems (NeurIPS) 33, 2020. [Online]. Available: https://proceedings.neurips.cc/paper/2020/hash/ 47a658229eb2368a99f1d032c8848542-Abstract.html 9
2020
-
[15]
Membership inference attacks against machine learning models,
R. Shokri, M. Stronati, C. Song, and V . Shmatikov, “Membership inference attacks against machine learning models,” inProc. 2017 IEEE Symposium on Security and Privacy (S&P), 2017, pp. 3–18
2017
-
[16]
The algorithmic foundations of differential privacy,
C. Dwork and A. Roth, “The algorithmic foundations of differential privacy,”Foundations and Trends in Theoretical Computer Science, vol. 9, no. 3–4, pp. 211–487, 2014
2014
-
[17]
Mofa+: a statistical framework for comprehensive integration of multi-modal single-cell data,
R. Argelaguet, D. Arnol, D. Bredikhin, Y . Deloro, B. Velten, J. C. Mar- ioni, and O. Stegle, “Mofa+: a statistical framework for comprehensive integration of multi-modal single-cell data,”Genome Biology, vol. 21, p. 111, 2020
2020
-
[18]
DIALOGUE maps multicellular pro- grams in tissue from single-cell or spatial transcriptomics data,
L. Jerby-Arnon and A. Regev, “DIALOGUE maps multicellular pro- grams in tissue from single-cell or spatial transcriptomics data,”Nature Biotechnology, vol. 40, pp. 1467–1477, 2022
2022
-
[19]
Toward a privacy-preserving predictive foundation model of single-cell transcriptomics with federated learning and tabular modeling,
J. Wanget al., “Toward a privacy-preserving predictive foundation model of single-cell transcriptomics with federated learning and tabular modeling,”bioRxiv, 2025, preprint; complete/verify author list before submission
2025
-
[20]
The noisy power method: A meta algorithm with applications,
M. Hardt and E. Price, “The noisy power method: A meta algorithm with applications,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 27, 2014, pp. 2861–2869
2014
-
[21]
DP-PCA: Statistically optimal and differentially private PCA,
X. Liu, W. Kong, P. Jain, and S. Oh, “DP-PCA: Statistically optimal and differentially private PCA,” inAdvances in Neural Information Processing Systems (NeurIPS), 2022, arXiv:2205.13709
arXiv 2022
-
[22]
Analyze Gauss: Optimal bounds for privacy-preserving principal component analysis,
C. Dwork, K. Talwar, A. Thakurta, and L. Zhang, “Analyze Gauss: Optimal bounds for privacy-preserving principal component analysis,” inProc. 46th Annual ACM Symposium on Theory of Computing (STOC), 2014, pp. 11–20
2014
-
[23]
A near-optimal algorithm for differentially-private principal components,
K. Chaudhuri, A. D. Sarwate, and K. Sinha, “A near-optimal algorithm for differentially-private principal components,”Journal of Machine Learning Research, vol. 14, pp. 2905–2943, 2013. [Online]. Available: https://jmlr.org/papers/v14/chaudhuri13a.html
2013
-
[24]
Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays,
N. Homer, S. Szelinger, M. Redman, D. Duggan, W. Tembe, J. Muehling, J. V . Pearson, D. A. Stephan, S. F. Nelson, and D. W. Craig, “Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays,” PLoS Genetics, vol. 4, no. 8, p. e1000167, 2008
2008
-
[25]
Routes for breaching and protecting genetic privacy,
Y . Erlich and A. Narayanan, “Routes for breaching and protecting genetic privacy,”Nature Reviews Genetics, vol. 15, no. 6, pp. 409–421, 2014
2014
-
[26]
Private information leakage from single-cell count matrices,
C. R. Walker, X. Li, M. Chakravarthy, W. Lounsbery-Scaife, Y . A. Choi, R. Singh, and G. G ¨ursoy, “Private information leakage from single-cell count matrices,”Cell, 2024, pMID 39362221; DOI 10.1016/j.cell.2024.09.012
-
[27]
muscat detects subpopulation- specific state transitions from multi-sample multi-condition single-cell transcriptomics data,
H. L. Crowell, C. Soneson, P.-L. Germain, D. Calini, L. Collin, C. Ra- poso, D. Malhotra, and M. D. Robinson, “muscat detects subpopulation- specific state transitions from multi-sample multi-condition single-cell transcriptomics data,”Nature Communications, vol. 11, p. 6077, 2020
2020
-
[28]
Confronting false dis- coveries in single-cell differential expression,
J. W. Squair, M. Gautier, C. Kathe, M. A. Anderson, N. D. James, T. H. Hutson, R. Hudelle, T. Qaiser, K. J. E. Matson, Q. Barraud, A. J. Levine, G. La Manno, M. A. Skinnider, and G. Courtine, “Confronting false dis- coveries in single-cell differential expression,”Nature Communications, vol. 12, p. 5692, 2021
2021
-
[29]
Tensor decompositions and applications,
T. G. Kolda and B. W. Bader, “Tensor decompositions and applications,” SIAM Review, vol. 51, no. 3, pp. 455–500, 2009
2009
-
[30]
A multilinear singular value decomposition,
L. De Lathauwer, B. De Moor, and J. Vandewalle, “A multilinear singular value decomposition,”SIAM Journal on Matrix Analysis and Applications, vol. 21, no. 4, pp. 1253–1278, 2000
2000
-
[31]
Federated machine learning: Concept and applications,
Q. Yang, Y . Liu, T. Chen, and Y . Tong, “Federated machine learning: Concept and applications,”ACM Transactions on Intelligent Systems and Technology, vol. 10, no. 2, pp. 12:1–12:19, 2019
2019
-
[32]
Fully homomorphic encryption using ideal lattices,
C. Gentry, “Fully homomorphic encryption using ideal lattices,” inProc. 41st Annual ACM Symposium on Theory of Computing (STOC), 2009, pp. 169–178
2009
-
[33]
Homomorphic encryption for arithmetic of approximate numbers,
J. H. Cheon, A. Kim, M. Kim, and Y . Song, “Homomorphic encryption for arithmetic of approximate numbers,” inAdvances in Cryptology – ASIACRYPT 2017, Part I, LNCS 10624, 2017, pp. 409–437
2017
-
[34]
SecureML: A system for scalable privacy- preserving machine learning,
P. Mohassel and Y . Zhang, “SecureML: A system for scalable privacy- preserving machine learning,” inProc. 2017 IEEE Symposium on Security and Privacy (S&P), 2017, pp. 19–38
2017
-
[35]
Membership inference attacks from first principles,
N. Carlini, S. Chien, M. Nasr, S. Song, A. Terzis, and F. Tram `er, “Membership inference attacks from first principles,” inProc. 2022 IEEE Symposium on Security and Privacy (S&P), 2022, pp. 1897–1914
2022
-
[36]
Single- cell rna-seq reveals cell type–specific molecular and genetic associations to lupus,
R. K. Perez, M. G. Gordon, M. Subramaniam, M. C. Kim, G. C. Hartoularos, S. Targ, Y . Sun, A. Ogorodnikov, R. Buenoet al., “Single- cell rna-seq reveals cell type–specific molecular and genetic associations to lupus,”Science, vol. 376, no. 6589, p. eabf1970, 2022
2022
-
[37]
Single-cell multi- omics analysis of the immune response in covid-19,
E. Stephenson, G. Reynolds, R. A. Bottinget al., “Single-cell multi- omics analysis of the immune response in covid-19,”Nature Medicine, vol. 27, pp. 904–916, 2021
2021
-
[38]
Cell-type-resolved genetic variation shapes inflammatory bowel disease risk,
others, “Cell-type-resolved genetic variation shapes inflammatory bowel disease risk,”Nature, 2026, iBDverse atlas (Wellcome Sanger); complete author list at submission
2026
-
[39]
Cell-type-specific and disease- associated expression quantitative trait loci in the human lung,
H. M. Natri, C. B. Del Azodi, L. Peter, C. J. Taylor, S. Chugh, R. Kendle, M.-i. Chung, D. K. Flaherty, B. K. Matlock, C. L. Calvi, T. S. Blackwell, L. B. Ware, M. Bacchetta, R. Walia, C. M. Shaver, J. A. Kropski, D. J. McCarthy, and N. E. Banovich, “Cell-type-specific and disease- associated expression quantitative trait loci in the human lung,”Nature Ge...
-
[40]
Single-cell, single-nucleus, and spatial transcriptomics characterization of the immunological landscape in the healthy and psc human liver,
T. S. Andrews, D. Nakib, C. T. Perciani, X. Z. Ma, L. Liu, E. Winter, D. Camat, S. W. Chung, P. Lumanto, J. Manuel, S. Mangroo, B. Hansen, B. Arpinder, C. Thoeni, B. Sayed, J. Feld, A. Gehring, A. Gulamhusein, G. M. Hirschfield, A. Ricciuto, G. D. Bader, I. D. McGilvray, and S. MacParland, “Single-cell, single-nucleus, and spatial transcriptomics characte...
2024
-
[41]
SCANPY: large-scale single- cell gene expression data analysis,
F. A. Wolf, P. Angerer, and F. J. Theis, “SCANPY: large-scale single- cell gene expression data analysis,”Genome Biology, vol. 19, p. 15, 2018
2018
-
[42]
Integrated analysis of multimodal single-cell data,
Y . Hao, S. Hao, E. Andersen-Nissen, W. M. Mauck III, S. Zheng, A. Butler, M. J. Lee, A. J. Wilk, C. Darby, M. Zageret al., “Integrated analysis of multimodal single-cell data,”Cell, vol. 184, no. 13, pp. 3573– 3587, 2021
2021
-
[43]
Calibrating noise to sensitivity in private data analysis,
C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise to sensitivity in private data analysis,” inTheory of Cryptography (TCC), LNCS 3876, 2006, pp. 265–284
2006
-
[44]
Privately learning subspaces,
V . Singhal and T. Steinke, “Privately learning subspaces,” in Advances in Neural Information Processing Systems (NeurIPS) 34,
-
[45]
Available: https://proceedings.neurips.cc/paper/2021/ hash/09b69adcd7cbae914c6204984097d2da-Abstract.html
[Online]. Available: https://proceedings.neurips.cc/paper/2021/ hash/09b69adcd7cbae914c6204984097d2da-Abstract.html
2021
-
[46]
Concentrated differential privacy: Simplifica- tions, extensions, and lower bounds,
M. Bun and T. Steinke, “Concentrated differential privacy: Simplifica- tions, extensions, and lower bounds,” inTheory of Cryptography (TCC- B), LNCS 9985, 2016, pp. 635–658
2016
-
[47]
Deep learning with differential privacy,
M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Tal- war, and L. Zhang, “Deep learning with differential privacy,” inProc. 2016 ACM SIGSAC Conf. on Computer and Communications Security (CCS), 2016, pp. 308–318
2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.