Data Collaboration Analysis with Orthonormal Basis Selection and Alignment
Pith reviewed 2026-05-24 03:34 UTC · model grok-4.3
The pith
Enforcing orthonormal bases turns data collaboration alignment into a closed-form Orthogonal Procrustes solution that makes performance invariant to the target basis.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By selecting orthonormal secret and target bases, the resulting change-of-basis matrices achieve orthogonal concordance: all parties' representations are aligned up to a shared orthogonal transform. This renders downstream performance invariant to the target basis. Alignment reduces to the Orthogonal Procrustes problem and admits a closed-form solution that lowers complexity from O(min{a(cl)^2,a^2cl}) to O(acl^2).
What carries the argument
Orthonormal Data Collaboration (ODC) that forces orthonormal secret and target bases so that alignment becomes the Orthogonal Procrustes problem and yields orthogonal concordance.
If this is right
- Alignment cost drops from quadratic to linear in the product of party count, common dimension, and local dimension.
- Empirical wall-clock speedups reach 100 times on standard benchmarks while accuracy stays equal or improves.
- One-round communication and the original privacy assumptions of data collaboration remain intact.
- Downstream model performance no longer depends on the particular choice of target basis.
Where Pith is reading between the lines
- The invariance property could let practitioners pick the numerically most stable orthonormal target basis without accuracy trade-offs.
- The same orthonormal reduction might apply to other multi-party linear-projection schemes that currently solve alignment iteratively.
- Because the method is a drop-in replacement, existing data-collaboration codebases can adopt it with minimal refactoring.
Load-bearing premise
Forcing orthonormality on the bases still spans the common subspace and leaves the original linear-projection semantics, information content, and privacy properties unchanged.
What would settle it
An experiment in which the same downstream model is trained on ODC-aligned data using two different orthonormal target bases and accuracy differs by more than numerical precision, or a dataset where the closed-form Procrustes solution fails to produce exact orthogonal concordance.
Figures
read the original abstract
Data Collaboration (DC) enables multiple parties to jointly train a model by sharing only linear projections of their private datasets. The core challenge in DC is to align the bases of these projections without revealing each party's secret basis. While existing theory suggests that any target basis spanning the common subspace should suffice, in practice, the choice of basis can substantially affect both accuracy and numerical stability. We introduce Orthonormal Data Collaboration (ODC), which enforces orthonormal secret and target bases, thereby reducing alignment to the classical Orthogonal Procrustes problem, which admits a closed-form solution. We prove that the resulting change-of-basis matrices achieve orthogonal concordance, aligning all parties' representations up to a shared orthogonal transform and rendering downstream performance invariant to the target basis. Computationally, ODC reduces the alignment complexity from O(min{a(cl)^2,a^2cl}) to O(acl^2), and empirical evaluations show up to 100 times speedups with equal or better accuracy across benchmarks. ODC preserves DC's one-round communication pattern and privacy assumptions, providing a simple and efficient drop-in improvement to existing DC pipelines.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Orthonormal Data Collaboration (ODC) as an enhancement to standard Data Collaboration (DC). In DC, parties share only linear projections of private data and must align bases without revealing secret bases. ODC enforces orthonormal secret and target bases, reducing the alignment step to the classical Orthogonal Procrustes problem (closed-form SVD solution). The central claim is a proof that the resulting change-of-basis matrices achieve orthogonal concordance: all parties' representations become aligned up to a shared orthogonal transform, rendering downstream performance invariant to target-basis choice. The method preserves the original one-round communication pattern and privacy model, reduces alignment complexity from O(min{a(cl)^2,a^2 cl}) to O(acl^2), and reports empirical speedups up to 100x with equal or better accuracy on benchmarks.
Significance. If the proof of orthogonal concordance holds, the result supplies a theoretically grounded, drop-in improvement that directly resolves the practical sensitivity of DC to basis selection while adding no communication or privacy overhead. The reduction to a standard, parameter-free problem (Orthogonal Procrustes) and the explicit complexity improvement are clear strengths; the empirical speedups and accuracy parity across benchmarks further support utility. The work credits the classical Procrustes literature and maintains the one-round privacy assumptions of prior DC papers.
Simulated Author's Rebuttal
We thank the referee for their positive review and recommendation to accept. The referee's summary accurately reflects the contributions of Orthonormal Data Collaboration (ODC), including the reduction of alignment to the Orthogonal Procrustes problem, the orthogonal concordance property, complexity reduction, and preservation of the original privacy model.
Circularity Check
No significant circularity; derivation self-contained
full rationale
The paper proves that orthonormal secret and target bases reduce alignment to the classical Orthogonal Procrustes problem (an external, standard result in linear algebra) and yield change-of-basis matrices achieving orthogonal concordance. No equations or claims in the provided material reduce the result by construction to a fitted parameter, self-citation chain, or renamed input. The appeal to Procrustes is not load-bearing self-citation, and the one-round privacy model plus spanning-property preservation are stated without internal reduction to the target claim. The derivation is therefore independent of its own outputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
- [1]
-
[2]
B. McMahan, E. Moore, D. Ramage, S. Hampson, B. A. y Arcas, Communication-efficient learning of deep networks from decentralized data, in: Artificial Intelligence and Statistics, PMLR, 2017, pp. 1273–1282
work page 2017
-
[3]
C. Dwork, Differential privacy: A survey of results, in: International Conference on Theory and Applications of Models of Computation, Springer, 2008, pp. 1–19
work page 2008
-
[4]
K. Wei, J. Li, M. Ding, C. Ma, H. H. Yang, F. Farokhi, S. Jin, T. Q. Quek, H. V . Poor, Federated learning with differential privacy: Algorithms and performance analysis, IEEE transactions on information forensics and security 15 (2020) 3454–3469
work page 2020
- [5]
-
[6]
A. Imakura, T. Sakurai, Data collaboration analysis framework using centralization of individual intermediate representations for distributed data sets, ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part A: Civil Engineering 6 (2) (2020) 04020018
work page 2020
-
[7]
A. Imakura, X. Ye, T. Sakurai, Collaborative data analysis: Non-model sharing-type machine learning for dis- tributed data, in: Knowledge Management and Acquisition for Intelligent Systems: 17th Pacific Rim Knowledge Acquisition Workshop, PKAW 2020, Yokohama, Japan, January 7–8, 2021, Proceedings 17, Springer, 2021, pp. 14–29
work page 2020
-
[8]
A. Imakura, A. Bogdanova, T. Yamazoe, K. Omote, T. Sakurai, Accuracy and privacy evaluations of collabora- tive data analysis, Proceedings of the AAAI Conference on Artificial Intelligence (2021)
work page 2021
-
[9]
A. Imakura, T. Sakurai, Y . Okada, T. Fujii, T. Sakamoto, H. Abe, Non-readily identifiable data collaboration analysis for multiple datasets including personal information, Information Fusion 98 (2023) 101826
work page 2023
-
[10]
H. Yamashiro, K. Omote, A. Imakura, T. Sakurai, Toward the application of differential privacy to data collabo- ration, IEEE Access PP (2024) 1–1.doi:10.1109/ACCESS.2024.3396146. 41
-
[11]
A. Imakura, T. Sakurai, Feddcl: a federated data collaboration learning as a hybrid-type privacy-preserving framework based on federated learning and data collaboration, arXiv preprint arXiv:2409.18356 (2024)
-
[12]
Y . Kawakami, Y . Takano, A. Imakura, New solutions based on the generalized eigenvalue problem for the data collaboration analysis, arXiv preprint arXiv:2404.14164 (2024)
-
[13]
K. Nosaka, A. Yoshise, Creating collaborative data representations using matrix manifold optimal computation and automated hyperparameter tuning, in: 2023 IEEE 3rd International Conference on Electronic Communica- tions, Internet of Things and Big Data (ICEIB), IEEE, 2023, pp. 180–185
work page 2023
-
[14]
P. H. Schönemann, A generalized solution of the orthogonal procrustes problem, Psychometrika 31 (1) (1966) 1–10
work page 1966
-
[15]
R. Penrose, A generalized inverse for matrices, Proceedings of the Cambridge Philosophical Society 51 (1955) 406–413
work page 1955
-
[16]
A. Mizoguchi, A. Imakura, T. Sakurai, Application of data collaboration analysis to distributed data with mis- aligned features, Informatics in Medicine Unlocked 32 (2022) 101013
work page 2022
-
[17]
A. Mizoguchi, A. Bogdanova, A. Imakura, T. Sakurai, Data collaboration analysis applied to compound datasets and the introduction of projection data to non-iid settings (2023)
work page 2023
-
[18]
T. Nakayama, Y . Kawamata, A. Toyoda, A. Imakura, R. Kagawa, M. Sanuki, R. Tsunoda, K. Yamagata, T. Saku- rai, Y . Okada, Data collaboration for causal inference from limited medical testing and medication data (2025). arXiv:2501.06511. URLhttps://arxiv.org/abs/2501.06511
-
[19]
Y . Kawamata, R. Motai, Y . Okada, A. Imakura, T. Sakurai, Collaborative causal inference on distributed data, Expert Systems with Applications 244 (2024) 123024.doi:https://doi.org/10.1016/j.eswa.2023. 123024. URLhttps://www.sciencedirect.com/science/article/pii/S0957417423035261
-
[20]
A. Bogdanova, A. Imakura, T. Sakurai, Dc-shap method for consistent explainability in privacy-preserving dis- tributed machine learning, Human-Centric Intelligent Systems 3 (3) (2023) 197–210
work page 2023
-
[21]
A. Imakura, R. Tsunoda, R. Kagawa, K. Yamagata, T. Sakurai, Dc-cox: Data collaboration cox proportional hazards model for privacy-preserving survival analysis on multiple parties, Journal of Biomedical Informatics 137 (2023) 104264
work page 2023
-
[22]
A. Imakura, H. Inaba, Y . Okada, T. Sakurai, Interpretable collaborative data analysis on distributed data, Expert Systems with Applications 177 (2021) 114891.doi:https://doi.org/10.1016/j.eswa.2021.114891. URLhttps://www.sciencedirect.com/science/article/pii/S0957417421003328
- [23]
-
[24]
Z. Liu, P. Luo, X. Wang, X. Tang, Deep learning face attributes in the wild, in: Proceedings of International Conference on Computer Vision (ICCV), 2015
work page 2015
- [25]
-
[26]
J. V . Haxby, J. S. Guntupalli, A. C. Connolly, Y . O. Halchenko, B. R. Conroy, M. I. Gobbini, M. Hanke, P. J. Ramadge, A common, high-dimensional model of the representational space in human ventral temporal cortex, Neuron 72 (2) (2011) 404–416.doi:10.1016/j.neuron.2011.08.026
-
[27]
A. Lorbert, P. J. Ramadge, Kernel hyperalignment, in: Advances in Neural Information Processing Systems 25, 2012, pp. 1799–1807. 42
work page 2012
-
[28]
S. Ling, Near-optimal bounds for generalized orthogonal procrustes problem via generalized power method, Applied and Computational Harmonic Analysis 66 (2023) 62–100
work page 2023
-
[29]
F. Nie, L. Tian, X. Li, Multiview clustering via adaptively weighted procrustes, in: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2018, pp. 2022–2030. doi:10.1145/3219819.3220049
-
[30]
X. Dong, D. Wu, F. Nie, R. Wang, X. Li, Multi-view clustering with adaptive procrustes on grassmann manifold, Information Sciences 609 (2022) 855–875.doi:10.1016/j.ins.2022.07.089
-
[31]
C. Wang, S. Mahadevan, Manifold alignment using procrustes analysis, in: Proceedings of the 25th International Conference on Machine Learning, 2008, pp. 1120–1127
work page 2008
- [32]
-
[33]
X. Peng, G. Chen, C. Lin, M. Stevenson, Highly efficient knowledge graph embedding learning with orthogonal procrustes analysis, in: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021, pp. 2364–2375
work page 2021
-
[34]
R. Iakymchuk, D. Defour, C. Collange, S. Graillat, Reproducible and accurate matrix multiplication, in: Inter- national Symposium on Scientific Computing, Computer Arithmetic, and Validated Numerics, Springer, 2015, pp. 126–137
work page 2015
-
[35]
P.-G. Martinsson, G. Quintana OrtI, N. Heavner, R. Van De Geijn, Householder qr factorization with random- ization for column pivoting (hqrrp), SIAM Journal on Scientific Computing 39 (2) (2017) C96–C115
work page 2017
-
[36]
L. Wang, G. Libert, P. Manneback, Kalman filter algorithm based on singular value decomposition, in: [1992] Proceedings of the 31st IEEE Conference on Decision and Control, IEEE, 1992, pp. 1224–1229
work page 1992
-
[37]
R. Mahfoudhi, A fast triangular matrix inversion, in: Proceedings of the World Congress on Engineering, V ol. 1, 2012
work page 2012
-
[38]
K. Chen, L. Liu, Geometric data perturbation for privacy preserving outsourced data mining, Knowledge and information systems 29 (3) (2011) 657–695
work page 2011
- [39]
-
[40]
B. Becker, R. Kohavi, Adult, UCI Machine Learning Repository, DOI: https://doi.org/10.24432/C5XW20 (1996)
-
[41]
T. J. Pollard, A. E. Johnson, J. D. Raffa, L. A. Celi, R. G. Mark, O. Badawi, The eicu collaborative research database, a freely available multi-center database for critical care research, Scientific data 5 (1) (2018) 1–13
work page 2018
-
[42]
B. Balle, Y .-X. Wang, Improving the gaussian mechanism for differential privacy: Analytical calibration and optimal denoising, in: International Conference on Machine Learning, PMLR, 2018, pp. 394–403
work page 2018
-
[43]
C. Xu, F. Cheng, L. Chen, Z. Du, W. Li, G. Liu, P. W. Lee, Y . Tang, In silico prediction of chemical ames mutagenicity, Journal of chemical information and modeling 52 (11) (2012) 2840–2847
work page 2012
-
[44]
A. Mayr, G. Klambauer, T. Unterthiner, S. Hochreiter, Deeptox: toxicity prediction using deep learning, Fron- tiers in Environmental Science 3 (2016) 80
work page 2016
-
[45]
Z. Wu, B. Ramsundar, E. N. Feinberg, J. Gomes, C. Geniesse, A. S. Pappu, K. Leswing, V . Pande, Moleculenet: a benchmark for molecular machine learning, Chemical science 9 (2) (2018) 513–530. 43
work page 2018
- [46]
-
[47]
F. Schroff, D. Kalenichenko, J. Philbin, Facenet: A unified embedding for face recognition and clustering, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 815–823. doi:10.1109/CVPR.2015.7298682
-
[48]
C. Szegedy, S. Ioffe, V . Vanhoucke, A. A. Alemi, Inception-v4, inception-resnet and the impact of residual connections on learning, in: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI), 2017, pp. 4278–4284
work page 2017
-
[49]
Q. Cao, L. Shen, W. Xie, O. M. Parkhi, A. Zisserman, Vggface2: A dataset for recognising faces across pose and age, in: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG), 2018, pp. 67–74.doi:10.1109/FG.2018.00020
-
[50]
L. Deng, The mnist database of handwritten digit images for machine learning research [best of the web], IEEE signal processing magazine 29 (6) (2012) 141–142
work page 2012
-
[51]
H. Xiao, K. Rasul, R. V ollgraf, Fashion-mnist: a novel image dataset for benchmarking machine learning algo- rithms, arXiv preprint arXiv:1708.07747 (2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[52]
Y . Wang, J. Xiao, T. O. Suzek, J. Zhang, J. Wang, S. H. Bryant, Pubchem: a public information system for analyzing bioactivities of small molecules, Nucleic acids research 37 (suppl_2) (2009) W623–W633. 44
work page 2009
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.