A Typed Tensor Language for Federated Learning
Pith reviewed 2026-05-21 06:29 UTC · model grok-4.3
The pith
Federated learning programs in a typed tensor language factor through fixed-dimensional shared state whose size stays independent of client and record counts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Typed one-round programs factor through fixed-dimensional shared state whose size is independent of the number of clients and records, computed from client-local tensor expressions and merged across clients. We also prove a converse representability result: factorizations whose encoders and decoders are expressible in the language are realized by typed one-round programs, and the correspondence extends to iterative programs whose cross-round state is shared. If a per-record loss and its per-record gradient are represented by client-local tensor expressions, the global gradient is represented by record-axis summation of the federated gradient tensor.
What carries the argument
The shared-state factorization theory that reduces typed programs over federated and shared tensors to encode-merge-decode procedures with fixed shared dimension.
If this is right
- Global gradient for learning equals the record-axis sum of the per-record federated gradients.
- Server-side gradient descent and shared linear-algebra second-order updates become typed iterative programs.
- Any factorization into encoders and decoders expressible in the language corresponds exactly to a typed one-round program.
- All communication in these programs passes only through fixed-dimensional shared state.
- The framework characterizes the class of federated computations whose communication cost is bounded independently of data volume and client number.
Where Pith is reading between the lines
- The factorization supplies an explicit recipe for rewriting many existing federated algorithms so their communication size is guaranteed to stay constant.
- Similar typing disciplines could be applied to other partitioned-data settings such as secure aggregation or privacy-preserving analytics.
- Tooling built on the type system might automatically detect and enforce the fixed-shared-state property during algorithm design.
- Empirical checks on standard federated benchmarks could measure how often real models fit inside the language while preserving the constant-size property.
Load-bearing premise
The semantics can be defined by comparison with a virtual global tensor used only as reference and that all relevant federated computations are expressible as client-local tensor expressions whose per-record loss and gradient are representable in the language.
What would settle it
A concrete federated computation expressible in the language whose minimal shared state dimension grows with the number of clients or records, or whose required cross-round state for iterative programs depends on client count.
Figures
read the original abstract
Federated learning and analytics are often described as collections of separate protocols, even when they share the same mathematical form: client-local tensor computation, mergeable aggregation into shared state, and shared-only post-processing. We introduce a typed tensor language that formalizes this structure. The language distinguishes federated tensors, whose records are partitioned across clients along a tracked record axis, from shared tensors, which are available globally. Its semantics are defined by comparison with a virtual global tensor, used only as a reference object. The main result is a shared-state factorization theory. We show that typed one-round programs factor through fixed-dimensional shared state whose size is independent of the number of clients and records, computed from client-local tensor expressions and merged across clients. We also prove a converse representability result; factorizations whose encoders and decoders are expressible in the language are realized by typed one-round programs, and the correspondence extends to iterative programs whose cross-round state is shared. This gives a formal account of the computations in the language that can be expressed as encode, merge, and decode procedures. We then develop a differentiable fragment for learning. If a per-record loss and its per-record gradient are represented by client-local tensor expressions, the global gradient is represented by record-axis summation of the federated gradient tensor. This yields typed iterative programs for server-side gradient descent and shared-linear-algebra second-order updates. The framework characterizes a broad class of federated learning computations whose communication passes through fixed-dimensional shared state.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces a typed tensor language for federated learning that distinguishes federated tensors (with records partitioned across clients along a tracked record axis) from shared tensors. Semantics are defined by reference to a virtual global tensor used only as a comparison object. The central result is a shared-state factorization theory: typed one-round programs factor through fixed-dimensional shared state (size independent of client and record counts) obtained from client-local tensor expressions merged across clients; a converse representability result holds for factorizations whose encoders/decoders are expressible in the language. The correspondence extends to iterative programs with shared cross-round state. A differentiable fragment is developed in which per-record losses and gradients represented by client-local expressions yield global gradients via record-axis summation, supporting typed programs for server-side gradient descent and shared-linear-algebra second-order methods. The framework characterizes federated computations whose communication passes through fixed-dimensional shared state.
Significance. If the factorization and representability results hold, the work supplies a formal account of the encode-merge-decode structure common to many federated protocols and isolates the class of computations whose communication cost is bounded by fixed shared-state dimension. This could support systematic design and verification of communication-efficient algorithms. The explicit treatment of the differentiable fragment and its connection to iterative optimization is a concrete strength; the paper also supplies formal proofs for the factorization theory and differentiable fragment.
major comments (2)
- [§3] §3 (Semantics): The definition of semantics by reference to a virtual global tensor does not yet include an explicit invariant showing that no well-typed client-local expression can produce mergeable state whose dimension or content depends on per-client record counts or client-specific index alignments. Without such an invariant, the fixed-dimensional claim in the factorization theorem (Theorem 4.1) remains at risk for expressions involving local normalization or record-axis metadata.
- [Theorem 4.1] Theorem 4.1 and its proof: The factorization through fixed-dimensional shared state is stated for one-round programs, but the argument relies on the type system preventing record-axis dependencies; the provided sketch does not enumerate the critical typing rules (e.g., for contraction or normalization) that enforce this, leaving open whether the independence holds for all well-typed programs.
minor comments (2)
- [§2] Notation for the record axis is introduced in §2 but used inconsistently in later figures; a single diagram clarifying the distinction between federated and shared tensors would improve readability.
- [Abstract] The abstract mentions that proofs exist for the factorization theory and differentiable fragment; the main text should indicate whether these proofs are machine-checked or include a brief outline of the induction strategy.
Simulated Author's Rebuttal
We thank the referee for the careful reading and for highlighting the potential of the factorization results. We address the two major comments below and will revise the manuscript to incorporate explicit invariants and an expanded proof sketch.
read point-by-point responses
-
Referee: [§3] §3 (Semantics): The definition of semantics by reference to a virtual global tensor does not yet include an explicit invariant showing that no well-typed client-local expression can produce mergeable state whose dimension or content depends on per-client record counts or client-specific index alignments. Without such an invariant, the fixed-dimensional claim in the factorization theorem (Theorem 4.1) remains at risk for expressions involving local normalization or record-axis metadata.
Authors: We agree that an explicit invariant would strengthen the argument. In the revised manuscript we will add a lemma in §3 establishing that every well-typed client-local expression produces mergeable state whose dimension is independent of per-client record counts and free of client-specific index alignments. The lemma follows directly from the typing rules that isolate the record axis and forbid operations that could embed per-client metadata into shared tensors. We will also include a short paragraph clarifying that local normalization is performed strictly along the record axis within each client’s tensor and therefore cannot affect the dimension or content of the mergeable state. revision: yes
-
Referee: [Theorem 4.1] Theorem 4.1 and its proof: The factorization through fixed-dimensional shared state is stated for one-round programs, but the argument relies on the type system preventing record-axis dependencies; the provided sketch does not enumerate the critical typing rules (e.g., for contraction or normalization) that enforce this, leaving open whether the independence holds for all well-typed programs.
Authors: The proof of Theorem 4.1 proceeds by induction on program structure and relies on the type system to guarantee that no record-axis dependency enters the shared state. To make this explicit, we will expand the proof sketch in the revised version to list the key typing rules that enforce the property, including the rules for contraction (which preserve record-axis independence) and normalization (which are confined to local axes). This enumeration will confirm that the fixed-dimensional factorization holds for every well-typed one-round program. revision: yes
Circularity Check
No significant circularity; derivation self-contained via language definition and reference semantics
full rationale
The paper introduces a new typed tensor language distinguishing federated and shared tensors, with semantics defined by comparison to a virtual global tensor serving only as a reference object. The shared-state factorization theorem for one-round programs, the converse representability result, and the extension to iterative programs are established directly from the type system, operational rules, and client-local tensor expressions. The differentiable fragment derives the global gradient via record-axis summation of per-record expressions without reducing to fitted parameters or prior self-citations. All load-bearing steps rely on the formal definitions and proofs internal to the language rather than circular reductions to inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Semantics are defined by comparison with a virtual global tensor used only as a reference object.
invented entities (2)
-
Federated tensor (with tracked record axis)
no independent evidence
-
Shared tensor
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
typed one-round programs factor through fixed-dimensional shared state whose size is independent of the number of clients and records, computed from client-local tensor expressions and merged across clients
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
If a per-record loss and its per-record gradient are represented by client-local tensor expressions, the global gradient is represented by record-axis summation of the federated gradient tensor
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Agarwal, Graham Cormode, Zengfeng Huang, Jeff M
Pankaj K. Agarwal, Graham Cormode, Zengfeng Huang, Jeff M. Phillips, Zhewei Wei, and Ke Yi. Mergeable summaries. ACM Transactions on Database Systems (TODS), 38 0 (4): 0 23--34, 2013
work page 2013
-
[2]
Domain-specific tensor languages
Jean-Philippe Bernardy and Patrik Jansson. Domain-specific tensor languages. Journal of Functional Programming, 35: 0 e9, 2025
work page 2025
-
[3]
Flower: A Friendly Federated Learning Research Framework
Daniel J. Beutel, Taner Topal, Akhil Mathur, Xinchi Qiu, Javier Fernandez-Marques, Yan Gao, Lorenzo Sani, Kwing Hei Li, Titouan Parcollet, and Nicholas D. Lane Pedro Porto Buarque de Gusm \ A G o. Flower: a friendly federated learning research framework. arXiv preprint arXiv:2007.14390, 2020
work page internal anchor Pith review arXiv 2007
-
[4]
An introduction to federated computation
Akash Bharadwaj and Graham Cormode. An introduction to federated computation. In International Conference on Management of Data (SIGMOD), pages 2448--2451, 2022
work page 2022
-
[5]
System DS : a declarative machine learning system for the end-to-end data science lifecycle
Matthias Boehm, Iulian Antonov, Sebastian Baunsgaard, Mark Dokter, Robert Ginth \"o r, Kevin Innerebner, Florijan Klezin, Stefanie Lindstaedt, Arnab Phani, Benjamin Rath, Berthold Reinwald, Shafaq Siddiqui, and Sebastian Benjamin Wrede. System DS : a declarative machine learning system for the end-to-end data science lifecycle. In Conference on Innovative...
work page 2020
-
[6]
Kallista A. Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chlo \'e Kiddon, Jakub Kone c n \'y , Stefano Mazzocchi, H. Brendan McMahan, Timon Van Overveldt, David Petrou, Daniel Ramage, and Jason Roselander. Towards federated learning at scale: system design. Proceedings of Machine Learning and Systems (MLSys),...
work page 2019
-
[7]
Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth
Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H. Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. Practical secure aggregation for privacy-preserving machine learning. In ACM SIGSAC Conference on Computer and Communications Security, pages 1175--1191, 2017
work page 2017
-
[8]
California consumer privacy act (ccpa)
Rob Bonta. California consumer privacy act (ccpa). State of California Department of Justice, 2022. URL https://oag.ca.gov/privacy/ccpa
work page 2022
-
[9]
Federated data distribution shift estimation
Graham Cormode and Daniel Ting. Federated data distribution shift estimation. Proceedings of the VLDB Endowment, 18 0 (8): 0 2399--2412, 2025
work page 2025
-
[10]
FBFL : a field-based coordination approach for data heterogeneity in federated learning
Davide Domini, Gianluca Aguzzi, Lukas Esterle, and Mirko Viroli. FBFL : a field-based coordination approach for data heterogeneity in federated learning. Logical Methods in Computer Science, 22 0 (1): 0 19:1--19:30, 2026
work page 2026
-
[11]
The algorithmic foundations of differential privacy
Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9 0 (3-4): 0 211--487, 2014
work page 2014
-
[12]
Calibrating noise to sensitivity in private data analysis
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography, pages 265--284, 2006
work page 2006
-
[13]
A pragmatic introduction to secure multi-party computation
David Evans, Vladimir Kolesnikov, and Mike Rosulek. A pragmatic introduction to secure multi-party computation. Foundations and Trends in Privacy and Security, 2 0 (2-3): 0 70--246, 2018
work page 2018
-
[14]
What is data ethics? Philosophical Transactions of the Royal Society A, 374 0 (2083), 2016
Luciano Floridi and Mariarosaria Taddeo. What is data ethics? Philosophical Transactions of the Royal Society A, 374 0 (2083), 2016
work page 2083
-
[15]
Patrick Foley, Micah J. Sheller, Brandon Edwards, Sarthak Pati, Walter Riviera, Mansi Sharma, Prakash Narayana Moorthy, Shih han Wang, Jason Martin, Parsa Mirhaji, Prashant Shah, and Spyridon Bakas. Open FL : the open federated learning library. Physics in Medicine & Biology, 67 0 (21): 0 214001, 2022
work page 2022
-
[16]
Oded Goldreich, Silvio Micali, and Avi Wigderson. How to play any mental game, or a completeness theorem for protocols with honest majority, pages 307--328. Association for Computing Machinery, 2019
work page 2019
-
[17]
Data C ube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals
Jim Gray, Surajit Chaudhuri, Adam Bosworth, Andrew Layman, Don Reichart, Murali Venkatrao, Frank Pellow, and Hamid Pirahesh. Data C ube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Mining and Knowledge Discovery, 1 0 (1): 0 29--53, 1997
work page 1997
-
[18]
Peter Kairouz and H. Brendan McMahan. Advances and open problems in federated learning. Foundations and Trends in Machine Learning, 14 0 (1-2): 0 1--210, 2021
work page 2021
-
[19]
Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith
Shiva Prasad Kasiviswanathan, Homin K. Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. What can we learn privately? SIAM Journal on Computing, 40 0 (3): 0 793--826, 2011
work page 2011
-
[20]
MP-SPDZ : a versatile framework for multi-party computation
Marcel Keller. MP-SPDZ : a versatile framework for multi-party computation. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, pages 1575--1590, 2020
work page 2020
-
[21]
Static analysis of shape in T ensor F low programs
Sifis Lagouvardos, Julian Dolby, Neville Grech, Anastasios Antoniadis, and Yannis Smaragdakis. Static analysis of shape in T ensor F low programs. In European Conference on Object-Oriented Programming (ECOOP), volume 166 of Leibniz International Proceedings in Informatics (LIPIcs), pages 15:1--15:29, 2020
work page 2020
-
[22]
Federated optimization in heterogeneous networks
Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, and Virginia Smith. Federated optimization in heterogeneous networks. Proceedings of Machine Learning and Systems (MLSys), 2: 0 429--450, 2020
work page 2020
-
[23]
Fed BN : federated learning on non- IID features via local batch normalization
Xiaoxiao Li, Meirui JIANG, Xiaofei Zhang, Michael Kamp, and Qi Dou. Fed BN : federated learning on non- IID features via local batch normalization. In International Conference on Learning Representations (ICLR), 2021
work page 2021
-
[24]
Federated learning for open banking
Guodong Long, Yue Tan, Jing Jiang, and Chengqi Zhang. Federated learning for open banking. In Federated learning: privacy and incentive, pages 240--254. Springer International Publishing, 2020
work page 2020
-
[25]
Johnson, and Dimitrios Vytiniotis
Dougal Maclaurin, Alexey Radul, Matthew J. Johnson, and Dimitrios Vytiniotis. Dex: array programming with typed indices. In Program Transformations for Machine Learning Workshop at NeurIPS 2019, 2019
work page 2019
-
[26]
Communication-efficient learning of deep networks from decentralized data
Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Ag \"u era y Arcas. Communication-efficient learning of deep networks from decentralized data. In International Conference on Artificial Intelligence and Statistics (AISTATS), pages 1273--1282, 2017
work page 2017
-
[27]
Dinh C. Nguyen, Ming Ding, Pubudu N. Pathirana, Aruna Seneviratne, Jun Li, and H. Vincent Poor. Federated learning for internet of things: a comprehensive survey. IEEE Communications Surveys & Tutorials, 23 0 (3): 0 1622--1658, 2021
work page 2021
-
[28]
Solmaz Niknam, Harpreet S. Dhillon, and Jeffrey H. Reed. Federated learning for wireless communications: motivation, opportunities, and challenges. IEEE Communications Magazine, 58 0 (6): 0 46--51, 2020
work page 2020
-
[29]
Rajasekhara Babu, Sweta Bhattacharya, Praveen Kumar Reddy Maddikunta, Spyridon Mastorakis, Md
Sharnil Pandya, Gautam Srivastava, Rutvij Jhaveri, M. Rajasekhara Babu, Sweta Bhattacharya, Praveen Kumar Reddy Maddikunta, Spyridon Mastorakis, Md. Jalil Piran, and Thippa Reddy Gadekallu. Federated learning for smart cities: a comprehensive survey. Sustainable Energy Technologies and Assessments, 55: 0 102987, 2023
work page 2023
-
[30]
Roth, Shadi Albarqouni, Spyridon Bakas, Mathieu N
Nicola Rieke, Jonny Hancox, Wenqi Li, Fausto Milletar\` i , Holger R. Roth, Shadi Albarqouni, Spyridon Bakas, Mathieu N. Galtier, Bennett A. Landman, Klaus Maier-Hein, S\' e bastien Ourselin, Micah Sheller, Ronald M. Summers, Andrew Trask, Daguang Xu, Maximilian Baust, and M. Jorge Cardoso. The future of digital health with federated learning. NPJ Digital...
work page 2020
-
[31]
Holger R. Roth, Yan Cheng, Yuhong Wen, Isaac Yang, Ziyue Xu, Yuan-Ting Hsieh, Kristopher Kersten, Ahmed Harouni, Can Zhao, Kevin Lu, Zhihong Zhang, Wenqi Li, Andriy Myronenko, Dong Yang, Sean Yang, Nicola Rieke, Abood Quraini, Chester Chen, Daguang Xu, Nic Ma, Prerna Dogra, Mona Flores, and Andrew Feng. NVIDIA FLARE : federated learning from simulation to...
-
[32]
Holger R. Roth, Ziyue Xu, Yuan-Ting Hsieh, Adithya Renduchintala, Isaac Te-Chung Yang, Zhihong Zhang, Yuhong Wen, Sean Yang, Kevin Lu, Kristopher Kersten, Camir Ricketts, Daguang Xu, Chester Chen, Yan Cheng, and Andrew Feng. Empowering federated learning for massive models with NVIDIA FLARE . In Federated learning systems: towards privacy-preserving distr...
work page 2025
-
[33]
Dr JAX : scalable and differentiable M ap R educe primitives in JAX
J Keith Rush, Zachary Charles, Zachary Garrett, Sean Augenstein, and Nicole Elyse Mitchell. Dr JAX : scalable and differentiable M ap R educe primitives in JAX . In 2nd Workshop on Advancing Neural Network Training: Computational Efficiency, Scalability, and Resource Optimization (WANT@ICML 2024), 2024 a
work page 2024
-
[34]
Federated automatic differentiation
Keith Rush, Zachary Charles, and Zachary Garrett. Federated automatic differentiation. Journal of Machine Learning Research, 25: 0 1--39, 2024 b
work page 2024
-
[35]
TensorFlow . Tensorflow federated, 2021. URL https://www.tensorflow.org/federated/get_started. Official documentation for TensorFlow Federated and Federated Core; last updated 2021-04-22; accessed 2026-04-23
work page 2021
-
[36]
Tensor F low F ederated: machine learning on decentralized data, 2019
The TensorFlow Federated Authors . Tensor F low F ederated: machine learning on decentralized data, 2019. URL https://www.tensorflow.org/federated
work page 2019
-
[37]
The EU general data protection regulation ( GDPR ): a practical guide
Paul Voigt and Axel Von dem Bussche. The EU general data protection regulation ( GDPR ): a practical guide . Springer, 2017
work page 2017
-
[38]
Federated analytics: opportunities and challenges
Dan Wang, Siping Shi, Yifei Zhu, and Zhu Han. Federated analytics: opportunities and challenges. IEEE Network, 36 0 (1): 0 151--158, 2022
work page 2022
-
[39]
GRACEFUL : a learned cost estimator for UDFs
Johannes Wehrstein, Tiemo Bang, Roman Heinrich, and Carsten Binnig. GRACEFUL : a learned cost estimator for UDFs . In International Conference on Data Engineering (ICDE), pages 2450--2463, 2025
work page 2025
-
[40]
Kaiqiang Xu, Di Chai, Junxue Zhang, Fan Lai, and Kai Chen. Sequoia: an accessible and extensible framework for privacy-preserving machine learning over distributed data. Proceedings of the ACM on Management of Data, 3 0 (1): 0 1--27, 2025
work page 2025
-
[41]
Wide feedforward or recurrent neural networks of any architecture are G aussian processes
Greg Yang. Wide feedforward or recurrent neural networks of any architecture are G aussian processes. In Advances in Neural Information Processing Systems (NeurIPS), volume 32, 2019
work page 2019
-
[42]
Greg Yang and Edward J. Hu. Tensor programs IV : feature learning in infinite-width neural networks. In International Conference on Machine Learning (ICML), pages 11727--11737, 2021
work page 2021
-
[43]
Andrew C. Yao. Protocols for secure computations. In 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982), pages 160--164, 1982
work page 1982
-
[44]
Tensor relational algebra for distributed machine learning system design
Binhang Yuan, Dimitrije Jankov, Jia Zou, Yuxin Tang, Daniel Bourgeois, and Chris Jermaine. Tensor relational algebra for distributed machine learning system design. Proceedings of the VLDB Endowment, 14 0 (8): 0 1338--1350, 2021
work page 2021
-
[45]
Py S yft: a library for easy federated learning
Alexander Ziller, Andrew Trask, Antonio Lopardo, Benjamin Szymkow, Bobby Wagner, Emma Bluemke, Jean-Mickael Nounahon, Jonathan Passerat-Palmbach, Kritika Prakash, Nick Rose, , Th \'e o Ryffel, Zarreen Naowal Reza, and Georgios Kaissis. Py S yft: a library for easy federated learning. In Federated learning systems: towards next-generation AI , pages 111--1...
work page 2021
-
[46]
The age of surveillance capitalism
Shoshana Zuboff. The age of surveillance capitalism. In Social theory re-wired, pages 203--213. Routledge, 2023
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.