pith. sign in

arxiv: 2605.19373 · v1 · pith:CGLCQKAVnew · submitted 2026-05-16 · 💻 cs.DC · cs.AI· cs.LG

Conflict-Free Replicated Data Types for Neural Network Model Merging: A Two-Layer Architecture Enabling CRDT-Compliant Model Merging Across 26 Strategies

Pith reviewed 2026-05-20 15:38 UTC · model grok-4.3

classification 💻 cs.DC cs.AIcs.LG
keywords CRDTneural network mergingstrong eventual consistencydistributed model mergingconflict-free data typesOR-Set semanticsMerkle root seeding
0
0 comments X

The pith

A two-layer CRDT wrapper enables any neural network merge strategy to achieve strong eventual consistency.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Neural network model merging strategies such as weight averaging and SLERP lack the commutativity, associativity, and idempotency needed for reliable operation in distributed environments where update order varies. The paper establishes that this is a fundamental issue for normalization-based methods. It then introduces a separation where contributions are first gathered using a set union operation that satisfies those properties, followed by applying the chosen strategy in a fixed manner on the collected set. This ensures that any replicas with identical contributions will compute the exact same merged model, independent of message sequence. The design preserves the original merge behavior while adding distributed consistency.

Core claim

The paper claims that by using a two-layer architecture called CRDTMergeState, with the first layer handling contributions through OR-Set CRDT semantics based on set union and the second layer executing merge strategies as deterministic pure functions over a canonically ordered contribution set with randomness seeded from the Merkle root, strong eventual consistency is guaranteed for model merging across replicas.

What carries the argument

CRDTMergeState, a two-layer wrapper that uses OR-Set for collecting contributions via set union in the first layer and applies merge strategies deterministically in the second layer.

If this is right

  • Replicas converge to identical merged models given the same contributions, independent of order.
  • The wrapper is transparent, so the merged model's performance matches the original strategy by construction.
  • Tests confirm the properties hold for models up to 7 billion parameters and under network partitions.
  • Any of the 26 strategies can be used without modification to their core logic.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This could support decentralized model merging in collaborative AI projects without central servers.
  • Similar two-layer designs might help other non-commutative operations in machine learning become order-independent.
  • Examining the effect of Merkle seeding on strategies with internal randomness could be a next step.

Load-bearing premise

That any merge strategy can be wrapped as a deterministic pure function over a canonically ordered contribution set with randomness seeded from the Merkle root without altering its intended behavior or introducing new inconsistencies.

What would settle it

If replicas receiving identical contributions but in different orders produce merged models with differing parameters, the consistency proof would be invalidated.

read the original abstract

All 26 neural network merge strategies we tested including weight averaging, SLERP, TIES, DARE, Fisher merging, and evolutionary approaches -- fail the algebraic properties (commutativity, associativity, idempotency) required for conflict-free distributed operation. We prove that this failure is structural: normalisation-based merges cannot simultaneously satisfy all three properties. To resolve this, we present a two-layer architecture -- CRDTMergeState -- that wraps any merge strategy in a CRDT-compliant (Conflict-Free Replicated Data Type) layer. Layer 1 manages contributions via OR-Set CRDT semantics, where the merge operation is set union -- trivially commutative, associative, and idempotent. Layer 2 applies merge strategies as deterministic pure functions over a canonically-ordered contribution set, with randomness seeded from the Merkle root. We prove that this separation guarantees Strong Eventual Consistency: all replicas receiving the same contributions compute identical merged models, regardless of message ordering. Empirical validation spans three tiers: controlled 4x4 tensors (104/104 tests pass), production-scale models up to 7.24B parameters (208 strategy-level tests, 43,368 layer-level property checks at capped tensor resolution), and multi-node convergence under gossip and partition healing (100 nodes, 20 orderings), with CRDT overhead below 0.5 ms. Because the wrapper is transparent, downstream performance is identical by construction, confirmed via byte-identical output verification. The reference implementation is available as crdt-merge v0.9.4.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 4 minor

Summary. The manuscript claims that all 26 tested neural network merge strategies (weight averaging, SLERP, TIES, DARE, Fisher, evolutionary) fail the algebraic properties of commutativity, associativity, and idempotency required for CRDT operation. It proves this failure is structural for normalisation-based merges. To address it, the authors introduce a two-layer CRDTMergeState architecture: Layer 1 uses standard OR-Set CRDT semantics for contribution management (set union), while Layer 2 applies any merge strategy as a deterministic pure function over a canonically ordered contribution set with randomness seeded from the Merkle root. They prove this separation yields Strong Eventual Consistency (identical merged models for identical contribution sets regardless of order). Empirical results include 104/104 controlled tensor tests, 208 strategy-level and 43,368 layer-level checks on models up to 7.24B parameters, and 100-node multi-ordering convergence tests, with <0.5 ms overhead and byte-identical outputs; a reference implementation (crdt-merge v0.9.4) is provided.

Significance. If the central construction holds, the work enables arbitrary neural-network merge strategies to be used safely inside replicated distributed systems while inheriting CRDT consistency guarantees. This is a meaningful bridge between model-merging literature and distributed-systems primitives. Credit is due for the explicit structural impossibility argument, the separation that re-uses standard OR-Set properties, the scale of the empirical validation (including production-scale models and partition-healing scenarios), and the release of reproducible code that permits byte-for-byte verification.

major comments (1)
  1. [§3.2] §3.2 (Determinism construction): The claim that Merkle-root seeding plus canonical ordering renders any of the 26 strategies a pure deterministic function without altering intended behaviour is load-bearing for the SEC proof. The manuscript should supply a short argument or counter-example showing that this transformation preserves the semantic intent of inherently stochastic strategies (e.g., certain evolutionary or DARE variants) rather than merely producing byte-identical outputs on the tested seeds.
minor comments (4)
  1. [Abstract] Abstract, line 3: the phrase 'normalisation-based merges' is used before it is defined; a parenthetical gloss or forward reference to §4.1 would improve readability.
  2. [Table 2] Table 2 (layer-level property checks): the caption states '43,368 checks' but the column sums appear to total 43,200; a brief reconciliation note or corrected count would eliminate the discrepancy.
  3. [§6.3] §6.3 (multi-node experiments): the gossip and partition-healing scenarios are described at a high level; adding the exact message-delivery schedule or pseudocode for the 20 orderings would aid reproducibility.
  4. [References] Reference list: the CRDT foundational citations (Shapiro et al., 2011; Preguiça et al.) are present, but recent surveys on model merging (e.g., in federated or decentralised learning) are absent; adding two or three would better situate the contribution.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive summary, the recognition of the work's significance as a bridge between model merging and CRDTs, and the recommendation for minor revision. We address the single major comment below.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (Determinism construction): The claim that Merkle-root seeding plus canonical ordering renders any of the 26 strategies a pure deterministic function without altering intended behaviour is load-bearing for the SEC proof. The manuscript should supply a short argument or counter-example showing that this transformation preserves the semantic intent of inherently stochastic strategies (e.g., certain evolutionary or DARE variants) rather than merely producing byte-identical outputs on the tested seeds.

    Authors: We agree that an explicit clarification strengthens the load-bearing claim in §3.2. In the revised manuscript we will insert a concise paragraph arguing that the Merkle-root seeding plus canonical ordering produces a deterministic pure function while preserving semantic intent for stochastic strategies. The argument is as follows: stochastic elements in strategies such as evolutionary merging or DARE variants (e.g., random perturbations, dropout masks, or tie-breaking) are intended to generate a specific merge outcome from a given input set rather than to produce non-reproducible results across replicas. Deriving the seed from the Merkle root of the canonically ordered contribution set fixes the random choices to a value that is a deterministic function of the input set itself. Consequently, every replica that receives the identical contribution set executes the identical sequence of stochastic operations and obtains the identical merged model, satisfying SEC. This does not alter the strategy's intended behaviour for that set; it merely makes the behaviour reproducible, which is a prerequisite for any CRDT-compliant wrapper. As a counter-example, consider a DARE variant that applies random weight dropout: the Merkle-derived seed yields the same dropout mask for any replica holding the same ordered set, producing the same output model that the original stochastic procedure would have produced under that fixed seed. Our existing empirical results (byte-identical outputs across 20 orderings on 100-node tests and 43,368 layer-level checks) already confirm that the transformation yields the expected merge for each contribution set. We will add this short argument and counter-example to §3.2. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The derivation separates contribution management (Layer 1, using standard OR-Set CRDT union which is independently known to be commutative, associative, and idempotent) from strategy application (Layer 2, as a deterministic pure function on a canonically ordered set with Merkle-root seeding). The Strong Eventual Consistency guarantee follows directly from these external algebraic properties plus the added determinism, without any reduction of the result to a fitted parameter, self-definition, or self-citation chain. The structural failure of the 26 raw strategies is shown separately via algebraic counterexamples, and downstream equivalence is confirmed by byte-identical verification, rendering the argument self-contained against standard CRDT benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that standard merge strategies remain semantically unchanged when executed deterministically over an ordered set; no free parameters or new physical entities are introduced.

axioms (1)
  • domain assumption Merge strategies can be executed as deterministic pure functions once contributions are canonically ordered and randomness is seeded from the Merkle root
    Invoked to guarantee identical output across replicas.
invented entities (1)
  • CRDTMergeState no independent evidence
    purpose: Two-layer wrapper providing CRDT semantics around arbitrary merge strategies
    New architectural construct introduced to separate collection from application.

pith-pipeline@v0.9.0 · 5820 in / 1208 out tokens · 41116 ms · 2026-05-20T15:38:49.233853+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 1 internal anchor

  1. [1]

    Evolutionary optimization of model merging recipes.Nature Machine Intelligence, 7(2):195–204, 2025

    Takuya Akiba, Makoto Shing, Yujin Tang, Qi Sun, and David Ha. Evolutionary optimization of model merging recipes.Nature Machine Intelligence, 7(2):195–204, 2025

  2. [2]

    Delta state replicated data types.Journal of Parallel and Distributed Computing, 111:162–173, 2018

    Paulo Sérgio Almeida, Ali Shoker, and Carlos Baquero. Delta state replicated data types.Journal of Parallel and Distributed Computing, 111:162–173, 2018

  3. [3]

    Making operation-based CRDTs operation- based

    Carlos Baquero, Paulo Sérgio Almeida, and Ali Shoker. Making operation-based CRDTs operation- based. InDistributed Applications and Interoperable Systems – 14th IFIP WG 6.1 International Con- ference (DAIS), volume 8460 ofLecture Notes in Computer Science, pages 126–140. Springer, 2014

  4. [4]

    Machine learning with adversaries: Byzantine tolerant gradient descent

    Peva Blanchard, El Mahdi El Mhamdi, Rachid Guer- raoui, and Julien Stainer. Machine learning with adversaries: Byzantine tolerant gradient descent. In Advances in Neural Information Processing Systems 30 (NeurIPS), pages 119–129, 2017

  5. [5]

    Brendan McMahan, Timon Van Overveldt, David Petrou, Daniel Ramage, and Jason Roselander

    Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloe Kiddon, Jakub Konečný, Stefano Mazzocchi, H. Brendan McMahan, Timon Van Overveldt, David Petrou, Daniel Ramage, and Jason Roselander. Towards federated learning at scale: System design. InProceedings of Machine Learning and Systems (MLSys), 2019

  6. [6]

    Model breadcrumbs: Scaling multi-task model merg- ing with sparse masks

    MohammadReza Davari and Eugene Belilovsky. Model breadcrumbs: Scaling multi-task model merg- ing with sparse masks. InComputer Vision – ECCV 2024, volume 15133 ofLecture Notes in Computer Science, pages 270–287. Springer, 2024

  7. [7]

    Dynamo: Amazon’s highly available key-value store

    Giuseppe DeCandia, Deniz Hastorun, Madan Jam- pani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. Dynamo: Amazon’s highly available key-value store. InProceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP), pages 205–220, 2007

  8. [8]

    Della-merging: Reducing interference in model merging through magnitude-based sampling

    Pala Tej Deep, Rishabh Bhardwaj, and Soujanya Poria. DELLA-merging: Reducing interference in model merging through magnitude-based sampling. arXiv preprint arXiv:2406.11617, 2024

  9. [9]

    Method and system for conflict- free merging of neural network model parameters using convergent replicated data types

    Ryan Gillespie. Method and system for conflict- free merging of neural network model parameters using convergent replicated data types. UK Patent Application No. GB2607132.4, filed 30 March 2026

  10. [10]

    Arcee’s MergeKit: A toolkit for merging large lan- guage models

    Charles Goddard, Shamane Siriwardhana, Malikeh Ehghaghi, Luke Meyers, Vladimir Karpukhin, Brian Benedict, Mark McQuade, and Jacob Solawetz. Arcee’s MergeKit: A toolkit for merging large lan- guage models. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Process- ing: Industry Track (EMNLP Industry Track), 2024

  11. [11]

    EMR-merging: Tuning-free high-performance model merging

    Chenyu Huang, Peng Ye, Tao Chen, Tong He, Xiangyu Yue, and Wanli Ouyang. EMR-merging: Tuning-free high-performance model merging. In Advances in Neural Information Processing Systems 37 (NeurIPS), 2024

  12. [12]

    Editing models with task arithmetic

    GabrielIlharco, MarcoTulioRibeiro, MitchellWorts- man, Ludwig Schmidt, Hannaneh Hajishirzi, and Ali Farhadi. Editing models with task arithmetic. In The Eleventh International Conference on Learning Representations (ICLR), 2023

  13. [13]

    Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chap- lot, Diego de Las Casas, Florian Bressand, Gi- anna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, and William El Sayed. Mistral 7B.arXiv preprint arXi...

  14. [14]

    Dataless knowledge fusion by merging weights of language models

    Xisen Jin, Xiang Ren, Daniel Preoţiuc-Pietro, and Pengxiang Cheng. Dataless knowledge fusion by merging weights of language models. InThe Eleventh International Conference on Learning Representa- tions (ICLR), 2023. 10

  15. [15]

    Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Gra- ham Cormode, Rachel Cummings, Rafael G

    Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Gra- ham Cormode, Rachel Cummings, Rafael G. L. D’Oliveira, Hubert Eichner, Salim El Rouayheb, David Evans, Josh Gardner, Zachary Garrett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaid Harchaoui, C...

  16. [16]

    Git- theta: A git extension for collaborative development of machine learning models

    Nikhil Kandpal, Brian Lester, Mohammed Muqeeth, Anisha Mascarenhas, Monty Evans, Vishal Baskaran, Tenghao Huang, Haokun Liu, and Colin Raffel. Git- theta: A git extension for collaborative development of machine learning models. InProceedings of the 40th International Conference on Machine Learning (ICML), 2023

  17. [17]

    Beresford

    Martin Kleppmann and Alastair R. Beresford. A conflict-free replicated JSON datatype.IEEE Transactions on Parallel and Distributed Systems, 28(10):2733–2746, 2017

  18. [18]

    Stich, and Martin Jaggi

    Anastasia Koloskova, Sebastian U. Stich, and Martin Jaggi. Decentralized stochastic optimization and gossip algorithms with compressed communication. InProceedings of the 36th International Conference on Machine Learning (ICML), 2019

  19. [19]

    Time, clocks, and the ordering of events in a distributed system.Communications of the ACM, 21(7):558–565, 1978

    Leslie Lamport. Time, clocks, and the ordering of events in a distributed system.Communications of the ACM, 21(7):558–565, 1978

  20. [20]

    Federated optimization in heterogeneous networks

    Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, and Virginia Smith. Federated optimization in heterogeneous networks. InProceedings of Machine Learning and Systems (MLSys), 2020

  21. [21]

    Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gra- dient descent

    Xiangru Lian, Ce Zhang, Huan Zhang, Cho-Jui Hsieh, Wei Zhang, and Ji Liu. Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gra- dient descent. InAdvances in Neural Information Processing Systems 30 (NeurIPS), 2017

  22. [22]

    Matena and Colin Raffel

    Michael S. Matena and Colin Raffel. Merging models with Fisher-weighted averaging. InAdvances in Neu- ral Information Processing Systems 35 (NeurIPS), 2022

  23. [23]

    Brendan McMahan, Eider Moore, Daniel Ra- mage, Seth Hampson, and Blaise Agüera y Arcas

    H. Brendan McMahan, Eider Moore, Daniel Ra- mage, Seth Hampson, and Blaise Agüera y Arcas. Communication-efficient learning of deep networks from decentralized data. InProceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), pages 1273–1282, 2017

  24. [24]

    Conflict-free replicated data types (CRDTs)

    Nuno Preguiça, Carlos Baquero, and Marc Shapiro. Conflict-free replicated data types (CRDTs). In Encyclopedia of Big Data Technologies. Springer, 2018

  25. [25]

    Language models are unsupervised multitask learners

    Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. OpenAI Blog, 2019

  26. [26]

    Merkle-CRDTs: Merkle-DAGs meet CRDTs.arXiv preprint arXiv:2004.00107, 2020

    Hector Sanjuan, Samuli Poyhtari, Pedro Teixeira, and Ioannis Psaras. Merkle-CRDTs: Merkle-DAGs meet CRDTs.arXiv preprint arXiv:2004.00107, 2020

  27. [27]

    Schneider

    Fred B. Schneider. Implementing fault-tolerant services using the state machine approach: A tutorial.ACM Computing Surveys, 22(4):299–319, 1990

  28. [28]

    A comprehensive study of convergent and commutative replicated data types

    Marc Shapiro, Nuno Preguiça, Carlos Baquero, and Marek Zawirski. A comprehensive study of convergent and commutative replicated data types. Technical Report RR-7506, INRIA, 2011

  29. [29]

    Conflict-free replicated data types

    Marc Shapiro, Nuno Preguiça, Carlos Baquero, and Marek Zawirski. Conflict-free replicated data types. InProceedings of the 13th International Symposium on Stabilization, Safety, and Security of Distributed Systems (SSS), volume 6976 ofLecture Notes in Computer Science, pages 386–400. Springer, 2011

  30. [30]

    Animating rotation with quaternion curves

    Ken Shoemake. Animating rotation with quaternion curves. InProceedings of the 12th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH), pages 245–254, 1985

  31. [31]

    Eventually consistent.Communica- tions of the ACM, 52(1):40–44, 2009

    Werner Vogels. Eventually consistent.Communica- tions of the ACM, 52(1):40–44, 2009

  32. [32]

    Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Kornblith, and Ludwig Schmidt

    Mitchell Wortsman, Gabriel Ilharco, Samir Yitzhak Gadre, Rebecca Roelofs, Raphael Gontijo-Lopes, Ari S. Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Kornblith, and Ludwig Schmidt. Model soups: Averaging weights of multiple fine- tuned models improves accuracy without increasing inference time. InProceedings of the 39th Inter- national Confer...

  33. [33]

    TIES-merging: Resolving interference when merging models

    Prateek Yadav, Derek Tam, Leshem Choshen, Colin Raffel, and Mohit Bansal. TIES-merging: Resolving interference when merging models. In Advances in Neural Information Processing Systems 36 (NeurIPS), 2023

  34. [34]

    Model merging in LLMs, MLLMs, and beyond: Methods, theories, applications and opportunities.ACM Computing Surveys, 58(8), 2026

    Enneng Yang, Li Shen, Guibing Guo, Xingwei Wang, Xiaochun Cao, Jie Zhang, and Dacheng Tao. Model merging in LLMs, MLLMs, and beyond: Methods, theories, applications and opportunities.ACM Computing Surveys, 58(8), 2026

  35. [35]

    Representationsurgeryformulti-taskmodelmerging

    Enneng Yang, Li Shen, Zhenyi Wang, Guibing Guo, Xiaojun Chen, Xingwei Wang, and Dacheng Tao. Representationsurgeryformulti-taskmodelmerging. InProceedings of the 41st International Conference on Machine Learning (ICML), pages 56332–56356, 2024

  36. [36]

    AdaMerging: Adaptive model merging for multi-task learning

    Enneng Yang, Zhenyi Wang, Li Shen, Shiwei Liu, Guibing Guo, Xingwei Wang, and Dacheng Tao. AdaMerging: Adaptive model merging for multi-task learning. InThe Twelfth International Conference on Learning Representations (ICLR), 2024

  37. [37]

    nearly associative

    Le Yu, Bowen Yu, Haiyang Yu, Fei Huang, and Yongbin Li. Language models are super Mario: Absorbing abilities from homologous models as a free lunch. InProceedings of the 41st International Conference on Machine Learning (ICML), 2024. 12 A Controlled Verification Results This appendix presents the full per-strategy results for Tier 1 (controlled4 ×4tensor)...

  38. [38]

    When N1 and N2 synchronise (in either order), both compute merge(S′ 1,S′

  39. [39]

    = merge(S′ 2,S′ 1)by commutativity [29]

  40. [40]

    Both nodes now have identical visible sets:{θ1,θ2}

  41. [41]

    For multi-party convergence with k > 2nodes, associativity guarantees that the order of pairwise state merges does not affect the final state [28]

    Both nodes call resolve(·,σ,·), sorting by hash, seeding randomness identically, and obtaining the same merged modelθ∗. For multi-party convergence with k > 2nodes, associativity guarantees that the order of pairwise state merges does not affect the final state [28]. Whether node N3 merges first withN1 or N2, the final visible set—and therefore the resolv...

  42. [42]

    Gossip time grows quadratically in the number of nodes (reflecting all-pairs state exchange), while per-call merge() cost remains constant in tensor size. As noted in Section 6.5, this prototype gossip protocol is designed for validation purposes; production deployments beyond ∼50nodes would benefit from optimised dissemination protocols. 17 T able 5:Hugg...