Decentralized & Collaborative AI on Blockchain

Bo Waggoner; Justin D. Harris

arxiv: 1907.07247 · v1 · pith:DXSJSQMTnew · submitted 2019-07-16 · 💻 cs.CR · cs.AI· cs.HC

Decentralized & Collaborative AI on Blockchain

Justin D. Harris , Bo Waggoner This is my paper

Pith reviewed 2026-05-24 20:40 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.HC

keywords decentralized AIblockchainsmart contractscollaborative learningincentive mechanismsmachine learningEthereum

0 comments

The pith

Smart contracts on a blockchain can host collaboratively built datasets and continuously updated AI models that stay free for public inference.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a system where participants jointly assemble training data and rely on smart contracts to maintain and serve an AI model on a blockchain. The model remains publicly accessible at no cost for repeated inference tasks such as personal assistants or recommender systems. Financial rewards and gamified incentives are offered to contributors who supply accurate data that preserves performance on a held-out test set. This setup targets the problem of models becoming stale when data and retraining stay centralized and proprietary. An open-source Ethereum implementation is supplied to demonstrate the approach.

Core claim

A blockchain framework lets participants build a shared dataset and use smart contracts to host a model that updates continuously; the model is then available for free public inference, with financial and gamified incentives designed to keep accuracy stable on a test set for problems that involve many similar queries.

What carries the argument

Smart contracts that store the model, accept data contributions, apply incentives, and serve inference results on the blockchain.

If this is right

Models stay current without any single party owning the data or paying for retraining.
Inference becomes free and public for high-volume use cases such as games or recommenders.
Accuracy is preserved through ongoing data contributions rather than one-time training.
The same contract infrastructure can support multiple learning problems on the same chain.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be tested first on narrow domains where query patterns are highly repetitive, making incentive costs easier to bound.
If incentives succeed, similar contract patterns might apply to collaborative maintenance of other shared resources such as knowledge bases.
Adoption would depend on whether transaction fees on the chosen blockchain stay low enough for frequent small data uploads.

Load-bearing premise

The proposed financial and gamified incentives will draw enough high-quality data contributions to keep the model's accuracy stable on a test set.

What would settle it

A deployed instance in which data contributions remain too few or too noisy for the model's accuracy on the test set to hold steady over time despite the incentives.

Figures

Figures reproduced from arXiv: 1907.07247 by Bo Waggoner, Justin D. Harris.

**Figure 2.** Figure 2: Bounty-based Incentive Mechanism. We use h to denote a machine learning model and D for a dataset. The loss function L(h, D) is clipped to the range [0, 1] by the smart contract. In the reward phase, the provider uploads the test dataset and the smart contract checks that it satisfies the commitment.3 Then, the smart contract determines rewards for all participants, as discussed next. 2) Reward calculatio… view at source ↗

**Figure 3.** Figure 3: Balance percentages and model accuracy in a simulation where an [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Overview of the class diagram for the framework; other members and [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

read the original abstract

Machine learning has recently enabled large advances in artificial intelligence, but these tend to be highly centralized. The large datasets required are generally proprietary; predictions are often sold on a per-query basis; and published models can quickly become out of date without effort to acquire more data and re-train them. We propose a framework for participants to collaboratively build a dataset and use smart contracts to host a continuously updated model. This model will be shared publicly on a blockchain where it can be free to use for inference. Ideal learning problems include scenarios where a model is used many times for similar input such as personal assistants, playing games, recommender systems, etc. In order to maintain the model's accuracy with respect to some test set we propose both financial and non-financial (gamified) incentive structures for providing good data. A free and open source implementation for the Ethereum blockchain is provided at https://github.com/microsoft/0xDeCA10B.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper proposes a blockchain-based framework in which participants collaboratively contribute data to build and continuously update a machine learning model hosted via smart contracts; the resulting model is made publicly available on-chain for free inference. Financial and gamified incentives are described to encourage high-quality data contributions that preserve accuracy on a held-out test set. An open-source Ethereum implementation is provided.

Significance. If the incentive mechanisms can be shown to elicit sustained high-quality contributions, the design would offer a concrete path toward decentralized, publicly accessible, and updatable models for high-query-volume tasks such as recommendation and personal assistants. The accompanying open-source implementation is a concrete strength that demonstrates technical feasibility of the smart-contract layer.

major comments (1)

[Abstract / Incentive Structures] Abstract and the section describing incentive structures: the central claim that the proposed financial and gamified incentives will be sufficient to maintain model accuracy on a test set is presented as an assumption without any simulation, game-theoretic analysis, or empirical evaluation of participant behavior.

minor comments (2)

[Implementation] The manuscript would benefit from an explicit statement of the threat model (e.g., Sybil attacks, data poisoning) and how the smart-contract logic mitigates each.
[System Architecture] Clarify the on-chain storage and update costs for the model parameters; the current description leaves open whether full model weights or only gradients are stored.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review and constructive feedback on our framework proposal. We address the major comment below.

read point-by-point responses

Referee: [Abstract / Incentive Structures] Abstract and the section describing incentive structures: the central claim that the proposed financial and gamified incentives will be sufficient to maintain model accuracy on a test set is presented as an assumption without any simulation, game-theoretic analysis, or empirical evaluation of participant behavior.

Authors: We agree that the manuscript presents the incentive structures as a proposed design element without accompanying simulations, game-theoretic analysis, or empirical evaluation of participant behavior. The work is positioned as a technical framework for decentralized collaborative ML with an accompanying open-source Ethereum implementation to demonstrate feasibility of the smart-contract layer; the incentives are described at a conceptual level to address the problem of maintaining accuracy. We will revise the abstract and the incentive section to clarify that effectiveness is hypothesized rather than demonstrated, and add a limitations/future-work paragraph noting that behavioral validation would require separate studies. revision: yes

Circularity Check

0 steps flagged

No significant circularity; system design proposal without derivations or self-referential predictions

full rationale

The manuscript is a systems-design proposal for a blockchain-based collaborative AI framework using smart contracts to host datasets and models, accompanied by an open-source Ethereum implementation. No equations, fitted parameters, predictions, or derivation chains are present in the abstract or described content. The incentive structures are proposed as mechanisms but receive no empirical evaluation or formal analysis that could reduce to self-defined quantities. The contribution is self-contained as an architectural outline rather than a mathematical result, with no self-citation load-bearing steps or ansatz smuggling. This is the expected honest non-finding for a non-mathematical systems paper.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper introduces a system architecture rather than new mathematical objects or fitted constants. No free parameters are defined. Axioms are standard assumptions about blockchain properties and participant rationality. No new physical or theoretical entities are postulated.

axioms (2)

domain assumption Blockchain smart contracts can reliably host and update machine learning models at acceptable cost and latency for the target applications.
Invoked when stating that the model will be shared publicly on a blockchain.
ad hoc to paper Participants respond to the described financial and gamified incentives by supplying data that improves model accuracy on a held-out test set.
Central to the claim that accuracy can be maintained.

pith-pipeline@v0.9.0 · 5683 in / 1259 out tokens · 19334 ms · 2026-05-24T20:40:14.621307+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · 1 internal anchor

[1]

DInEMMo: Decentral- ized incentivization for enterprise marketplace models,

A. Marathe, K. Narayanan, A. Gupta, and M. Pr, “DInEMMo: Decentral- ized incentivization for enterprise marketplace models,” 2018 IEEE 25th International Conference on High Performance Computing Workshops (HiPCW), pp. 95–100, 2018

work page 2018
[2]

Trustless machine learning contracts; evaluating and exchanging machine learning models on the ethereum blockchain,

A. B. Kurtulmus and K. Daniel, “Trustless machine learning contracts; evaluating and exchanging machine learning models on the ethereum blockchain,” 2018. [Online]. Available: https: //algorithmia.com/research/ml-models-on-blockchain

work page 2018
[3]

Quality Control in Crowdsourcing: A Survey of Quality Attributes, Assessment Techniques and Assurance Actions

F. Daniel, P. Kucherbaev, C. Cappiello, B. Benatallah, and M. Allahbakhsh, “Quality control in crowdsourcing: A survey of quality attributes, assessment techniques and assurance actions,” CoRR, vol. abs/1801.02546, 2018. [Online]. Available: http://arxiv.org/abs/1801.02546

work page internal anchor Pith review Pith/arXiv arXiv 2018
[4]

A next generation smart contract & decentralized application platform,

V . Buterin, “A next generation smart contract & decentralized application platform,” 2015

work page 2015
[5]

Tendermint: Consensus without mining,

J. Kwon, “Tendermint: Consensus without mining,” 2014

work page 2014
[6]

Casper the friendly ﬁnality gadget,

V . Buterin and V . Grifﬁth, “Casper the friendly ﬁnality gadget,” 2017

work page 2017
[7]

Conditional random ﬁelds: Probabilistic models for segmenting and labeling sequence data,

J. Lafferty, A. McCallum, and F. C. Pereira, “Conditional random ﬁelds: Probabilistic models for segmenting and labeling sequence data,” 2001

work page 2001
[8]

Cortes, V

C. Cortes and V . Vapnik, “Support-vector networks,” Machine Learning, vol. 20, no. 3, pp. 273–297, Sep 1995. [Online]. Available: https://doi.org/10.1007/BF00994018

work page doi:10.1007/bf00994018 1995
[9]

A case study of incremental concept induction,

J. C. Schlimmer and D. Fisher, “A case study of incremental concept induction,” in Proceedings of the Fifth AAAI National Conference on Artiﬁcial Intelligence, ser. AAAI’86. AAAI Press, 1986, pp. 496–501. [Online]. Available: http://dl.acm.org/citation.cfm?id=2887770.2887853

work page arXiv 1986
[10]

C. D. Manning, P. Raghavan, and H. Sch ¨utze, Introduction to In- formation Retrieval . Cambridge University Press, 2008, ch. Vector space classiﬁcation, http://nlp.stanford.edu/IR-book/html/htmledition/ rocchio-classiﬁcation-1.html

work page 2008
[11]

Universal sentence encoder,

D. Cer, Y . Yang, S. yi Kong, N. Hua, N. Limtiaco, R. S. John, N. Constant, M. Guajardo-Cespedes, S. Yuan, C. Tar, Y .-H. Sung, B. Strope, and R. Kurzweil, “Universal sentence encoder,” 2018

work page 2018
[12]

The perceptron: a probabilistic model for information storage and organization in the brain,

F. Rosenblatt, “The perceptron: a probabilistic model for information storage and organization in the brain,” Psychological Review, vol. 65, no. 6, p. 386, 1958

work page 1958
[13]

Learning word vectors for sentiment analysis,

A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y . Ng, and C. Potts, “Learning word vectors for sentiment analysis,” in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies . Portland, Oregon, USA: Association for Computational Linguistics, June 2011, pp. 142–150. [Online]. Available: http:...

work page 2011
[14]

Wikipedia — Wikipedia, the free ency- clopedia,

Wikipedia contributors, “Wikipedia — Wikipedia, the free ency- clopedia,” https://en.wikipedia.org/w/index.php?title=Wikipedia&oldid= 889379633, 2019, [Online; accessed 25-March-2019]

work page 2019
[15]

Stack exchange,

Stack Exchange contributors, “Stack exchange,” https://stackexchange. com, 2019, [Online; accessed 25-March-2019]

work page 2019
[16]

Gamifying with badges: A big data natural experiment on stack exchange,

B. Bornfeld and S. Rafaeli, “Gamifying with badges: A big data natural experiment on stack exchange,” First Monday , vol. 22, no. 6, 2017. [Online]. Available: https://ﬁrstmonday.org/ojs/index.php/ fm/article/view/7299

work page 2017
[17]

A collaborative mechanism for crowdsourcing prediction problems,

J. D. Abernethy and R. M. Frongillo, “A collaborative mechanism for crowdsourcing prediction problems,” in Advances in Neural Information Processing Systems 25 , ser. NeurIPS ’11, 2011, pp. 2600–2608

work page 2011
[18]

A market framework for eliciting private data,

B. Waggoner, R. Frongillo, and J. D. Abernethy, “A market framework for eliciting private data,” in Advances in Neural Information Processing Systems 28, ser. NeurIPS ’15, 2015, pp. 3492–3500

work page 2015
[19]

Combinatorial information market design,

R. Hanson, “Combinatorial information market design,” Information Systems Frontiers, vol. 5, no. 1, pp. 107–119, 2003

work page 2003
[20]

Inversion of control — Wikipedia, the free encyclopedia,

Wikipedia contributors, “Inversion of control — Wikipedia, the free encyclopedia,” https://en.wikipedia.org/w/index.php?title=Inversion of control&oldid=885334776, 2019, [Online; accessed 27-March-2019]

work page 2019
[21]

Composition over inheritance - gas efﬁciency,

Sergii Bomko, “Composition over inheritance - gas efﬁciency,” https:// ethereum.stackexchange.com/a/60244/9564, 2018, [Online; accessed 27- March-2019]

work page 2018
[22]

Bitcoin: A peer-to-peer electronic cash system,

S. Nakamoto et al. , “Bitcoin: A peer-to-peer electronic cash system,” 2008

work page 2008
[23]

Deep learning of representations for unsupervised and transfer learning,

Y . Bengio, “Deep learning of representations for unsupervised and transfer learning,” in Proceedings of ICML Workshop on Unsupervised and Transfer Learning , 2012, pp. 17–36

work page 2012
[24]

DeepChain: Auditable and privacy-preserving deep learning with blockchain-based incentive,

J.-S. Weng, J. Weng, M. Li, Y . Zhang, and W. Luo, “DeepChain: Auditable and privacy-preserving deep learning with blockchain-based incentive,” IACR Cryptology ePrint Archive , vol. 2018, p. 679, 2018

work page 2018
[25]

Provable - Oraclize 2.0 - blockchain oracle service, enabling data-rich smart contracts,

Provable, “Provable - Oraclize 2.0 - blockchain oracle service, enabling data-rich smart contracts,” https://provable.xyz/, 2019, [Online; accessed 5-April-2019]

work page 2019

[1] [1]

DInEMMo: Decentral- ized incentivization for enterprise marketplace models,

A. Marathe, K. Narayanan, A. Gupta, and M. Pr, “DInEMMo: Decentral- ized incentivization for enterprise marketplace models,” 2018 IEEE 25th International Conference on High Performance Computing Workshops (HiPCW), pp. 95–100, 2018

work page 2018

[2] [2]

Trustless machine learning contracts; evaluating and exchanging machine learning models on the ethereum blockchain,

A. B. Kurtulmus and K. Daniel, “Trustless machine learning contracts; evaluating and exchanging machine learning models on the ethereum blockchain,” 2018. [Online]. Available: https: //algorithmia.com/research/ml-models-on-blockchain

work page 2018

[3] [3]

Quality Control in Crowdsourcing: A Survey of Quality Attributes, Assessment Techniques and Assurance Actions

F. Daniel, P. Kucherbaev, C. Cappiello, B. Benatallah, and M. Allahbakhsh, “Quality control in crowdsourcing: A survey of quality attributes, assessment techniques and assurance actions,” CoRR, vol. abs/1801.02546, 2018. [Online]. Available: http://arxiv.org/abs/1801.02546

work page internal anchor Pith review Pith/arXiv arXiv 2018

[4] [4]

A next generation smart contract & decentralized application platform,

V . Buterin, “A next generation smart contract & decentralized application platform,” 2015

work page 2015

[5] [5]

Tendermint: Consensus without mining,

J. Kwon, “Tendermint: Consensus without mining,” 2014

work page 2014

[6] [6]

Casper the friendly ﬁnality gadget,

V . Buterin and V . Grifﬁth, “Casper the friendly ﬁnality gadget,” 2017

work page 2017

[7] [7]

Conditional random ﬁelds: Probabilistic models for segmenting and labeling sequence data,

J. Lafferty, A. McCallum, and F. C. Pereira, “Conditional random ﬁelds: Probabilistic models for segmenting and labeling sequence data,” 2001

work page 2001

[8] [8]

Cortes, V

C. Cortes and V . Vapnik, “Support-vector networks,” Machine Learning, vol. 20, no. 3, pp. 273–297, Sep 1995. [Online]. Available: https://doi.org/10.1007/BF00994018

work page doi:10.1007/bf00994018 1995

[9] [9]

A case study of incremental concept induction,

J. C. Schlimmer and D. Fisher, “A case study of incremental concept induction,” in Proceedings of the Fifth AAAI National Conference on Artiﬁcial Intelligence, ser. AAAI’86. AAAI Press, 1986, pp. 496–501. [Online]. Available: http://dl.acm.org/citation.cfm?id=2887770.2887853

work page arXiv 1986

[10] [10]

C. D. Manning, P. Raghavan, and H. Sch ¨utze, Introduction to In- formation Retrieval . Cambridge University Press, 2008, ch. Vector space classiﬁcation, http://nlp.stanford.edu/IR-book/html/htmledition/ rocchio-classiﬁcation-1.html

work page 2008

[11] [11]

Universal sentence encoder,

D. Cer, Y . Yang, S. yi Kong, N. Hua, N. Limtiaco, R. S. John, N. Constant, M. Guajardo-Cespedes, S. Yuan, C. Tar, Y .-H. Sung, B. Strope, and R. Kurzweil, “Universal sentence encoder,” 2018

work page 2018

[12] [12]

The perceptron: a probabilistic model for information storage and organization in the brain,

F. Rosenblatt, “The perceptron: a probabilistic model for information storage and organization in the brain,” Psychological Review, vol. 65, no. 6, p. 386, 1958

work page 1958

[13] [13]

Learning word vectors for sentiment analysis,

A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y . Ng, and C. Potts, “Learning word vectors for sentiment analysis,” in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies . Portland, Oregon, USA: Association for Computational Linguistics, June 2011, pp. 142–150. [Online]. Available: http:...

work page 2011

[14] [14]

Wikipedia — Wikipedia, the free ency- clopedia,

Wikipedia contributors, “Wikipedia — Wikipedia, the free ency- clopedia,” https://en.wikipedia.org/w/index.php?title=Wikipedia&oldid= 889379633, 2019, [Online; accessed 25-March-2019]

work page 2019

[15] [15]

Stack exchange,

Stack Exchange contributors, “Stack exchange,” https://stackexchange. com, 2019, [Online; accessed 25-March-2019]

work page 2019

[16] [16]

Gamifying with badges: A big data natural experiment on stack exchange,

B. Bornfeld and S. Rafaeli, “Gamifying with badges: A big data natural experiment on stack exchange,” First Monday , vol. 22, no. 6, 2017. [Online]. Available: https://ﬁrstmonday.org/ojs/index.php/ fm/article/view/7299

work page 2017

[17] [17]

A collaborative mechanism for crowdsourcing prediction problems,

J. D. Abernethy and R. M. Frongillo, “A collaborative mechanism for crowdsourcing prediction problems,” in Advances in Neural Information Processing Systems 25 , ser. NeurIPS ’11, 2011, pp. 2600–2608

work page 2011

[18] [18]

A market framework for eliciting private data,

B. Waggoner, R. Frongillo, and J. D. Abernethy, “A market framework for eliciting private data,” in Advances in Neural Information Processing Systems 28, ser. NeurIPS ’15, 2015, pp. 3492–3500

work page 2015

[19] [19]

Combinatorial information market design,

R. Hanson, “Combinatorial information market design,” Information Systems Frontiers, vol. 5, no. 1, pp. 107–119, 2003

work page 2003

[20] [20]

Inversion of control — Wikipedia, the free encyclopedia,

Wikipedia contributors, “Inversion of control — Wikipedia, the free encyclopedia,” https://en.wikipedia.org/w/index.php?title=Inversion of control&oldid=885334776, 2019, [Online; accessed 27-March-2019]

work page 2019

[21] [21]

Composition over inheritance - gas efﬁciency,

Sergii Bomko, “Composition over inheritance - gas efﬁciency,” https:// ethereum.stackexchange.com/a/60244/9564, 2018, [Online; accessed 27- March-2019]

work page 2018

[22] [22]

Bitcoin: A peer-to-peer electronic cash system,

S. Nakamoto et al. , “Bitcoin: A peer-to-peer electronic cash system,” 2008

work page 2008

[23] [23]

Deep learning of representations for unsupervised and transfer learning,

Y . Bengio, “Deep learning of representations for unsupervised and transfer learning,” in Proceedings of ICML Workshop on Unsupervised and Transfer Learning , 2012, pp. 17–36

work page 2012

[24] [24]

DeepChain: Auditable and privacy-preserving deep learning with blockchain-based incentive,

J.-S. Weng, J. Weng, M. Li, Y . Zhang, and W. Luo, “DeepChain: Auditable and privacy-preserving deep learning with blockchain-based incentive,” IACR Cryptology ePrint Archive , vol. 2018, p. 679, 2018

work page 2018

[25] [25]

Provable - Oraclize 2.0 - blockchain oracle service, enabling data-rich smart contracts,

Provable, “Provable - Oraclize 2.0 - blockchain oracle service, enabling data-rich smart contracts,” https://provable.xyz/, 2019, [Online; accessed 5-April-2019]

work page 2019