arxiv: 2604.18614 · v1 · submitted 2026-04-15 · 💻 cs.DC · cs.CR· cs.ET· cs.MA

Recognition: unknown

HadAgent: Harness-Aware Decentralized Agentic AI Serving with Proof-of-Inference Blockchain Consensus

Landy Jimenez , Mariah Weatherspoon , Bingyu Shen , Yi Sheng , Jianming Liu , Boyang Li

Authors on Pith no claims yet

Pith reviewed 2026-05-10 12:14 UTC · model grok-4.3

classification 💻 cs.DC cs.CRcs.ETcs.MA

keywords decentralized AIproof-of-inferenceblockchain consensusLLM servingtamper detectiontrust managementharness monitoring

0 comments

The pith

HadAgent replaces proof-of-work mining with proof-of-inference consensus for decentralized LLM agent serving.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

HadAgent proposes a blockchain-based system for serving AI agents in which nodes earn block creation rights by executing deterministic LLM inference tasks instead of solving hash puzzles. Verification occurs through simple re-execution of a forward pass, enabling fast cross-node checks, while a three-lane block structure separates data, model, and proof channels each protected by its own Merkle root. A two-tier node classification and harness layer use heartbeats, recomputation-based anomaly detection, and automatic trust updates to isolate unreliable participants and promote consistent ones. This setup aims to turn the computational effort of consensus into productive AI work while maintaining tamper resistance and self-correction in a decentralized environment.

Core claim

HadAgent establishes proof-of-inference consensus in which nodes perform LLM inference to validate blocks, organized into DATA, MODEL, and PROOF lanes with independent Merkle roots. A harness monitors behavior to classify nodes as trusted or non-trusted, allowing trusted nodes optimistic real-time service while non-trusted nodes undergo full verification, forming a feedback loop that excludes adversaries and elevates honest participants.

What carries the argument

The harness layer, which monitors nodes via heartbeat probes, detects anomalies through deterministic recomputation, and manages trust levels to create a self-correcting exclusion of malicious nodes.

If this is right

Tampered records are detected at 100 percent rate with zero false positives.
Record and hub operations complete validation in sub-millisecond time.
Adversarial nodes are excluded within two monitoring rounds.
Honest nodes reach trusted status within five rounds and can then serve inference optimistically.
The three-lane structure allows independent verification of data, models, and proofs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The system could extend to other deterministic compute workloads where re-execution serves both consensus and application needs.
Trusted-node optimistic paths might reduce latency in production agent deployments once convergence stabilizes.
Scaling would require addressing how model updates or input variations affect the determinism assumption across larger node sets.

Load-bearing premise

LLM inference produces identical results across nodes despite differences in hardware, software, and floating-point behavior.

What would settle it

Two nodes running the same model and input under controlled conditions produce differing outputs, or a modified record passes all Merkle root checks and harness recomputation without detection.

Figures

Figures reproduced from arXiv: 2604.18614 by Bingyu Shen, Boyang Li, Jianming Liu, Landy Jimenez, Mariah Weatherspoon, Yi Sheng.

**Figure 2.** Figure 2: Overview of Block Design. Each block contains a header and a three [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Consensus Flow with Interval Verification. The cycle proceeds in [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Demonstration of the Two-Tier Node Architecture and Inference [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Consensus Latency Performance incurs higher latency and occasional spikes due to additional structural and cryptographic checks. Pool operations exhibit negligible latency and are included for completeness. In the scale evaluation, the system processed over 2000 validation operations, including 1000 record validations and 1000 hub submissions. Latency remained consistent with baseline measurements, indica… view at source ↗

read the original abstract

Proof-of-Work (PoW) blockchain consensus consumes vast computational resources without producing useful output, while the rapid growth of large language model (LLM) agents has created unprecedented demand for GPU computation. We present HadAgent, a decentralized agentic AI serving system that replaces hash-based mining with Proof-of-Inference (PoI), a consensus mechanism in which nodes earn block-creation rights by executing deterministic LLM inference tasks. Because verification requires only re-executing a single forward pass under identical conditions, cross-node verification operates at consensus speed. HadAgent organizes validated records into a three-lane block body with dedicated DATA, MODEL, and PROOF channels, each protected by an independent Merkle root for fine-grained tamper detection. A two-tier node architecture classifies secondary nodes as trusted or non-trusted based on historical behavior: trusted nodes serve inference results in real time through optimistic execution, while non-trusted nodes must undergo full consensus verification. A harness layer monitors node behavior through heartbeat probes, anomaly detection via deterministic recomputation, and automated trust management, creating a self-correcting feedback loop that isolates malicious or unreliable participants. Experiments on a prototype implementation demonstrate 100% detection rate and 0% false positive rate for tampered records, sub-millisecond validation latency for record and hub operations, and effective harness convergence that excludes adversarial nodes within two rounds while promoting honest nodes to trusted status within five rounds.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HadAgent's Proof-of-Inference relies on bit-identical LLM outputs across nodes, which won't hold in a real decentralized setup.

read the letter

The punchline for you is that HadAgent replaces PoW mining with LLM inference as the consensus work, using a harness layer and three-lane blocks to make it work for decentralized agentic AI serving. The prototype shows strong numbers on detection and convergence, but only under ideal conditions. The new parts are the specific three-lane structure with separate Merkle roots for data, model, and proof, the optimistic execution for trusted nodes, and the automated trust promotion based on historical behavior. It does well in sketching a self-correcting system that ties useful computation to block rewards. The soft spots are bigger than minor. The verification depends on getting exactly the same inference output when re-running on any node. But different GPUs and software stacks will produce small differences in logits or tokens, which the system would flag as tampering. The experiments claiming 100% detection and 0% false positives were likely done on uniform hardware, leaving the real decentralized case untested. The harness convergence in two to five rounds also looks like it might be sensitive to the simulation parameters rather than general. This paper is for folks thinking about blockchain for AI or energy-efficient consensus. It gives a concrete design to discuss. I'd say send it to peer review. Reviewers can check if the determinism assumption can be relaxed or if they need heavier verification like zero-knowledge proofs for inference. The thinking is clear enough to be worth the time.

Referee Report

2 major / 0 minor

Summary. The manuscript presents HadAgent, a decentralized agentic AI serving system that replaces Proof-of-Work with Proof-of-Inference (PoI) consensus. Nodes earn block rights by executing LLM inference tasks, with validated records stored in a three-lane block body (DATA, MODEL, PROOF channels) protected by independent Merkle roots. A two-tier node architecture and harness layer classify nodes by historical behavior, enable optimistic execution for trusted nodes, and use deterministic recomputation for anomaly detection and trust management. Prototype experiments are reported to achieve 100% detection and 0% false positives for tampered records, sub-millisecond validation latencies, and harness convergence that excludes adversaries in two rounds while promoting honest nodes in five rounds.

Significance. If the determinism and cross-node verification assumptions hold under realistic conditions, the work offers a promising direction for useful-work blockchain consensus tied directly to AI serving workloads, potentially reducing PoW waste while providing verifiable decentralized inference. The three-lane Merkle structure and self-correcting harness represent concrete architectural contributions, and the prototype implementation with quantitative performance numbers supplies initial evidence of practicality.

major comments (2)

[Abstract] Abstract: The claims of 100% detection rate, 0% false positive rate, sub-millisecond latencies, and specific convergence rounds (two for exclusion, five for promotion) are presented without any reference to experimental methodology, hardware setup, datasets, attack models, number of trials, or statistical measures, leaving the central empirical support for the system's correctness and performance unassessable.
[§3] §3 (System Design, PoI and harness description): The verification and anomaly detection mechanisms rest on the assumption that a single LLM forward pass produces bit-identical outputs across nodes under 'identical conditions,' enabling simple re-execution for tamper detection. This is contradicted by real hardware and software variations (GPU architectures, CUDA/cuDNN versions, floating-point modes) that cause divergent token or logit outputs even with fixed seeds and weights; the three-lane Merkle roots and trust classification would then misclassify honest divergence as tampering, undermining the reported detection rates and two-round exclusion claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment point by point below. Revisions have been made to improve clarity and address concerns where the feedback identifies gaps in the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: The claims of 100% detection rate, 0% false positive rate, sub-millisecond latencies, and specific convergence rounds (two for exclusion, five for promotion) are presented without any reference to experimental methodology, hardware setup, datasets, attack models, number of trials, or statistical measures, leaving the central empirical support for the system's correctness and performance unassessable.

Authors: We agree that the abstract should provide more context on the supporting experiments to make the claims assessable. In the revised version, we have added a concise clause referencing the prototype evaluation: results derive from 1000 trials on a controlled cluster of identical NVIDIA A100 GPUs using synthetic and real LLM inference workloads, with simulated tampering attacks and standard statistical reporting (means and standard deviations). Full methodology, hardware specifications, datasets, and attack models are detailed in Section 5. This keeps the abstract within length limits while directing readers to the evidence. revision: yes
Referee: [§3] §3 (System Design, PoI and harness description): The verification and anomaly detection mechanisms rest on the assumption that a single LLM forward pass produces bit-identical outputs across nodes under 'identical conditions,' enabling simple re-execution for tamper detection. This is contradicted by real hardware and software variations (GPU architectures, CUDA/cuDNN versions, floating-point modes) that cause divergent token or logit outputs even with fixed seeds and weights; the three-lane Merkle roots and trust classification would then misclassify honest divergence as tampering, undermining the reported detection rates and two-round exclusion claims.

Authors: We acknowledge this as a valid and important limitation of the current design. The PoI mechanism and harness explicitly assume identical conditions for bit-exact recomputation, as stated in the manuscript. All prototype experiments were performed in a homogeneous environment (identical GPUs and software stacks) to validate the 100% detection / 0% false-positive claims under those conditions. In the revision to §3, we have added explicit discussion of this assumption, including requirements for node standardization and use of deterministic cuDNN modes. We also note that in heterogeneous deployments, the harness could incorporate tolerance thresholds on logits rather than strict equality, though this would require re-evaluating the reported metrics. A new limitations paragraph has been inserted to scope the guarantees accordingly. These changes clarify the operating assumptions without changing the core three-lane or harness architecture. revision: partial

Circularity Check

2 steps flagged

Tamper detection rates and harness convergence reduce to prototype definitions and simulation rules by construction

specific steps

fitted input called prediction [Abstract]
"Experiments on a prototype implementation demonstrate 100% detection rate and 0% false positive rate for tampered records, sub-millisecond validation latency for record and hub operations, and effective harness convergence that excludes adversarial nodes within two rounds while promoting honest nodes to trusted status within five rounds."

The prototype uses homogeneous hardware/software; tampering is introduced by altering records, and detection occurs via deterministic recomputation on the same node. Mismatch is therefore guaranteed by construction whenever a record is altered, producing 100% detection / 0% FP without testing cross-node non-determinism or real decentralized conditions.
self definitional [Abstract]
"A harness layer monitors node behavior through heartbeat probes, anomaly detection via deterministic recomputation, and automated trust management, creating a self-correcting feedback loop that isolates malicious or unreliable participants."

The harness rules define trust classification and exclusion based on historical behavior and recomputation matches; running those exact rules in simulation necessarily yields the reported two-round exclusion and five-round promotion numbers, making the convergence result equivalent to the input policy rather than an emergent or validated property.

full rationale

The paper's core claims rest on a prototype and simulation whose outcomes are forced by the verification definition (re-execution under identical conditions) and the harness classification logic. No external benchmarks, machine-checked proofs, or heterogeneous-node tests are cited to break the loop. The central experimental results therefore function as re-statements of the input setup rather than independent predictions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 3 invented entities

The design rests on several unproven assumptions about determinism and node behavior plus newly introduced components without external validation.

axioms (2)

domain assumption LLM inference produces identical outputs under identical conditions across nodes
Invoked to enable verification by re-execution; appears in the description of Proof-of-Inference and cross-node validation.
ad hoc to paper Historical behavior reliably predicts future trustworthiness
Basis for classifying nodes as trusted or non-trusted and for the harness feedback loop.

invented entities (3)

Proof-of-Inference (PoI) no independent evidence
purpose: Consensus mechanism replacing hash mining with inference tasks
Core new primitive; no independent evidence of security properties provided.
Three-lane block body (DATA, MODEL, PROOF channels) no independent evidence
purpose: Fine-grained tamper detection via separate Merkle roots
Architectural invention; no prior reference or proof of advantage shown.
Harness layer no independent evidence
purpose: Monitoring, anomaly detection, and automated trust management
Self-correcting feedback mechanism; convergence claims rest on this without external benchmarks.

pith-pipeline@v0.9.0 · 5575 in / 1509 out tokens · 33780 ms · 2026-05-10T12:14:31.992837+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 5 canonical work pages · 1 internal anchor

[1]

Bitcoin: A peer-to-peer electronic cash system,

S. Nakamotoet al., “Bitcoin: A peer-to-peer electronic cash system,” 2008

2008
[2]

Energy-recycling blockchain with proof-of-deep-learning,

C. Chenli, B. Li, Y . Shi, and T. Jung, “Energy-recycling blockchain with proof-of-deep-learning,” in2019 IEEE International Conference on Blockchain and Cryptocurrency (ICBC), 2019, pp. 19–23

2019
[3]

Bitcoin energy consumption index @ONLINE,

Digiconomist, “Bitcoin energy consumption index @ONLINE,” https://digiconomist.net/bitcoin-energy-consumption, March 2019, (ac- cessed: 03.06.2019)

2019
[4]

Proof of learning (pole): Empowering neural network training with consensus building on blockchains,

Y . Liu, Y . Lan, B. Li, C. Miao, and Z. Tian, “Proof of learning (pole): Empowering neural network training with consensus building on blockchains,”Computer Networks, vol. 201, p. 108594, 2021

2021
[5]

Coin. ai: A proof-of-useful-work scheme for blockchain-based distributed deep learning,

A. Baldominos and Y . Saez, “Coin. ai: A proof-of-useful-work scheme for blockchain-based distributed deep learning,”Entropy, vol. 21, no. 8, p. 723, 2019

2019
[6]

Dlbc: A deep learning- based consensus in blockchains for deep learning services,

B. Li, C. Chenli, X. Xu, Y . Shi, and T. Jung, “Dlbc: A deep learning- based consensus in blockchains for deep learning services,”arXiv preprint arXiv:1904.07349, 2019

work page arXiv 1904
[7]

Ppcoin: Peer-to-peer crypto-currency with proof- of-stake,

S. King and S. Nadal, “Ppcoin: Peer-to-peer crypto-currency with proof- of-stake,”self-published paper, August, vol. 19, no. 1, 2012

2012
[8]

Blockchain challenges and opportunities: A survey,

Z. Zheng, S. Xie, H.-N. Dai, X. Chen, and H. Wang, “Blockchain challenges and opportunities: A survey,”International journal of web and grid services, vol. 14, no. 4, pp. 352–375, 2018

2018
[9]

Proof-of-useful-work blockchain for trustworthy biomedical hyperdimensional computing,

J. Wen, D. Ma, S. Zhang, H. Sudler, and X. Jiao, “Proof-of-useful-work blockchain for trustworthy biomedical hyperdimensional computing,” in2025 IEEE Biomedical Circuits and Systems Conference (BioCAS), 2025, pp. 56–60

2025
[10]

Exploiting computation power of blockchain for biomedical image segmentation,

B. Li, C. Chenli, X. Xu, T. Jung, and Y . Shi, “Exploiting computation power of blockchain for biomedical image segmentation,” inProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2019

2019
[11]

Proof-of-federated-learning- subchain: Free partner selection subchain based on federated learning,

B. Li, B. Shen, Q. Lu, T. Jung, and Y . Shi, “Proof-of-federated-learning- subchain: Free partner selection subchain based on federated learning,” in2023 Fifth International Conference on Blockchain Computing and Applications (BCCA), 2023, pp. 600–605

2023
[12]

A mining pool solution for novel proof-of-neural-architecture consensus,

B. Li, Q. Lu, W. Jiang, T. Jung, and Y . Shi, “A mining pool solution for novel proof-of-neural-architecture consensus,” in2021 IEEE Inter- national Conference on Blockchain and Cryptocurrency (ICBC), 2021, pp. 1–3

2021
[13]

Tidyblock: A novel consensus mechanism for dag-based blockchain in iot,

X. Qu, S. Wang, K. Li, J. Huang, and X. Cheng, “Tidyblock: A novel consensus mechanism for dag-based blockchain in iot,”IEEE Transactions on Mobile Computing, vol. 24, no. 2, pp. 722–735, 2025

2025
[14]

Blockchain consensus scheme based on the proof of distributed deep learning work,

H. Zhi, H. Wu, Y . Huang, C. Tian, and S. Wang, “Blockchain consensus scheme based on the proof of distributed deep learning work,”IET Software, vol. 2025, no. 1, p. 3378383, 2025

2025
[15]

A novel proof of useful work for a blockchain storing transportation transactions,

M. Haouari, M. Mhiri, M. El-Masri, and K. Al-Yafi, “A novel proof of useful work for a blockchain storing transportation transactions,” Information Processing & Management, vol. 59, no. 1, p. 102749, 2022

2022
[16]

Blockchain technology, structure, and applications: a survey,

N. Moosavi, H. Taherdoost, N. Mohamed, M. Madanchian, Y . Farhaoui, and I. U. Khan, “Blockchain technology, structure, and applications: a survey,”Procedia Computer Science, vol. 237, pp. 645–658, 2024

2024
[17]

Whitepaper

Y . Rao, J. Steeves, A. Shaabana, D. Attevelt, and M. McAteer, “Bittensor: A peer-to-peer intelligence market,” 2021. [Online]. Available: https://arxiv.org/abs/2003.03917

work page arXiv 2021
[18]

Resonance: A market mechanism for heterogeneous computation,

N. Durvasula and M. Bahrani, “Resonance: A market mechanism for heterogeneous computation,” https://ritual.net/blog/resonance-pt1, 2025

2025
[19]

Introducing ritual chain,

Ritual Foundation, “Introducing ritual chain,” https://ritualfoundation.org/blog/unveiling-ritual, 2025

2025
[20]

Fedml: A research li- brary and benchmark for federated machine learning,

C. He, S. Li, J. So, X. Zeng, M. Zhang, H. Wang, X. Wang, P. Vepakomma, A. Singh, H. Qiuet al., “Fedml: A research li- brary and benchmark for federated machine learning,”arXiv preprint arXiv:2007.13518, 2020

work page arXiv 2007
[21]

Verde: Verification via refereed delegation for machine learning programs,

A. Arun, A. S. Arnaud, A. Titov, B. Wilcox, V . Kolobaric, M. Brinkmann, O. Ersoy, B. Fielding, and J. Bonneau, “Verde: Verification via refereed delegation for machine learning programs,”
[22]

Available: https://arxiv.org/abs/2502.19405

[Online]. Available: https://arxiv.org/abs/2502.19405

work page arXiv
[23]

Back to Basics: Let Conversational Agents Remember with Just Retrieval and Generation

Y . Wu, W. Chen, Z. Huang, J. Chen, Q. Liu, K. Wang, X. Zhou, and Y . Liang, “Back to basics: Let conversational agents remember with just retrieval and generation,” 2026. [Online]. Available: https://arxiv.org/abs/2604.11628

work page internal anchor Pith review Pith/arXiv arXiv 2026
[24]

My AI adoption journey,

M. Hashimoto, “My AI adoption journey,” 2026. [Online]. Available: https://mitchellh.com/writing/my-ai-adoption-journey

2026