pith. machine review for the scientific record. sign in

arxiv: 2604.18614 · v1 · submitted 2026-04-15 · 💻 cs.DC · cs.CR· cs.ET· cs.MA

Recognition: unknown

HadAgent: Harness-Aware Decentralized Agentic AI Serving with Proof-of-Inference Blockchain Consensus

Authors on Pith no claims yet

Pith reviewed 2026-05-10 12:14 UTC · model grok-4.3

classification 💻 cs.DC cs.CRcs.ETcs.MA
keywords decentralized AIproof-of-inferenceblockchain consensusLLM servingtamper detectiontrust managementharness monitoring
0
0 comments X

The pith

HadAgent replaces proof-of-work mining with proof-of-inference consensus for decentralized LLM agent serving.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

HadAgent proposes a blockchain-based system for serving AI agents in which nodes earn block creation rights by executing deterministic LLM inference tasks instead of solving hash puzzles. Verification occurs through simple re-execution of a forward pass, enabling fast cross-node checks, while a three-lane block structure separates data, model, and proof channels each protected by its own Merkle root. A two-tier node classification and harness layer use heartbeats, recomputation-based anomaly detection, and automatic trust updates to isolate unreliable participants and promote consistent ones. This setup aims to turn the computational effort of consensus into productive AI work while maintaining tamper resistance and self-correction in a decentralized environment.

Core claim

HadAgent establishes proof-of-inference consensus in which nodes perform LLM inference to validate blocks, organized into DATA, MODEL, and PROOF lanes with independent Merkle roots. A harness monitors behavior to classify nodes as trusted or non-trusted, allowing trusted nodes optimistic real-time service while non-trusted nodes undergo full verification, forming a feedback loop that excludes adversaries and elevates honest participants.

What carries the argument

The harness layer, which monitors nodes via heartbeat probes, detects anomalies through deterministic recomputation, and manages trust levels to create a self-correcting exclusion of malicious nodes.

If this is right

  • Tampered records are detected at 100 percent rate with zero false positives.
  • Record and hub operations complete validation in sub-millisecond time.
  • Adversarial nodes are excluded within two monitoring rounds.
  • Honest nodes reach trusted status within five rounds and can then serve inference optimistically.
  • The three-lane structure allows independent verification of data, models, and proofs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The system could extend to other deterministic compute workloads where re-execution serves both consensus and application needs.
  • Trusted-node optimistic paths might reduce latency in production agent deployments once convergence stabilizes.
  • Scaling would require addressing how model updates or input variations affect the determinism assumption across larger node sets.

Load-bearing premise

LLM inference produces identical results across nodes despite differences in hardware, software, and floating-point behavior.

What would settle it

Two nodes running the same model and input under controlled conditions produce differing outputs, or a modified record passes all Merkle root checks and harness recomputation without detection.

Figures

Figures reproduced from arXiv: 2604.18614 by Bingyu Shen, Boyang Li, Jianming Liu, Landy Jimenez, Mariah Weatherspoon, Yi Sheng.

Figure 1
Figure 1. Figure 1: Overview of the HadAgent system architecture. From top to bottom: [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of Block Design. Each block contains a header and a three [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Consensus Flow with Interval Verification. The cycle proceeds in [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Demonstration of the Two-Tier Node Architecture and Inference [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Consensus Latency Performance incurs higher latency and occasional spikes due to additional structural and cryptographic checks. Pool operations exhibit negligible latency and are included for completeness. In the scale evaluation, the system processed over 2000 vali￾dation operations, including 1000 record validations and 1000 hub submissions. Latency remained consistent with baseline measurements, indica… view at source ↗
read the original abstract

Proof-of-Work (PoW) blockchain consensus consumes vast computational resources without producing useful output, while the rapid growth of large language model (LLM) agents has created unprecedented demand for GPU computation. We present HadAgent, a decentralized agentic AI serving system that replaces hash-based mining with Proof-of-Inference (PoI), a consensus mechanism in which nodes earn block-creation rights by executing deterministic LLM inference tasks. Because verification requires only re-executing a single forward pass under identical conditions, cross-node verification operates at consensus speed. HadAgent organizes validated records into a three-lane block body with dedicated DATA, MODEL, and PROOF channels, each protected by an independent Merkle root for fine-grained tamper detection. A two-tier node architecture classifies secondary nodes as trusted or non-trusted based on historical behavior: trusted nodes serve inference results in real time through optimistic execution, while non-trusted nodes must undergo full consensus verification. A harness layer monitors node behavior through heartbeat probes, anomaly detection via deterministic recomputation, and automated trust management, creating a self-correcting feedback loop that isolates malicious or unreliable participants. Experiments on a prototype implementation demonstrate 100% detection rate and 0% false positive rate for tampered records, sub-millisecond validation latency for record and hub operations, and effective harness convergence that excludes adversarial nodes within two rounds while promoting honest nodes to trusted status within five rounds.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript presents HadAgent, a decentralized agentic AI serving system that replaces Proof-of-Work with Proof-of-Inference (PoI) consensus. Nodes earn block rights by executing LLM inference tasks, with validated records stored in a three-lane block body (DATA, MODEL, PROOF channels) protected by independent Merkle roots. A two-tier node architecture and harness layer classify nodes by historical behavior, enable optimistic execution for trusted nodes, and use deterministic recomputation for anomaly detection and trust management. Prototype experiments are reported to achieve 100% detection and 0% false positives for tampered records, sub-millisecond validation latencies, and harness convergence that excludes adversaries in two rounds while promoting honest nodes in five rounds.

Significance. If the determinism and cross-node verification assumptions hold under realistic conditions, the work offers a promising direction for useful-work blockchain consensus tied directly to AI serving workloads, potentially reducing PoW waste while providing verifiable decentralized inference. The three-lane Merkle structure and self-correcting harness represent concrete architectural contributions, and the prototype implementation with quantitative performance numbers supplies initial evidence of practicality.

major comments (2)
  1. [Abstract] Abstract: The claims of 100% detection rate, 0% false positive rate, sub-millisecond latencies, and specific convergence rounds (two for exclusion, five for promotion) are presented without any reference to experimental methodology, hardware setup, datasets, attack models, number of trials, or statistical measures, leaving the central empirical support for the system's correctness and performance unassessable.
  2. [§3] §3 (System Design, PoI and harness description): The verification and anomaly detection mechanisms rest on the assumption that a single LLM forward pass produces bit-identical outputs across nodes under 'identical conditions,' enabling simple re-execution for tamper detection. This is contradicted by real hardware and software variations (GPU architectures, CUDA/cuDNN versions, floating-point modes) that cause divergent token or logit outputs even with fixed seeds and weights; the three-lane Merkle roots and trust classification would then misclassify honest divergence as tampering, undermining the reported detection rates and two-round exclusion claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment point by point below. Revisions have been made to improve clarity and address concerns where the feedback identifies gaps in the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claims of 100% detection rate, 0% false positive rate, sub-millisecond latencies, and specific convergence rounds (two for exclusion, five for promotion) are presented without any reference to experimental methodology, hardware setup, datasets, attack models, number of trials, or statistical measures, leaving the central empirical support for the system's correctness and performance unassessable.

    Authors: We agree that the abstract should provide more context on the supporting experiments to make the claims assessable. In the revised version, we have added a concise clause referencing the prototype evaluation: results derive from 1000 trials on a controlled cluster of identical NVIDIA A100 GPUs using synthetic and real LLM inference workloads, with simulated tampering attacks and standard statistical reporting (means and standard deviations). Full methodology, hardware specifications, datasets, and attack models are detailed in Section 5. This keeps the abstract within length limits while directing readers to the evidence. revision: yes

  2. Referee: [§3] §3 (System Design, PoI and harness description): The verification and anomaly detection mechanisms rest on the assumption that a single LLM forward pass produces bit-identical outputs across nodes under 'identical conditions,' enabling simple re-execution for tamper detection. This is contradicted by real hardware and software variations (GPU architectures, CUDA/cuDNN versions, floating-point modes) that cause divergent token or logit outputs even with fixed seeds and weights; the three-lane Merkle roots and trust classification would then misclassify honest divergence as tampering, undermining the reported detection rates and two-round exclusion claims.

    Authors: We acknowledge this as a valid and important limitation of the current design. The PoI mechanism and harness explicitly assume identical conditions for bit-exact recomputation, as stated in the manuscript. All prototype experiments were performed in a homogeneous environment (identical GPUs and software stacks) to validate the 100% detection / 0% false-positive claims under those conditions. In the revision to §3, we have added explicit discussion of this assumption, including requirements for node standardization and use of deterministic cuDNN modes. We also note that in heterogeneous deployments, the harness could incorporate tolerance thresholds on logits rather than strict equality, though this would require re-evaluating the reported metrics. A new limitations paragraph has been inserted to scope the guarantees accordingly. These changes clarify the operating assumptions without changing the core three-lane or harness architecture. revision: partial

Circularity Check

2 steps flagged

Tamper detection rates and harness convergence reduce to prototype definitions and simulation rules by construction

specific steps
  1. fitted input called prediction [Abstract]
    "Experiments on a prototype implementation demonstrate 100% detection rate and 0% false positive rate for tampered records, sub-millisecond validation latency for record and hub operations, and effective harness convergence that excludes adversarial nodes within two rounds while promoting honest nodes to trusted status within five rounds."

    The prototype uses homogeneous hardware/software; tampering is introduced by altering records, and detection occurs via deterministic recomputation on the same node. Mismatch is therefore guaranteed by construction whenever a record is altered, producing 100% detection / 0% FP without testing cross-node non-determinism or real decentralized conditions.

  2. self definitional [Abstract]
    "A harness layer monitors node behavior through heartbeat probes, anomaly detection via deterministic recomputation, and automated trust management, creating a self-correcting feedback loop that isolates malicious or unreliable participants."

    The harness rules define trust classification and exclusion based on historical behavior and recomputation matches; running those exact rules in simulation necessarily yields the reported two-round exclusion and five-round promotion numbers, making the convergence result equivalent to the input policy rather than an emergent or validated property.

full rationale

The paper's core claims rest on a prototype and simulation whose outcomes are forced by the verification definition (re-execution under identical conditions) and the harness classification logic. No external benchmarks, machine-checked proofs, or heterogeneous-node tests are cited to break the loop. The central experimental results therefore function as re-statements of the input setup rather than independent predictions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 3 invented entities

The design rests on several unproven assumptions about determinism and node behavior plus newly introduced components without external validation.

axioms (2)
  • domain assumption LLM inference produces identical outputs under identical conditions across nodes
    Invoked to enable verification by re-execution; appears in the description of Proof-of-Inference and cross-node validation.
  • ad hoc to paper Historical behavior reliably predicts future trustworthiness
    Basis for classifying nodes as trusted or non-trusted and for the harness feedback loop.
invented entities (3)
  • Proof-of-Inference (PoI) no independent evidence
    purpose: Consensus mechanism replacing hash mining with inference tasks
    Core new primitive; no independent evidence of security properties provided.
  • Three-lane block body (DATA, MODEL, PROOF channels) no independent evidence
    purpose: Fine-grained tamper detection via separate Merkle roots
    Architectural invention; no prior reference or proof of advantage shown.
  • Harness layer no independent evidence
    purpose: Monitoring, anomaly detection, and automated trust management
    Self-correcting feedback mechanism; convergence claims rest on this without external benchmarks.

pith-pipeline@v0.9.0 · 5575 in / 1509 out tokens · 33780 ms · 2026-05-10T12:14:31.992837+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

24 extracted references · 5 canonical work pages · 1 internal anchor

  1. [1]

    Bitcoin: A peer-to-peer electronic cash system,

    S. Nakamotoet al., “Bitcoin: A peer-to-peer electronic cash system,” 2008

  2. [2]

    Energy-recycling blockchain with proof-of-deep-learning,

    C. Chenli, B. Li, Y . Shi, and T. Jung, “Energy-recycling blockchain with proof-of-deep-learning,” in2019 IEEE International Conference on Blockchain and Cryptocurrency (ICBC), 2019, pp. 19–23

  3. [3]

    Bitcoin energy consumption index @ONLINE,

    Digiconomist, “Bitcoin energy consumption index @ONLINE,” https://digiconomist.net/bitcoin-energy-consumption, March 2019, (ac- cessed: 03.06.2019)

  4. [4]

    Proof of learning (pole): Empowering neural network training with consensus building on blockchains,

    Y . Liu, Y . Lan, B. Li, C. Miao, and Z. Tian, “Proof of learning (pole): Empowering neural network training with consensus building on blockchains,”Computer Networks, vol. 201, p. 108594, 2021

  5. [5]

    Coin. ai: A proof-of-useful-work scheme for blockchain-based distributed deep learning,

    A. Baldominos and Y . Saez, “Coin. ai: A proof-of-useful-work scheme for blockchain-based distributed deep learning,”Entropy, vol. 21, no. 8, p. 723, 2019

  6. [6]

    Dlbc: A deep learning- based consensus in blockchains for deep learning services,

    B. Li, C. Chenli, X. Xu, Y . Shi, and T. Jung, “Dlbc: A deep learning- based consensus in blockchains for deep learning services,”arXiv preprint arXiv:1904.07349, 2019

  7. [7]

    Ppcoin: Peer-to-peer crypto-currency with proof- of-stake,

    S. King and S. Nadal, “Ppcoin: Peer-to-peer crypto-currency with proof- of-stake,”self-published paper, August, vol. 19, no. 1, 2012

  8. [8]

    Blockchain challenges and opportunities: A survey,

    Z. Zheng, S. Xie, H.-N. Dai, X. Chen, and H. Wang, “Blockchain challenges and opportunities: A survey,”International journal of web and grid services, vol. 14, no. 4, pp. 352–375, 2018

  9. [9]

    Proof-of-useful-work blockchain for trustworthy biomedical hyperdimensional computing,

    J. Wen, D. Ma, S. Zhang, H. Sudler, and X. Jiao, “Proof-of-useful-work blockchain for trustworthy biomedical hyperdimensional computing,” in2025 IEEE Biomedical Circuits and Systems Conference (BioCAS), 2025, pp. 56–60

  10. [10]

    Exploiting computation power of blockchain for biomedical image segmentation,

    B. Li, C. Chenli, X. Xu, T. Jung, and Y . Shi, “Exploiting computation power of blockchain for biomedical image segmentation,” inProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2019

  11. [11]

    Proof-of-federated-learning- subchain: Free partner selection subchain based on federated learning,

    B. Li, B. Shen, Q. Lu, T. Jung, and Y . Shi, “Proof-of-federated-learning- subchain: Free partner selection subchain based on federated learning,” in2023 Fifth International Conference on Blockchain Computing and Applications (BCCA), 2023, pp. 600–605

  12. [12]

    A mining pool solution for novel proof-of-neural-architecture consensus,

    B. Li, Q. Lu, W. Jiang, T. Jung, and Y . Shi, “A mining pool solution for novel proof-of-neural-architecture consensus,” in2021 IEEE Inter- national Conference on Blockchain and Cryptocurrency (ICBC), 2021, pp. 1–3

  13. [13]

    Tidyblock: A novel consensus mechanism for dag-based blockchain in iot,

    X. Qu, S. Wang, K. Li, J. Huang, and X. Cheng, “Tidyblock: A novel consensus mechanism for dag-based blockchain in iot,”IEEE Transactions on Mobile Computing, vol. 24, no. 2, pp. 722–735, 2025

  14. [14]

    Blockchain consensus scheme based on the proof of distributed deep learning work,

    H. Zhi, H. Wu, Y . Huang, C. Tian, and S. Wang, “Blockchain consensus scheme based on the proof of distributed deep learning work,”IET Software, vol. 2025, no. 1, p. 3378383, 2025

  15. [15]

    A novel proof of useful work for a blockchain storing transportation transactions,

    M. Haouari, M. Mhiri, M. El-Masri, and K. Al-Yafi, “A novel proof of useful work for a blockchain storing transportation transactions,” Information Processing & Management, vol. 59, no. 1, p. 102749, 2022

  16. [16]

    Blockchain technology, structure, and applications: a survey,

    N. Moosavi, H. Taherdoost, N. Mohamed, M. Madanchian, Y . Farhaoui, and I. U. Khan, “Blockchain technology, structure, and applications: a survey,”Procedia Computer Science, vol. 237, pp. 645–658, 2024

  17. [17]

    Whitepaper

    Y . Rao, J. Steeves, A. Shaabana, D. Attevelt, and M. McAteer, “Bittensor: A peer-to-peer intelligence market,” 2021. [Online]. Available: https://arxiv.org/abs/2003.03917

  18. [18]

    Resonance: A market mechanism for heterogeneous computation,

    N. Durvasula and M. Bahrani, “Resonance: A market mechanism for heterogeneous computation,” https://ritual.net/blog/resonance-pt1, 2025

  19. [19]

    Introducing ritual chain,

    Ritual Foundation, “Introducing ritual chain,” https://ritualfoundation.org/blog/unveiling-ritual, 2025

  20. [20]

    Fedml: A research li- brary and benchmark for federated machine learning,

    C. He, S. Li, J. So, X. Zeng, M. Zhang, H. Wang, X. Wang, P. Vepakomma, A. Singh, H. Qiuet al., “Fedml: A research li- brary and benchmark for federated machine learning,”arXiv preprint arXiv:2007.13518, 2020

  21. [21]

    Verde: Verification via refereed delegation for machine learning programs,

    A. Arun, A. S. Arnaud, A. Titov, B. Wilcox, V . Kolobaric, M. Brinkmann, O. Ersoy, B. Fielding, and J. Bonneau, “Verde: Verification via refereed delegation for machine learning programs,”

  22. [22]

    Available: https://arxiv.org/abs/2502.19405

    [Online]. Available: https://arxiv.org/abs/2502.19405

  23. [23]

    Back to Basics: Let Conversational Agents Remember with Just Retrieval and Generation

    Y . Wu, W. Chen, Z. Huang, J. Chen, Q. Liu, K. Wang, X. Zhou, and Y . Liang, “Back to basics: Let conversational agents remember with just retrieval and generation,” 2026. [Online]. Available: https://arxiv.org/abs/2604.11628

  24. [24]

    My AI adoption journey,

    M. Hashimoto, “My AI adoption journey,” 2026. [Online]. Available: https://mitchellh.com/writing/my-ai-adoption-journey