Can Trustless Agents Be Trusted? An Empirical Study of the ERC-8004 Decentralized AI Agent Ecosystem
Pith reviewed 2026-06-25 19:29 UTC · model grok-4.3
The pith
The ERC-8004 reputation registry cannot serve as a reliable trust signal because its ratings are not based on verifiable interactions and can be manipulated at low cost.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The Registry, as currently deployed, cannot function as a trust signal: values are not commensurable, feedback records are rarely grounded in verifiable interactions, and reputation can be manipulated at minimal cost. Consistent with these design weaknesses, a substantial fraction of reviewers exhibit coordinated Sybil behavior. After removing Sybil-flagged feedback, the majority of rated agents are left with no valid feedback.
What carries the argument
The ERC-8004 Reputation Registry, which stores reviewer scores and identities on-chain, together with the authors' analysis of commensurability, verifiability, and coordinated reviewer patterns.
If this is right
- Only a small minority of registered identities expose a valid registration file with a live service endpoint.
- Reputation values recorded on-chain cannot be compared directly because different reviewers apply inconsistent scales.
- Most feedback entries lack evidence of an actual transaction between the reviewer and the rated agent.
- Coordinated reviewer clusters appear on all three chains at rates between 59 and 91 percent.
- Protocol revisions are needed to tie feedback to verifiable interactions before the registry can support trust decisions.
Where Pith is reading between the lines
- Future versions of the protocol could require cryptographic proof of a completed service call before a review is accepted.
- Similar empirical audits of other on-chain reputation systems would likely reveal comparable gaps between recorded scores and actual usage.
- Agent developers may need to combine the on-chain registry with off-chain verification layers until the protocol is updated.
- The high fraction of placeholder registrations suggests that economic incentives for genuine agent deployment remain weak.
Load-bearing premise
The crawled on-chain events, off-chain files, and payment records give a complete picture of activity and the chosen criteria correctly separate coordinated manipulation from legitimate reviewer behavior.
What would settle it
Discovery of a large set of on-chain x402 payment transactions whose timing and counterparties match the recorded feedback at rates far above those observed in the study, or independent verification that the flagged reviewer clusters consist of distinct real users.
Figures
read the original abstract
As autonomous AI agents increasingly transact across organizational boundaries, a fundamental trust challenge emerges: how can an agent assess whether an unknown counterpart is trustworthy? The ERC-8004 protocol addresses this challenge with the first permissionless trust layer for AI agent economies, built around three on-chain registries for Identity, Reputation, and Validation. Despite its rapid adoption, the protocol has not been studied empirically, leaving it unclear whether the information it records provides a trustworthy basis for decision-making. To address this gap, we present the first empirical study of ERC-8004 across three chains: Ethereum, BNB Smart Chain (BSC), and Base, covering the period from protocol deployment through May 13, 2026. We crawl on-chain Identity and Reputation events, off-chain files, and x402 payment transactions. On the identity side, we find that most registrations are placeholders rather than active agents, with only a small fraction (3%, 4%, and 15% across Ethereum, BSC, and Base) exposing a valid ERC-8004 registration file with at least one live service endpoint. On the reputation side, we show that the Registry, as currently deployed, cannot function as a trust signal: values are not commensurable, feedback records are rarely grounded in verifiable interactions, and reputation can be manipulated at minimal cost. Consistent with these design weaknesses, we find that a substantial fraction of reviewers (73.6%, 59.2%, and 90.6% across Ethereum, BSC, and Base) exhibit coordinated Sybil behavior. After removing Sybil-flagged feedback, 15.5%, 72.3%, and 89.4% of rated agents, respectively, are left with no valid feedback. We then turn these findings into concrete recommendations for future revisions of ERC-8004. Our study yields actionable protocol-design implications and establishes an empirical baseline for research on AI agent markets.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents the first empirical study of the ERC-8004 protocol across Ethereum, BSC, and Base, crawling on-chain Identity/Reputation events, off-chain files, and x402 transactions through May 13, 2026. It reports that only 3%, 4%, and 15% of registrations expose valid files with live endpoints. It concludes that the Reputation Registry cannot serve as a trust signal because values are incommensurable, feedback is rarely grounded in verifiable interactions, and manipulation is low-cost; this is supported by 73.6%, 59.2%, and 90.6% of reviewers exhibiting coordinated Sybil behavior, after which 15.5%, 72.3%, and 89.4% of rated agents retain no valid feedback. The work ends with concrete recommendations for protocol revisions.
Significance. If the data collection and Sybil detection hold, the study supplies a valuable empirical baseline for permissionless trust layers in AI agent economies and identifies actionable design weaknesses that could inform revisions to ERC-8004 and similar systems. The provision of specific on-chain measurements and protocol recommendations is a strength.
major comments (2)
- [Sybil detection / reputation analysis section] The section describing the Sybil flagging procedure (the mapping from on-chain Identity/Reputation events and x402 transactions to the headline percentages) provides no description of clustering features, similarity thresholds, calibration against ground-truth labels, or false-positive assessment against legitimate patterns such as shared developer teams or protocol batching. This is load-bearing for the central claim that the Registry cannot function as a trust signal.
- [Data collection / methods section] The data collection and crawling methods section supplies no details on completeness verification, potential indexing biases across the three chains, or error analysis for the crawled events and files. These omissions directly affect the reliability of all reported fractions (3-15% valid registrations, Sybil rates, and post-removal no-valid-feedback rates).
minor comments (2)
- [Abstract] The abstract states the study covers 'through May 13, 2026'; clarify whether this is a projected or actual cutoff and ensure the methods section cross-references the exact block ranges used.
- [Introduction / related work] The claim of being the 'first empirical study' would benefit from an explicit related-work subsection that surveys any prior on-chain analyses of ERC-8004 or similar registries.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and for recognizing the value of this first empirical study as a baseline for permissionless trust layers. We address each major comment below and will revise the manuscript to improve methodological transparency.
read point-by-point responses
-
Referee: [Sybil detection / reputation analysis section] The section describing the Sybil flagging procedure (the mapping from on-chain Identity/Reputation events and x402 transactions to the headline percentages) provides no description of clustering features, similarity thresholds, calibration against ground-truth labels, or false-positive assessment against legitimate patterns such as shared developer teams or protocol batching. This is load-bearing for the central claim that the Registry cannot function as a trust signal.
Authors: We acknowledge that the manuscript does not currently provide a detailed description of the Sybil flagging procedure. In the revision we will expand the relevant section to specify the clustering features (address co-occurrence via x402 payments and shared metadata), similarity thresholds, any calibration steps performed, and an explicit false-positive assessment that considers legitimate patterns such as shared developer teams and protocol-level batching. This addition will directly address the load-bearing nature of the claim. revision: yes
-
Referee: [Data collection / methods section] The data collection and crawling methods section supplies no details on completeness verification, potential indexing biases across the three chains, or error analysis for the crawled events and files. These omissions directly affect the reliability of all reported fractions (3-15% valid registrations, Sybil rates, and post-removal no-valid-feedback rates).
Authors: We agree that the current methods section lacks these details. The revised manuscript will add a dedicated subsection describing completeness verification (cross-checks against multiple RPC providers and block explorers), potential indexing biases across Ethereum, BSC, and Base, and error analysis for event and file crawling. Where feasible we will also reference the open data-collection scripts. revision: yes
Circularity Check
Empirical measurement study with no derivations or self-referential constructions
full rationale
The paper is a data-driven empirical study that crawls on-chain Identity/Reputation events, off-chain files, and x402 transactions across Ethereum, BSC, and Base. It reports observed fractions of placeholder registrations, incommensurable reputation values, ungrounded feedback, and reviewer coordination patterns. No equations, fitted parameters, predictions derived from inputs, uniqueness theorems, or ansatzes appear in the provided text. Central claims rest on external on-chain observations rather than internal definitions or self-citation chains. The Sybil-flagging procedure is an applied detection method on raw events and does not reduce the reported statistics to the inputs by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
A2A Protocol Organization. 2025. A2A and MCP: Detailed Comparison. A2A Protocol Documentation. Available at: https://a2a-protocol.org/latest/topics/a2a-and-mcp/?utm_source=chatgpt.com
2025
-
[2]
Agentic Commerce Protocol Contributors. 2026. Agentic Commerce Protocol. GitHub Repository. Available at: https://github.com/agentic-commerce-protocol/agentic-commerce-protocol
2026
-
[3]
Anthropic. 2024. Introducing the Model Context Protocol. Anthropic Documentation. Available at: https://www. anthropic.com/news/model-context-protocol
2024
-
[4]
Remco Bloemen, Leonid Logvinov, and Jacob Evans. 2017. EIP-712: Typed Structured Data Hashing and Signing. Ethereum Improvement Proposals, no. 712. https://eips.ethereum.org/EIPS/eip-712
2017
-
[5]
Eric Budish, Andrew Lewis-Pye, and Tim Roughgarden. 2024. The economic limits of permissionless consensus. In ACM Conference on Economics and Computation (EC). 704–731
2024
-
[6]
Vitalik Buterin. 2016. EIP-155: Simple Replay Attack Protection. Ethereum Improvement Proposals, no. 155. https: //eips.ethereum.org/EIPS/eip-155
2016
-
[7]
Bing-Jyue Chen, Suppakit Waiwitlikhit, Ion Stoica, and Daniel Kang. 2024. ZKML: An Optimizing System for ML Inference in Zero-Knowledge Proofs. InProceedings of the European Conference on Computer Systems (EuroSys). 560–574
2024
-
[8]
Mohd Sameen Chishti, Damilare Peter Oyinloye, and Jingyue Li. 2026. AgentReputation: A Decentralized Agentic AI Reputation Framework.arXiv preprint arXiv:2605.00073(2026)
Pith/arXiv arXiv 2026
-
[9]
Arka Rai Choudhuri, Sanjam Garg, Keewoo Lee, Hart Montgomery, Guru Vamsi Policharla, and Rohit Sinha. 2026. A Cryptographic Framework for Proof of Personhood.Cryptology ePrint Archive(2026)
2026
-
[10]
Coinbase. 2025. x402: HTTP Payment Protocol for AI Agents. Coinbase Developer. Avaliable at: https://www.x402.org/
2025
-
[11]
Victor Costan and Srinivas Devadas. 2016. Intel SGX Explained.IACR Cryptol. ePrint Arch.2016, 86 (2016), 1–118
2016
-
[12]
Elizabeth Crites, Aggelos Kiayias, Markulf Kohlweiss, and Amirreza Sarencheh. 2025. SyRA: Sybil-resilient anonymous signatures with applications to decentralized identity. InACM SIGSAC Conference on Computer and Communications Security (CCS). 423–437
2025
-
[13]
Marco De Rossi, Davide Crapis, Jordan Ellis, and Erik Reppel. 2025. ERC-8004: Trustless Agents. Ethereum Improvement Proposals, no. 8004. Available at: https://eips.ethereum.org/EIPS/eip-8004
2025
-
[14]
William Entriken, Dieter Shirley, Jacob Evans, and Nastassia Sachs. 2018. EIP-721: Non-Fungible Token Standard. Ethereum Improvement Proposals, no. 721. Available at: https://eips.ethereum.org/EIPS/eip-721
2018
-
[15]
ERC-8004 Contributors. 2024. ERC-8004 Contracts. GitHub repository. Available at: https://github.com/erc-8004/erc- 8004-contracts
2024
-
[16]
Efat Fathalla, Mohamed Azab, Chunsheng Xin, and Hongyi Wu. 2026. Self-sovereign identity as a secure and trustworthy approach to digital identity management: A comprehensive survey.ACM Computing Surveys (CSUR)58, 7 (2026), 1–47
2026
-
[17]
Francisco Giordano, Matt Condon, Philippe Castonguay, Amir Bandeali, and Jacob Evans. 2018. ERC-1271: Standard Signature Validation Method for Contracts. Ethereum Improvement Proposals, no. 1271. https://eips.ethereum.org/ EIPS/eip-1271
2018
-
[18]
Mehul Goenka, Tejas Pathak, and Siddharth Asthana. 2026. TessPay: Verify-then-Pay Infrastructure for Trusted Agentic Commerce.arXiv preprint arXiv:2602.00213(2026)
arXiv 2026
-
[19]
Google. 2025. Announcing the Agent2Agent Protocol (A2A). Google Developers Blog. Available at: https://developers. googleblog.com/en/a2a-a-new-era-of-agent-interoperability/
2025
-
[20]
Shaolong Guo, Yuntao Wang, Zhou Su, Yanghe Pan, Qinnan Hu, and Tom H Luan. 2026. Agent Discovery in Internet of Agents: Challenges and Solutions.IEEE Network(2026)
2026
-
[21]
Omar Hasan, Lionel Brunie, and Elisa Bertino. 2022. Privacy-preserving reputation systems based on blockchain and other cryptographic building blocks: A survey.ACM Computing Surveys (CSUR)55, 2 (2022), 1–37
2022
-
[22]
Helixa. 2026. Helixa: Onchain Identity and Reputation for AI Agents. Project website; Base mainnet contract 0x2e3B541C59D38b84E3Bc54e977200230A204Fe60. Available at: https://helixa.xyz
2026
-
[23]
Botao Amber Hu and Helena Rong. 2025. Inter-Agent Trust Models: A Comparative Study of Brief, Claim, Proof, Stake, Reputation and Constraint in Agentic Web Protocol Design—A2A, AP2, ERC-8004, and Beyond.arXiv preprint arXiv:2511.03434(2025)
arXiv 2025
-
[24]
Enabling imitation-based cooperation in dy- namic social networks
Trung Dong Huynh, Nicholas R. Jennings, and Nigel R. Shadbolt. 2006. An Integrated Trust and Reputation Model for Open Multi-Agent Systems.Autonomous Agents and Multi-Agent Systems13, 2 (2006), 119–154. doi:10.1007/s10458- 005-6825-4
-
[25]
Yanna Jiang, Delong Li, Haiyu Deng, Baihe Ma, Xu Wang, Qin Wang, and Guangsheng Yu. 2026. SoK: Agentic Skills–Beyond Tool Use in LLM Agents.arXiv preprint arXiv:2602.20867(2026)
Pith/arXiv arXiv 2026
-
[26]
Audun Jøsang, Roslan Ismail, and Colin Boyd. 2007. A Survey of Trust and Reputation Systems for Online Service Provision.Decision Support Systems43, 2 (2007), 618–644. doi:10.1016/j.dss.2005.05.019 22 Xiong et al
-
[27]
Juhee Kim, Wenbo Guo, and Dawn Song. 2026. SoK: Attack and defense landscape of agentic AI systems. InUSENIX Security Symposium (USENIX Sec)
2026
-
[28]
Peter Jihoon Kim, Kevin Britz, and David Knott. 2020. EIP-3009: Transfer With Authorization. Ethereum Improvement Proposals, no. 3009. https://eips.ethereum.org/EIPS/eip-3009
2020
-
[29]
Lightning Labs. 2025. L402: Lightning HTTP 402 Protocol.Accessed June 2026. https:// docs.lightning.engineering/ the- lightning-network/ l402(2025)
2025
-
[30]
Yue Li, Lei Wang, Kaixuan Wang, Zhiqiang Yang, Ke Wang, Zhi Guan, and Jianbo Gao. 2026. A402: Binding Cryp- tocurrency Payments to Service Execution for Agentic Commerce.arXiv preprint arXiv:2603.01179(2026)
arXiv 2026
-
[31]
Zelin Li, Qin Wang, and Zhipeng Wang. 2026. Five Attacks on x402 Agentic Payment Protocol.arXiv preprint arXiv:2605.11781(2026)
Pith/arXiv arXiv 2026
-
[32]
Shengchen Ling, Yihang Huang, Yuefeng Du, Yuan Chen, Yajin Zhou, Lei Wu, and Cong Wang. 2026. Free-Riding the Agentic Web: A Systematic Security Analysis of x402 Payments.arXiv preprint arXiv:2605.30998(2026)
Pith/arXiv arXiv 2026
-
[33]
Qiangqiang Liu, Qian Huang, Frank Fan, Haishan Wu, and Xueyan Tang. 2025. Detecting Sybil Addresses in Blockchain Airdrops: A Subgraph-based Feature Propagation and Fusion Approach. InIEEE International Conference on Blockchain and Cryptocurrency (ICBC)
2025
-
[34]
Yulin Liu. 2026. A Dataset of Early Blockchain-Registered AI Agents on Ethereum.arXiv preprint arXiv:2604.22652 (2026)
Pith/arXiv arXiv 2026
-
[35]
Yizhong Liu, Zedan Zhao, Boyu Zhao, Feiang Ran, Xun Lin, Dawei Li, and Zhenyu Guan. 2025. Fully anonymous decentralized identity supporting threshold traceability with practical blockchain. InACM The Web Conference (WWW)
2025
-
[36]
Yuwei Lou, Hao Hu, Shaocong Ma, Zongfei Zhang, Liang Wang, Jidong Ge, and Xianping Tao. 2025. DRF: LLM-AGENT Dynamic Reputation Filtering Framework.arXiv preprint arXiv:2509.05764(2025)
arXiv 2025
-
[37]
Elizabeth Lui, Rui Sun, Vatsal Shah, Xihan Xiong, Jiahao Sun, Davide Crapis, William Knottenbelt, and Zhipeng Wang
-
[38]
SoK: Blockchain-Based Decentralized AI (DeAI).arXiv preprint arXiv:2411.17461(2024)
arXiv 2024
-
[39]
Qian’ang Mao, Jiaxin Wang, Ya Liu, Li Zhu, Cong Ma, and Jiaqi Yan. 2026. SoK: Security of Autonomous LLM Agents in Agentic Commerce.arXiv preprint arXiv:2604.15367(2026)
Pith/arXiv arXiv 2026
-
[40]
Carlo Mazzocca, Abbas Acar, Selcuk Uluagac, Rebecca Montanari, Paolo Bellavista, and Mauro Conti. 2025. A survey on decentralized identifiers and verifiable credentials.IEEE Communications Surveys & Tutorials(2025)
2025
-
[41]
Nuwa Protocol. 2026. nuwa-8004: ERC-8004 Implementation Contracts for xNUWA. GitHub repository, fork of the ERC-8004 reference implementation. Available at: https://github.com/nuwa-protocol/nuwa-8004
2026
-
[42]
Open Agentic Schema Framework Contributors. 2025. Open Agentic Schema Framework (OASF). AGNTCY project documentation. https://github.com/agntcy/oasf
2025
-
[43]
Michele Orrù. 2025. Revisiting keyed-verification Anonymous credentials. InACM SIGSAC Conference on Computer and Communications Security (CCS). 1188–1199
2025
-
[44]
Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. 2023. Generative agents: Interactive simulacra of human behavior. InProceedings of the Annual ACM Symposium on User Interface Software and Technology (UIST). 1–22
2023
-
[45]
QuantuLabs. 2026. 8004-solana: An ERC-8004 SDK for Solana. GitHub repository. Available at: https://github.com/ QuantuLabs/8004-solana
2026
-
[46]
Tayebeh Rajabi, Alvi Ataur Khalil, Mohammad Hossein Manshaei, Mohammad Ashiqur Rahman, Mohammad Dakhi- lalian, Maurice Ngouen, Murtuza Jadliwala, and A Selcuk Uluagac. 2023. Feasibility analysis for sybil attacks in shard-based permissionless blockchains.Distributed Ledger Technologies: Research and Practice2, 4 (2023), 1–21
2023
-
[47]
Siyue Ren, Wanli Fu, Xinkun Zou, Chen Shen, Yi Cai, Chen Chu, Zhen Wang, and Shuyue Hu. 2025. Beyond the Tragedy of the Commons: Building A Reputation System for Generative Multi-agent Systems.arXiv preprint arXiv:2505.05029 (2025)
arXiv 2025
-
[48]
Ranjan Sapkota, Konstantinos I Roumeliotis, and Manoj Karkee. 2025. Ai agents vs. agentic ai: A conceptual taxonomy, applications and challenges.Information Fusion(2025), 103599
2025
-
[49]
Manu Sporny, Dave Longley, Markus Sabadello, Drummond Reed, Orie Steele, and Christopher Allen. 2022. Decentral- ized Identifiers (DIDs) v1.0. W3C Recommendation. Available at: https://www.w3.org/TR/did-core/
2022
-
[50]
Rui Sun, Zhipeng Wang, Jiahao Sun, and Rajiv Ranjan. 2025. Vision: How to fully unleash the productivity of agentic ai? decentralized agent swarm network. InICML 2025 Workshop on Collaborative and Federated Agentic Workflows
2025
-
[51]
Nenad Tomasev, Matija Franklin, Joel Z Leibo, Julian Jacobs, William A Cunningham, Iason Gabriel, and Simon Osindero. 2025. Virtual agent economies.arXiv preprint arXiv:2509.10147(2025)
arXiv 2025
-
[52]
Jiangshan Yu, David Kozhaya, Jeremie Decouchant, and Paulo Esteves-Verissimo. 2019. Repucoin: Your reputation is your power.IEEE Transactions on Computers (TC)68, 8 (2019), 1225–1237
2019
-
[53]
Yuanzhe Zhang, Yuexin Xiang, Yuchen Lei, Qin Wang, Tian Qiu, Yujing Sun, Spiridon Zarkov, Tsz Hon Yuen, Andreas Deppeler, Jiangshan Yu, and Kwok-Yan Lam. 2026. SoK: Blockchain Agent-to-Agent Payments.arXiv preprint arXiv:2604.03733(2026). An Empirical Study of ERC-8004 23 A ERC-8004 Background: Supplementary Diagram Figure 21 illustrates the agentic-inter...
Pith/arXiv arXiv 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.