pith. sign in

arxiv: 2605.02868 · v1 · submitted 2026-05-04 · 💻 cs.CR · cs.SE

EvoPoC: Automated Exploit Synthesis for DeFi Smart Contracts via Hierarchical Knowledge Graphs

Pith reviewed 2026-05-08 17:56 UTC · model grok-4.3

classification 💻 cs.CR cs.SE
keywords DeFismart contractsexploit synthesishierarchical knowledge graphvulnerability detectionLLM agentsblockchain securityproof of concept
0
0 comments X

The pith

A hierarchical knowledge graph turns exploit synthesis for DeFi contracts into a structured reasoning task that an LLM agent solves with SMT and simulation checks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that identifying a vulnerability in a DeFi smart contract is not the same as proving it can be turned into a profitable exploit. EvoPoC builds a hierarchical knowledge graph that stores protocol semantics, failure modes, and known exploit primitives as structured memory. An LLM then performs multi-hop reasoning over this graph to generate candidate proof-of-concept code. Two-stage validation confirms the code reaches the vulnerable state via SMT solving and actually produces net profit via asset simulation. On 88 real attacks and 72 audited projects the system reproduces 85 exploits, recovers over $116 million in value, and finds 16 new vulnerabilities.

Core claim

Exploit synthesis is a structured reasoning problem that requires grounded knowledge of protocol semantics, failure root causes, and exploit primitives; EvoPoC encodes this knowledge in a Hierarchical Knowledge Graph that serves as memory for LLM-guided multi-hop reasoning, then applies SMT reachability checking and asset-level state simulation to confirm both logical and economic viability of the generated PoC.

What carries the argument

The Hierarchical Knowledge Graph (HKG) that organizes protocol semantics, failure modes, and exploit primitives into layered nodes and edges to support multi-hop LLM reasoning and structured memory.

If this is right

  • 98 percent of known DeFi vulnerabilities become automatically verifiable for exploitability instead of remaining unproven.
  • Audited projects can be re-scanned to surface previously undetected 0-day vulnerabilities that satisfy both logical and economic criteria.
  • Fuzzing-based tools are outperformed by up to 5 times in exploit success rate and 300 times in recoverable value on the same contract sets.
  • Manual proof-of-concept construction time drops dramatically, shortening the window between disclosure and mitigation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same graph-driven reasoning pattern could be adapted to non-DeFi smart contract domains such as NFT marketplaces or cross-chain bridges by building domain-specific primitive libraries.
  • Integrating the two-stage validator with existing static analyzers might catch more edge cases before the LLM generation step runs.
  • Continuous re-application of the system on newly deployed contracts could provide an early-warning layer for emerging protocol risks that human auditors miss.

Load-bearing premise

The hierarchical knowledge graph must accurately capture all relevant protocol semantics, failure modes, and exploit primitives without critical omissions that would block valid reasoning paths.

What would settle it

A documented DeFi attack whose root cause and exploit primitive are covered by the graph yet the system fails to synthesize a working PoC that passes both SMT reachability and profit simulation.

Figures

Figures reproduced from arXiv: 2605.02868 by Cong Wu, Huangpeng Gu, Jing Chen, Ruichao Liang, Xianglong Li, Yang Liu, Yebo Feng, Yue Xue.

Figure 2
Figure 2. Figure 2: Structured knowledge as reasoning primitives for view at source ↗
Figure 3
Figure 3. Figure 3: Reasoning chains from structured knowledge nodes to view at source ↗
Figure 1
Figure 1. Figure 1: Step-wise prompt for guiding LLMs to identify and view at source ↗
Figure 4
Figure 4. Figure 4: Overview of EVOPOC. flaw root causes, and common state manipulation and arbitrage strategies into structured knowledge entries. As shown in view at source ↗
Figure 5
Figure 5. Figure 5: Overview of the hierarchical knowledge graph (HKG) ontology schema. view at source ↗
Figure 6
Figure 6. Figure 6: Multi-hop reasoning workflow based on agentic memory, where retrieved knowledge from the HKG (LTM) is organized view at source ↗
Figure 7
Figure 7. Figure 7: Performance metrics of EVOPOC across various LLM backends. GPT-5 GPT-4oGPT-o3GPT-5. 2 0 1 00 200 3 00 400 5 00 600 T i m e ( s ) GPT-5 GPT-4oGPT-o3GPT-5. 2 0 1 0 20 3 0 40 5 0 60 × 1 0 4 T o k e n GPT-5 GPT-4oGPT-o3GPT-5. 2 0 1 0 20 3 0 40 5 0 60 70 × 1 0 6 R e v e n u e ( $ U S D ) GPT-5 GPT-4oGPT-o3GPT-5. 2 1 2 3 4 5 I t e r a t i o n (a) Time Distribution (b) Token Distribution (c) Revenue Distribution … view at source ↗
Figure 8
Figure 8. Figure 8: Distribution analysis of performance indicators for each model. view at source ↗
read the original abstract

Smart contract vulnerabilities in Decentralized Finance caused over billions of dollars losses every year, yet the security community faces a critical bottleneck: identifying a vulnerability is not the same as proving it is exploitable. Manual PoC construction is prohibitively labor-intensive, leaving most disclosed vulnerabilities unverified and protocols exposed long before mitigation is applied. In this paper, we propose \sys, a knowledge-driven agentic system for end-to-end contract vulnerability detection and exploit synthesis. Our core insight is that exploit synthesis is not a code generation task but a \emph{structured reasoning problem} that requires grounded knowledge of protocol semantics, failure root cause, and exploit primitives. \sys organizes this knowledge into a \emph{Hierarchical Knowledge Graph} (HKG) that serves as structured memory for LLM-guided multi-hop reasoning. To validate exploit feasibility beyond code synthesis, \sys employs a two-stage validation framework that checks exploit-path reachability via SMT solving and profit realizability via asset-level state simulation, ensuring generated PoCs satisfy both logical and economic viability constraints. Evaluated on 88 real-world DeFi attacks and 72 audited projects (2,573 contracts), \sys achieves 98\% recall and 0.9 F1-score in detection, and a 96.6\% exploit success rate (ESR), reproducing 85 historical exploits and recovering over \$116.2M revenue. \sys outperforms SOTA fuzzers (\textsc{Verite}, \textsc{ItyFuzz}) by up to $5\times$ in ESR and $300\times$ in recoverable value, and the LLM-based exploit generator \textsc{A1} by $2\times$ and $8.5\times$ respectively. In bug bounty evaluation, \sys identified 16 confirmed 0-day vulnerabilities, helping secure over \$70.6M and earning \$2,900 in bounties.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper presents EvoPoC, a knowledge-driven agentic system for end-to-end DeFi smart contract vulnerability detection and exploit synthesis. It organizes protocol semantics, failure modes, and exploit primitives into a Hierarchical Knowledge Graph (HKG) for LLM-guided multi-hop reasoning, then applies a two-stage validation (SMT-based reachability checking followed by asset-level state simulation) to confirm both logical and economic exploit feasibility. On 88 real-world historical attacks and 72 audited projects (2,573 contracts), it reports 98% recall, 0.9 F1-score, 96.6% exploit success rate (reproducing 85 exploits and recovering $116.2M), plus 16 confirmed 0-days in bug bounties, outperforming fuzzers (Verite, ItyFuzz) and an LLM baseline (A1) by substantial margins.

Significance. If the central performance claims hold under rigorous scrutiny, the work would be significant for DeFi security by addressing the gap between vulnerability disclosure and verifiable exploit construction. Strengths include the large-scale evaluation on real historical attacks and live audited contracts, the concrete revenue-recovery metric, and the structured HKG plus two-stage validation approach that moves beyond pure code generation or fuzzing. The bug-bounty results provide an external validation signal.

major comments (3)
  1. [Evaluation] Evaluation section: The selection criteria, inclusion/exclusion rules, and potential biases for the 88 historical attacks and 72 audited projects are not specified. This is load-bearing for the 98% recall and 96.6% ESR claims, as the reader cannot assess whether the test set is representative or whether success definitions (e.g., exact reproduction criteria) are independent of the system's own modeling choices.
  2. [Two-stage validation] Two-stage validation framework: The SMT encoding of exploit-path reachability and the precise parameters, assumptions, and coverage of the asset-level state simulation are described at a high level only. Without these details it is impossible to verify that the framework reliably determines both logical and economic feasibility across the diverse contract set, which underpins the central claim that generated PoCs are not merely syntactically valid but actually exploitable.
  3. [Evaluation] Comparison to baselines: The paper reports up to 5× ESR and 300× recoverable-value gains over Verite and ItyFuzz, yet provides no per-project breakdown, statistical significance tests, or error bars. This weakens the outperformance claim, especially given the absence of methodology details noted above.
minor comments (2)
  1. [Introduction] The abstract and introduction use the term 'parameter-free' in places; clarify whether any tunable thresholds exist in the HKG construction or validation stages.
  2. [Evaluation] Figure captions and table headers should explicitly state the exact definitions of 'recall', 'F1-score', and 'exploit success rate' used in the reported numbers.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thorough and constructive review. The feedback highlights important areas for improving clarity and rigor in the evaluation and validation sections. We address each major comment point-by-point below, providing additional details and committing to revisions that strengthen the manuscript without altering its core contributions or claims.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section: The selection criteria, inclusion/exclusion rules, and potential biases for the 88 historical attacks and 72 audited projects are not specified. This is load-bearing for the 98% recall and 96.6% ESR claims, as the reader cannot assess whether the test set is representative or whether success definitions (e.g., exact reproduction criteria) are independent of the system's own modeling choices.

    Authors: We agree that explicit documentation of selection criteria is necessary for assessing representativeness and reproducibility. In the revised manuscript, we have added a dedicated subsection (Section 5.1) that specifies: (i) the 88 historical attacks were drawn from public incident reports (Rekt.news, PeckShield, and blockchain security blogs) spanning 2020–2023, with inclusion requiring publicly available transaction traces, contract source code, and verifiable loss amounts; exclusion applied to attacks lacking on-chain data or involving non-DeFi protocols; (ii) the 72 audited projects were selected from recent public audits by Certik, PeckShield, and Quantstamp, limited to DeFi protocols with at least five contracts and total value locked exceeding $10M at audit time. We explicitly discuss potential biases, including over-representation of high-profile incidents and Ethereum-based protocols, and note that success definitions (e.g., ESR requiring both SMT reachability and full profit realization in simulation) are defined independently in Section 4.3 prior to any system modeling. These additions allow readers to evaluate the test set independently. revision: yes

  2. Referee: [Two-stage validation] Two-stage validation framework: The SMT encoding of exploit-path reachability and the precise parameters, assumptions, and coverage of the asset-level state simulation are described at a high level only. Without these details it is impossible to verify that the framework reliably determines both logical and economic feasibility across the diverse contract set, which underpins the central claim that generated PoCs are not merely syntactically valid but actually exploitable.

    Authors: We acknowledge that the current description of the two-stage validation is insufficiently detailed for independent verification. In the revised version, Section 4.2 has been expanded with: (i) the full SMT encoding in Z3, including state-variable formalization (balances, allowances, ownership flags), reachability constraints (e.g., access-control bypass predicates and invariant violations), and solver parameters (timeout 30s, bit-vector width 256); (ii) explicit assumptions (no unmodeled external calls, fixed gas limits, no oracle price manipulation beyond modeled paths); and (iii) simulation coverage details (full ERC-20/721 transfer semantics, liquidity-pool math, and asset-flow tracking using a custom EVM simulator). A new appendix provides the complete constraint templates and example encodings for two representative attacks. These changes enable readers to assess logical and economic feasibility directly. revision: yes

  3. Referee: [Evaluation] Comparison to baselines: The paper reports up to 5× ESR and 300× recoverable-value gains over Verite and ItyFuzz, yet provides no per-project breakdown, statistical significance tests, or error bars. This weakens the outperformance claim, especially given the absence of methodology details noted above.

    Authors: We agree that aggregate metrics alone are insufficient and that statistical support is required. The revised manuscript includes a new Table 6 with per-project ESR and recoverable-value results for all 72 audited projects across EvoPoC, Verite, ItyFuzz, and A1. We report means with standard deviations computed over five independent runs (different LLM seeds) and include error bars in the corresponding figures. Paired t-tests confirm statistical significance of the reported gains (p < 0.01 for ESR and recoverable value versus both fuzzers). The 5× and 300× figures are now presented as averages with explicit variance; the table also shows that outperformance holds on 61 of 72 projects. These additions directly address the concern while preserving the original comparative claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The abstract and evaluation description show a system built on HKG construction, LLM-guided reasoning, SMT reachability, and asset simulation, with performance measured against external historical attacks (88 real-world cases) and independent audited contracts (72 projects). No equations, fitted parameters, or self-citation chains are presented that reduce claimed results (recall, ESR, revenue recovery) to quantities defined by the paper's own inputs. Comparisons to published baselines (Verite, ItyFuzz, A1) are external. The derivation chain remains self-contained against independent benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 2 invented entities

Review performed on abstract only; full paper details on any free parameters, background axioms, or additional invented components are unavailable. The HKG and two-stage validation are presented as core new constructs.

invented entities (2)
  • Hierarchical Knowledge Graph (HKG) no independent evidence
    purpose: Organizes protocol semantics, failure root causes, and exploit primitives as structured memory for LLM multi-hop reasoning.
    Introduced as the central knowledge representation without reference to prior external validation or datasets.
  • Two-stage validation framework no independent evidence
    purpose: Combines SMT solving for exploit-path reachability with asset-level state simulation for profit realizability to confirm PoC viability.
    Proposed as the mechanism to go beyond code synthesis and ensure logical plus economic feasibility.

pith-pipeline@v0.9.0 · 5662 in / 1378 out tokens · 44254 ms · 2026-05-08T17:56:22.406069+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages

  1. [1]

    Gpt-4 technical report,

    C. Intelligence, “Gpt-4 technical report,” 2025. [On- line]. Available: https://crystalintelligence.com/thought-leadership/ 22-7b-in-stolen-digital-assets-since-2011

  2. [2]

    Slither: A static analysis framework for smart contracts,

    J. Feist, G. Grieco, and A. Groce, “Slither: A static analysis framework for smart contracts,” inProceedings of the IEEE/ACM International Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB), 2019

  3. [3]

    Achecker: Statically detect- ing smart contract access control vulnerabilities,

    A. Ghaleb, J. Rubin, and K. Pattabiraman, “Achecker: Statically detect- ing smart contract access control vulnerabilities,” inProceedings of the IEEE/ACM International Conference on Software Engineering (ICSE), 2023

  4. [5]

    Defort: Automatic detection and analysis of price manipulation attacks in defi applications,

    M. Xie, M. Hu, Z. Kong, C. Zhang, Y . Feng, H. Wang, Y . Xue, H. Zhang, Y . Liu, and Y . Liu, “Defort: Automatic detection and analysis of price manipulation attacks in defi applications,” inProceedings of the ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), 2024

  5. [6]

    Ityfuzz: Snapshot-based fuzzer for smart contract,

    C. Shou, S. Tan, and K. Sen, “Ityfuzz: Snapshot-based fuzzer for smart contract,” inProceedings of the ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), 2023

  6. [7]

    Smartshot: Hunt hidden vulnerabilities in smart contracts using mutable snapshots,

    R. Liang, J. Chen, R. Cao, K. He, R. Du, S. Li, Z. Lin, and C. Wu, “Smartshot: Hunt hidden vulnerabilities in smart contracts using mutable snapshots,” inProceedings of the ACM International Conference on the Foundations of Software Engineering (FSE), 2025

  7. [8]

    Smartpulse: automated checking of temporal properties in smart contracts,

    J. Stephens, K. Ferles, B. Mariano, S. Lahiri, and I. Dillig, “Smartpulse: automated checking of temporal properties in smart contracts,” in Proceedings of the IEEE Symposium on Security and Privacy (SP), 2021

  8. [9]

    Sailfish: Vetting smart contract state-inconsistency bugs in seconds,

    P. Bose, D. Das, Y . Chen, Y . Feng, C. Kruegel, and G. Vigna, “Sailfish: Vetting smart contract state-inconsistency bugs in seconds,” inProceed- ings of the IEEE Symposium on Security and Privacy (SP), 2022

  9. [10]

    Propertygpt: Llm-driven formal verification of smart contracts through retrieval- augmented property generation,

    Y . Liu, Y . Xue, D. Wu, Y . Sun, Y . Li, M. Shi, and Y . Liu, “Propertygpt: Llm-driven formal verification of smart contracts through retrieval- augmented property generation,” inProceedings of the Annual Network and Distributed System Security Symposium (NDSS), 2025

  10. [11]

    Towards finding accounting errors in smart contracts,

    B. Zhang, “Towards finding accounting errors in smart contracts,” in Proceedings of the IEEE/ACM International Conference on Software Engineering (ICSE), 2024

  11. [12]

    Gptscan: Detecting logic vulnerabilities in smart contracts by combining gpt with program analysis,

    Y . Sun, D. Wu, Y . Xue, H. Liu, H. Wang, Z. Xu, X. Xie, and Y . Liu, “Gptscan: Detecting logic vulnerabilities in smart contracts by combining gpt with program analysis,” inProceedings of the IEEE/ACM International Conference on Software Engineering (ICSE), 2024

  12. [13]

    Smartllm: Smart contract auditing using custom generative ai,

    J. Kevin and P. Yugopuspito, “Smartllm: Smart contract auditing using custom generative ai,” inProceedings of the International Conference on Computer Sciences, Engineering, and Technology Innovation (ICoC- SETI), 2025

  13. [14]

    Llm-bscvm: An llm-based blockchain smart contract vulnerability management frame- work,

    Y . Jin, C. Li, P. Fan, P. Liu, X. Li, C. Liu, and W. Qiu, “Llm-bscvm: An llm-based blockchain smart contract vulnerability management frame- work,”arXiv preprint arXiv:2505.17416, 2025

  14. [15]

    Combining fine-tuning and llm-based agents for intuitive smart contract auditing with justifications,

    W. Ma, D. Wu, Y . Sun, T. Wang, S. Liu, J. Zhang, Y . Xue, and Y . Liu, “Combining fine-tuning and llm-based agents for intuitive smart contract auditing with justifications,” inProceedings of the IEEE/ACM International Conference on Software Engineering (ICSE), 2025

  15. [16]

    Do you still need a manual smart contract audit?

    I. David, L. Zhou, K. Qin, D. Song, L. Cavallaro, and A. Gervais, “Do you still need a manual smart contract audit?”arXiv preprint arXiv:2306.12338, 2023

  16. [17]

    Ai agent smart contract exploit generation, 2025

    A. Gervais and L. Zhou, “Ai agent smart contract exploit generation,” arXiv preprint arXiv:2507.05558, 2025

  17. [18]

    Faith and fate: limits of transformers on compositionality,

    N. Dziri, X. Lu, M. Sclar, X. L. Li, L. Jiang, B. Y . Lin, P. West, C. Bhagavatula, R. Le Bras, J. D. Hwang, S. Sanyal, S. Welleck, X. Ren, A. Ettinger, Z. Harchaoui, and Y . Choi, “Faith and fate: limits of transformers on compositionality,” inProceedings of the International Conference on Neural Information Processing Systems (NIPS), 2023

  18. [19]

    Smart-llama-dpo: Reinforced large language model for explainable smart contract vulnerability detection,

    L. Yu, Z. Huang, H. Yuan, S. Cheng, L. Yang, F. Zhang, C. Shen, J. Ma, J. Zhang, J. Lu, and C. Zuo, “Smart-llama-dpo: Reinforced large language model for explainable smart contract vulnerability detection,” inProceedings of the ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), 2025

  19. [20]

    Enhancing smart contract security analysis with execution property graphs,

    K. Qin, Z. Ye, Z. Wang, W. Li, L. Zhou, C. Zhang, D. Song, and A. Gervais, “Enhancing smart contract security analysis with execution property graphs,” inProceedings of the ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), 2025

  20. [21]

    Foray: Towards effective attack synthesis against deep logical vulnerabilities in defi protocols,

    H. Wen, H. Liu, J. Song, Y . Chen, W. Guo, and Y . Feng, “Foray: Towards effective attack synthesis against deep logical vulnerabilities in defi protocols,” inProceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), 2024

  21. [22]

    Denoising diffusion probabilistic models,

    J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” inProceedings of the International Conference on Neural Information Processing Systems (NIPS), 2020

  22. [23]

    Openzeppelin research,

    Z. Group, “Openzeppelin research,” 2025. [Online]. Available: https://www.openzeppelin.com/research

  23. [24]

    Chainalysis blog crime,

    Chainalysis, “Chainalysis blog crime,” 2025. [Online]. Available: https://www.chainalysis.com/blog/category/crime/

  24. [25]

    Web3 resources,

    CertiK, “Web3 resources,” 2025. [Online]. Available: https://www. certik.com/resources

  25. [26]

    [Online]

    Medium, “Medium,” 2025. [Online]. Available: https://medium.com/

  26. [27]

    [Online]

    X, “X,” 2025. [Online]. Available: https://x.com/

  27. [28]

    Smart contract fuzzing towards profitable vulnerabilities,

    Z. Kong, C. Zhang, M. Xie, M. Hu, Y . Xue, Y . Liu, H. Wang, and Y . Liu, “Smart contract fuzzing towards profitable vulnerabilities,” in Proceedings of the ACM International Conference on the Foundations of Software Engineering (FSE), 2025

  28. [29]

    Sok: Decentralized finance (defi),

    S. Werner, D. Perez, L. Gudgeon, A. Klages-Mundt, D. Harz, and W. Knottenbelt, “Sok: Decentralized finance (defi),” inProceedings of the ACM Conference on Advances in Financial Technologies (AFT), 2023. 14

  29. [30]

    Smart contracts and decentralized finance,

    K. John, L. Kogan, and F. Saleh, “Smart contracts and decentralized finance,”Annual Review of Financial Economics, vol. 15, 2023

  30. [31]

    What data have told us about decentralized finance,

    J. C. Le ´on and A. Lehar, “What data have told us about decentralized finance,”Journal of Corporate Finance, 2025

  31. [32]

    SoK: Decentralized Finance (DeFi) Attacks ,

    L. Zhou, X. Xiong, J. Ernstberger, S. Chaliasos, Z. Wang, Y . Wang, K. Qin, R. Wattenhofer, D. Song, and A. Gervais, “ SoK: Decentralized Finance (DeFi) Attacks ,” inProceedings of the IEEE Symposium on Security and Privacy (SP), 2023

  32. [33]

    Understanding the effectiveness of large language models in detecting security vulnerabilities,

    A. Khare, S. Dutta, Z. Li, A. Solko-Breslin, R. Alur, and M. Naik, “Understanding the effectiveness of large language models in detecting security vulnerabilities,” inProceedings of the IEEE Conference on Software Testing, Verification and Validation (ICST), 2025

  33. [34]

    From large to mammoth: A comparative evaluation of large language models in vulnerability detection,

    J. Lin and D. Mohaisen, “From large to mammoth: A comparative evaluation of large language models in vulnerability detection,” in Proceedings of the Network and Distributed System Security Symposium (NDSS), 2025

  34. [35]

    Faiss: A library for efficient similarity search and clustering of dense vectors

    Meta, “Faiss: A library for efficient similarity search and clustering of dense vectors.” 2025. [Online]. Available: https://github.com/ facebookresearch/faiss

  35. [36]

    Foundry: Ethereum development framework,

    Foundry, “Foundry: Ethereum development framework,” 2025. [Online]. Available: https://getfoundry.sh/

  36. [37]

    Langchain,

    I. LangChain, “Langchain,” 2025. [Online]. Available: https://www. langchain.com

  37. [38]

    Another tool for language recognition,

    ANTLR, “Another tool for language recognition,” 2025. [Online]. Available: https://github.com/antlr/antlr4

  38. [39]

    Neo4j graph database & analytics,

    Neo4j, “Neo4j graph database & analytics,” 2025. [Online]. Available: https://neo4j.com

  39. [40]

    Defihacklabs,

    SunSec, “Defihacklabs,” 2025. [Online]. Available: https://github.com/ SunWeb3Sec/DeFiHackLabs

  40. [41]

    Comprehensive security audits for the web3 ecosystems,

    Secure3, “Comprehensive security audits for the web3 ecosystems,”

  41. [42]

    Available: https://app.secure3.io

    [Online]. Available: https://app.secure3.io

  42. [43]

    Advscanner: Generating adversarial smart contracts to exploit reentrancy vulnerabilities using llm and static analysis,

    Y . Wu, X. Xie, C. Peng, D. Liu, H. Wu, M. Fan, T. Liu, and H. Wang, “Advscanner: Generating adversarial smart contracts to exploit reentrancy vulnerabilities using llm and static analysis,” inProceedings of the IEEE/ACM International Conference on Automated Software Engineering (ASE), 2024

  43. [44]

    Sael: Leveraging large language models with adaptive mixture-of-experts for smart contract vulnerability detection,

    L. Yu, S. Cheng, Z. Huang, J. Zhang, C. Shen, J. Lu, L. Yang, F. Zhang, and J. Ma, “Sael: Leveraging large language models with adaptive mixture-of-experts for smart contract vulnerability detection,” inProceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME), 2025

  44. [45]

    Acf ix: Guiding llms with mined common rbac practices for context- aware repair of access control vulnerabilities in smart contracts,

    L. Zhang, K. Li, K. Sun, D. Wu, Y . Liu, H. Tian, and Y . Liu, “Acf ix: Guiding llms with mined common rbac practices for context- aware repair of access control vulnerabilities in smart contracts,”IEEE Transactions on Software Engineering, 2025

  45. [46]

    Advanced smart contract vulnerability detection via llm-powered multi-agent systems,

    Z. Wei, J. Sun, Y . Sun, Y . Liu, D. Wu, Z. Zhang, X. Zhang, M. Li, Y . Liu, C. Liet al., “Advanced smart contract vulnerability detection via llm-powered multi-agent systems,”IEEE Transactions on Software Engineering, 2025

  46. [47]

    Smart-llama-dpo: Reinforced large language model for explainable smart contract vulnerability detection,

    L. Yu, Z. Huang, H. Yuan, S. Cheng, L. Yang, F. Zhang, C. Shen, J. Ma, J. Zhang, J. Luet al., “Smart-llama-dpo: Reinforced large language model for explainable smart contract vulnerability detection,” inProceedings of the ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), 2025

  47. [48]

    Are we there yet? unraveling the state-of-the-art smart contract fuzzers,

    S. Wu, Z. Li, L. Yan, W. Chen, M. Jiang, C. Wang, X. Luo, and H. Zhou, “Are we there yet? unraveling the state-of-the-art smart contract fuzzers,” inIEEE/ACM International Conference on Software Engineering (ICSE), 2024

  48. [49]

    Rethinking smart contract fuzzing: Fuzzing with invocation ordering and important branch revisiting,

    Z. Liu, P. Qian, J. Yang, L. Liu, X. Xu, Q. He, and X. Zhang, “Rethinking smart contract fuzzing: Fuzzing with invocation ordering and important branch revisiting,”IEEE Transactions on Information Forensics and Security (TIFS), 2023

  49. [50]

    Automatic library fuzzing through api relation evolvement,

    J. Lin, Q. Zhang, J. Li, C. Sun, H. Zhou, C. Luo, and C. Qian, “Automatic library fuzzing through api relation evolvement,” inNetwork and Distributed System Security Symposium (NDSS), 2025

  50. [51]

    Promfuzz: Leveraging llm-driven and bug-oriented composite analysis for detecting functional bugs in smart contracts,

    X. Lin, Q. Xie, B. Zhao, Y . Tian, S. Zonouz, N. Ruan, J. Li, R. Beyah, and S. Ji, “Promfuzz: Leveraging llm-driven and bug-oriented composite analysis for detecting functional bugs in smart contracts,” inProceedings of the IEEE/ACM International Conference on Automated Software Engineering (ASE), 2025

  51. [52]

    Ef/cf: High performance smart contract fuzzing for exploit generation,

    M. Rodler, D. Paaßen, W. Li, L. Bernhard, T. Holz, G. Karame, and L. Davi, “Ef/cf: High performance smart contract fuzzing for exploit generation,” inProceedings of the IEEE European Symposium on Security and Privacy (EuroS&P), 2023

  52. [53]

    xfuzz: Machine learning guided cross-contract fuzzing,

    Y . Xue, J. Ye, W. Zhang, J. Sun, L. Ma, H. Wang, and J. Zhao, “xfuzz: Machine learning guided cross-contract fuzzing,”IEEE Trans. Dependable Secur. Comput., vol. 21, 2024

  53. [54]

    Midas: Mining profitable exploits in on-chain smart contracts via feedback-driven fuzzing and differential analysis,

    M. Ye, X. Lin, Y . Nan, J. Wu, and Z. Zheng, “Midas: Mining profitable exploits in on-chain smart contracts via feedback-driven fuzzing and differential analysis,” inACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), 2024

  54. [55]

    Automatic exploit generation,

    T. Avgerinos, S. K. Cha, A. Rebert, E. J. Schwartz, M. Woo, and D. Brumley, “Automatic exploit generation,”Communications of the ACM, vol. 57, 2014

  55. [56]

    Q: Exploit harden- ing made easy,

    E. J. Schwartz, T. Avgerinos, and D. Brumley, “Q: Exploit harden- ing made easy,” inProceedings of the USENIX Security Symposium (USENIX Security), 2011. APPENDIXA WALK-THROUGH OFTWO-STAGEVALIDATION This appendix illustrates how EVOPOC validates an exploit candidate using a real-world signature-bypass minting vulner- ability. The vulnerable token contract...

  56. [57]

    tx-hash-used

    isSigned(_txHash, _amount, _r, _s, _v) 9external returns(bool) 10{ 11require(!txHashes[_txHash], "tx-hash-used"); 12txHashes[_txHash] =true; 13 14_mint(_receiver, _amount); 15return true; 16} 17 18modifierisSigned( 19string memory_txHash, 20uint256_amount, 21bytes32[]memory_r, 22bytes32[]memory_s, 23uint8[]memory_v

  57. [58]

    bad-sign- params

    { 25require(checkSignParams(_r, _s, _v), "bad-sign- params"); 26 27bytes32_hash = 28keccak256(abi.encodePacked(bsc,msg.sender, _txHash, _amount)); 29 30address[]memory_signers =new address[](_r.length ); 31 32for(uint8i = 0; i < _r.length; i++) { 33_signers[i] =ecrecover(_hash, _v[i], _r[i], _s [i]); 34} 35 36require(isSigners(_signers), "bad-signers"); 3...