OpDiffer: LLM-Assisted Opcode-Level Differential Testing of Ethereum Virtual Machine

Haoyu Wang; Jie Ma; Jinwen Xi; Mingzhe Xing; Ningyu He; Ying Gao; Yinliang Yue

arxiv: 2504.12034 · v1 · pith:ZOWGIEPDnew · submitted 2025-04-16 · 💻 cs.SE · cs.CR

OpDiffer: LLM-Assisted Opcode-Level Differential Testing of Ethereum Virtual Machine

Jie Ma , Ningyu He , Jinwen Xi , Mingzhe Xing , Haoyu Wang , Ying Gao , Yinliang Yue This is my paper

Pith reviewed 2026-05-22 20:31 UTC · model grok-4.3

classification 💻 cs.SE cs.CR

keywords EVM securitydifferential testingLLM-assisted testingEthereum Virtual Machinesmart contract bugsopcode testingbug detectionblockchain security

0 comments

The pith

OpDiffer uses LLMs to generate opcode test cases that expose 26 new bugs across nine Ethereum Virtual Machines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents OpDiffer, a framework that generates test inputs for the Ethereum Virtual Machine at the opcode level with the help of large language models and then runs those inputs on multiple EVM implementations to find differences. It addresses two main gaps in earlier work: test cases that often had invalid semantics or lacked variety, and the lack of automatic ways to spot bugs and trace their causes. The central idea is that comparing how different EVMs respond to the same input can reliably surface implementation errors that affect smart contract behavior or network stability. Evaluation across nine EVMs produced 26 previously unknown bugs, most confirmed by their developers, along with large gains in code coverage over existing methods. The authors also estimate that over seven percent of real Ethereum contracts could encounter conditions that trigger these bugs.

Core claim

OpDiffer is a differential testing framework that combines LLMs with static analysis to produce semantically valid opcode sequences and automatically detect and localize bugs by observing inconsistent execution results across distinct EVM implementations. The framework was applied to nine EVMs and identified 26 previously unknown bugs, of which 22 were confirmed by developers and three received CNVD identifiers. The same evaluation showed coverage improvements of up to 71 percent, 148 percent, and 655 percent relative to prior baselines, and analysis of deployed contracts indicated that 7.21 percent could trigger the discovered bugs under certain conditions.

What carries the argument

OpDiffer, the differential testing framework that uses LLMs to synthesize opcode-level inputs and static analysis to identify root causes of behavioral divergences between EVM implementations.

If this is right

Developers of individual EVMs receive concrete, reproducible test cases that expose implementation errors before deployment.
Higher code coverage during testing increases the chance of catching security problems that could cause inconsistent smart-contract outcomes.
An estimated 7.21 percent of deployed contracts may encounter triggering conditions for the identified bugs under specific network settings.
Routine use of the approach would reduce the risk of denial-of-service or unexpected behavior propagating through the Ethereum network.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same LLM-driven input generation could be applied to virtual machines of other blockchains that share similar opcode structures.
Combining these differential results with on-chain monitoring might allow early detection of contracts that would hit the bugs in practice.
If the generated tests can be turned into a public regression suite, they would provide ongoing protection as new EVM versions are released.

Load-bearing premise

Behavioral differences observed when running LLM-generated opcode sequences on different EVMs reliably signal real bugs instead of valid implementation choices or false positives.

What would settle it

A follow-up review in which developers reject most of the reported differences as non-bugs, or in which the same inputs produce no observable failures when replayed on a live Ethereum node, would show the method over-reports issues.

Figures

Figures reproduced from arXiv: 2504.12034 by Haoyu Wang, Jie Ma, Jinwen Xi, Mingzhe Xing, Ningyu He, Ying Gao, Yinliang Yue.

**Figure 2.** Figure 2: An example for illustrating a semantically-invalid case. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: The workflow overview of OpDiffer. differential testing, OpDiffer will initiate the execution context for all EVMs. After the execution, the results and runtime data from instrumented EVMs will be parsed in a uniform format for bug identification, which will be further utilized for root cause localization. At last, OpDiffer will output the corresponding bug reports. 4.2 LLM-Assisted Test Input Generation I… view at source ↗

**Figure 4.** Figure 4: Prompt for generating seed generator of the [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: A generated seed generator for the BYTE opcode. 1, 2, 3, 4, 5 7, 8, 9, 10, 11 6 12, 13 Path 1 Path 2 index < 32 index 32 (a) Without considering ICFG. 1, 2 3 Stack Underflow 4 Stack Underflow 5 Out of Gas 6, 12 7, 8, 9 10, 11, 12 Stack Overflow 13 len(stack) > 0 len(stack) 0 len(stack) > 0 len(stack) 0 gas_left 3 gas_left < 3 index < 32 index 32 len(stack) < 1024 len(stack) 1024 13 len(stack) < 1024 Path 1… view at source ↗

**Figure 6.** Figure 6: The control flow of BYTE in EVM, where the specification and line numbers in [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Prompt for implementing control-flow-oriented mutation for opcode [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

**Figure 8.** Figure 8: Coverage results of OpDiffer, EVMFuzzer, NeoDiff and FuzzyVM on Geth and SealEVM. Bars represent the achieved code coverage, while lines represent the coverage improvement by which OpDiffer exceeds EVMFuzzer, NeoDiff and FuzzyVM. lack of diversity of test inputs. Although baselines may generate more test inputs than OpDiffer, their generated inputs cannot further improve the coverage across target EVM impl… view at source ↗

**Figure 9.** Figure 9: Case study #1: the buggy implementation of [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗

**Figure 10.** Figure 10: Case study #2: the buggy implementation of [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗

read the original abstract

As Ethereum continues to thrive, the Ethereum Virtual Machine (EVM) has become the cornerstone powering tens of millions of active smart contracts. Intuitively, security issues in EVMs could lead to inconsistent behaviors among smart contracts or even denial-of-service of the entire blockchain network. However, to the best of our knowledge, only a limited number of studies focus on the security of EVMs. Moreover, they suffer from 1) insufficient test input diversity and invalid semantics; and 2) the inability to automatically identify bugs and locate root causes. To bridge this gap, we propose OpDiffer, a differential testing framework for EVM, which takes advantage of LLMs and static analysis methods to address the above two limitations. We conducted the largest-scale evaluation, covering nine EVMs and uncovering 26 previously unknown bugs, 22 of which have been confirmed by developers and three have been assigned CNVD IDs. Compared to state-of-the-art baselines, OpDiffer can improve code coverage by at most 71.06%, 148.40% and 655.56%, respectively. Through an analysis of real-world deployed Ethereum contracts, we estimate that 7.21% of the contracts could trigger our identified EVM bugs under certain environmental settings, potentially resulting in severe negative impact on the Ethereum ecosystem.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper introduces OpDiffer, an LLM-assisted differential testing framework for Ethereum Virtual Machines that combines LLM-generated opcode sequences with static analysis to produce semantically valid inputs and automatically identify behavioral divergences across EVM implementations. It reports the largest-scale evaluation on nine EVMs, uncovering 26 previously unknown bugs (22 confirmed by developers, three assigned CNVD IDs), substantial coverage gains over baselines (up to 71.06%, 148.40%, 655.56%), and an estimate that 7.21% of real-world contracts could trigger the identified bugs.

Significance. If the bug identifications prove sound and the coverage claims reproducible, the work would advance automated security testing for EVMs by addressing input diversity and root-cause localization limitations of prior studies. The scale across nine implementations and the real-world impact estimate add practical value to the Ethereum ecosystem.

major comments (3)

[Abstract] Abstract: The central claims of discovering 26 bugs and achieving specific coverage improvements supply no details on validation procedures, how false positives were ruled out, or how coverage was measured (e.g., tool, metric, or baseline configurations), so the empirical results cannot be assessed.
[Method] Method and evaluation description: The differential testing procedure treats any observable behavioral divergence on LLM-generated inputs as a bug, but provides no independent oracle or reference semantics (e.g., Yellow Paper model) to distinguish errors from allowed implementation variations or spec ambiguities; developer confirmation occurs post-hoc and does not establish that inputs lie in the defined behavior space.
[Evaluation] Evaluation results: The reported coverage improvements (71.06%, 148.40%, 655.56%) and bug counts lack any description of the measurement methodology or controls for input validity, undermining the comparison to state-of-the-art baselines and the claim of largest-scale evaluation.

minor comments (1)

[Abstract] The abstract mentions nine EVMs and real-world contract analysis but does not name the specific EVM implementations or the contract dataset used, which would aid immediate understanding.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment below, providing clarifications and indicating planned revisions where appropriate to improve the presentation of our results.

read point-by-point responses

Referee: [Abstract] Abstract: The central claims of discovering 26 bugs and achieving specific coverage improvements supply no details on validation procedures, how false positives were ruled out, or how coverage was measured (e.g., tool, metric, or baseline configurations), so the empirical results cannot be assessed.

Authors: We agree that the abstract is concise and omits key methodological details. In the revised manuscript, we will expand the abstract with brief statements noting that the 26 bugs were validated via developer confirmations (22 cases) and CNVD assignments (3 cases), and that coverage gains were measured using standard profiling tools with explicit baseline configurations as described in Section 5. Full validation procedures and measurement details remain in Sections 4 and 5. This change will make the central claims more assessable while preserving abstract length. revision: yes
Referee: [Method] Method and evaluation description: The differential testing procedure treats any observable behavioral divergence on LLM-generated inputs as a bug, but provides no independent oracle or reference semantics (e.g., Yellow Paper model) to distinguish errors from allowed implementation variations or spec ambiguities; developer confirmation occurs post-hoc and does not establish that inputs lie in the defined behavior space.

Authors: We respectfully note that our approach follows standard differential testing practice for systems like the EVM, where the Yellow Paper specification is informal and contains known ambiguities. Divergences are not automatically labeled as bugs; they undergo manual triage and are only reported after developer confirmation, which serves as domain-expert validation that the input triggers unintended behavior. We will add a clarifying paragraph in the Method section explaining this rationale, the role of static analysis in ensuring semantic validity, and why a formal reference oracle is not feasible or necessary here. This addresses the concern on substance without requiring changes to the core methodology. revision: partial
Referee: [Evaluation] Evaluation results: The reported coverage improvements (71.06%, 148.40%, 655.56%) and bug counts lack any description of the measurement methodology or controls for input validity, undermining the comparison to state-of-the-art baselines and the claim of largest-scale evaluation.

Authors: We acknowledge that additional explicit details would strengthen reproducibility. In the revised Evaluation section, we will add a dedicated subsection describing the coverage measurement tools and metrics employed for each of the nine EVMs, the exact baseline configurations used for comparison, and the input validity controls provided by the static analysis component. These additions will directly support the reported coverage figures and the largest-scale evaluation claim. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical tool evaluation with no derivations or fitted predictions

full rationale

The paper describes an empirical differential testing framework (OpDiffer) that uses LLMs and static analysis to generate opcode sequences, runs them across nine EVM implementations, and reports observed behavioral differences as bugs (with post-hoc developer confirmation). No equations, parameters, uniqueness theorems, or first-principles derivations appear in the provided text. The central claims rest on experimental outcomes rather than any reduction of a 'prediction' to an input quantity defined by the authors. This is a standard self-contained experimental report; the skeptic concern about whether differences constitute bugs is a question of external validity, not circularity in the derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical software-testing study; the central claims rest on experimental outcomes rather than mathematical axioms, free parameters, or newly postulated entities.

pith-pipeline@v0.9.0 · 5784 in / 1296 out tokens · 81980 ms · 2026-05-22T20:31:47.900759+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

70 extracted references · 70 canonical work pages

[1]

Blue Alloy. 2025. Github revm repository. Retrieved 2025-04-15 from https://github.com/bluealloy/revm

work page 2025
[2]

Ether Alpha. 2025. Ethereum Client Diversity. Retrieved 2025-04-15 from https://clientdiversity.org

work page 2025
[3]

2021.EIP-3540: EOF - EVM Object Format v1

Alex Beregszaszi, Paweł Bylica, Andrei Maiboroda, and Matt Garnett. 2021.EIP-3540: EOF - EVM Object Format v1 . Retrieved 2025-04-15 from https://eips.ethereum.org/EIPS/eip-3540

work page 2021
[4]

Lukas Bernhard, Tobias Scharnowski, Moritz Schloegel, Tim Blazytko, and Thorsten Holz. 2022. JIT-Picking: Differential Fuzzing of JavaScript Engines. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security (Los Angeles, CA, USA) (CCS ’22). Association for Computing Machinery, New York, NY, USA, 351–364. doi:10.1145/3548606.3560624

work page doi:10.1145/3548606.3560624 2022
[5]

Chad Brubaker, Suman Jana, Baishakhi Ray, Sarfraz Khurshid, and Vitaly Shmatikov. 2014. Using Frankencerts for Automated Adversarial Testing of Certificate Validation in SSL/TLS Implementations. In 2014 IEEE Symposium on Security and Privacy (SP) (Berkeley, CA, USA). IEEE, 114–129. doi:10.1109/SP.2014.15

work page doi:10.1109/sp.2014.15 2014
[6]

Vitalik Buterin et al. 2013. Ethereum white paper. GitHub repository 1 (2013), 22–23. Retrieved 2025-04-15 from https://ethereum.org/en/whitepaper

work page 2013
[7]

Shangtong Cao, Ningyu He, Xinyu She, Yixuan Zhang, Mu Zhang, and Haoyu Wang. 2024. WASMaker: Differential Testing of WebAssembly Runtimes via Semantic-Aware Binary Generation. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (Vienna, Austria) (ISSTA 2024). Association for Computing Machinery, New York, NY, USA...

work page doi:10.1145/3650212.3680358 2024
[8]

Chu Chen, Pinghong Ren, Zhenhua Duan, Cong Tian, Xu Lu, and Bin Yu. 2023. SBDT: Search-Based Differential Testing of Certificate Parsers in SSL/TLS Implementations. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (Seattle, WA, USA) (ISSTA 2023). Association for Computing Machinery, New York, NY, USA, 967–979...

work page doi:10.1145/3597926.3598110 2023
[9]

Yuanliang Chen, Fuchen Ma, Yuanhang Zhou, Yu Jiang, Ting Chen, and Jiaguang Sun. 2023. Tyr: Finding Consensus Failure Bugs in Blockchain System with Behaviour Divergent Model. In 2023 IEEE Symposium on Security and Privacy (SP) (San Francisco, CA, USA). IEEE, 2517–2532. doi:10.1109/SP46215.2023.10179386

work page doi:10.1109/sp46215.2023.10179386 2023
[10]

Yuting Chen, Ting Su, and Zhendong Su. 2019. Deep Differential Testing of JVM Implementations. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE ’2019) . 1257–1268. doi:10.1109/ICSE.2019.00127

work page doi:10.1109/icse.2019.00127 2019
[11]

Yuting Chen, Ting Su, Chengnian Sun, Zhendong Su, and Jianjun Zhao. 2016. Coverage-directed differential testing of JVM implementations. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (Santa Barbara, CA, USA) (PLDI ’16). Association for Computing Machinery, New York, NY, USA, 85–99. doi:10.1145/2908080.2908095

work page doi:10.1145/2908080.2908095 2016
[12]

CoinMarketCap. 2025. Ethereum price today . Retrieved 2025-04-15 from https://coinmarketcap.com/currencies/ ethereum

work page 2025
[13]

Ethereum community. 2025. Ethereum Improvement Proposals. Retrieved 2025-04-15 from https://ethereum.org/en/eips

work page 2025
[14]

Ethereum Javascript Community. 2025. Github ethereumjs repository. Retrieved 2025-04-15 from https://github.com/ ethereumjs/ethereumjs-monorepo

work page 2025
[15]

Dan, Mario Vega, Mukul Kolpe, Spencer Taylor-Brown, and omahs. 2025. State Transition Tests, Ethereum Execution Spec Tests. Retrieved 2025-04-15 from https://ethereum.github.io/execution-spec-tests/main/tutorials/state_transition

work page 2025
[16]

DappRadar. 2025. Top Ethereum Games. Retrieved 2025-04-15 from https://dappradar.com/rankings/protocol/ethereum/ category/games

work page 2025
[17]

National Vulnerability Database. 2021. CVE-2021-39137 Detail. Retrieved 2025-04-15 from https://nvd.nist.gov/vuln/ detail/CVE-2021-39137

work page 2021
[18]

Shihan Dou, Haoxiang Jia, Shenxi Wu, Huiyuan Zheng, Weikang Zhou, Muling Wu, Mingxu Chai, Jessica Fan, Caishuang Huang, Yunbo Tao, Yan Liu, Enyu Zhou, Ming Zhang, Yuhao Zhou, Yueming Wu, Rui Zheng, Ming Wen, Rongxiang Weng, Jingang Wang, Xunliang Cai, Tao Gui, Xipeng Qiu, Qi Zhang, and Xuanjing Huang. 2024. What’s Wrong with Your Code Generated by Large L...

work page arXiv 2024
[19]

Ethereum. 2025. Github evmone repository. Retrieved 2025-04-15 from https://github.com/ethereum/evmone

work page 2025
[20]

Ethereum. 2025. Github execution-specs repository. Retrieved 2025-04-15 from https://github.com/ethereum/execution- specs

work page 2025
[21]

Ethereum. 2025. Github Go Ethereum repository. Retrieved 2025-04-15 from https://github.com/ethereum/go-ethereum

work page 2025
[22]

Ethereum. 2025. Github Py-EVM repository. Retrieved 2025-04-15 from https://github.com/ethereum/py-evm

work page 2025
[23]

Ethereum.org. 2025. Decentralized finance (DeFi). Retrieved 2025-04-15 from https://ethereum.org/en/defi

work page 2025
[24]

Ethereum.org. 2025. Ethereum Virtual Machine (EVM) implementations . Retrieved 2025-04-15 from https://ethereum. org/en/developers/docs/evm

work page 2025
[25]

Ethereum.org. 2025. The history of Ethereum . Retrieved 2025-04-15 from https://ethereum.org/en/history

work page 2025
[26]

Ethereum.org. 2025. Non-fungible tokens (NFT). Retrieved 2025-04-15 from https://ethereum.org/en/nft

work page 2025
[27]

Etherscan. 2025. The Ethereum Blockchain Explorer . Retrieved 2025-04-15 from https://etherscan.io

work page 2025
[28]

Ying Fu, Meng Ren, Fuchen Ma, Heyuan Shi, Xin Yang, Yu Jiang, Huizhong Li, and Xiang Shi. 2019. EVMFuzzer: detect EVM vulnerabilities via fuzz testing. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Tallinn, Estonia) (ESEC/FSE 2019). Association for Co...

work page doi:10.1145/3338906.3341175 2019
[29]

Google. 2025. Coverage profiling support for integration tests. Retrieved 2025-04-15 from https://go.dev/doc/build-cover

work page 2025
[30]

Neville Grech, Lexi Brent, Bernhard Scholz, and Yannis Smaragdakis. 2019. Gigahorse: Thorough, Declarative Decompilation of Smart Contracts. In 2019 IEEE/ACM 41st International Conference on Software Engineering . 1176–1186. doi:10.1109/ICSE.2019.00120

work page doi:10.1109/icse.2019.00120 2019
[31]

Qiuhan Gu. 2023. LLM-Based Code Generation Method for Golang Compiler Testing. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (San Francisco CA USA, 2023-11-30) (ESEC/FSE 2023). Association for Computing Machinery, 2201–2203. doi:10.1145/3611643.3617850

work page doi:10.1145/3611643.3617850 2023
[32]

Ningyu He, Ruiyi Zhang, Haoyu Wang, Lei Wu, Xiapu Luo, Yao Guo, Ting Yu, and Xuxian Jiang. 2021. EOSAFE: Security Analysis of EOSIO Smart Contracts. In 30th USENIX Security Symposium (USENIX Security 21) . USENIX Association, 1271–1288. https://www.usenix.org/conference/usenixsecurity21/presentation/he-ningyu

work page 2021
[33]

Hyperledger. 2025. Github Besu Ethereum Client repository. Retrieved 2025-04-15 from https://github.com/hyperledger/ besu/ ISSTA069:22 Jie Ma, Ningyu He, Jinwen Xi, Mingzhe Xing, Haoyu Wang, Ying Gao, and Yinliang Yue

work page 2025
[34]

Bo Jiang, Ye Liu, and W. K. Chan. 2018. ContractFuzzer: fuzzing smart contracts for vulnerability detection. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (Montpellier, France) (ASE ’18). Association for Computing Machinery, New York, NY, USA, 259–269. doi:10.1145/3238147.3238177

work page doi:10.1145/3238147.3238177 2018
[35]

Shinhae Kim and Sungjae Hwang. 2023. EtherDiffer: Differential Testing on RPC Services of Ethereum Nodes. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (San Francisco, CA, USA) (ESEC/FSE 2023). Association for Computing Machinery, New York, NY, USA, 1333–1344. doi:10....

work page doi:10.1145/3611643.3616251 2023
[36]

Kai Li, Jiaqi Chen, Xianghong Liu, Yuzhe Richard Tang, XiaoFeng Wang, and Xiapu Luo. 2021. As Strong As Its Weakest Link: How to Break Blockchain DApps at RPC Service. In28th Annual Network and Distributed System Security Symposium, NDSS 2021, virtually, February 21-25, 2021 . The Internet Society. https://www.ndss-symposium.org/ndss- paper/as-strong-as-i...

work page 2021
[37]

Li Li, Jiawei Wang, and Haowei Quan. 2022. Scalpel: The Python Static Analysis Framework. arXiv:2202.11840 [cs.SE] https://arxiv.org/abs/2202.11840

work page arXiv 2022
[38]

Tsz-On Li, Wenxi Zong, Yibo Wang, Haoye Tian, Ying Wang, Shing-Chi Cheung, and Jeff Kramer. 2023. Nuances are the Key: Unlocking ChatGPT to Find Failure-Inducing Tests with Differential Prompting. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE ’23) . 14–26. doi:10.1109/ASE56229.2023.00089

work page doi:10.1109/ase56229.2023.00089 2023
[39]

Wen Li, Haoran Yang, Xiapu Luo, Long Cheng, and Haipeng Cai. 2023. PyRTFuzz: Detecting Bugs in Python Runtimes via Two-Level Collaborative Fuzzing. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (Copenhagen, Denmark) (CCS ’23). Association for Computing Machinery, New York, NY, USA, 1645–1659. doi:10.1145/3576915.3623166

work page doi:10.1145/3576915.3623166 2023
[40]

Zhe Liu, Chunyang Chen, Junjie Wang, Mengzhuo Chen, Boyu Wu, Xing Che, Dandan Wang, and Qing Wang. 2024. Make LLM a Testing Expert: Bringing Human-like Interaction to Mobile GUI Testing via Functionality-aware Decisions. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (Lisbon, Portugal) (ICSE ’24). Association for Comp...

work page doi:10.1145/3597503.3639180 2024
[41]

Fuchen Ma, Yuanliang Chen, Meng Ren, Yuanhang Zhou, Yu Jiang, Ting Chen, Huizhong Li, and Jiaguang Sun. 2023. LOKI: State-Aware Fuzzing Framework for the Implementation of Blockchain Consensus Protocols. In30th Annual Network and Distributed System Security Symposium, NDSS 2023, San Diego, California, USA, February 27 - March 3,

work page 2023
[42]

https://www.ndss-symposium.org/ndss-paper/loki-state-aware-fuzzing-framework-for- the-implementation-of-blockchain-consensus-protocols/

The Internet Society. https://www.ndss-symposium.org/ndss-paper/loki-state-aware-fuzzing-framework-for- the-implementation-of-blockchain-consensus-protocols/

work page
[43]

Jie Ma. 2025. OpDiffer: LLM-Assisted Opcode-Level Differential Testing of Ethereum Virtual Machine. doi:10.5281/zenodo. 15195943

work page doi:10.5281/zenodo 2025
[44]

Pengxiang Ma, Ningyu He, Yuhua Huang, Haoyu Wang, and Xiapu Luo. 2024. Abusing the Ethereum Smart Contract Verification Services for Fun and Profit. In31st Annual Network and Distributed System Security Symposium, NDSS 2024, San Diego, California, USA, February 26 - March 1, 2024 . The Internet Society. https://www.ndss-symposium.org/ndss- paper/abusing-t...

work page 2024
[45]

Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, Shashank Gupta, Bodhisattwa Prasad Majumder, Katherine Hermann, Sean Welleck, Amir Yazdanbakhsh, and Peter Clark. 2023. SELF-REFINE: iterative refinement with self-feedback. In Proceedings of the 37th International ...

work page 2023
[46]

Dominik Maier, Fabian Fäßler, and Jean-Pierre Seifert. 2022. Uncovering Smart Contract VM Bugs Via Differen- tial Fuzzing. In Reversing and Offensive-Oriented Trends Symposium (Vienna, Austria) (ROOTS’21). Association for Computing Machinery, New York, NY, USA, 11–22. doi:10.1145/3503921.3503923

work page doi:10.1145/3503921.3503923 2022
[47]

Marius van der Wijden Martin Holst Swende. 2020. EIP-3155: EVM trace specification [DRAFT] . Retrieved 2025-04-15 from https://eips.ethereum.org/EIPS/eip-3155

work page 2020
[48]

NethermindEth. 2025. Github Nethermind Ethereum client repository . Retrieved 2025-04-15 from https://github.com/ NethermindEth/nethermind

work page 2025
[49]

Beijing Academy of Blockchain and Edge Computing. 2025. chainmaker document . Retrieved 2025-04-15 from https://docs.chainmaker.org.cn

work page 2025
[50]

Theofilos Petsios, Adrian Tang, Salvatore Stolfo, Angelos D Keromytis, and Suman Jana. 2017. Nezha: Efficient domain-independent differential testing. In 2017 IEEE Symposium on security and privacy (SP) (Berkeley, CA, USA). IEEE, 615–632. doi:10.1109/SP.2017.27

work page doi:10.1109/sp.2017.27 2017
[51]

Moritz Schloegel, Nils Bars, Nico Schiller, Lukas Bernhard, Tobias Scharnowski, Addison Crump, Arash Ale-Ebrahim, Nicolai Bissantz, Marius Muench, and Thorsten Holz. 2024. SoK: Prudent Evaluation Practices for Fuzzing. In2024 IEEE Symposium on Security and Privacy (SP) (San Francisco, CA, USA). IEEE, 1974–1993. doi:10.1109/SP54263.2024.00137

work page doi:10.1109/sp54263.2024.00137 2024
[52]

SealSC. 2025. Github SealEVM repository. Retrieved 2025-04-15 from https://github.com/SealSC/SealEVM OpDiffer: LLM-Assisted Opcode-Level Differential Testing of Ethereum Virtual Machine ISSTA069:23

work page 2025
[53]

Chaofan Shou, Shangyin Tan, and Koushik Sen. 2023. ItyFuzz: Snapshot-Based Fuzzer for Smart Contract. InProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (Seattle, WA, USA) (ISSTA 2023). Association for Computing Machinery, New York, NY, USA, 322–333. doi:10.1145/3597926.3598059

work page doi:10.1145/3597926.3598059 2023
[54]

smlXL Inc. 2025. An Ethereum Virtual Machine Opcodes Interactive Reference . Retrieved 2025-04-15 from https: //www.evm.codes/?fork=cancun

work page 2025
[55]

Tianle Sun, Ningyu He, Jiang Xiao, Yinliang Yue, Xiapu Luo, and Haoyu Wang. 2024. All Your Tokens are Belong to Us: Demystifying Address Verification Vulnerabilities in Solidity Smart Contracts. In33rd USENIX Security Sympo- sium (USENIX Security 24) . USENIX Association, Philadelphia, PA, 3567–3584. https://www.usenix.org/conference/ usenixsecurity24/pre...

work page 2024
[56]

Yuqiang Sun, Daoyuan Wu, Yue Xue, Han Liu, Haijun Wang, Zhengzi Xu, Xiaofei Xie, and Yang Liu. 2024. GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (Lisbon, Portugal) (ICSE ’24). Association for Computing Machinery, New...

work page doi:10.1145/3597503.3639117 2024
[57]

Martin Holst Swende. 2025. Github Go evmlab repository . Retrieved 2025-04-15 from https://github.com/holiman/ goevmlab

work page 2025
[58]

Christof Ferreira Torres, Antonio Ken Iannillo, Arthur Gervais, and Radu State. 2021. ConFuzzius: A Data Dependency- Aware Hybrid Fuzzer for Smart Contracts. In2021 IEEE European Symposium on Security and Privacy (EuroS&P) (Vienna, Austria). IEEE, 103–119. doi:10.1109/EuroSP51992.2021.00018

work page doi:10.1109/eurosp51992.2021.00018 2021
[59]

Petar Tsankov, Andrei Dan, Dana Drachsler-Cohen, Arthur Gervais, Florian Bünzli, and Martin Vechev. 2018. Securify: Practical Security Analysis of Smart Contracts. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (Toronto, Canada) (CCS ’18). Association for Computing Machinery, New York, NY, USA, 67–82. doi:10.1145/...

work page doi:10.1145/3243734.3243780 2018
[60]

Marius van der Wijden. 2025. Github FuzzyVM repository . Retrieved 2025-04-15 from https://github.com/ MariusVanDerWijden/FuzzyVM

work page 2025
[61]

Sam Wilson. 2023. Ethereum Execution Layer Specification . Retrieved 2025-04-15 from https://blog.ethereum.org/2023/ 08/29/eel-spec

work page 2023
[62]

Winter, Florena Buse, Daan de Graaf, Klaus von Gleissenthall, and Burcu Kulahcioglu Ozkan

Levin N. Winter, Florena Buse, Daan de Graaf, Klaus von Gleissenthall, and Burcu Kulahcioglu Ozkan. 2023. Randomized Testing of Byzantine Fault Tolerant Algorithms. Proc. ACM Program. Lang. 7, OOPSLA1, Article 101 (April 2023), 32 pages. doi:10.1145/3586053

work page doi:10.1145/3586053 2023
[63]

Gavin Wood et al. 2014. Ethereum: A secure decentralised generalised transaction ledger. Ethereum project yellow paper 151, 2014 (2014), 1–32

work page 2014
[64]

Shuohan Wu, Zihao Li, Luyi Yan, Weimin Chen, Muhui Jiang, Chenxu Wang, Xiapu Luo, and Hao Zhou. 2024. Are We There Yet? Unraveling the State-of-the-Art Smart Contract Fuzzers. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (Lisbon, Portugal) (ICSE ’24). Association for Computing Machinery, New York, NY, USA, Article 1...

work page doi:10.1145/3597503.3639152 2024
[65]

Zhiyi Xue, Liangguo Li, Senyue Tian, Xiaohong Chen, Pingping Li, Liangyu Chen, Tingting Jiang, and Min Zhang

work page
[66]

In Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (Vienna, Austria) (ISSTA 2024)

LLM4Fin: Fully Automating LLM-Powered Test Case Generation for FinTech Software Acceptance Testing. In Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (Vienna, Austria) (ISSTA 2024). Association for Computing Machinery, New York, NY, USA, 1643–1655. doi:10.1145/3650212.3680388

work page doi:10.1145/3650212.3680388 2024
[67]

Youngseok Yang, Taesoo Kim, and Byung-Gon Chun. 2021. Finding Consensus Bugs in Ethereum via Multi-transaction Differential Fuzzing. In 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21) . USENIX Association, 349–365. https://www.usenix.org/conference/osdi21/presentation/yang

work page 2021
[68]

Wuqi Zhang, Zhuo Zhang, Qingkai Shi, Lu Liu, Lili Wei, Yepang Liu, Xiangyu Zhang, and Shing-Chi Cheung. 2024. Nyx: Detecting Exploitable Front-Running Vulnerabilities in Smart Contracts. In 2024 IEEE Symposium on Security and Privacy (SP) (San Francisco, CA, USA). IEEE, 2198–2216. doi:10.1109/SP54263.2024.00146

work page doi:10.1109/sp54263.2024.00146 2024
[69]

Zhijie Zhong, Zibin Zheng, Hong-Ning Dai, Qing Xue, Junjia Chen, and Yuhong Nan. 2024. PrettySmart: Detecting Permission Re-delegation Vulnerability for Token Behaviors in Smart Contracts. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (Lisbon, Portugal) (ICSE ’24). Association for Computing Machinery, New York, NY, U...

work page doi:10.1145/3597503.3639140 2024
[70]

Shiyao Zhou, Muhui Jiang, Weimin Chen, Hao Zhou, Haoyu Wang, and Xiapu Luo. 2024. WADIFF: A Differential Testing Framework for WebAssembly Runtimes. In Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering (Echternach, Luxembourg) (ASE ’23). IEEE, 939–950. doi:10.1109/ASE56229.2023.00188 Received 2024-10-31; accepted ...

work page doi:10.1109/ase56229.2023.00188 2024

[1] [1]

Blue Alloy. 2025. Github revm repository. Retrieved 2025-04-15 from https://github.com/bluealloy/revm

work page 2025

[2] [2]

Ether Alpha. 2025. Ethereum Client Diversity. Retrieved 2025-04-15 from https://clientdiversity.org

work page 2025

[3] [3]

2021.EIP-3540: EOF - EVM Object Format v1

Alex Beregszaszi, Paweł Bylica, Andrei Maiboroda, and Matt Garnett. 2021.EIP-3540: EOF - EVM Object Format v1 . Retrieved 2025-04-15 from https://eips.ethereum.org/EIPS/eip-3540

work page 2021

[4] [4]

Lukas Bernhard, Tobias Scharnowski, Moritz Schloegel, Tim Blazytko, and Thorsten Holz. 2022. JIT-Picking: Differential Fuzzing of JavaScript Engines. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security (Los Angeles, CA, USA) (CCS ’22). Association for Computing Machinery, New York, NY, USA, 351–364. doi:10.1145/3548606.3560624

work page doi:10.1145/3548606.3560624 2022

[5] [5]

Chad Brubaker, Suman Jana, Baishakhi Ray, Sarfraz Khurshid, and Vitaly Shmatikov. 2014. Using Frankencerts for Automated Adversarial Testing of Certificate Validation in SSL/TLS Implementations. In 2014 IEEE Symposium on Security and Privacy (SP) (Berkeley, CA, USA). IEEE, 114–129. doi:10.1109/SP.2014.15

work page doi:10.1109/sp.2014.15 2014

[6] [6]

Vitalik Buterin et al. 2013. Ethereum white paper. GitHub repository 1 (2013), 22–23. Retrieved 2025-04-15 from https://ethereum.org/en/whitepaper

work page 2013

[7] [7]

Shangtong Cao, Ningyu He, Xinyu She, Yixuan Zhang, Mu Zhang, and Haoyu Wang. 2024. WASMaker: Differential Testing of WebAssembly Runtimes via Semantic-Aware Binary Generation. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (Vienna, Austria) (ISSTA 2024). Association for Computing Machinery, New York, NY, USA...

work page doi:10.1145/3650212.3680358 2024

[8] [8]

Chu Chen, Pinghong Ren, Zhenhua Duan, Cong Tian, Xu Lu, and Bin Yu. 2023. SBDT: Search-Based Differential Testing of Certificate Parsers in SSL/TLS Implementations. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (Seattle, WA, USA) (ISSTA 2023). Association for Computing Machinery, New York, NY, USA, 967–979...

work page doi:10.1145/3597926.3598110 2023

[9] [9]

Yuanliang Chen, Fuchen Ma, Yuanhang Zhou, Yu Jiang, Ting Chen, and Jiaguang Sun. 2023. Tyr: Finding Consensus Failure Bugs in Blockchain System with Behaviour Divergent Model. In 2023 IEEE Symposium on Security and Privacy (SP) (San Francisco, CA, USA). IEEE, 2517–2532. doi:10.1109/SP46215.2023.10179386

work page doi:10.1109/sp46215.2023.10179386 2023

[10] [10]

Yuting Chen, Ting Su, and Zhendong Su. 2019. Deep Differential Testing of JVM Implementations. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE ’2019) . 1257–1268. doi:10.1109/ICSE.2019.00127

work page doi:10.1109/icse.2019.00127 2019

[11] [11]

Yuting Chen, Ting Su, Chengnian Sun, Zhendong Su, and Jianjun Zhao. 2016. Coverage-directed differential testing of JVM implementations. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (Santa Barbara, CA, USA) (PLDI ’16). Association for Computing Machinery, New York, NY, USA, 85–99. doi:10.1145/2908080.2908095

work page doi:10.1145/2908080.2908095 2016

[12] [12]

CoinMarketCap. 2025. Ethereum price today . Retrieved 2025-04-15 from https://coinmarketcap.com/currencies/ ethereum

work page 2025

[13] [13]

Ethereum community. 2025. Ethereum Improvement Proposals. Retrieved 2025-04-15 from https://ethereum.org/en/eips

work page 2025

[14] [14]

Ethereum Javascript Community. 2025. Github ethereumjs repository. Retrieved 2025-04-15 from https://github.com/ ethereumjs/ethereumjs-monorepo

work page 2025

[15] [15]

Dan, Mario Vega, Mukul Kolpe, Spencer Taylor-Brown, and omahs. 2025. State Transition Tests, Ethereum Execution Spec Tests. Retrieved 2025-04-15 from https://ethereum.github.io/execution-spec-tests/main/tutorials/state_transition

work page 2025

[16] [16]

DappRadar. 2025. Top Ethereum Games. Retrieved 2025-04-15 from https://dappradar.com/rankings/protocol/ethereum/ category/games

work page 2025

[17] [17]

National Vulnerability Database. 2021. CVE-2021-39137 Detail. Retrieved 2025-04-15 from https://nvd.nist.gov/vuln/ detail/CVE-2021-39137

work page 2021

[18] [18]

Shihan Dou, Haoxiang Jia, Shenxi Wu, Huiyuan Zheng, Weikang Zhou, Muling Wu, Mingxu Chai, Jessica Fan, Caishuang Huang, Yunbo Tao, Yan Liu, Enyu Zhou, Ming Zhang, Yuhao Zhou, Yueming Wu, Rui Zheng, Ming Wen, Rongxiang Weng, Jingang Wang, Xunliang Cai, Tao Gui, Xipeng Qiu, Qi Zhang, and Xuanjing Huang. 2024. What’s Wrong with Your Code Generated by Large L...

work page arXiv 2024

[19] [19]

Ethereum. 2025. Github evmone repository. Retrieved 2025-04-15 from https://github.com/ethereum/evmone

work page 2025

[20] [20]

Ethereum. 2025. Github execution-specs repository. Retrieved 2025-04-15 from https://github.com/ethereum/execution- specs

work page 2025

[21] [21]

Ethereum. 2025. Github Go Ethereum repository. Retrieved 2025-04-15 from https://github.com/ethereum/go-ethereum

work page 2025

[22] [22]

Ethereum. 2025. Github Py-EVM repository. Retrieved 2025-04-15 from https://github.com/ethereum/py-evm

work page 2025

[23] [23]

Ethereum.org. 2025. Decentralized finance (DeFi). Retrieved 2025-04-15 from https://ethereum.org/en/defi

work page 2025

[24] [24]

Ethereum.org. 2025. Ethereum Virtual Machine (EVM) implementations . Retrieved 2025-04-15 from https://ethereum. org/en/developers/docs/evm

work page 2025

[25] [25]

Ethereum.org. 2025. The history of Ethereum . Retrieved 2025-04-15 from https://ethereum.org/en/history

work page 2025

[26] [26]

Ethereum.org. 2025. Non-fungible tokens (NFT). Retrieved 2025-04-15 from https://ethereum.org/en/nft

work page 2025

[27] [27]

Etherscan. 2025. The Ethereum Blockchain Explorer . Retrieved 2025-04-15 from https://etherscan.io

work page 2025

[28] [28]

Ying Fu, Meng Ren, Fuchen Ma, Heyuan Shi, Xin Yang, Yu Jiang, Huizhong Li, and Xiang Shi. 2019. EVMFuzzer: detect EVM vulnerabilities via fuzz testing. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Tallinn, Estonia) (ESEC/FSE 2019). Association for Co...

work page doi:10.1145/3338906.3341175 2019

[29] [29]

Google. 2025. Coverage profiling support for integration tests. Retrieved 2025-04-15 from https://go.dev/doc/build-cover

work page 2025

[30] [30]

Neville Grech, Lexi Brent, Bernhard Scholz, and Yannis Smaragdakis. 2019. Gigahorse: Thorough, Declarative Decompilation of Smart Contracts. In 2019 IEEE/ACM 41st International Conference on Software Engineering . 1176–1186. doi:10.1109/ICSE.2019.00120

work page doi:10.1109/icse.2019.00120 2019

[31] [31]

Qiuhan Gu. 2023. LLM-Based Code Generation Method for Golang Compiler Testing. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (San Francisco CA USA, 2023-11-30) (ESEC/FSE 2023). Association for Computing Machinery, 2201–2203. doi:10.1145/3611643.3617850

work page doi:10.1145/3611643.3617850 2023

[32] [32]

Ningyu He, Ruiyi Zhang, Haoyu Wang, Lei Wu, Xiapu Luo, Yao Guo, Ting Yu, and Xuxian Jiang. 2021. EOSAFE: Security Analysis of EOSIO Smart Contracts. In 30th USENIX Security Symposium (USENIX Security 21) . USENIX Association, 1271–1288. https://www.usenix.org/conference/usenixsecurity21/presentation/he-ningyu

work page 2021

[33] [33]

Hyperledger. 2025. Github Besu Ethereum Client repository. Retrieved 2025-04-15 from https://github.com/hyperledger/ besu/ ISSTA069:22 Jie Ma, Ningyu He, Jinwen Xi, Mingzhe Xing, Haoyu Wang, Ying Gao, and Yinliang Yue

work page 2025

[34] [34]

Bo Jiang, Ye Liu, and W. K. Chan. 2018. ContractFuzzer: fuzzing smart contracts for vulnerability detection. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (Montpellier, France) (ASE ’18). Association for Computing Machinery, New York, NY, USA, 259–269. doi:10.1145/3238147.3238177

work page doi:10.1145/3238147.3238177 2018

[35] [35]

Shinhae Kim and Sungjae Hwang. 2023. EtherDiffer: Differential Testing on RPC Services of Ethereum Nodes. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (San Francisco, CA, USA) (ESEC/FSE 2023). Association for Computing Machinery, New York, NY, USA, 1333–1344. doi:10....

work page doi:10.1145/3611643.3616251 2023

[36] [36]

Kai Li, Jiaqi Chen, Xianghong Liu, Yuzhe Richard Tang, XiaoFeng Wang, and Xiapu Luo. 2021. As Strong As Its Weakest Link: How to Break Blockchain DApps at RPC Service. In28th Annual Network and Distributed System Security Symposium, NDSS 2021, virtually, February 21-25, 2021 . The Internet Society. https://www.ndss-symposium.org/ndss- paper/as-strong-as-i...

work page 2021

[37] [37]

Li Li, Jiawei Wang, and Haowei Quan. 2022. Scalpel: The Python Static Analysis Framework. arXiv:2202.11840 [cs.SE] https://arxiv.org/abs/2202.11840

work page arXiv 2022

[38] [38]

Tsz-On Li, Wenxi Zong, Yibo Wang, Haoye Tian, Ying Wang, Shing-Chi Cheung, and Jeff Kramer. 2023. Nuances are the Key: Unlocking ChatGPT to Find Failure-Inducing Tests with Differential Prompting. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE ’23) . 14–26. doi:10.1109/ASE56229.2023.00089

work page doi:10.1109/ase56229.2023.00089 2023

[39] [39]

Wen Li, Haoran Yang, Xiapu Luo, Long Cheng, and Haipeng Cai. 2023. PyRTFuzz: Detecting Bugs in Python Runtimes via Two-Level Collaborative Fuzzing. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (Copenhagen, Denmark) (CCS ’23). Association for Computing Machinery, New York, NY, USA, 1645–1659. doi:10.1145/3576915.3623166

work page doi:10.1145/3576915.3623166 2023

[40] [40]

Zhe Liu, Chunyang Chen, Junjie Wang, Mengzhuo Chen, Boyu Wu, Xing Che, Dandan Wang, and Qing Wang. 2024. Make LLM a Testing Expert: Bringing Human-like Interaction to Mobile GUI Testing via Functionality-aware Decisions. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (Lisbon, Portugal) (ICSE ’24). Association for Comp...

work page doi:10.1145/3597503.3639180 2024

[41] [41]

Fuchen Ma, Yuanliang Chen, Meng Ren, Yuanhang Zhou, Yu Jiang, Ting Chen, Huizhong Li, and Jiaguang Sun. 2023. LOKI: State-Aware Fuzzing Framework for the Implementation of Blockchain Consensus Protocols. In30th Annual Network and Distributed System Security Symposium, NDSS 2023, San Diego, California, USA, February 27 - March 3,

work page 2023

[42] [42]

https://www.ndss-symposium.org/ndss-paper/loki-state-aware-fuzzing-framework-for- the-implementation-of-blockchain-consensus-protocols/

The Internet Society. https://www.ndss-symposium.org/ndss-paper/loki-state-aware-fuzzing-framework-for- the-implementation-of-blockchain-consensus-protocols/

work page

[43] [43]

Jie Ma. 2025. OpDiffer: LLM-Assisted Opcode-Level Differential Testing of Ethereum Virtual Machine. doi:10.5281/zenodo. 15195943

work page doi:10.5281/zenodo 2025

[44] [44]

Pengxiang Ma, Ningyu He, Yuhua Huang, Haoyu Wang, and Xiapu Luo. 2024. Abusing the Ethereum Smart Contract Verification Services for Fun and Profit. In31st Annual Network and Distributed System Security Symposium, NDSS 2024, San Diego, California, USA, February 26 - March 1, 2024 . The Internet Society. https://www.ndss-symposium.org/ndss- paper/abusing-t...

work page 2024

[45] [45]

Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, Shashank Gupta, Bodhisattwa Prasad Majumder, Katherine Hermann, Sean Welleck, Amir Yazdanbakhsh, and Peter Clark. 2023. SELF-REFINE: iterative refinement with self-feedback. In Proceedings of the 37th International ...

work page 2023

[46] [46]

Dominik Maier, Fabian Fäßler, and Jean-Pierre Seifert. 2022. Uncovering Smart Contract VM Bugs Via Differen- tial Fuzzing. In Reversing and Offensive-Oriented Trends Symposium (Vienna, Austria) (ROOTS’21). Association for Computing Machinery, New York, NY, USA, 11–22. doi:10.1145/3503921.3503923

work page doi:10.1145/3503921.3503923 2022

[47] [47]

Marius van der Wijden Martin Holst Swende. 2020. EIP-3155: EVM trace specification [DRAFT] . Retrieved 2025-04-15 from https://eips.ethereum.org/EIPS/eip-3155

work page 2020

[48] [48]

NethermindEth. 2025. Github Nethermind Ethereum client repository . Retrieved 2025-04-15 from https://github.com/ NethermindEth/nethermind

work page 2025

[49] [49]

Beijing Academy of Blockchain and Edge Computing. 2025. chainmaker document . Retrieved 2025-04-15 from https://docs.chainmaker.org.cn

work page 2025

[50] [50]

Theofilos Petsios, Adrian Tang, Salvatore Stolfo, Angelos D Keromytis, and Suman Jana. 2017. Nezha: Efficient domain-independent differential testing. In 2017 IEEE Symposium on security and privacy (SP) (Berkeley, CA, USA). IEEE, 615–632. doi:10.1109/SP.2017.27

work page doi:10.1109/sp.2017.27 2017

[51] [51]

Moritz Schloegel, Nils Bars, Nico Schiller, Lukas Bernhard, Tobias Scharnowski, Addison Crump, Arash Ale-Ebrahim, Nicolai Bissantz, Marius Muench, and Thorsten Holz. 2024. SoK: Prudent Evaluation Practices for Fuzzing. In2024 IEEE Symposium on Security and Privacy (SP) (San Francisco, CA, USA). IEEE, 1974–1993. doi:10.1109/SP54263.2024.00137

work page doi:10.1109/sp54263.2024.00137 2024

[52] [52]

SealSC. 2025. Github SealEVM repository. Retrieved 2025-04-15 from https://github.com/SealSC/SealEVM OpDiffer: LLM-Assisted Opcode-Level Differential Testing of Ethereum Virtual Machine ISSTA069:23

work page 2025

[53] [53]

Chaofan Shou, Shangyin Tan, and Koushik Sen. 2023. ItyFuzz: Snapshot-Based Fuzzer for Smart Contract. InProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (Seattle, WA, USA) (ISSTA 2023). Association for Computing Machinery, New York, NY, USA, 322–333. doi:10.1145/3597926.3598059

work page doi:10.1145/3597926.3598059 2023

[54] [54]

smlXL Inc. 2025. An Ethereum Virtual Machine Opcodes Interactive Reference . Retrieved 2025-04-15 from https: //www.evm.codes/?fork=cancun

work page 2025

[55] [55]

Tianle Sun, Ningyu He, Jiang Xiao, Yinliang Yue, Xiapu Luo, and Haoyu Wang. 2024. All Your Tokens are Belong to Us: Demystifying Address Verification Vulnerabilities in Solidity Smart Contracts. In33rd USENIX Security Sympo- sium (USENIX Security 24) . USENIX Association, Philadelphia, PA, 3567–3584. https://www.usenix.org/conference/ usenixsecurity24/pre...

work page 2024

[56] [56]

Yuqiang Sun, Daoyuan Wu, Yue Xue, Han Liu, Haijun Wang, Zhengzi Xu, Xiaofei Xie, and Yang Liu. 2024. GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (Lisbon, Portugal) (ICSE ’24). Association for Computing Machinery, New...

work page doi:10.1145/3597503.3639117 2024

[57] [57]

Martin Holst Swende. 2025. Github Go evmlab repository . Retrieved 2025-04-15 from https://github.com/holiman/ goevmlab

work page 2025

[58] [58]

Christof Ferreira Torres, Antonio Ken Iannillo, Arthur Gervais, and Radu State. 2021. ConFuzzius: A Data Dependency- Aware Hybrid Fuzzer for Smart Contracts. In2021 IEEE European Symposium on Security and Privacy (EuroS&P) (Vienna, Austria). IEEE, 103–119. doi:10.1109/EuroSP51992.2021.00018

work page doi:10.1109/eurosp51992.2021.00018 2021

[59] [59]

Petar Tsankov, Andrei Dan, Dana Drachsler-Cohen, Arthur Gervais, Florian Bünzli, and Martin Vechev. 2018. Securify: Practical Security Analysis of Smart Contracts. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (Toronto, Canada) (CCS ’18). Association for Computing Machinery, New York, NY, USA, 67–82. doi:10.1145/...

work page doi:10.1145/3243734.3243780 2018

[60] [60]

Marius van der Wijden. 2025. Github FuzzyVM repository . Retrieved 2025-04-15 from https://github.com/ MariusVanDerWijden/FuzzyVM

work page 2025

[61] [61]

Sam Wilson. 2023. Ethereum Execution Layer Specification . Retrieved 2025-04-15 from https://blog.ethereum.org/2023/ 08/29/eel-spec

work page 2023

[62] [62]

Winter, Florena Buse, Daan de Graaf, Klaus von Gleissenthall, and Burcu Kulahcioglu Ozkan

Levin N. Winter, Florena Buse, Daan de Graaf, Klaus von Gleissenthall, and Burcu Kulahcioglu Ozkan. 2023. Randomized Testing of Byzantine Fault Tolerant Algorithms. Proc. ACM Program. Lang. 7, OOPSLA1, Article 101 (April 2023), 32 pages. doi:10.1145/3586053

work page doi:10.1145/3586053 2023

[63] [63]

Gavin Wood et al. 2014. Ethereum: A secure decentralised generalised transaction ledger. Ethereum project yellow paper 151, 2014 (2014), 1–32

work page 2014

[64] [64]

Shuohan Wu, Zihao Li, Luyi Yan, Weimin Chen, Muhui Jiang, Chenxu Wang, Xiapu Luo, and Hao Zhou. 2024. Are We There Yet? Unraveling the State-of-the-Art Smart Contract Fuzzers. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (Lisbon, Portugal) (ICSE ’24). Association for Computing Machinery, New York, NY, USA, Article 1...

work page doi:10.1145/3597503.3639152 2024

[65] [65]

Zhiyi Xue, Liangguo Li, Senyue Tian, Xiaohong Chen, Pingping Li, Liangyu Chen, Tingting Jiang, and Min Zhang

work page

[66] [66]

In Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (Vienna, Austria) (ISSTA 2024)

LLM4Fin: Fully Automating LLM-Powered Test Case Generation for FinTech Software Acceptance Testing. In Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (Vienna, Austria) (ISSTA 2024). Association for Computing Machinery, New York, NY, USA, 1643–1655. doi:10.1145/3650212.3680388

work page doi:10.1145/3650212.3680388 2024

[67] [67]

Youngseok Yang, Taesoo Kim, and Byung-Gon Chun. 2021. Finding Consensus Bugs in Ethereum via Multi-transaction Differential Fuzzing. In 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21) . USENIX Association, 349–365. https://www.usenix.org/conference/osdi21/presentation/yang

work page 2021

[68] [68]

Wuqi Zhang, Zhuo Zhang, Qingkai Shi, Lu Liu, Lili Wei, Yepang Liu, Xiangyu Zhang, and Shing-Chi Cheung. 2024. Nyx: Detecting Exploitable Front-Running Vulnerabilities in Smart Contracts. In 2024 IEEE Symposium on Security and Privacy (SP) (San Francisco, CA, USA). IEEE, 2198–2216. doi:10.1109/SP54263.2024.00146

work page doi:10.1109/sp54263.2024.00146 2024

[69] [69]

Zhijie Zhong, Zibin Zheng, Hong-Ning Dai, Qing Xue, Junjia Chen, and Yuhong Nan. 2024. PrettySmart: Detecting Permission Re-delegation Vulnerability for Token Behaviors in Smart Contracts. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (Lisbon, Portugal) (ICSE ’24). Association for Computing Machinery, New York, NY, U...

work page doi:10.1145/3597503.3639140 2024

[70] [70]

Shiyao Zhou, Muhui Jiang, Weimin Chen, Hao Zhou, Haoyu Wang, and Xiapu Luo. 2024. WADIFF: A Differential Testing Framework for WebAssembly Runtimes. In Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering (Echternach, Luxembourg) (ASE ’23). IEEE, 939–950. doi:10.1109/ASE56229.2023.00188 Received 2024-10-31; accepted ...

work page doi:10.1109/ase56229.2023.00188 2024