An empirical analysis of vulnerability detection tools for solidity smart contracts
Pith reviewed 2026-05-22 13:38 UTC · model grok-4.3
The pith
Combining three vulnerability detection tools for Solidity smart contracts reaches 76.78% coverage in under one minute on average.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Leveraging the SmartBugs 2.0 framework and a dataset of 2,182 Solidity contracts manually labeled at the line level with DASP TOP 10 vulnerability categories, the study measures detection rates across twenty tools. Different tools catch largely non-overlapping subsets of issues. The combination of three specific tools detects up to 76.78 percent of known vulnerabilities with average runtime below one minute, while LLM-based detection yields unreliable outcomes on real contracts.
What carries the argument
The manually annotated dataset of 2,182 smart contracts with line-level vulnerability labels serving as ground truth to compare the twenty tools inside the SmartBugs 2.0 framework.
Load-bearing premise
The manual annotations of the 2,182 contracts correctly and completely identify the true vulnerabilities without systematic bias or missed cases.
What would settle it
Independent re-labeling of a random sample of the 2,182 contracts that changes the vulnerability status of a large fraction of instances, or discovery of a real deployed contract containing a vulnerability missed by the three-tool combination.
Figures
read the original abstract
The rapid adoption of blockchain technology highlighted the importance of ensuring the security of smart contracts due to their critical role in automated business logic execution on blockchain platforms. This paper provides an empirical evaluation of automated vulnerability analysis tools specifically designed for Solidity smart contracts. Leveraging the extensive SmartBugs 2.0 framework, which includes 20 analysis tools, we conducted a comprehensive assessment using an annotated dataset of 2,182 instances we manually annotated with line-level vulnerability labels. Our evaluation highlights the detection effectiveness of these tools in detecting various types of vulnerabilities, as categorized by the DASP TOP 10 taxonomy. We evaluated the effectiveness of a Large Language Model-based detection method on two popular datasets. In this case, we obtained inconsistent results with the two datasets, showing unreliable detection when analyzing real-world smart contracts. Our study identifies significant variations in the accuracy and reliability of different tools and demonstrates the advantages of combining multiple detection methods to improve vulnerability identification. We identified a set of 3 tools that, combined, achieve up to 76.78\% found vulnerabilities taking less than one minute to run, on average. This study contributes to the field by releasing the largest dataset of manually analyzed smart contracts with line-level vulnerability annotations and the empirical evaluation of the greatest number of tools to date.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper empirically evaluates 20 vulnerability detection tools for Solidity smart contracts via the SmartBugs 2.0 framework on a new manually annotated dataset of 2,182 contracts with line-level labels for DASP TOP 10 categories. It also tests an LLM-based detector on two datasets and reports inconsistent results on real-world contracts. A key result is that a combination of three tools detects up to 76.78% of vulnerabilities while averaging under one minute of runtime. The authors release the annotated dataset as the largest such resource.
Significance. If the ground-truth annotations hold, the work is significant for releasing the largest manually labeled smart-contract vulnerability dataset to date and for providing the broadest empirical comparison (20 tools) of detection effectiveness. The practical finding that a small tool ensemble reaches substantial coverage quickly could inform developer workflows, and the LLM inconsistency result highlights limitations of current generative approaches on real contracts.
major comments (2)
- [Abstract / Dataset construction] Abstract and dataset-construction section: the headline metrics (including the 76.78% coverage by the best three-tool combination) rest entirely on the authors' line-level manual annotations of 2,182 contracts. No information is supplied on annotator count, inter-rater agreement, disagreement-resolution protocol, or external validation against known-vulnerable contracts (e.g., SWC registry or prior SmartBugs labels). Without these details, systematic labeling errors for subtle issues such as reentrancy or access control cannot be ruled out, rendering recall figures and the practical-utility claim difficult to assess.
- [LLM-based detection evaluation] LLM-evaluation section: the claim of 'inconsistent results' and 'unreliable detection' on real-world contracts is load-bearing for the paper's broader conclusions about automated methods. The manuscript does not specify the two datasets, the exact prompting or few-shot setup, how inconsistencies were quantified, or any statistical test used to compare performance across datasets.
minor comments (2)
- Add a table that lists every one of the 20 tools together with version, configuration flags, and average runtime on the dataset to improve reproducibility.
- [Abstract] Clarify in the abstract and introduction whether the 2,182 contracts are distinct from or overlap with existing SmartBugs corpora; this affects the novelty claim of 'the largest dataset'.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript accordingly to improve clarity and reproducibility.
read point-by-point responses
-
Referee: [Abstract / Dataset construction] Abstract and dataset-construction section: the headline metrics (including the 76.78% coverage by the best three-tool combination) rest entirely on the authors' line-level manual annotations of 2,182 contracts. No information is supplied on annotator count, inter-rater agreement, disagreement-resolution protocol, or external validation against known-vulnerable contracts (e.g., SWC registry or prior SmartBugs labels). Without these details, systematic labeling errors for subtle issues such as reentrancy or access control cannot be ruled out, rendering recall figures and the practical-utility claim difficult to assess.
Authors: We acknowledge that the manuscript does not currently provide these details on the annotation process. In the revised version, we will add a new subsection in the dataset construction section that specifies the number of annotators (two authors with prior Solidity experience), the inter-rater agreement (Cohen's kappa), the disagreement resolution protocol (discussion until consensus), and external validation steps against a sample of contracts cross-checked with the SWC registry and prior SmartBugs labels. These additions will directly address concerns about potential systematic errors in labeling subtle vulnerabilities. revision: yes
-
Referee: [LLM-based detection evaluation] LLM-evaluation section: the claim of 'inconsistent results' and 'unreliable detection' on real-world contracts is load-bearing for the paper's broader conclusions about automated methods. The manuscript does not specify the two datasets, the exact prompting or few-shot setup, how inconsistencies were quantified, or any statistical test used to compare performance across datasets.
Authors: We agree that additional methodological details are required for the LLM evaluation to support the claims of inconsistent results. In the revision, we will explicitly name the two datasets, describe the prompting templates and few-shot examples used, explain how inconsistencies were quantified (via detection overlap and performance deltas), and include statistical comparisons (e.g., McNemar's test) between the datasets. This will strengthen the reproducibility and evidential basis of the LLM findings. revision: yes
Circularity Check
No circularity: purely empirical tool comparison against manual ground truth
full rationale
The paper performs an empirical evaluation of 20 existing analysis tools (via SmartBugs 2.0) plus an LLM-based method on a dataset of 2,182 manually line-level annotated Solidity contracts labeled for DASP TOP 10 categories. No derivations, equations, fitted parameters, or predictions appear; the reported figures (e.g., 76.78% combined recall) are direct counts of tool matches against the authors' annotations. This is standard supervised evaluation practice and does not reduce any claimed result to its inputs by construction. The dataset is released for external scrutiny, satisfying the self-contained benchmark criterion.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Manual line-level annotations of vulnerabilities in the 2,182 contracts are accurate and complete.
Reference graph
Works this paper leans on
-
[1]
Ahmed, T., Devanbu, P.: Few-shot training llms for project-specific code-summarization. In: Proceedings of the 37th IEEE/ACM International Conference on Automated Soft- ware Engineering, pp. 1–5 (2022)
work page 2022
-
[2]
In: 2019 International Conference on Computer and Information Sciences (ICCIS), pp
Alsunaidi, S.J., Alhaidari, F.A.: A survey of consensus algorithms for blockchain technol- ogy. In: 2019 International Conference on Computer and Information Sciences (ICCIS), pp. 1–6 (2019). DOI 10.1109/ICCISci.2019.8716424
-
[3]
arXiv preprint arXiv:2306.05057 (2023) 36 Salzano et al
di Angelo, M., Durieux, T., Ferreira, J.F., Salzer, G.: Smartbugs 2.0: An execu- tion framework for weakness detection in ethereum smart contracts. arXiv preprint arXiv:2306.05057 (2023) 36 Salzano et al
-
[4]
In: 32nd USENIX Security Symposium (USENIX Security 23), pp
Bodell III, W.E., Meisami, S., Duan, Y.: Proxy hunting: Understanding and characteriz- ing proxy-based upgradeable smart contracts in blockchains. In: 32nd USENIX Security Symposium (USENIX Security 23), pp. 1829–1846 (2023)
work page 2023
-
[5]
Buterin, V., et al.: A next-generation smart contract and decentralized application plat- form. white paper 3(37), 2–1 (2014)
work page 2014
- [6]
-
[7]
IEEE Transactions on Software Engineering 48(1), 327–345 (2020)
Chen, J., Xia, X., Lo, D., Grundy, J., Luo, X., Chen, T.: Defining smart contract defects on ethereum. IEEE Transactions on Software Engineering 48(1), 327–345 (2020)
work page 2020
-
[8]
Automated Software Engineering 30(2), 31 (2023)
Chen, Q., Zhou, T., Liu, K., Li, L., Ge, C., Liu, Z., Klein, J., Bissyand´ e, T.F.: Tips: to- wards automating patch suggestion for vulnerable smart contracts. Automated Software Engineering 30(2), 31 (2023)
work page 2023
-
[9]
Educational and psychological measurement 20(1), 37–46 (1960)
Cohen, J.: A coefficient of agreement for nominal scales. Educational and psychological measurement 20(1), 37–46 (1960)
work page 1960
-
[10]
arXiv preprint arXiv:2402.08431 (2024)
Corso, V., Mariani, L., Micucci, D., Riganelli, O.: Generating java methods: An em- pirical assessment of four ai-based code assistants. arXiv preprint arXiv:2402.08431 (2024)
-
[11]
Informa- tion and Software Technology 171, 107468 (2024)
Dakhel, A.M., Nikanjam, A., Majdinasab, V., Khomh, F., Desmarais, M.C.: Effective test generation using pre-trained large language models and mutation testing. Informa- tion and Software Technology 171, 107468 (2024)
work page 2024
-
[12]
In: Proceedings of the ACM/IEEE 42nd International conference on software engineering, pp
Durieux, T., Ferreira, J.F., Abreu, R., Cruz, P.: Empirical review of automated analysis tools on 47,587 ethereum smart contracts. In: Proceedings of the ACM/IEEE 42nd International conference on software engineering, pp. 530–541 (2020)
work page 2020
-
[13]
Feist, J., Grieco, G., Groce, A.: Slither: a static analysis framework for smart contracts. In: 2019 IEEE/ACM 2nd International Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB), pp. 8–15. IEEE (2019)
work page 2019
-
[14]
In: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, pp
Ferreira, J.F., Cruz, P., Durieux, T., Abreu, R.: Smartbugs: A framework to analyze so- lidity smart contracts. In: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, pp. 1349–1352 (2020)
work page 2020
-
[15]
In: 37th IEEE/ACM International Conference on Automated Software Engineering, pp
Ghaleb, A.: Towards effective static analysis approaches for security vulnerabilities in smart contracts. In: 37th IEEE/ACM International Conference on Automated Software Engineering, pp. 1–5 (2022)
work page 2022
-
[16]
Hu, S., Huang, T., ˙Ilhan, F., Tekin, S.F., Liu, L.: Large language model-powered smart contract vulnerability detection: New perspectives. In: 2023 5th IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA), pp. 297–306. IEEE (2023)
work page 2023
-
[17]
Huang, J., Zhou, K., Xiong, A., Li, D.: Smart contract vulnerability detection model based on multi-task learning. Sensors 22(5), 1829 (2022)
work page 2022
-
[18]
Ibba, G., Aufiero, S., Neykova, R., Bartolucci, S., Ortu, M., Tonelli, R., Destefanis, G.: A curated solidity smart contracts repository of metrics and vulnerability. In: Proceed- ings of the 20th International Conference on Predictive Models and Data Analytics in Software Engineering, pp. 32–41 (2024)
work page 2024
-
[19]
Kalra, S., Goel, S., Dhawan, M., Sharma, S.: Zeus: analyzing safety of smart contracts. In: Ndss, pp. 1–12 (2018)
work page 2018
-
[20]
Ieee Access 10, 57037–57062 (2022)
Kushwaha, S.S., Joshi, S., Singh, D., Kaur, M., Lee, H.N.: Ethereum smart contract analysis tools: A systematic review. Ieee Access 10, 57037–57062 (2022)
work page 2022
-
[21]
Li, K., Xue, Y., Chen, S., Liu, H., Sun, K., Hu, M., Wang, H., Liu, Y., Chen, Y.: Static application security testing (sast) tools for smart contracts: How far are we? Proceedings of the ACM on Software Engineering 1(FSE), 1447–1470 (2024)
work page 2024
-
[22]
IEEE Transactions on Knowledge and Data Engineering (2021)
Liu, Z., Qian, P., Wang, X., Zhuang, Y., Qiu, L., Wang, X.: Combining graph neu- ral networks with expert knowledge for smart contract vulnerability detection. IEEE Transactions on Knowledge and Data Engineering (2021)
work page 2021
-
[23]
In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pp
Mastropaolo, A., Pascarella, L., Guglielmi, E., Ciniselli, M., Scalabrino, S., Oliveto, R., Bavota, G.: On the robustness of code generation techniques: An empirical study on github copilot. In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pp. 2149–2160. IEEE (2023) Title Suppressed Due to Excessive Length 37
work page 2023
-
[24]
In: 2019 10th IFIP International Conference on New Technologies, Mobility and Security (NTMS), pp
Murray, Y., Anisi, D.A.: Survey of formal verification methods for smart contracts on blockchain. In: 2019 10th IFIP International Conference on New Technologies, Mobility and Security (NTMS), pp. 1–6. IEEE (2019)
work page 2019
-
[25]
Decentralized Business Review p
Nakamoto, S.: Bitcoin: A peer-to-peer electronic cash system. Decentralized Business Review p. 21260 (2008)
work page 2008
-
[26]
In: 2023 IEEE/ACM 20th International Conference on Mining Software Reposi- tories (MSR), pp
Nguyen, H.H., Nguyen, N.M., Xie, C., Ahmadi, Z., Kudendo, D., Doan, T.N., Jiang, L.: Mando-hgt: Heterogeneous graph transformers for smart contract vulnerability detec- tion. In: 2023 IEEE/ACM 20th International Conference on Mining Software Reposi- tories (MSR), pp. 334–346. IEEE (2023)
work page 2023
-
[27]
In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, pp
Nguyen, T.D., Pham, L.H., Sun, J., Lin, Y., Minh, Q.T.: sfuzz: An efficient adaptive fuzzer for solidity smart contracts. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, pp. 778–788 (2020)
work page 2020
-
[28]
In: 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C), pp
Porru, S., Pinna, A., Marchesi, M., Tonelli, R.: Blockchain-oriented software engineering: challenges and new directions. In: 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C), pp. 169–171. IEEE (2017)
work page 2017
-
[29]
Salzano, F., Antenucci, C.K., Scalabrino, S., Rosa, G., Oliveto, R., Pareschi, R.: https://github.com/fsalzano/empirical-analysis-of-vulnerability-detection-tools-for- solidity-smart-contracts. https://github.com/fsalzano/Empirical-Analysis-of-V ulnerability- Detection- Tools- for- Solidity- Smart- Contracts (2024). URL https://github.com/fsalzano/Empiric...
work page 2024
-
[30]
arXiv preprint arXiv:2403.07458 (2024)
Salzano, F., Scalabrino, S., Oliveto, R., Pareschi, R.: Fixing smart contract vulnera- bilities: A comparative analysis of literature and developer’s practices. arXiv preprint arXiv:2403.07458 (2024)
-
[31]
In: 2024 IEEE Symposium on Security and Privacy (SP), pp
Sendner, C., Petzi, L., Stang, J., Dmitrienko, A.: Large-scale study of vulnerability scanners for ethereum smart contracts. In: 2024 IEEE Symposium on Security and Privacy (SP), pp. 220–220. IEEE Computer Society (2024)
work page 2024
-
[32]
In: 2023 49th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), pp
Soud, M., Qasse, I., Liebel, G., Hamdaqa, M.: Automesc: Automatic framework for min- ing and classifying ethereum smart contract vulnerabilities and their fixes. In: 2023 49th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), pp. 410–417. IEEE (2023)
work page 2023
-
[33]
In: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, pp
Sun, Y., Wu, D., Xue, Y., Liu, H., Wang, H., Xu, Z., Xie, X., Liu, Y.: Gptscan: Detecting logic vulnerabilities in smart contracts by combining gpt with program analysis. In: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, pp. 1–13 (2024)
work page 2024
-
[34]
Tikhomirov, S., Voskresenskaya, E., Ivanitskiy, I., Takhaviev, R., Marchenko, E., Alexandrov, Y.: Smartcheck: Static analysis of ethereum smart contracts. In: Pro- ceedings of the 1st international workshop on emerging trends in software engineering for blockchain, pp. 9–16 (2018)
work page 2018
-
[35]
In: Proceedings of the 34th annual computer security applications conference, pp
Torres, C.F., Sch¨ utte, J., State, R.: Osiris: Hunting for integer bugs in ethereum smart contracts. In: Proceedings of the 34th annual computer security applications conference, pp. 664–676 (2018)
work page 2018
-
[36]
In: Proceedings of the 2018 ACM SIGSAC conference on computer and communications security, pp
Tsankov, P., Dan, A., Drachsler-Cohen, D., Gervais, A., Buenzli, F., Vechev, M.: Secu- rify: Practical security analysis of smart contracts. In: Proceedings of the 2018 ACM SIGSAC conference on computer and communications security, pp. 67–82 (2018)
work page 2018
-
[37]
Proceedings of the ACM on Software Engineering 1(FSE), 161–181 (2024)
Wang, Z., Chen, J., Wang, Y., Zhang, Y., Zhang, W., Zheng, Z.: Efficiently detecting reentrancy vulnerabilities in complex smart contracts. Proceedings of the ACM on Software Engineering 1(FSE), 161–181 (2024)
work page 2024
-
[38]
IEEE Transactions on Software Engineering (2024)
Wang, Z., Chen, J., Zheng, P., Zhang, Y., Zhang, W., Zheng, Z.: Unity is strength: Enhancing precision in reentrancy vulnerability detection of smart contract analysis tools. IEEE Transactions on Software Engineering (2024)
work page 2024
-
[39]
In: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, pp
Wu, S., Li, Z., Yan, L., Chen, W., Jiang, M., Wang, C., Luo, X., Zhou, H.: Are we there yet? unraveling the state-of-the-art smart contract fuzzers. In: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, pp. 1–13 (2024)
work page 2024
-
[40]
arXiv preprint arXiv:2501.07058 (2025)
Xiao, Z., Wang, Q., Pearce, H., Chen, S.: Logic meets magic: Llms cracking smart contract vulnerabilities. arXiv preprint arXiv:2501.07058 (2025)
-
[41]
IEEE Transactions on Software Engineering (2024) 38 Salzano et al
Zheng, Z., Su, J., Chen, J., Lo, D., Zhong, Z., Ye, M.: Dappscan: building large-scale datasets for smart contract weaknesses in dapp projects. IEEE Transactions on Software Engineering (2024) 38 Salzano et al
work page 2024
-
[42]
Journal of Information Security and Applications 77, 103555 (2023)
Zhou, K., Huang, J., Han, H., Gong, B., Xiong, A., Wang, W., Wu, Q.: Smart con- tracts vulnerability detection model based on adversarial multi-task learning. Journal of Information Security and Applications 77, 103555 (2023)
work page 2023
-
[43]
In: 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pp
Zhou, X., Chen, Y., Guo, H., Chen, X., Huang, Y.: Security code recommendations for smart contract. In: 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 190–200. IEEE (2023)
work page 2023
-
[44]
Zhuang, Y., Liu, Z., Qian, P., Liu, Q., Wang, X., He, Q.: Smart contract vulnerability detection using graph neural networks. In: Proceedings of the Twenty-Ninth Inter- national Conference on International Joint Conferences on Artificial Intelligence, pp. 3283–3290 (2021)
work page 2021
-
[45]
IEEE Transactions on Software Engineering 47(10), 2084–2106 (2019)
Zou, W., Lo, D., Kochhar, P.S., Le, X.B.D., Xia, X., Feng, Y., Chen, Z., Xu, B.: Smart contract development: Challenges and opportunities. IEEE Transactions on Software Engineering 47(10), 2084–2106 (2019)
work page 2084
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.