Evaluating Cryptographic API Misuse Detectors for Go
Pith reviewed 2026-05-08 03:08 UTC · model grok-4.3
The pith
An evaluation of four detectors on 328 Go projects finds 7,473 cryptographic API misuses with large differences in what each tool catches.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish a consolidated taxonomy of 14 cryptographic API misuse classes for Go and apply four state-of-the-art detectors to 328 security-critical open-source projects. This evaluation identifies 7,473 instances of misuse, with the tools showing significant variations in the types and numbers of issues they report.
What carries the argument
The consolidated taxonomy of 14 misuse classes used to systematically compare the coverage of CodeQL, Gopher, Gosec, and Snyk Code.
If this is right
- Security engineers should run multiple detectors together rather than rely on any single one.
- Research on new detectors should target the misuse classes that current tools cover least.
- The high count of misuses indicates that Go programmers need clearer guidance on correct crypto API usage.
- Tool developers can use the taxonomy to benchmark and expand their own detection rules.
- Security audits of Go codebases would gain from standardized evaluation methods based on this taxonomy.
Where Pith is reading between the lines
- Many deployed Go applications may carry avoidable weaknesses in encryption or authentication.
- Embedding the taxonomy directly into development environments could catch misuses earlier.
- Comparable studies in other languages would show whether the Go results reflect a general problem or a language-specific one.
- If one detector consistently finds more unique cases, it could become a recommended baseline for Go security checks.
Load-bearing premise
The four chosen tools represent current detectors and the 14-class taxonomy captures the relevant misuses in Go.
What would settle it
Re-running the same analysis on the 328 projects with new detectors that report either zero misuses or identical coverage to the existing four would challenge the reported prevalence and variation.
Figures
read the original abstract
Cryptographic API misuse represents a critical vulnerability class that undermines the security foundations of modern software. Yet, it remains largely unexplored in Go despite its dominance in security-critical infrastructure. This paper presents the first comprehensive study of cryptographic API misuse detection in Go, identifying and analyzing 4 state-of-the-art tools (CodeQL, Gopher, Gosec, and Snyk Code) and establishing a consolidated taxonomy of 14 relevant misuse classes. Through an experimental evaluation of 328 security-critical open-source Go projects, we discovered 7,473 cryptographic API misuses, providing insights into the prevalence and distribution of these vulnerabilities. Our systematic comparison reveals significant variations in misuse coverage, with immediate practical implications for security engineers and long-term implications for research in this domain.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents the first comprehensive study of cryptographic API misuse detection in Go. It evaluates four state-of-the-art tools (CodeQL, Gopher, Gosec, Snyk Code) against a consolidated taxonomy of 14 misuse classes, applies them to 328 security-critical open-source Go projects, and reports discovering 7,473 misuses along with significant variations in tool coverage and prevalence insights.
Significance. If the empirical results hold after validation, the work would provide the first large-scale data on cryptographic API misuse prevalence in Go, a language dominant in security-critical infrastructure. The scale (328 projects) and systematic tool comparison offer practical value for security engineers selecting detectors and could guide future research on improving misuse detection coverage.
major comments (2)
- Abstract and Evaluation section: The central claims of 7,473 discovered misuses, prevalence insights, and 'significant variations in misuse coverage' rest entirely on raw detector outputs. No description is given of how detected misuses were validated, how false positives were handled, or the precise configuration of each tool, leaving the quantitative results only partially supported and potentially confounded by differing false-positive rates or alert styles.
- Taxonomy and methodology: The consolidated taxonomy of 14 misuse classes is used to aggregate and compare results, yet the paper provides no independent validation or mapping of these classes to actual Go crypto API usage patterns in the studied projects. This directly affects the reliability of the coverage comparison and prevalence distribution claims.
minor comments (2)
- The abstract states the study has 'immediate practical implications' for security engineers, but the manuscript would benefit from more concrete, actionable recommendations tied to the observed tool differences.
- Ensure all tool versions, query configurations, and project selection criteria are documented with sufficient detail for full reproducibility of the 328-project corpus and alert aggregation process.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. The comments highlight important aspects of result validation and taxonomy transparency that we will address through targeted revisions to improve clarity and support for our claims.
read point-by-point responses
-
Referee: Abstract and Evaluation section: The central claims of 7,473 discovered misuses, prevalence insights, and 'significant variations in misuse coverage' rest entirely on raw detector outputs. No description is given of how detected misuses were validated, how false positives were handled, or the precise configuration of each tool, leaving the quantitative results only partially supported and potentially confounded by differing false-positive rates or alert styles.
Authors: We agree that additional transparency is needed. The 7,473 figure represents the total alerts raised by the four tools (with deduplication across overlapping detections) when applied to the 328 projects. Our evaluation intentionally reports raw detector outputs to compare coverage as practitioners would encounter it. We did not perform exhaustive manual validation of all alerts due to scale, but we will revise the Evaluation section to: (1) specify the exact version and configuration parameters used for each tool (CodeQL, Gopher, Gosec, Snyk Code), (2) add a limitations subsection discussing potential false positives and alert-style differences, and (3) report results from a sampled manual review of 200 alerts to provide some empirical grounding. These changes will better contextualize the prevalence insights and coverage variations without altering the core comparative findings. revision: partial
-
Referee: Taxonomy and methodology: The consolidated taxonomy of 14 misuse classes is used to aggregate and compare results, yet the paper provides no independent validation or mapping of these classes to actual Go crypto API usage patterns in the studied projects. This directly affects the reliability of the coverage comparison and prevalence distribution claims.
Authors: The 14-class taxonomy was constructed by merging the misuse categories explicitly supported by the four detectors with patterns from prior cryptographic misuse studies. Each detected alert was then manually mapped to one of the 14 classes based on the flagged API call and context. We will expand the Taxonomy and Methodology sections to include: (1) the full derivation process with references to source taxonomies, (2) a table showing how each tool's alerts map to the classes, and (3) concrete examples of Go crypto API usage patterns (e.g., from the studied projects) for each class. This added detail will strengthen the justification for the aggregation and comparison. revision: partial
Circularity Check
No circularity: pure empirical evaluation with no derivations or self-referential claims
full rationale
This is an empirical tool-evaluation study that runs four external detectors (CodeQL, Gopher, Gosec, Snyk Code) on 328 external Go projects, counts alerts, and compares coverage under a 14-class taxonomy. No equations, fitted parameters, predictions, or first-principles derivations appear in the abstract or described methodology. Central quantitative claims (7,473 misuses, coverage variations) are direct aggregates of tool outputs rather than reductions to the paper's own inputs. Self-citations, if present, are not load-bearing for any uniqueness theorem or ansatz. The skeptic concern about missing ground-truth precision is a validity issue, not a circularity issue in the derivation chain. The study is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The four selected tools represent the state-of-the-art for cryptographic API misuse detection in Go.
- ad hoc to paper The consolidated taxonomy of 14 misuse classes is relevant and sufficiently complete for Go.
Reference graph
Works this paper leans on
-
[1]
2025.CodeQL for Go (v1.1.13)
work page 2025
-
[2]
Sharmin Afrose, Ya Xiao, Sazzadur Rahaman, Barton P. Miller, and Danfeng Yao
-
[3]
Evaluation of Static Vulnerability Detection Tools With Java Cryptographic API Benchmarks.IEEE Transactions on Software Engineering49, 2 (Feb. 2023), 485–497. doi:10.1109/TSE.2022.3154717
-
[4]
Martin R Albrecht, Kenneth G Paterson, and Gaven J Watson. 2009. Plaintext recovery attacks against SSH. In2009 30th IEEE Symposium on Security and Privacy. IEEE, 16–26
work page 2009
-
[5]
Amit Seal Ami, Nathan Cooper, Kaushal Kafle, Kevin Moran, Denys Poshyvanyk, and Adwait Nadkarni. 2022. Why Crypto-detectors Fail: A Systematic Evaluation of Cryptographic Misuse Detection Techniques. In2022 IEEE Symposium on Security and Privacy (SP). 614–631. doi:10.1109/SP46214.2022.9833582
-
[6]
Alex Biryukov, Daniel Dinu, and Dmitry Khovratovich. 2016. Argon2: New Generation of Memory-Hard Functions for Password Hashing and Other Applications. In2016 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, Saarbrucken, 292–302. doi:10.1109/EuroSP.2016.31
-
[7]
Jinbao Chen, Boyao Ding, Yu Zhang, Qingwei Li, and Fugen Tang. 2025. An Empirical Study of Cgo Usage in Go Projects: Distribution, Purposes, Patterns and Critical Issues. (February 2025). SSRN:5153961 doi:10.2139/ssrn.5153961
-
[8]
Yikang Chen, Yibo Liu, Ka Lok Wu, Duc V Le, and Sze Yiu Chau. 2024. Towards Precise Reporting of Cryptographic Misuses. InProceedings 2024 Network and Distributed System Security Symposium. Internet Society, San Diego, CA, USA. doi:10.14722/ndss.2024.241032
-
[9]
CISA. 2014. SSL 3.0 POODLE Attack. https://www.cisa.gov/news-events/alerts/ 2014/10/17/ssl-30-protocol-vulnerability-and-poodle-attack. Accessed: 2025-05- 30
work page 2014
-
[10]
2025.gosec: Go Security Checker (v2.22.4)
Cosmin Cojocar, Grant Murphy, and SecureGo Team. 2025.gosec: Go Security Checker (v2.22.4). https://github.com/securego/gosec
work page 2025
-
[11]
Russ Cox and Filippo Valsorda. [n. d.]. Secure Randomness in Go 1.22. https://go.dev/blog/chacha8rand. Accessed: 2025-05-30
work page 2025
-
[12]
Roya Ensafi, Philipp Winter, Abdullah Mueen, and Jedidiah R Crandall. 2015. Analyzing the Great Firewall of China over space and time.Proceedings on privacy enhancing technologies(2015)
work page 2015
-
[13]
Ehsan Firouzi, Mohammad Ghafari, and Mike Ebrahimi. 2024. ChatGPT’s Potential in Cryptography Misuse Detection: A Comparative Analysis with Static Analysis Tools. InProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. ACM, Barcelona Spain, 582–588. doi:10.1145/3674805.3695408
- [14]
-
[15]
Xiangxin Guo, Shijie Jia, Jingqiang Lin, Yuan Ma, Fangyu Zheng, Guangzheng Li, Bowen Xu, Yueqiang Cheng, and Kailiang Ji. 2024. CryptoPyt: Unraveling Python Cryptographic APIs Misuse with Precise Static Taint Analysis. In2024 Annual Computer Security Applications Conference (ACSAC). IEEE, 1075–1091
work page 2024
-
[16]
Paul E. Hoffman and Bruce Schneier. 2005. Attacks on Cryptographic Hashes in Internet Protocols. RFC 4270. doi:10.17487/RFC4270 Num Pages: 12
-
[17]
IETF. 2021. Deprecating TLS 1.0 and TLS 1.1. RFC 8996. https://datatracker.ietf. org/doc/rfc8996/
work page 2021
-
[18]
Michael Jones, John Bradley, and Nat Sakimura. 2015. JSON Web Token (JWT). RFC 7519. https://datatracker.ietf.org/doc/html/rfc7519
work page 2015
-
[19]
Stefan Krüger, Sarah Nadi, Michael Reif, Karim Ali, Mira Mezini, Eric Bodden, Florian Göpfert, Felix Günther, Christian Weinert, Daniel Demmler, et al. 2017. Cognicrypt: Supporting developers in using cryptography. In2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 931–936
work page 2017
-
[20]
Wenqing Li, Shijie Jia, Limin Liu, Fangyu Zheng, Yuan Ma, and Jingqiang Lin. 2022. Cryptogo: Automatic detection of go cryptographic api misuses. InProceedings of the 38th Annual Computer Security Applications Conference. 318–331
work page 2022
- [21]
-
[22]
MITRE. [n. d.]. CWE-1204: Generation of Weak Initialization Vector (IV). https: //cwe.mitre.org/data/definitions/1204.html. Accessed: 2025-05-30
work page 2025
-
[23]
Seyedehzahra Mosavi, Chadni Islam, Muhammad Ali Babar, Sharif Abuadbba, and Kristen Moore. 2023. Detecting Misuse of Security APIs: A Systematic Review. Comput. Surveys(2023)
work page 2023
-
[24]
Sarah Nadi, Stefan Krüger, Mira Mezini, and Eric Bodden. 2016. Jumping through hoops: Why do Java developers struggle with cryptography APIs?. InProceedings of the 38th International Conference on Software Engineering. 935–946
work page 2016
-
[25]
National Institute of Standards and Technology. 2025. NVD - CVE-2025-66491. https://nvd.nist.gov/vuln/detail/CVE-2025-66491. Accessed: 2026-01-25
work page 2025
-
[26]
Nikhil Patnaik, Joseph Hallett, and Awais Rashid. 2019. Usability Smells: An Analysis of {Developers’} Struggle With Crypto Libraries. InFifteenth Symposium on Usable Privacy and Security (SOUPS 2019). 245–257
work page 2019
-
[27]
Luca Piccolboni, Giuseppe Di Guglielmo, Luca P Carloni, and Simha Sethumad- havan. 2021. Crylogger: Detecting crypto misuses dynamically. In2021 IEEE Symposium on Security and Privacy (SP). IEEE, 1972–1989
work page 2021
-
[28]
Niels Provos and David Mazieres. 1999. A future-adaptable password scheme.. In USENIX annual technical conference, FREENIX track, Vol. 1999. 81–91
work page 1999
-
[29]
Sazzadur Rahaman, Ya Xiao, Sharmin Afrose, Fahad Shaon, Ke Tian, Miles Frantz, Murat Kantarcioglu, and Danfeng Yao. 2019. Cryptoguard: High precision detection of cryptographic vulnerabilities in massive-sized java projects. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. 2455–2472
work page 2019
-
[30]
2025.Snyk Code for Go (v1.1297.1)
Snyk. 2025.Snyk Code for Go (v1.1297.1). https://docs.snyk.io/supported- languages-package-managers-and-frameworks/go
work page 2025
-
[31]
Meltem Sonmez Turan. 2024.Keyed-Hash Message Authentication Code (HMAC): Specification of HMAC and Recommendations for Message Authentication. Technical Report NIST SP 800-224 ipd. National Institute of Standards and Technology, Gaithersburg, MD. NIST SP 800–224 ipd pages. doi:10.6028/NIST.SP. 800-224.ipd
-
[32]
The Go Team. 2024. x/crypto/ssh: misuse of ServerConfig.PublicKeyCallback may cause authorization bypass. https://github.com/golang/go/issues/70779. Accessed: 2025-05-30
work page 2024
-
[33]
Anna-Katharina Wickert, Lars Baumgärtner, Florian Breitfelder, and Mira Mezini
-
[34]
Python crypto misuses in the wild. InProceedings of the 15th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). 1–6
-
[35]
Anna-Katharina Wickert, Lars Baumgärtner, Michael Schlichtig, Krishna Narasimhan, and Mira Mezini. 2022. To fix or not to fix: a critical study of crypto- misuses in the wild. In2022 IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). IEEE, 315–322
work page 2022
- [36]
-
[37]
Ying Zhang, Md Mahir Asef Kabir, Ya Xiao, Danfeng Yao, and Na Meng. 2022. Automatic detection of Java cryptographic API misuses: Are we there yet?IEEE Transactions on Software Engineering49, 1 (2022), 288–303
work page 2022
-
[38]
Yuexi Zhang, Bingyu Li, Jingqiang Lin, Linghui Li, Jiaju Bai, Shijie Jia, and Qianhong Wu. 2024. Gopher: High-Precision and Deep-Dive Detection of Cryptographic API Misuse in the Go Ecosystem. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security. 2978–2992. 8
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.