pith. sign in

arxiv: 2604.24085 · v1 · submitted 2026-04-27 · 💻 cs.CR · cs.SE

Evaluating Cryptographic API Misuse Detectors for Go

Pith reviewed 2026-05-08 03:08 UTC · model grok-4.3

classification 💻 cs.CR cs.SE
keywords cryptographic API misuseGo programmingstatic analysisvulnerability detectionsecurity tools evaluationopen source projectsmisuse taxonomycrypto vulnerabilities
0
0 comments X

The pith

An evaluation of four detectors on 328 Go projects finds 7,473 cryptographic API misuses with large differences in what each tool catches.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how well current detectors identify cryptographic API misuses in Go code. It applies four tools to 328 security-critical open-source projects and counts the problems they report using a shared list of 14 misuse types. The experiment shows thousands of misuses exist across the projects. The tools differ substantially in the types and numbers of issues they flag. This matters because Go powers much of the secure infrastructure that relies on correct cryptography.

Core claim

The authors establish a consolidated taxonomy of 14 cryptographic API misuse classes for Go and apply four state-of-the-art detectors to 328 security-critical open-source projects. This evaluation identifies 7,473 instances of misuse, with the tools showing significant variations in the types and numbers of issues they report.

What carries the argument

The consolidated taxonomy of 14 misuse classes used to systematically compare the coverage of CodeQL, Gopher, Gosec, and Snyk Code.

If this is right

  • Security engineers should run multiple detectors together rather than rely on any single one.
  • Research on new detectors should target the misuse classes that current tools cover least.
  • The high count of misuses indicates that Go programmers need clearer guidance on correct crypto API usage.
  • Tool developers can use the taxonomy to benchmark and expand their own detection rules.
  • Security audits of Go codebases would gain from standardized evaluation methods based on this taxonomy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Many deployed Go applications may carry avoidable weaknesses in encryption or authentication.
  • Embedding the taxonomy directly into development environments could catch misuses earlier.
  • Comparable studies in other languages would show whether the Go results reflect a general problem or a language-specific one.
  • If one detector consistently finds more unique cases, it could become a recommended baseline for Go security checks.

Load-bearing premise

The four chosen tools represent current detectors and the 14-class taxonomy captures the relevant misuses in Go.

What would settle it

Re-running the same analysis on the 328 projects with new detectors that report either zero misuses or identical coverage to the existing four would challenge the reported prevalence and variation.

Figures

Figures reproduced from arXiv: 2604.24085 by Martin Monperrus, Vivi Andersson.

Figure 1
Figure 1. Figure 1: Lines of code per repository 2.4 Protocol for RQ1 We conduct a systematic qualitative comparison of the 4 tools iden￾tified in subsection 2.2: CodeQL, Gopher, Gosec, and Snyk Code. We compare the detection coverage across tools and their technical implementation. Taxonomy Rules. To enable systematic comparison across tools with divergent coverage and classification approaches, we establish a unified taxono… view at source ↗
Figure 2
Figure 2. Figure 2: Detection agreement across all rules shows that view at source ↗
Figure 3
Figure 3. Figure 3: Tool detection overlaps for 3 representative rules. Rule 05 is extreme in the lack of agreement. Rule 11 shows that view at source ↗
read the original abstract

Cryptographic API misuse represents a critical vulnerability class that undermines the security foundations of modern software. Yet, it remains largely unexplored in Go despite its dominance in security-critical infrastructure. This paper presents the first comprehensive study of cryptographic API misuse detection in Go, identifying and analyzing 4 state-of-the-art tools (CodeQL, Gopher, Gosec, and Snyk Code) and establishing a consolidated taxonomy of 14 relevant misuse classes. Through an experimental evaluation of 328 security-critical open-source Go projects, we discovered 7,473 cryptographic API misuses, providing insights into the prevalence and distribution of these vulnerabilities. Our systematic comparison reveals significant variations in misuse coverage, with immediate practical implications for security engineers and long-term implications for research in this domain.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents the first comprehensive study of cryptographic API misuse detection in Go. It evaluates four state-of-the-art tools (CodeQL, Gopher, Gosec, Snyk Code) against a consolidated taxonomy of 14 misuse classes, applies them to 328 security-critical open-source Go projects, and reports discovering 7,473 misuses along with significant variations in tool coverage and prevalence insights.

Significance. If the empirical results hold after validation, the work would provide the first large-scale data on cryptographic API misuse prevalence in Go, a language dominant in security-critical infrastructure. The scale (328 projects) and systematic tool comparison offer practical value for security engineers selecting detectors and could guide future research on improving misuse detection coverage.

major comments (2)
  1. Abstract and Evaluation section: The central claims of 7,473 discovered misuses, prevalence insights, and 'significant variations in misuse coverage' rest entirely on raw detector outputs. No description is given of how detected misuses were validated, how false positives were handled, or the precise configuration of each tool, leaving the quantitative results only partially supported and potentially confounded by differing false-positive rates or alert styles.
  2. Taxonomy and methodology: The consolidated taxonomy of 14 misuse classes is used to aggregate and compare results, yet the paper provides no independent validation or mapping of these classes to actual Go crypto API usage patterns in the studied projects. This directly affects the reliability of the coverage comparison and prevalence distribution claims.
minor comments (2)
  1. The abstract states the study has 'immediate practical implications' for security engineers, but the manuscript would benefit from more concrete, actionable recommendations tied to the observed tool differences.
  2. Ensure all tool versions, query configurations, and project selection criteria are documented with sufficient detail for full reproducibility of the 328-project corpus and alert aggregation process.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. The comments highlight important aspects of result validation and taxonomy transparency that we will address through targeted revisions to improve clarity and support for our claims.

read point-by-point responses
  1. Referee: Abstract and Evaluation section: The central claims of 7,473 discovered misuses, prevalence insights, and 'significant variations in misuse coverage' rest entirely on raw detector outputs. No description is given of how detected misuses were validated, how false positives were handled, or the precise configuration of each tool, leaving the quantitative results only partially supported and potentially confounded by differing false-positive rates or alert styles.

    Authors: We agree that additional transparency is needed. The 7,473 figure represents the total alerts raised by the four tools (with deduplication across overlapping detections) when applied to the 328 projects. Our evaluation intentionally reports raw detector outputs to compare coverage as practitioners would encounter it. We did not perform exhaustive manual validation of all alerts due to scale, but we will revise the Evaluation section to: (1) specify the exact version and configuration parameters used for each tool (CodeQL, Gopher, Gosec, Snyk Code), (2) add a limitations subsection discussing potential false positives and alert-style differences, and (3) report results from a sampled manual review of 200 alerts to provide some empirical grounding. These changes will better contextualize the prevalence insights and coverage variations without altering the core comparative findings. revision: partial

  2. Referee: Taxonomy and methodology: The consolidated taxonomy of 14 misuse classes is used to aggregate and compare results, yet the paper provides no independent validation or mapping of these classes to actual Go crypto API usage patterns in the studied projects. This directly affects the reliability of the coverage comparison and prevalence distribution claims.

    Authors: The 14-class taxonomy was constructed by merging the misuse categories explicitly supported by the four detectors with patterns from prior cryptographic misuse studies. Each detected alert was then manually mapped to one of the 14 classes based on the flagged API call and context. We will expand the Taxonomy and Methodology sections to include: (1) the full derivation process with references to source taxonomies, (2) a table showing how each tool's alerts map to the classes, and (3) concrete examples of Go crypto API usage patterns (e.g., from the studied projects) for each class. This added detail will strengthen the justification for the aggregation and comparison. revision: partial

Circularity Check

0 steps flagged

No circularity: pure empirical evaluation with no derivations or self-referential claims

full rationale

This is an empirical tool-evaluation study that runs four external detectors (CodeQL, Gopher, Gosec, Snyk Code) on 328 external Go projects, counts alerts, and compares coverage under a 14-class taxonomy. No equations, fitted parameters, predictions, or first-principles derivations appear in the abstract or described methodology. Central quantitative claims (7,473 misuses, coverage variations) are direct aggregates of tool outputs rather than reductions to the paper's own inputs. Self-citations, if present, are not load-bearing for any uniqueness theorem or ansatz. The skeptic concern about missing ground-truth precision is a validity issue, not a circularity issue in the derivation chain. The study is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on two domain assumptions about tool representativeness and taxonomy completeness; no free parameters or invented entities are introduced.

axioms (2)
  • domain assumption The four selected tools represent the state-of-the-art for cryptographic API misuse detection in Go.
    Abstract describes them as '4 state-of-the-art tools' without further justification.
  • ad hoc to paper The consolidated taxonomy of 14 misuse classes is relevant and sufficiently complete for Go.
    The paper 'establishes a consolidated taxonomy' as a core contribution.

pith-pipeline@v0.9.0 · 5414 in / 1364 out tokens · 76690 ms · 2026-05-08T03:08:30.787609+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages

  1. [1]

    2025.CodeQL for Go (v1.1.13)

  2. [2]

    Miller, and Danfeng Yao

    Sharmin Afrose, Ya Xiao, Sazzadur Rahaman, Barton P. Miller, and Danfeng Yao

  3. [3]

    2023), 485–497

    Evaluation of Static Vulnerability Detection Tools With Java Cryptographic API Benchmarks.IEEE Transactions on Software Engineering49, 2 (Feb. 2023), 485–497. doi:10.1109/TSE.2022.3154717

  4. [4]

    Martin R Albrecht, Kenneth G Paterson, and Gaven J Watson. 2009. Plaintext recovery attacks against SSH. In2009 30th IEEE Symposium on Security and Privacy. IEEE, 16–26

  5. [5]

    Amit Seal Ami, Nathan Cooper, Kaushal Kafle, Kevin Moran, Denys Poshyvanyk, and Adwait Nadkarni. 2022. Why Crypto-detectors Fail: A Systematic Evaluation of Cryptographic Misuse Detection Techniques. In2022 IEEE Symposium on Security and Privacy (SP). 614–631. doi:10.1109/SP46214.2022.9833582

  6. [6]

    Alex Biryukov, Daniel Dinu, and Dmitry Khovratovich. 2016. Argon2: New Generation of Memory-Hard Functions for Password Hashing and Other Applications. In2016 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, Saarbrucken, 292–302. doi:10.1109/EuroSP.2016.31

  7. [7]

    Jinbao Chen, Boyao Ding, Yu Zhang, Qingwei Li, and Fugen Tang. 2025. An Empirical Study of Cgo Usage in Go Projects: Distribution, Purposes, Patterns and Critical Issues. (February 2025). SSRN:5153961 doi:10.2139/ssrn.5153961

  8. [8]

    Yikang Chen, Yibo Liu, Ka Lok Wu, Duc V Le, and Sze Yiu Chau. 2024. Towards Precise Reporting of Cryptographic Misuses. InProceedings 2024 Network and Distributed System Security Symposium. Internet Society, San Diego, CA, USA. doi:10.14722/ndss.2024.241032

  9. [9]

    CISA. 2014. SSL 3.0 POODLE Attack. https://www.cisa.gov/news-events/alerts/ 2014/10/17/ssl-30-protocol-vulnerability-and-poodle-attack. Accessed: 2025-05- 30

  10. [10]

    2025.gosec: Go Security Checker (v2.22.4)

    Cosmin Cojocar, Grant Murphy, and SecureGo Team. 2025.gosec: Go Security Checker (v2.22.4). https://github.com/securego/gosec

  11. [11]

    Russ Cox and Filippo Valsorda. [n. d.]. Secure Randomness in Go 1.22. https://go.dev/blog/chacha8rand. Accessed: 2025-05-30

  12. [12]

    Roya Ensafi, Philipp Winter, Abdullah Mueen, and Jedidiah R Crandall. 2015. Analyzing the Great Firewall of China over space and time.Proceedings on privacy enhancing technologies(2015)

  13. [13]

    Ehsan Firouzi, Mohammad Ghafari, and Mike Ebrahimi. 2024. ChatGPT’s Potential in Cryptography Misuse Detection: A Comparative Analysis with Static Analysis Tools. InProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. ACM, Barcelona Spain, 582–588. doi:10.1145/3674805.3695408

  14. [14]

    Miles Frantz, Ya Xiao, Tanmoy Sarkar Pias, Na Meng, and Danfeng Yao. 2024. Methods and Benchmark for Detecting Cryptographic API Misuses in Python. IEEE Transactions on Software Engineering50, 5 (May 2024), 1118–1129. doi:10. 1109/TSE.2024.3377182

  15. [15]

    Xiangxin Guo, Shijie Jia, Jingqiang Lin, Yuan Ma, Fangyu Zheng, Guangzheng Li, Bowen Xu, Yueqiang Cheng, and Kailiang Ji. 2024. CryptoPyt: Unraveling Python Cryptographic APIs Misuse with Precise Static Taint Analysis. In2024 Annual Computer Security Applications Conference (ACSAC). IEEE, 1075–1091

  16. [16]

    Hoffman and Bruce Schneier

    Paul E. Hoffman and Bruce Schneier. 2005. Attacks on Cryptographic Hashes in Internet Protocols. RFC 4270. doi:10.17487/RFC4270 Num Pages: 12

  17. [17]

    IETF. 2021. Deprecating TLS 1.0 and TLS 1.1. RFC 8996. https://datatracker.ietf. org/doc/rfc8996/

  18. [18]

    Michael Jones, John Bradley, and Nat Sakimura. 2015. JSON Web Token (JWT). RFC 7519. https://datatracker.ietf.org/doc/html/rfc7519

  19. [19]

    Stefan Krüger, Sarah Nadi, Michael Reif, Karim Ali, Mira Mezini, Eric Bodden, Florian Göpfert, Felix Günther, Christian Weinert, Daniel Demmler, et al. 2017. Cognicrypt: Supporting developers in using cryptography. In2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 931–936

  20. [20]

    Wenqing Li, Shijie Jia, Limin Liu, Fangyu Zheng, Yuan Ma, and Jingqiang Lin. 2022. Cryptogo: Automatic detection of go cryptographic api misuses. InProceedings of the 38th Annual Computer Security Applications Conference. 318–331

  21. [21]

    Zohaib Masood and Miguel Vargas Martin. 2024. Beyond Static Tools: Evaluating Large Language Models for Cryptographic Misuse Detection.arXiv preprint arXiv:2411.09772(2024)

  22. [22]

    MITRE. [n. d.]. CWE-1204: Generation of Weak Initialization Vector (IV). https: //cwe.mitre.org/data/definitions/1204.html. Accessed: 2025-05-30

  23. [23]

    Seyedehzahra Mosavi, Chadni Islam, Muhammad Ali Babar, Sharif Abuadbba, and Kristen Moore. 2023. Detecting Misuse of Security APIs: A Systematic Review. Comput. Surveys(2023)

  24. [24]

    Sarah Nadi, Stefan Krüger, Mira Mezini, and Eric Bodden. 2016. Jumping through hoops: Why do Java developers struggle with cryptography APIs?. InProceedings of the 38th International Conference on Software Engineering. 935–946

  25. [25]

    National Institute of Standards and Technology. 2025. NVD - CVE-2025-66491. https://nvd.nist.gov/vuln/detail/CVE-2025-66491. Accessed: 2026-01-25

  26. [26]

    Nikhil Patnaik, Joseph Hallett, and Awais Rashid. 2019. Usability Smells: An Analysis of {Developers’} Struggle With Crypto Libraries. InFifteenth Symposium on Usable Privacy and Security (SOUPS 2019). 245–257

  27. [27]

    Luca Piccolboni, Giuseppe Di Guglielmo, Luca P Carloni, and Simha Sethumad- havan. 2021. Crylogger: Detecting crypto misuses dynamically. In2021 IEEE Symposium on Security and Privacy (SP). IEEE, 1972–1989

  28. [28]

    Niels Provos and David Mazieres. 1999. A future-adaptable password scheme.. In USENIX annual technical conference, FREENIX track, Vol. 1999. 81–91

  29. [29]

    Sazzadur Rahaman, Ya Xiao, Sharmin Afrose, Fahad Shaon, Ke Tian, Miles Frantz, Murat Kantarcioglu, and Danfeng Yao. 2019. Cryptoguard: High precision detection of cryptographic vulnerabilities in massive-sized java projects. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. 2455–2472

  30. [30]

    2025.Snyk Code for Go (v1.1297.1)

    Snyk. 2025.Snyk Code for Go (v1.1297.1). https://docs.snyk.io/supported- languages-package-managers-and-frameworks/go

  31. [31]

    NIST Special Publication 800-38A, National Institute of Standards and Technology, Gaithersburg, MD (2001).https://doi.org/10.6028/NIST.SP

    Meltem Sonmez Turan. 2024.Keyed-Hash Message Authentication Code (HMAC): Specification of HMAC and Recommendations for Message Authentication. Technical Report NIST SP 800-224 ipd. National Institute of Standards and Technology, Gaithersburg, MD. NIST SP 800–224 ipd pages. doi:10.6028/NIST.SP. 800-224.ipd

  32. [32]

    The Go Team. 2024. x/crypto/ssh: misuse of ServerConfig.PublicKeyCallback may cause authorization bypass. https://github.com/golang/go/issues/70779. Accessed: 2025-05-30

  33. [33]

    Anna-Katharina Wickert, Lars Baumgärtner, Florian Breitfelder, and Mira Mezini

  34. [34]

    InProceedings of the 15th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)

    Python crypto misuses in the wild. InProceedings of the 15th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). 1–6

  35. [35]

    Anna-Katharina Wickert, Lars Baumgärtner, Michael Schlichtig, Krishna Narasimhan, and Mira Mezini. 2022. To fix or not to fix: a critical study of crypto- misuses in the wild. In2022 IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). IEEE, 315–322

  36. [36]

    Yifan Xia, Zichen Xie, Peiyu Liu, Kangjie Lu, Yan Liu, Wenhai Wang, and Shouling Ji. 2024. Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs.arXiv preprint arXiv:2407.16576(2024)

  37. [37]

    Ying Zhang, Md Mahir Asef Kabir, Ya Xiao, Danfeng Yao, and Na Meng. 2022. Automatic detection of Java cryptographic API misuses: Are we there yet?IEEE Transactions on Software Engineering49, 1 (2022), 288–303

  38. [38]

    Yuexi Zhang, Bingyu Li, Jingqiang Lin, Linghui Li, Jiaju Bai, Shijie Jia, and Qianhong Wu. 2024. Gopher: High-Precision and Deep-Dive Detection of Cryptographic API Misuse in the Go Ecosystem. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security. 2978–2992. 8