Weaponizing the Commons: A Taxonomy and Detection Framework of Abuse on GitHub
Pith reviewed 2026-05-10 04:32 UTC · model grok-4.3
The pith
GitHub abuse behaviors can be organized into a taxonomy of symptoms and root causes to support one unified detection framework that flags every category on repositories and accounts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors review existing reports of GitHub abuse and analyze 392 publicly available cases to create a taxonomy that groups behaviors by their observable symptoms and underlying causes viewed through software security. They then design a unified detection framework that applies this taxonomy to identify every category across both repositories and user accounts. When tested on the labeled dataset, the framework reaches F1-scores above 89 percent for all categories, providing a systematic method for future large-scale study of the platform.
What carries the argument
The taxonomy of abuse categories, defined by symptoms and root causes, which directly structures the unified detection framework that operates on both repositories and accounts.
If this is right
- All abuse categories can be identified with a single system rather than separate tools for each type.
- Large-scale, systematic scanning of GitHub becomes feasible for security monitoring.
- Software supply chain security improves through earlier detection of malicious repository or account activity.
- Security teams gain a shared vocabulary for discussing and responding to GitHub misuse.
Where Pith is reading between the lines
- Organizations that depend on GitHub could run the framework internally to flag risky dependencies before they enter production.
- The same taxonomy structure could be tested on other code-hosting platforms to see whether similar abuse patterns appear.
- Platform operators might incorporate the detection logic into automated moderation to reduce manual review workload.
- Repeated application over time would reveal whether new abuse forms emerge outside the current categories.
Load-bearing premise
The 392 hand-labeled GitHub instances are representative of the full range of abuse and the taxonomy built from them captures every important symptom and cause without significant omissions.
What would settle it
A new collection of several hundred recent GitHub abuse cases, labeled independently, contains many examples that fit none of the taxonomy categories or on which the detection framework scores below 80 percent F1.
Figures
read the original abstract
GitHub plays a critical role in modern software supply chains, making its security an important research concern. Existing studies have primarily focused on CI/CD automation, collaboration patterns, and community management, while abuse behaviors on GitHub have received little systematic investigation. In this paper, we systematically review and summarize reported GitHub abuse behaviors and conduct an empirical analysis of publicly available abuse cases, curating a manually labeled dataset of 392 GitHub instances. Based on this investigation, we propose a comprehensive taxonomy that characterizes their diverse symptoms and root causes from a software security perspective. Building on this taxonomy, we develop a unified detection framework capable of identifying all abuse categories across repositories and user accounts. Evaluated on the constructed dataset, the proposed framework achieves high performance across all categories (e.g., F1-score exceeding 89%). Collectively, this work advances the understanding of GitHub abuse behaviors and lays the groundwork for large-scale, systematic analysis of the GitHub platform to strengthen software supply chain security.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reviews reported GitHub abuse behaviors, curates a manually labeled dataset of 392 instances, derives a taxonomy of symptoms and root causes from a software security perspective, and presents a unified detection framework that identifies all abuse categories across repositories and accounts. The framework is evaluated on the constructed dataset and reports F1-scores exceeding 89% across categories.
Significance. If the taxonomy proves comprehensive and the framework generalizes beyond the construction data, the work would fill a notable gap in systematic study of GitHub abuse and supply-chain security threats. The empirical curation of public cases and the attempt at a unified detector are positive steps; however, the absence of independent validation or external test data limits the demonstrated contribution to a proof-of-concept on a small, internally derived set.
major comments (3)
- [Abstract and evaluation section] Abstract and dataset section: the central performance claim (F1 > 89% across all categories) is evaluated exclusively on the same 392 manually labeled instances used to build the taxonomy, with no mention of train/test splits, cross-validation, or held-out data. This makes it impossible to assess whether the reported scores reflect generalization or simply fit to the construction set.
- [Dataset and taxonomy sections] Dataset construction: no information is supplied on sampling method, selection criteria for the 392 instances, labeling guidelines, number of raters, or inter-rater agreement metrics. These details are load-bearing for both the taxonomy's claimed comprehensiveness and the framework's reliability.
- [Detection framework and evaluation sections] Framework evaluation: the abstract states the detector identifies 'all abuse categories' with high performance, yet supplies no baseline comparisons, ablation studies, or analysis of false-positive rates on non-abuse repositories. Without these, the 'unified' and 'high-performance' claims cannot be properly evaluated.
minor comments (2)
- [Abstract] The abstract refers to 'publicly available abuse cases' but does not cite the specific sources or repositories used; adding explicit references would improve traceability.
- [Taxonomy section] Notation for abuse categories in the taxonomy could be clarified with a summary table early in the paper to help readers track the mapping to the detection features.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback. We address each major comment point by point below, indicating where revisions will be made to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract and evaluation section] Abstract and dataset section: the central performance claim (F1 > 89% across all categories) is evaluated exclusively on the same 392 manually labeled instances used to build the taxonomy, with no mention of train/test splits, cross-validation, or held-out data. This makes it impossible to assess whether the reported scores reflect generalization or simply fit to the construction set.
Authors: We acknowledge that the reported F1 scores were obtained by applying the detection framework to the full set of 392 instances used to derive the taxonomy. The framework is a rule-based system that directly encodes the symptoms and root causes identified in the taxonomy, rather than a statistical or machine-learning model trained on the data. Consequently, the metrics reflect the framework's coverage of the defined abuse categories on the known cases rather than overfitting in the conventional sense. To address the concern about generalization, we will add k-fold cross-validation results (splitting the labeled instances) and a dedicated limitations subsection discussing the proof-of-concept nature of the current evaluation in the revised manuscript. revision: partial
-
Referee: [Dataset and taxonomy sections] Dataset construction: no information is supplied on sampling method, selection criteria for the 392 instances, labeling guidelines, number of raters, or inter-rater agreement metrics. These details are load-bearing for both the taxonomy's claimed comprehensiveness and the framework's reliability.
Authors: We agree that these methodological details are essential and were inadvertently omitted. In the revised dataset section we will explicitly describe: (1) the sampling sources and selection criteria (public GitHub security reports, advisories, and issue trackers), (2) the labeling guidelines used to map instances to taxonomy categories, and (3) the number of raters together with inter-rater agreement statistics (or a statement that labeling was performed by a single researcher with subsequent review). revision: yes
-
Referee: [Detection framework and evaluation sections] Framework evaluation: the abstract states the detector identifies 'all abuse categories' with high performance, yet supplies no baseline comparisons, ablation studies, or analysis of false-positive rates on non-abuse repositories. Without these, the 'unified' and 'high-performance' claims cannot be properly evaluated.
Authors: We concur that the evaluation would be more convincing with additional controls. In the revised evaluation section we will: (1) include baseline comparisons against simpler keyword-based detectors and any publicly available GitHub abuse tools, (2) present ablation results by disabling individual framework components, and (3) report false-positive rates on a newly sampled set of non-abuse repositories and accounts. These additions will allow readers to better assess the unified nature and practical performance of the framework. revision: yes
Circularity Check
Taxonomy derived from 392-instance dataset; detection framework evaluated on identical set with no held-out validation.
full rationale
The paper's core chain proceeds from systematic review of reported cases to manual curation of 392 labeled instances, taxonomy construction from that analysis, framework development on the taxonomy, and performance reporting on the same 392 instances. No mathematical derivations, parameter fitting, or self-citations appear in the provided text that would reduce any claim to an input by construction. The work relies on external public reports and manual labeling rather than internal loops, satisfying the default expectation of no significant circularity. The shared dataset for taxonomy and evaluation is a methodological limitation for generalizability but does not constitute a definitional or fitted-input reduction per the enumerated patterns.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Michael’s Blog
Michael’s Blog 2018.README Badges Are Vulnerabilities. Michael’s Blog. https: //movermeyer.com/2018-06-22-readme-badges-are-vulns/
2018
-
[2]
https://krebsonsecurity.com/2024/09/this-windows-powershell-phish-has- scary-potential/
2024.This Windows PowerShell Phish Has Scary Potential – Krebs on Secu- rity. https://krebsonsecurity.com/2024/09/this-windows-powershell-phish-has- scary-potential/
2024
-
[3]
GitHub Docs
GitHub Docs 2026.Creating a Commit with Multiple Authors. GitHub Docs. https://docs.github.com/en/pull-requests/committing-changes-to-your- project/creating-and-editing-commits/creating-a-commit-with-multiple- authors/ Accessed: 2026-01-13
2026
-
[4]
2024.How Do Software Engi- neering Researchers Use GitHub? An Empirical Study of Artifacts & Impact
Kamel Alrashedy and Ahmed Binjahlan. 2024.How Do Software Engi- neering Researchers Use GitHub? An Empirical Study of Artifacts & Impact. arXiv:2310.01566 [cs] doi:10.48550/arXiv.2310.01566
-
[5]
Hamilton, Jason Holdsworth, and SingWhat Tee
Mohammad Azeez Alshomali, John R. Hamilton, Jason Holdsworth, and SingWhat Tee. 2017. GitHub: Factors Influencing Project Activity Levels. In Proceedings of the 17th International Conference on Electronic Business (ICEB). ICEB, Dubai, UAE, 116–124
2017
-
[6]
L. Alvisi, A. Clement, A. Epasto, S. Lattanzi, and A. Panconesi. 2013. SoK: The Evolution of Sybil Defense via Social Networks. In2013 IEEE Symposium on Security and Privacy(2013-05). IEEE, 382–396. doi:10.1109/SP.2013.33
-
[7]
2024.Reputation Farming in OSS: A Threat to Building Trust
Kumar Ashwin. 2024.Reputation Farming in OSS: A Threat to Building Trust. https://krash.dev/posts/reputation-farming/
2024
-
[8]
Mohamed Amine Batoun, Ka Lai Yung, Yuan Tian, and Mohammed Sayagh. 2023. An Empirical Study on GitHub Pull Requests’ Reactions. 32, 6 (2023), 146:1–146:35. doi:10.1145/3597208
-
[9]
2017.On the Popularity of GitHub Applications: A Preliminary Note
Hudson Borges, Marco Tulio Valente, Andre Hora, and Jailton Coelho. 2017.On the Popularity of GitHub Applications: A Preliminary Note. arXiv:1507.00604 [cs] doi:10.48550/arXiv.1507.00604
- [10]
-
[11]
Durgesh Firake and Bhushan Wakode. 2025. Machine Learning-Based Spam Filter for GitHub Repository Issues. InIndian Journal of Technical Education (Special Issue), Y. R. M. Rao, Jyoti Sekhar Banerjee, and Rajeshree D. Raut (Eds.). Indian Society for Technical Education, New Delhi, India, 249–256. https://isteconline.in/ Special Issue on Technical Education
2025
-
[12]
Michael Fire, Dima Kagan, Aviad Elyashar, and Yuval Elovici. 2014. Friend or Foe? Fake Profile Identification in Online Social Networks. 4, 1 (2014), 194. doi:10.1007/s13278-014-0194-4
-
[13]
2024.New Technique Detected in an Open-Source Supply Chain Attack
Yehuda Gelb. 2024.New Technique Detected in an Open-Source Supply Chain Attack. Checkmarx. https://checkmarx.com/blog/new-technique-to-trick-developers- detected-in-an-open-source-supply-chain-attack/#:~:text=In%20a%20recent% 20attack%20campaign,crafted%20repositories%20to%20distribute%20malware
2024
-
[14]
2021.Anomalicious: Automated Detection of Anomalous and Potentially Malicious Commits on GitHub
Danielle Gonzalez, Thomas Zimmermann, Patrice Godefroid, and Max Schaefer. 2021.Anomalicious: Automated Detection of Anomalous and Potentially Malicious Commits on GitHub. arXiv:2103.03846 [cs] doi:10.48550/arXiv.2103.03846
-
[15]
2024.OpenSSF Warns of Reputation Farming Leveraging Closed GitHub..Socket
Sarah Gooding. 2024.OpenSSF Warns of Reputation Farming Leveraging Closed GitHub..Socket. https://socket.dev/blog/openssf-warns-of-reputation-farming- using-closed-github-issues-and-prs
2024
-
[16]
Hao He, Haoqin Yang, Philipp Burckhardt, Alexandros Kapravelos, Bogdan Vasilescu, and Christian Kästner. 2025.Six Million (Suspected) Fake Stars in GitHub: A Growing Spiral of Popularity Contests, Spams, and Malware. arXiv:2412.13459 [cs] doi:10.1145/3744916.3764531
-
[17]
Jan Hensel. 2024. Survey of Automated Agents and Spam on GitHub. https://hensel.dev/papers/github-sbots-analysis-2024/hensel-sbots-github- analysis-2024.pdf. Online technical report
2024
-
[18]
2022.Fake GitHub Commits Can Trick Developers into Using Malicious Code | Cybersecurity Dive
Matt Kapko. 2022.Fake GitHub Commits Can Trick Developers into Using Malicious Code | Cybersecurity Dive. https://www.cybersecuritydive.com/news/github- commits-malicious-code/627466/
2022
-
[19]
2024.Hackers Are Abusing GitHub’s Search Function to Spread Malware
Solomon Klappholz. 2024.Hackers Are Abusing GitHub’s Search Function to Spread Malware. IT Pro. https://www.itpro.com/security/hackers-are-abusing-githubs- search-function-to-spread-malware
2024
-
[20]
2023.5 Ways Attackers Fool Victims with Fake GitHub Profiles
Alik Koldobsky. 2023.5 Ways Attackers Fool Victims with Fake GitHub Profiles. Medium. https://zero.checkmarx.com/5-easy-ways-attackers-fool-victims-with- fake-github-profiles-8e8f4199598a
2023
-
[21]
2024.Beware: GitHub’s Fake Popularity Scam Tricking De- velopers into Downloading Malware
Ravie Lakshmanan. 2024.Beware: GitHub’s Fake Popularity Scam Tricking De- velopers into Downloading Malware. The Hacker News. https://thehackernews. com/2024/04/beware-githubs-fake-popularity-scam.html
2024
-
[22]
2022.Technical Debt Management in OSS Projects: An Empirical Study on GitHub
Zengyang Li, Yilin Peng, Peng Liang, Apostolos Ampatzoglou, Ran Mo, Hui Liu, and Xiaoxiao Qi. 2022.Technical Debt Management in OSS Projects: An Empirical Study on GitHub. arXiv:2212.05537 [cs] doi:10.48550/arXiv.2212.05537
-
[23]
Antonio Lima, Luca Rossi, and Mirco Musolesi. 2014.Coding Together at Scale: GitHub as a Collaborative Social Network. arXiv:1407.2535 [cs] doi:10.48550/arXiv. 1407.2535
work page internal anchor Pith review doi:10.48550/arxiv 2014
-
[24]
2024.Millions of Fake Repositories Found on GitHub: What Developers Need to Know
Rounak Majumdar. 2024.Millions of Fake Repositories Found on GitHub: What Developers Need to Know. TechStory. https://techstory.in/millions-of-fake- repositories-found-on-github-what-developers-need-to-know/
2024
-
[25]
Kari McMahon. 2023. The GitHub Black Market That Helps Coders Cheat the Popularity Contest. (2023). https://www.wired.com/story/github-stars-black- market-coders-cheat/
2023
-
[26]
Marc Ohm, Henrik Plate, Arnold Sykosch, and Michael Meier. 2020. Backstabber’s Knife Collection: A Review of Open Source Software Supply Chain Attacks. 12223 (2020), 23–43. pubmed:null doi:10.1007/978-3-030-52683-2_2
-
[27]
Schorlemmer, Santiago Torres-Arias, and James C
Chinenye Okafor, Taylor R. Schorlemmer, Santiago Torres-Arias, and James C. Davis. 2022. SoK: Analysis of Software Supply Chain Security by Establishing Secure Design Properties. InProceedings of the 2022 ACM Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses(New York, NY, USA, 2022-11-08)(SCORED’22). Association for Computing Mac...
-
[28]
Ziyue Pan, Wenbo Shen, Xingkai Wang, Yutian Yang, Rui Chang, Yao Liu, Cheng- wei Liu, Yang Liu, and Kui Ren. 2024. Ambush from All Sides: Understanding Security Threats in Open-Source Software CI/CD Pipelines. 21, 1 (2024), 403–418. arXiv:2401.17606 [cs] doi:10.1109/TDSC.2023.3253572
-
[29]
Sk Golam Saroar, Waseefa Ahmed, and Maleknaz Nayebi. 2022.GitHub Mar- ketplace for Practitioners and Researchers to Date: A Systematic Analysis of the Knowledge Mobilization Gap in Open Source Software Automation. arXiv.org. https://arxiv.org/abs/2208.00332v1
-
[30]
Thomas Schlienger and Stephanie Teufel. 2003. Analyzing Information Security Culture: Increased Trust by an Appropriate Information Security Culture. In International Workshop on Trust and Privacy in Digital Business (TrustBus’03) in conjunction with the 14th International Conference on Database and Expert Systems Applications (DEXA 2003). 405–409. doi:10...
-
[31]
2024.Clever ’GitHub Scanner’ Campaign Abusing Repos to Push Malware
Ax Sharma. 2024.Clever ’GitHub Scanner’ Campaign Abusing Repos to Push Malware. https://www.bleepingcomputer.com/news/security/clever-github- scanner-campaign-abusing-repos-to-push-malware/
2024
- [32]
-
[33]
Taro Tsuchiya, Alejandro Cuevas, Thomas Magelinski, and Nicolas Christin. 2023. Misbehavior and Account Suspension in an Online Financial Communication Platform. InProceedings of the ACM Web Conference 2023(Austin TX USA, 2023- 04-30). ACM, 2686–2697. doi:10.1145/3543507.3583385
-
[34]
2025.Fraudsters Use Fake Stars to Game GitHub - Software and Societal Systems Department - School of Computer Science - Carnegie Mellon University
Carnegie Mellon University. 2025.Fraudsters Use Fake Stars to Game GitHub - Software and Societal Systems Department - School of Computer Science - Carnegie Mellon University. http://cms-staging.andrew.cmu.edu/s3d/news/2025/0903- github-stars.html
2025
-
[35]
Alejandro E. D. Cuevas V. 2025.Measuring the Impact of Profile Signals on Online Platform Integrity and User Safety. Doctoral dissertation. Carnegie Mellon University, School of Computer Science, Software and Societal Systems Program, Pittsburgh, PA, USA. Thesis Committee: Nicolas Christin (Chair), Bogdan Vasilescu, Sauvik Das, Rolf van Wegberg (TU Delft)...
2025
-
[36]
2024.The Hidden Costs of Automation: An Empirical Study on GitHub Actions Workflow Maintenance
Pablo Valenzuela-Toledo, Alexandre Bergel, Timo Kehrer, and Oscar Nierstrasz. 2024.The Hidden Costs of Automation: An Empirical Study on GitHub Actions Workflow Maintenance. arXiv:2409.02366 [cs] doi:10.48550/arXiv.2409.02366
- [37]
-
[38]
Han Wang, Sijia Yu, Chunyang Chen, Burak Turhan, and Xiaodong Zhu. 2024. Beyond Accuracy: An Empirical Study on Unit Testing in Open-Source Deep Learning Projects. 33, 4 (2024), 1–22. arXiv:2402.16546 [cs] doi:10.1145/3638245
-
[39]
Shao-Fang Wen, Mazaher Kianpour, and Stewart Kowalski. 2020. An Empirical Study of Security Culture in Open Source Software Communities. InProceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining(New York, NY, USA, 2020-01-15)(ASONAM ’19). Association for Computing Machinery, 863–870. doi:10.1145/3341161.3343520
-
[40]
Laurie Williams, Giacomo Benedetti, Sivana Hamer, Ranindya Paramitha, Imra- nur Rahman, Mahzabin Tamanna, Greg Tystahl, Nusrat Zahan, Patrick Morrison, Yasemin Acar, Michel Cukier, Christian Kästner, Alexandros Kapravelos, Dominik Wermke, and William Enck. 2025. Research Directions in Software Supply Chain Security. 34, 5 (2025), 146:1–146:38. doi:10.1145/3714464
-
[41]
2024.Watch the Typo: Our PoC Exploit for Typosquatting in GitHub Actions
Ofir Yakobi. 2024.Watch the Typo: Our PoC Exploit for Typosquatting in GitHub Actions. Orca Security. https://orca.security/resources/blog/typosquatting-in- github-actions/
2024
-
[42]
Dong Yuan, Yuanli Miao, Neil Gong, Zheng Yang, Qi Li, Dawn Song, Qian Wang, and Xiao Liang. 2019. Detecting Fake Accounts in Online Social Networks at the Time of Registrations. 1423–1438. doi:10.1145/3319535.3363198
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.