Recognition: no theorem link
Investigating Notable Metadata Practices in PyPI Libraries: An Empirical Study about Repository and Donation Platform URLs
Pith reviewed 2026-05-16 12:03 UTC · model grok-4.3
The pith
PyPI maintainers most often leave out repository links due to oversight and skip donation links because of skepticism about their value.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The study finds that missing or outdated repository links arise mainly from oversight, lack of awareness, or perceived irrelevance, while donation platform links are omitted chiefly due to skepticism, limited perceived benefit, or lack of knowledge. Platform dominance for these links is shaped by ideological, technical, and organizational factors, with GitHub preferred for visibility. The LLM topic modeling pipeline using LLaMA 3.3 70B proved robust, reaching up to 88 percent lexical and 92 percent semantic similarity across 30 runs, and produced topics judged high-quality by experts in roughly 77-78 percent of cases.
What carries the argument
The LLM-based topic modeling pipeline that preprocesses survey responses, extracts topics with LLaMA 3.3 70B, merges similar topics, and evaluates stability via Jaccard and cosine similarity plus expert assessment.
If this is right
- Dependency monitoring tools could flag and auto-suggest missing repository links based on the dominant oversight pattern.
- Donation platforms would gain more entries if they offered one-click GitHub integration rather than separate profile setup.
- Security scanners would encounter fewer broken or stale references once maintainers are prompted about link relevance.
- Platform maintainers could reduce inconsistency by surfacing visibility statistics that show GitHub's advantage for donation links.
Where Pith is reading between the lines
- The same oversight pattern may appear in other package registries such as npm or crates.io, suggesting a shared tooling opportunity across ecosystems.
- If awareness prompts prove effective in a follow-up experiment, they could be built directly into the PyPI upload workflow to raise metadata completeness without extra maintainer effort.
- The preference for GitHub placement implies that donation platforms might increase adoption by allowing direct embedding of donation buttons on repository README files.
- High robustness of the LLM pipeline indicates it could be reused on similar open-ended maintainer surveys in other software-engineering domains with modest additional validation.
Load-bearing premise
The 1,776 self-reported survey answers accurately capture the real motivations and practices of PyPI maintainers without systematic bias or social-desirability effects.
What would settle it
A controlled follow-up that tracks whether sending maintainers simple reminders about repository links actually increases the number of valid, up-to-date links in their PyPI entries within six months.
Figures
read the original abstract
Background: Open source software (OSS) libraries are critical components of modern software systems, yet their metadata-particularly links to source code repositories and donation platforms-is often incomplete, outdated, or inconsistent. Such deficiencies hinder dependency monitoring, security assessment, and the sustainability of OSS projects. Aims: This study aims to explain notable metadata practices in PyPI libraries, focusing on platform dominance, outdated links, and missing references to repositories and donation platforms. As this investigation relies on large-scale qualitative survey data, we further evaluate the robustness and quality of the LLM-based topic modeling approach used to derive the findings. Method: We conducted two surveys targeting PyPI authors and maintainers, collecting 1,776 open-ended responses. To analyze these responses, we developed a LLM-based topic modeling pipeline using LLaMA 3.3 70B, including preprocessing, topic extraction, and topic merging. Robustness was assessed across 30 repeated runs using Jaccard and cosine similarity, while topic quality was evaluated by 23 experts using a structured assessment framework and Randolph's Kappa. Results: The findings reveal that missing or outdated repository links are primarily associated with oversight, lack of awareness, or perceived irrelevance, while platform dominance is driven by ideological, technical, and organizational factors. Donation platform links are often omitted due to skepticism, limited perceived benefit, or lack of knowledge, and are preferentially placed on GitHub for visibility reasons. The topic modeling approach demonstrated high robustness (up to 88% lexical and 92% semantic similarity) and produced high-quality topics, with approximately 77-78% meeting all evaluation criteria and moderate inter-rater agreement.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents an empirical study of metadata practices in PyPI libraries, focusing on repository and donation platform URLs. Through two surveys yielding 1,776 open-ended responses from authors and maintainers, and an LLM-based topic modeling pipeline using LLaMA 3.3 70B, it identifies key reasons for incomplete metadata (e.g., oversight for repos, skepticism for donations) and evaluates the robustness of the analysis method via repeated runs and expert review.
Significance. If the sampling and validation concerns are addressed, the work provides valuable insights into maintainer behaviors affecting OSS metadata quality, with implications for dependency monitoring, security, and sustainability in package ecosystems. The methodological assessment of the LLM pipeline (robustness metrics and expert ratings) is a strength that could inform future qualitative SE research.
major comments (2)
- [Method] Method section: The survey description provides no sampling frame, total population size, response rate, or analysis of non-response bias. This undermines the generalizability of the central claims about motivations (oversight for repository links; skepticism for donations) drawn from the 1,776 responses.
- [Method and Results] Method and Results: The LLM topic modeling is validated only for internal robustness (Jaccard/cosine across 30 runs) and expert-perceived coherence (77-78% high-quality topics, moderate Kappa), but lacks comparison to human-coded ground truth or direct fidelity checks against actual maintainer interpretations, leaving open the risk of model-induced artifacts in the extracted themes.
minor comments (1)
- [Abstract] Abstract: Clarify whether the 'two surveys' are distinct instruments or differ in targeting (authors vs. maintainers) and how responses were combined.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our methodological approach. We address each major comment below and indicate the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [Method] Method section: The survey description provides no sampling frame, total population size, response rate, or analysis of non-response bias. This undermines the generalizability of the central claims about motivations (oversight for repository links; skepticism for donations) drawn from the 1,776 responses.
Authors: We acknowledge this limitation in the current manuscript. The surveys were distributed by extracting author contact information from PyPI package metadata for packages meeting inclusion criteria and sending targeted invitations, but the dynamic and decentralized nature of the ecosystem prevented precise tracking of the total contacted population or a formal response rate calculation. In the revised version, we will expand the Method section to detail the distribution process, report the number of packages from which authors were contacted where feasible, and add an explicit limitations paragraph discussing generalizability and potential non-response bias. We will also qualify the central claims to reflect these constraints while retaining the value of the observed response patterns. revision: yes
-
Referee: [Method and Results] Method and Results: The LLM topic modeling is validated only for internal robustness (Jaccard/cosine across 30 runs) and expert-perceived coherence (77-78% high-quality topics, moderate Kappa), but lacks comparison to human-coded ground truth or direct fidelity checks against actual maintainer interpretations, leaving open the risk of model-induced artifacts in the extracted themes.
Authors: We agree that a direct comparison to human-coded ground truth would provide stronger validation. Given the scale of 1,776 responses, exhaustive human coding was not feasible with available resources; the 23-expert review was selected as a practical external quality check on coherence and relevance. We will revise the Method and Results sections to explicitly discuss this limitation, clarify the rationale for the chosen robustness metrics and expert evaluation, and note the potential for model-induced artifacts. If space and resources allow, we will also include a small-scale pilot comparison between LLM-derived topics and human coding on a subset of responses. revision: yes
Circularity Check
No circularity: purely empirical survey with external validation
full rationale
The paper conducts two surveys yielding 1,776 open-ended responses from PyPI maintainers, then applies an LLM topic modeling pipeline (LLaMA 3.3 70B) whose outputs are checked for robustness via 30 repeated runs (Jaccard/cosine similarity) and quality via 23-expert structured ratings plus Randolph's Kappa. No equations, fitted parameters, predictions, or derivations exist that could reduce claims to inputs by construction. Central findings on motivations for missing/outdated links derive directly from the validated topic extractions rather than any self-referential loop or self-citation load-bearing premise. The methodology is self-contained against external benchmarks (expert review, repeated-run metrics) with no ansatz smuggling, uniqueness theorems, or renaming of known results.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Survey responses from PyPI authors and maintainers accurately reflect their metadata practices and motivations
- domain assumption The LLM-based topic modeling pipeline reliably extracts meaningful and stable themes from qualitative text
Reference graph
Works this paper leans on
-
[1]
2026. Replication Package. (1 2026). https://figshare.com/s/ b510f5274eb1333deb95
work page 2026
-
[2]
Rabe Abdalkareem, Vinicius Oda, Suhaib Mujahid, and Emad Shihab. 2020. On the impact of using trivial packages: An empirical case study on npm and pypi. Empirical Software Engineering25 (2020), 1168–1204
work page 2020
-
[3]
Aly Abdelrazek, Yomna Eid, Eman Gawish, Walaa Medhat, and Ahmed Hassan
-
[4]
Topic modeling algorithms and applications: A survey.Information Systems 112 (2023), 102131
work page 2023
- [5]
-
[6]
Apache Software Foundation. 2021. Log4j 2. https://logging.apache.org/log4j/2.x/ Accessed: 2026-01-21
work page 2021
-
[7]
Sebastian Baltes, Florian Angermeir, Chetan Arora, Marvin Muñoz Barón, Chun- yang Chen, Lukas Böhme, Fabio Calefato, Neil Ernst, Davide Falessi, Brian Fitzgerald, et al. 2025. Evaluation Guidelines for Empirical Studies in Software Engineering involving LLMs.arXiv preprint arXiv:2508.15503(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[8]
Sebastian Baltes and Paul Ralph. 2022. Sampling in software engineering research: A critical review and guidelines.Empirical Software Engineering27, 4 (2022), 94
work page 2022
-
[9]
Veronika Bauer, Lars Heinemann, and Florian Deissenboeck. 2012. A structured approach to assess third-party library usage. In2012 28th IEEE International Conference on Software Maintenance (ICSM). IEEE, 483–492
work page 2012
-
[10]
Isabele Bittencourt, Aparna S Varde, and Pankaj Lal. 2024. Opinion mining on offshore wind energy for environmental engineering. InInternational IOT, Electronics and Mechatronics Conference. Springer, 487–505
work page 2024
-
[11]
David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of machine Learning research3, Jan (2003), 993–1022
work page 2003
-
[12]
SRBH Chaturvedi and RC Shweta. 2015. Evaluation of inter-rater agreement and inter-rater reliability for observational data: an overview of concepts and methods.Journal of the Indian Academy of Applied Psychology41, 3 (2015), 20–27
work page 2015
-
[13]
Weisi Chen, Fethi Rabhi, Wenqi Liao, and Islam Al-Qudah. 2023. Leveraging state-of-the-art topic modeling for news impact analysis on financial markets: a comparative study.Electronics12, 12 (2023), 2605
work page 2023
-
[14]
Gerard Chung, Maria Rodriguez, Paul Lanier, and Daniel Gibbs. 2022. Text- mining open-ended survey responses using structural topic modeling: A practical demonstration to understand parents’ coping methods during the COVID-19 pandemic in Singapore.Journal of Technology in Human Services40, 4 (2022), 296–318
work page 2022
-
[15]
Rob Churchill and Lisa Singh. 2022. The evolution of topic modeling.Comput. Surveys54, 10s (2022), 1–35
work page 2022
-
[16]
Russ Cox. 2019. Surviving software dependencies.Commun. ACM62, 9 (2019), 36–43
work page 2019
-
[17]
Russ Cox. 2025. Fifty Years of Open Source Software Supply Chain Security: For decades, software reuse was only a lofty goal. Now it’s very real.Queue23, 1 (2025), 84–107
work page 2025
-
[18]
Alexandre Decan, Tom Mens, and Maelick Claes. 2016. On the topology of package dependency networks: A comparison of three programming language ecosystems. InProccedings of the 10th european conference on software architecture workshops. 1–4
work page 2016
-
[19]
Alexandre Decan, Tom Mens, and Eleni Constantinou. 2018. On the impact of security vulnerabilities in the npm package dependency network. InProceedings of the 15th international conference on mining software repositories. 181–191
work page 2018
-
[20]
Tomoki Doi, Masaru Isonuma, and Hitomi Yanaka. 2024. Topic modeling for short texts with large language models. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop). 21–33
work page 2024
-
[21]
Christof Ebert. 2008. Open source software in industry.IEEE Software25, 3 (2008), 52–53
work page 2008
-
[22]
Roman Egger and Joanne Yu. 2022. A topic modeling comparison between lda, nmf, top2vec, and bertopic to demystify twitter posts.Frontiers in sociology7 (2022), 886498
work page 2022
-
[23]
Youmei Fan, Tao Xiao, Hideaki Hata, Christoph Treude, and Kenichi Matsumoto
-
[24]
My GitHub Sponsors profile is live!
" My GitHub Sponsors profile is live!" Investigating the Impact of Twitter/X Mentions on GitHub Sponsors. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering. 1–12
-
[25]
Kai Gao, Weiwei Xu, Wenhao Yang, and Minghui Zhou. 2024. PyRadar: Towards Automatically Retrieving and Validating Source Code Repository Information for PyPI Packages.Proceedings of the ACM on Software Engineering1, FSE (2024), 2608–2631
work page 2024
-
[26]
2025.Octoverse: A new developer joins GitHub every second as AI leads TypeScript to #1
GitHub. 2025.Octoverse: A new developer joins GitHub every second as AI leads TypeScript to #1. https://github.blog/news-insights/octoverse/octoverse-a-new- developer-joins-github-every-second-as-ai-leads-typescript-to-1/ Accessed: 2026-01-21
work page 2025
-
[27]
GitHub Docs. 2023. Displaying a sponsor button in your repository. https://docs.github.com/en/repositories/managing-your-repositorys-settings- and-features/customizing-your-repository/displaying-a-sponsor-button-in- your-repository. Accessed: 2026-01-21
work page 2023
-
[28]
Maarten Grootendorst. 2022. BERTopic: Neural topic modeling with a class-based TF-IDF procedure.arXiv preprint arXiv:2203.05794(2022)
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[29]
Sonja Hahn, Ulf Kroehne, and Samuel Merk. 2024. Improving and analyzing open-ended survey responses: A case study linking psychological theories and analysis approaches for text data.Zeitschrift für Psychologie232, 3 (2024), 171
work page 2024
-
[30]
Clayton Hutto and Eric Gilbert. 2014. Vader: A parsimonious rule-based model for sentiment analysis of social media text. InProceedings of the international AAAI conference on web and social media, Vol. 8. 216–225
work page 2014
-
[31]
Indeed Engineering. 2019. FOSS Fund: Six Months In. https://engineering. indeedblog.com/blog/2019/07/foss-fund-six-months-in/. Accessed: 2026-01-21
work page 2019
- [32]
-
[33]
Dai Li, Bolun Zhang, and Yimang Zhou. 2023. Can Large Language Models (LLM) label topics from a topic model? (2023)
work page 2023
-
[34]
Poonacha K Medappa, Murat M Tunc, and Xitong Li. 2023. Sponsorship Funding in Open-Source Software: Effort Reallocation and Spillover Effects in Knowledge- Sharing Ecosystems.A vailable at SSRN 4484403(2023)
work page 2023
-
[35]
Microsoft. 2023. FOSS Fund. https://github.com/microsoft/foss-fund. Accessed: 2026-01-21
work page 2023
- [36]
- [37]
-
[38]
Ollama Contributors. 2025. Ollama Documentation. https://github.com/ollama/ ollama/tree/main/docs Accessed: 2026-01-21
work page 2025
-
[39]
OpenSSF. 2023. OpenSSF Scorecard. https://github.com/ossf/scorecard. Accessed: 2026-01-21
work page 2023
-
[40]
OpenSSF. 2024. XZ Backdoor (CVE-2024-3094). https://openssf.org/blog/2024/ 03/30/xz-backdoor-cve-2024-3094/. Accessed: 2026-01-21
work page 2024
-
[41]
2022.The Open Source Software Security Mobilization Plan
OpenSSF and LF. 2022.The Open Source Software Security Mobilization Plan. https://openssf.org/oss-security-mobilization-plan/ Accessed: 2026-01-21
work page 2022
-
[42]
Cassandra Overney, Jens Meinicke, Christian Kästner, and Bogdan Vasilescu
-
[43]
In Proceedings of the ACM/IEEE 42nd international conference on software engineering
How to not get rich: An empirical study of donations in open source. In Proceedings of the ACM/IEEE 42nd international conference on software engineering. 1209–1221
-
[44]
Andra-Selina Pietsch and Stefan Lessmann. 2018. Topic modeling for analyzing open-ended survey responses.Journal of Business Analytics1, 2 (2018), 93–116
work page 2018
-
[45]
Mike Pittenger. 2016. Open source security analysis: The state of open source security in commercial applications.Black Duck Software, Tech. Rep(2016)
work page 2016
-
[46]
Python Packaging Authority (PyPA). 2025. Packaging Python Projects: Con- figuring Metadata. https://packaging.python.org/en/latest/tutorials/packaging- projects/#configuring-metadata. Accessed: 2026-01-21
work page 2025
-
[47]
Python Packaging Authority (PyPA). 2025. PyPA Official Website. https://www. pypa.io/. Accessed: 2026-01-21
work page 2025
-
[48]
Steven Raemaekers, Arie van Deursen, and Joost Visser. 2011. Exploring risks in the usage of third-party libraries. Inof the BElgian-NEtherlands software eVOLution seminar. 31
work page 2011
-
[49]
Kristiina Rahkema and Dietmar Pfahl. 2022. SwiftDependencyChecker: Detecting Vulnerable Dependencies Declared Through CocoaPods, Carthage and Swift PM. In2022 IEEE/ACM 9th International Conference on Mobile Software Engineering and Systems (MobileSoft). IEEE, 107–111
work page 2022
-
[50]
Justus J Randolph. 2005. Free-Marginal Multirater Kappa (multirater K [free]): An Alternative to Fleiss’ Fixed-Marginal Multirater Kappa.Online submission (2005)
work page 2005
-
[51]
Nils Reimers and Iryna Gurevych. 2023. Sentence-Transformers: Pretrained Models Documentation. https://www.sbert.net/docs/sentence_transformer/ pretrained_models.html. Accessed: 2026-01-21
work page 2023
-
[52]
Emil Rijcken, Floortje Scheepers, Kalliopi Zervanou, Marco Spruit, Pablo Mosteiro, and Uzay Kaymak. 2023. Towards interpreting topic models with ChatGPT. InThe 20th World Congress of the International Fuzzy Systems Associa- tion
work page 2023
-
[53]
Margaret E Roberts, Brandon M Stewart, Dustin Tingley, Christopher Lucas, Jetson Leder-Luis, Shana Kushner Gadarian, Bethany Albertson, and David G Rand. 2014. Structural topic models for open-ended survey responses.American journal of political science58, 4 (2014), 1064–1082
work page 2014
-
[54]
Per Runeson and Martin Höst. 2009. Guidelines for conducting and reporting case study research in software engineering.Empirical software engineering14, 2 (2009), 131–164
work page 2009
-
[55]
Henry Sauermann and Michael Roach. 2013. Increasing web survey response rates in innovation research: An experimental study of static and dynamic contact design features.Research Policy42, 1 (2013), 273–286
work page 2013
-
[56]
Vida Sharifian-Attar, Suparna De, Sanaz Jabbari, Jenny Li, Harry Moss, and Jon Johnson. 2022. Analysing longitudinal social science questionnaires: topic modelling with BERT-based embeddings. In2022 IEEE international conference on big data (big data). IEEE, 5558–5567. Alexandros Tsakpinis, Nicolas Raube, and Alexander Pretschner
work page 2022
-
[57]
Naomichi Shimada, Tao Xiao, Hideaki Hata, Christoph Treude, and Kenichi Matsumoto. 2022. Github sponsors: exploring a new way to contribute to open source. InProceedings of the 44th international conference on software engineering. 1058–1069
work page 2022
-
[58]
Charlotte Siska, Katerina Marazopoulou, Melissa Ailem, and James Bono. 2024. Examining the robustness of LLM evaluation to the distributional assumptions of benchmarks. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 10406–10421
work page 2024
-
[59]
Snyk Security Team. 2022. The Colors and Faker NPM Packages Go Rogue. https://snyk.io/de/blog/open-source-npm-packages-colors-faker/. Accessed: 2026-01-21
work page 2022
-
[60]
Stripe. 2023. Why Stripe Sponsors Open Source. https://resources.github.com/ open-source/why-stripe-sponsors-open-source/. Accessed: 2026-01-21
work page 2023
-
[61]
2024.The 2024 Tidelift State of the Open Source Maintainer Report
Tidelift. 2024.The 2024 Tidelift State of the Open Source Maintainer Report. https://tidelift.com/open-source-maintainer-survey-2024 Accessed: 2026-01-21
work page 2024
-
[62]
Alexandros Tsakpinis. 2023. Analyzing Maintenance Activities of Software Libraries. InProceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering. 313–318
work page 2023
-
[63]
Alexandros Tsakpinis and Alexander Pretschner. 2024. Analyzing the Accessibil- ity of GitHub Repositories for PyPI and NPM Libraries. InProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering. 345–350
work page 2024
-
[64]
Alexandros Tsakpinis and Alexander Pretschner. 2025. Analyzing the Usage of Donation Platforms for PyPI Libraries. InProceedings of the 29th International Conference on Evaluation and Assessment in Software Engineering. 628–633
work page 2025
-
[65]
Matteo Vaccargiu, Sabrina Aufiero, Silvia Bartolucci, Rumyana Neykova, Roberto Tonelli, and Giuseppe Destefanis. 2024. Sustainability in blockchain develop- ment: A bert-based analysis of ethereum developer discussions. InProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering. 381–386
work page 2024
-
[66]
Roberto Verdecchia and Justus Bogner. 2025. Notes On Writing Effective Em- pirical Software Engineering Papers: An Opinionated Primer.ACM SIGSOFT Software Engineering Notes50, 3 (2025), 24–36
work page 2025
- [67]
-
[68]
Han Wang, Nirmalendu Prakash, Nguyen Khoi Hoang, Ming Shan Hee, Usman Naseem, and Roy Ka-Wei Lee. 2023. Prompting large language models for topic modeling. In2023 IEEE International Conference on Big Data (BigData). IEEE, 1236–1241
work page 2023
-
[69]
X Documentation. 2025. Counting characters when composing Tweets. https: //docs.x.com/fundamentals/counting-characters. Accessed: 2026-01-21
work page 2025
-
[70]
Ayfer Ezgi Yilmaz and Tulay Saracbasi. 2017. Assessing agreement between raters from the point of coefficients and log-linear models.Journal of Data Science15, 1 (2017), 1–24
work page 2017
-
[71]
Shuoxiao Zhang, Enyi Tang, Xinyu Gao, Zhekai Zhang, Yixiao Shan, Haofeng Zhang, Ziyang He, Jianhua Zhao, and Xuandong Li. 2025. Exploring the effective- ness of open-source donation platform: An empirical study on Opencollective. (2025)
work page 2025
-
[72]
Xunhui Zhang, Tao Wang, Yue Yu, Qiubing Zeng, Zhixing Li, and Huaimin Wang. 2022. Who, what, why and how? towards the monetary incentive in crowd collaboration: A case study of github’s sponsor mechanism. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–18
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.