arxiv: 2601.15139 · v3 · submitted 2026-01-21 · 💻 cs.SE

Recognition: no theorem link

Investigating Notable Metadata Practices in PyPI Libraries: An Empirical Study about Repository and Donation Platform URLs

Alexandros Tsakpinis , Nicolas Raube , Alexander Pretschner

Authors on Pith no claims yet

Pith reviewed 2026-05-16 12:03 UTC · model grok-4.3

classification 💻 cs.SE

keywords PyPImetadatarepository linksdonation platformssurvey studyLLM topic modelingopen source softwaremaintainer practices

0 comments

The pith

PyPI maintainers most often leave out repository links due to oversight and skip donation links because of skepticism about their value.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to explain why metadata links to source repositories and donation platforms are frequently missing, outdated, or inconsistent in PyPI libraries. It draws on 1,776 open-ended survey responses from authors and maintainers and applies an LLM-based topic modeling pipeline to surface the main reasons. A sympathetic reader would care because incomplete metadata blocks reliable dependency tracking, security checks, and long-term project support. If the reported patterns hold, they point to straightforward fixes such as awareness prompts for repository links and better visibility options for donations. The work also tests the reliability of its own LLM analysis method across repeated runs and expert review.

Core claim

The study finds that missing or outdated repository links arise mainly from oversight, lack of awareness, or perceived irrelevance, while donation platform links are omitted chiefly due to skepticism, limited perceived benefit, or lack of knowledge. Platform dominance for these links is shaped by ideological, technical, and organizational factors, with GitHub preferred for visibility. The LLM topic modeling pipeline using LLaMA 3.3 70B proved robust, reaching up to 88 percent lexical and 92 percent semantic similarity across 30 runs, and produced topics judged high-quality by experts in roughly 77-78 percent of cases.

What carries the argument

The LLM-based topic modeling pipeline that preprocesses survey responses, extracts topics with LLaMA 3.3 70B, merges similar topics, and evaluates stability via Jaccard and cosine similarity plus expert assessment.

If this is right

Dependency monitoring tools could flag and auto-suggest missing repository links based on the dominant oversight pattern.
Donation platforms would gain more entries if they offered one-click GitHub integration rather than separate profile setup.
Security scanners would encounter fewer broken or stale references once maintainers are prompted about link relevance.
Platform maintainers could reduce inconsistency by surfacing visibility statistics that show GitHub's advantage for donation links.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same oversight pattern may appear in other package registries such as npm or crates.io, suggesting a shared tooling opportunity across ecosystems.
If awareness prompts prove effective in a follow-up experiment, they could be built directly into the PyPI upload workflow to raise metadata completeness without extra maintainer effort.
The preference for GitHub placement implies that donation platforms might increase adoption by allowing direct embedding of donation buttons on repository README files.
High robustness of the LLM pipeline indicates it could be reused on similar open-ended maintainer surveys in other software-engineering domains with modest additional validation.

Load-bearing premise

The 1,776 self-reported survey answers accurately capture the real motivations and practices of PyPI maintainers without systematic bias or social-desirability effects.

What would settle it

A controlled follow-up that tracks whether sending maintainers simple reminders about repository links actually increases the number of valid, up-to-date links in their PyPI entries within six months.

Figures

Figures reproduced from arXiv: 2601.15139 by Alexander Pretschner, Alexandros Tsakpinis, Nicolas Raube.

**Figure 1.** Figure 1: Overview of the topic modeling pipeline. The robustness assessment component is omitted to enhance visual clarity. [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗

read the original abstract

Background: Open source software (OSS) libraries are critical components of modern software systems, yet their metadata-particularly links to source code repositories and donation platforms-is often incomplete, outdated, or inconsistent. Such deficiencies hinder dependency monitoring, security assessment, and the sustainability of OSS projects. Aims: This study aims to explain notable metadata practices in PyPI libraries, focusing on platform dominance, outdated links, and missing references to repositories and donation platforms. As this investigation relies on large-scale qualitative survey data, we further evaluate the robustness and quality of the LLM-based topic modeling approach used to derive the findings. Method: We conducted two surveys targeting PyPI authors and maintainers, collecting 1,776 open-ended responses. To analyze these responses, we developed a LLM-based topic modeling pipeline using LLaMA 3.3 70B, including preprocessing, topic extraction, and topic merging. Robustness was assessed across 30 repeated runs using Jaccard and cosine similarity, while topic quality was evaluated by 23 experts using a structured assessment framework and Randolph's Kappa. Results: The findings reveal that missing or outdated repository links are primarily associated with oversight, lack of awareness, or perceived irrelevance, while platform dominance is driven by ideological, technical, and organizational factors. Donation platform links are often omitted due to skepticism, limited perceived benefit, or lack of knowledge, and are preferentially placed on GitHub for visibility reasons. The topic modeling approach demonstrated high robustness (up to 88% lexical and 92% semantic similarity) and produced high-quality topics, with approximately 77-78% meeting all evaluation criteria and moderate inter-rater agreement.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PyPI metadata gaps come mostly from oversight for repo links and skepticism for donations, based on a large survey whose LLM topic model shows internal stability but needs more external checks.

read the letter

The paper's main contribution is a set of concrete themes from nearly 1800 PyPI maintainer responses: missing or outdated repository links trace mostly to oversight, lack of awareness, or low perceived relevance, while donation links are skipped due to skepticism, unclear benefits, or missing know-how, with GitHub favored for visibility. Platform choice overall reflects ideological, technical, and organizational factors. This is fresh empirical detail for PyPI specifically, extending smaller prior OSS metadata studies with open-ended data and a documented analysis pipeline. The work does well on transparency and method checks. They run the LLaMA-based topic extraction 30 times, report Jaccard and cosine similarities up to 88% lexical and 92% semantic, and have 23 experts rate topics with roughly 77-78% meeting all quality criteria plus moderate Kappa. Those numbers give reasonable evidence that the themes are not random model output. The survey scale itself is a plus for this kind of qualitative work. The softer spots are around sampling and validation. The abstract gives no response rate or sampling frame, so representativeness is unclear. Self-reports carry the usual risks of rationalization, and the LLM robustness metrics stay internal; there is no direct human-coded comparison to test whether the extracted topics match maintainer intent without model framing. That leaves the strongest claims a bit provisional. This is useful for people working on package registries, dependency tools, and OSS sustainability metrics. It has enough substance and visible method effort to merit serious peer review, where the main questions would be sampling details and stronger grounding for the topic interpretations.

Referee Report

2 major / 1 minor

Summary. The paper presents an empirical study of metadata practices in PyPI libraries, focusing on repository and donation platform URLs. Through two surveys yielding 1,776 open-ended responses from authors and maintainers, and an LLM-based topic modeling pipeline using LLaMA 3.3 70B, it identifies key reasons for incomplete metadata (e.g., oversight for repos, skepticism for donations) and evaluates the robustness of the analysis method via repeated runs and expert review.

Significance. If the sampling and validation concerns are addressed, the work provides valuable insights into maintainer behaviors affecting OSS metadata quality, with implications for dependency monitoring, security, and sustainability in package ecosystems. The methodological assessment of the LLM pipeline (robustness metrics and expert ratings) is a strength that could inform future qualitative SE research.

major comments (2)

[Method] Method section: The survey description provides no sampling frame, total population size, response rate, or analysis of non-response bias. This undermines the generalizability of the central claims about motivations (oversight for repository links; skepticism for donations) drawn from the 1,776 responses.
[Method and Results] Method and Results: The LLM topic modeling is validated only for internal robustness (Jaccard/cosine across 30 runs) and expert-perceived coherence (77-78% high-quality topics, moderate Kappa), but lacks comparison to human-coded ground truth or direct fidelity checks against actual maintainer interpretations, leaving open the risk of model-induced artifacts in the extracted themes.

minor comments (1)

[Abstract] Abstract: Clarify whether the 'two surveys' are distinct instruments or differ in targeting (authors vs. maintainers) and how responses were combined.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our methodological approach. We address each major comment below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [Method] Method section: The survey description provides no sampling frame, total population size, response rate, or analysis of non-response bias. This undermines the generalizability of the central claims about motivations (oversight for repository links; skepticism for donations) drawn from the 1,776 responses.

Authors: We acknowledge this limitation in the current manuscript. The surveys were distributed by extracting author contact information from PyPI package metadata for packages meeting inclusion criteria and sending targeted invitations, but the dynamic and decentralized nature of the ecosystem prevented precise tracking of the total contacted population or a formal response rate calculation. In the revised version, we will expand the Method section to detail the distribution process, report the number of packages from which authors were contacted where feasible, and add an explicit limitations paragraph discussing generalizability and potential non-response bias. We will also qualify the central claims to reflect these constraints while retaining the value of the observed response patterns. revision: yes
Referee: [Method and Results] Method and Results: The LLM topic modeling is validated only for internal robustness (Jaccard/cosine across 30 runs) and expert-perceived coherence (77-78% high-quality topics, moderate Kappa), but lacks comparison to human-coded ground truth or direct fidelity checks against actual maintainer interpretations, leaving open the risk of model-induced artifacts in the extracted themes.

Authors: We agree that a direct comparison to human-coded ground truth would provide stronger validation. Given the scale of 1,776 responses, exhaustive human coding was not feasible with available resources; the 23-expert review was selected as a practical external quality check on coherence and relevance. We will revise the Method and Results sections to explicitly discuss this limitation, clarify the rationale for the chosen robustness metrics and expert evaluation, and note the potential for model-induced artifacts. If space and resources allow, we will also include a small-scale pilot comparison between LLM-derived topics and human coding on a subset of responses. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical survey with external validation

full rationale

The paper conducts two surveys yielding 1,776 open-ended responses from PyPI maintainers, then applies an LLM topic modeling pipeline (LLaMA 3.3 70B) whose outputs are checked for robustness via 30 repeated runs (Jaccard/cosine similarity) and quality via 23-expert structured ratings plus Randolph's Kappa. No equations, fitted parameters, predictions, or derivations exist that could reduce claims to inputs by construction. Central findings on motivations for missing/outdated links derive directly from the validated topic extractions rather than any self-referential loop or self-citation load-bearing premise. The methodology is self-contained against external benchmarks (expert review, repeated-run metrics) with no ansatz smuggling, uniqueness theorems, or renaming of known results.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on two standard empirical assumptions rather than new parameters or entities: that self-reported reasons reflect actual practices and that the LLM pipeline produces stable, high-quality topics.

axioms (2)

domain assumption Survey responses from PyPI authors and maintainers accurately reflect their metadata practices and motivations
The study treats open-ended answers as direct evidence of reasons such as oversight or skepticism without independent verification of actual metadata states.
domain assumption The LLM-based topic modeling pipeline reliably extracts meaningful and stable themes from qualitative text
Robustness is assessed via repeated runs and expert ratings, but the assumption that the model does not introduce systematic bias remains untested against human-coded baselines in the provided abstract.

pith-pipeline@v0.9.0 · 5606 in / 1476 out tokens · 53063 ms · 2026-05-16T12:03:14.699569+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

72 extracted references · 72 canonical work pages · 2 internal anchors

[1]

Replication Package

2026. Replication Package. (1 2026). https://figshare.com/s/ b510f5274eb1333deb95

work page 2026
[2]

Rabe Abdalkareem, Vinicius Oda, Suhaib Mujahid, and Emad Shihab. 2020. On the impact of using trivial packages: An empirical case study on npm and pypi. Empirical Software Engineering25 (2020), 1168–1204

work page 2020
[3]

Aly Abdelrazek, Yomna Eid, Eman Gawish, Walaa Medhat, and Ahmed Hassan

work page
[4]

Topic modeling algorithms and applications: A survey.Information Systems 112 (2023), 102131

work page 2023
[5]

Nikta Akbarpour, Ahmad Saleem Mirza, Erfan Raoofian, Fatemeh Fard, and Gema Rodríguez-Pérez. 2025. Unveiling Ruby: Insights from Stack Overflow and Developer Survey.arXiv preprint arXiv:2503.19238(2025)

work page arXiv 2025
[6]

Apache Software Foundation. 2021. Log4j 2. https://logging.apache.org/log4j/2.x/ Accessed: 2026-01-21

work page 2021
[7]

Sebastian Baltes, Florian Angermeir, Chetan Arora, Marvin Muñoz Barón, Chun- yang Chen, Lukas Böhme, Fabio Calefato, Neil Ernst, Davide Falessi, Brian Fitzgerald, et al. 2025. Evaluation Guidelines for Empirical Studies in Software Engineering involving LLMs.arXiv preprint arXiv:2508.15503(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[8]

Sebastian Baltes and Paul Ralph. 2022. Sampling in software engineering research: A critical review and guidelines.Empirical Software Engineering27, 4 (2022), 94

work page 2022
[9]

Veronika Bauer, Lars Heinemann, and Florian Deissenboeck. 2012. A structured approach to assess third-party library usage. In2012 28th IEEE International Conference on Software Maintenance (ICSM). IEEE, 483–492

work page 2012
[10]

Isabele Bittencourt, Aparna S Varde, and Pankaj Lal. 2024. Opinion mining on offshore wind energy for environmental engineering. InInternational IOT, Electronics and Mechatronics Conference. Springer, 487–505

work page 2024
[11]

David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of machine Learning research3, Jan (2003), 993–1022

work page 2003
[12]

SRBH Chaturvedi and RC Shweta. 2015. Evaluation of inter-rater agreement and inter-rater reliability for observational data: an overview of concepts and methods.Journal of the Indian Academy of Applied Psychology41, 3 (2015), 20–27

work page 2015
[13]

Weisi Chen, Fethi Rabhi, Wenqi Liao, and Islam Al-Qudah. 2023. Leveraging state-of-the-art topic modeling for news impact analysis on financial markets: a comparative study.Electronics12, 12 (2023), 2605

work page 2023
[14]

Gerard Chung, Maria Rodriguez, Paul Lanier, and Daniel Gibbs. 2022. Text- mining open-ended survey responses using structural topic modeling: A practical demonstration to understand parents’ coping methods during the COVID-19 pandemic in Singapore.Journal of Technology in Human Services40, 4 (2022), 296–318

work page 2022
[15]

Rob Churchill and Lisa Singh. 2022. The evolution of topic modeling.Comput. Surveys54, 10s (2022), 1–35

work page 2022
[16]

Russ Cox. 2019. Surviving software dependencies.Commun. ACM62, 9 (2019), 36–43

work page 2019
[17]

Russ Cox. 2025. Fifty Years of Open Source Software Supply Chain Security: For decades, software reuse was only a lofty goal. Now it’s very real.Queue23, 1 (2025), 84–107

work page 2025
[18]

Alexandre Decan, Tom Mens, and Maelick Claes. 2016. On the topology of package dependency networks: A comparison of three programming language ecosystems. InProccedings of the 10th european conference on software architecture workshops. 1–4

work page 2016
[19]

Alexandre Decan, Tom Mens, and Eleni Constantinou. 2018. On the impact of security vulnerabilities in the npm package dependency network. InProceedings of the 15th international conference on mining software repositories. 181–191

work page 2018
[20]

Tomoki Doi, Masaru Isonuma, and Hitomi Yanaka. 2024. Topic modeling for short texts with large language models. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop). 21–33

work page 2024
[21]

Christof Ebert. 2008. Open source software in industry.IEEE Software25, 3 (2008), 52–53

work page 2008
[22]

Roman Egger and Joanne Yu. 2022. A topic modeling comparison between lda, nmf, top2vec, and bertopic to demystify twitter posts.Frontiers in sociology7 (2022), 886498

work page 2022
[23]

Youmei Fan, Tao Xiao, Hideaki Hata, Christoph Treude, and Kenichi Matsumoto

work page
[24]

My GitHub Sponsors profile is live!

" My GitHub Sponsors profile is live!" Investigating the Impact of Twitter/X Mentions on GitHub Sponsors. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering. 1–12

work page
[25]

Kai Gao, Weiwei Xu, Wenhao Yang, and Minghui Zhou. 2024. PyRadar: Towards Automatically Retrieving and Validating Source Code Repository Information for PyPI Packages.Proceedings of the ACM on Software Engineering1, FSE (2024), 2608–2631

work page 2024
[26]

2025.Octoverse: A new developer joins GitHub every second as AI leads TypeScript to #1

GitHub. 2025.Octoverse: A new developer joins GitHub every second as AI leads TypeScript to #1. https://github.blog/news-insights/octoverse/octoverse-a-new- developer-joins-github-every-second-as-ai-leads-typescript-to-1/ Accessed: 2026-01-21

work page 2025
[27]

GitHub Docs. 2023. Displaying a sponsor button in your repository. https://docs.github.com/en/repositories/managing-your-repositorys-settings- and-features/customizing-your-repository/displaying-a-sponsor-button-in- your-repository. Accessed: 2026-01-21

work page 2023
[28]

Maarten Grootendorst. 2022. BERTopic: Neural topic modeling with a class-based TF-IDF procedure.arXiv preprint arXiv:2203.05794(2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[29]

Sonja Hahn, Ulf Kroehne, and Samuel Merk. 2024. Improving and analyzing open-ended survey responses: A case study linking psychological theories and analysis approaches for text data.Zeitschrift für Psychologie232, 3 (2024), 171

work page 2024
[30]

Clayton Hutto and Eric Gilbert. 2014. Vader: A parsimonious rule-based model for sentiment analysis of social media text. InProceedings of the international AAAI conference on web and social media, Vol. 8. 216–225

work page 2014
[31]

Indeed Engineering. 2019. FOSS Fund: Six Months In. https://engineering. indeedblog.com/blog/2019/07/foss-fund-six-months-in/. Accessed: 2026-01-21

work page 2019
[32]

Trishia Khandelwal. 2024. Investigating the Impact of Text Summarization on Topic Modeling.arXiv preprint arXiv:2410.09063(2024)

work page arXiv 2024
[33]

Dai Li, Bolun Zhang, and Yimang Zhou. 2023. Can Large Language Models (LLM) label topics from a topic model? (2023)

work page 2023
[34]

Poonacha K Medappa, Murat M Tunc, and Xitong Li. 2023. Sponsorship Funding in Open-Source Software: Effort Reallocation and Spillover Effects in Knowledge- Sharing Ecosystems.A vailable at SSRN 4484403(2023)

work page 2023
[35]

Microsoft. 2023. FOSS Fund. https://github.com/microsoft/foss-fund. Accessed: 2026-01-21

work page 2023
[36]

Yida Mu, Peizhen Bai, Kalina Bontcheva, and Xingyi Song. 2024. Addressing Topic Granularity and Hallucination in Large Language Models for Topic Modelling. arXiv preprint arXiv:2405.00611(2024)

work page arXiv 2024
[37]

Yida Mu, Chun Dong, Kalina Bontcheva, and Xingyi Song. 2024. Large language models offer an alternative to the traditional approach of topic modelling.arXiv preprint arXiv:2403.16248(2024)

work page arXiv 2024
[38]

Ollama Contributors. 2025. Ollama Documentation. https://github.com/ollama/ ollama/tree/main/docs Accessed: 2026-01-21

work page 2025
[39]

OpenSSF. 2023. OpenSSF Scorecard. https://github.com/ossf/scorecard. Accessed: 2026-01-21

work page 2023
[40]

OpenSSF. 2024. XZ Backdoor (CVE-2024-3094). https://openssf.org/blog/2024/ 03/30/xz-backdoor-cve-2024-3094/. Accessed: 2026-01-21

work page 2024
[41]

2022.The Open Source Software Security Mobilization Plan

OpenSSF and LF. 2022.The Open Source Software Security Mobilization Plan. https://openssf.org/oss-security-mobilization-plan/ Accessed: 2026-01-21

work page 2022
[42]

Cassandra Overney, Jens Meinicke, Christian Kästner, and Bogdan Vasilescu

work page
[43]

In Proceedings of the ACM/IEEE 42nd international conference on software engineering

How to not get rich: An empirical study of donations in open source. In Proceedings of the ACM/IEEE 42nd international conference on software engineering. 1209–1221

work page
[44]

Andra-Selina Pietsch and Stefan Lessmann. 2018. Topic modeling for analyzing open-ended survey responses.Journal of Business Analytics1, 2 (2018), 93–116

work page 2018
[45]

Mike Pittenger. 2016. Open source security analysis: The state of open source security in commercial applications.Black Duck Software, Tech. Rep(2016)

work page 2016
[46]

Python Packaging Authority (PyPA). 2025. Packaging Python Projects: Con- figuring Metadata. https://packaging.python.org/en/latest/tutorials/packaging- projects/#configuring-metadata. Accessed: 2026-01-21

work page 2025
[47]

Python Packaging Authority (PyPA). 2025. PyPA Official Website. https://www. pypa.io/. Accessed: 2026-01-21

work page 2025
[48]

Steven Raemaekers, Arie van Deursen, and Joost Visser. 2011. Exploring risks in the usage of third-party libraries. Inof the BElgian-NEtherlands software eVOLution seminar. 31

work page 2011
[49]

Kristiina Rahkema and Dietmar Pfahl. 2022. SwiftDependencyChecker: Detecting Vulnerable Dependencies Declared Through CocoaPods, Carthage and Swift PM. In2022 IEEE/ACM 9th International Conference on Mobile Software Engineering and Systems (MobileSoft). IEEE, 107–111

work page 2022
[50]

Justus J Randolph. 2005. Free-Marginal Multirater Kappa (multirater K [free]): An Alternative to Fleiss’ Fixed-Marginal Multirater Kappa.Online submission (2005)

work page 2005
[51]

Nils Reimers and Iryna Gurevych. 2023. Sentence-Transformers: Pretrained Models Documentation. https://www.sbert.net/docs/sentence_transformer/ pretrained_models.html. Accessed: 2026-01-21

work page 2023
[52]

Emil Rijcken, Floortje Scheepers, Kalliopi Zervanou, Marco Spruit, Pablo Mosteiro, and Uzay Kaymak. 2023. Towards interpreting topic models with ChatGPT. InThe 20th World Congress of the International Fuzzy Systems Associa- tion

work page 2023
[53]

Margaret E Roberts, Brandon M Stewart, Dustin Tingley, Christopher Lucas, Jetson Leder-Luis, Shana Kushner Gadarian, Bethany Albertson, and David G Rand. 2014. Structural topic models for open-ended survey responses.American journal of political science58, 4 (2014), 1064–1082

work page 2014
[54]

Per Runeson and Martin Höst. 2009. Guidelines for conducting and reporting case study research in software engineering.Empirical software engineering14, 2 (2009), 131–164

work page 2009
[55]

Henry Sauermann and Michael Roach. 2013. Increasing web survey response rates in innovation research: An experimental study of static and dynamic contact design features.Research Policy42, 1 (2013), 273–286

work page 2013
[56]

Vida Sharifian-Attar, Suparna De, Sanaz Jabbari, Jenny Li, Harry Moss, and Jon Johnson. 2022. Analysing longitudinal social science questionnaires: topic modelling with BERT-based embeddings. In2022 IEEE international conference on big data (big data). IEEE, 5558–5567. Alexandros Tsakpinis, Nicolas Raube, and Alexander Pretschner

work page 2022
[57]

Naomichi Shimada, Tao Xiao, Hideaki Hata, Christoph Treude, and Kenichi Matsumoto. 2022. Github sponsors: exploring a new way to contribute to open source. InProceedings of the 44th international conference on software engineering. 1058–1069

work page 2022
[58]

Charlotte Siska, Katerina Marazopoulou, Melissa Ailem, and James Bono. 2024. Examining the robustness of LLM evaluation to the distributional assumptions of benchmarks. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 10406–10421

work page 2024
[59]

Snyk Security Team. 2022. The Colors and Faker NPM Packages Go Rogue. https://snyk.io/de/blog/open-source-npm-packages-colors-faker/. Accessed: 2026-01-21

work page 2022
[60]

Stripe. 2023. Why Stripe Sponsors Open Source. https://resources.github.com/ open-source/why-stripe-sponsors-open-source/. Accessed: 2026-01-21

work page 2023
[61]

2024.The 2024 Tidelift State of the Open Source Maintainer Report

Tidelift. 2024.The 2024 Tidelift State of the Open Source Maintainer Report. https://tidelift.com/open-source-maintainer-survey-2024 Accessed: 2026-01-21

work page 2024
[62]

Alexandros Tsakpinis. 2023. Analyzing Maintenance Activities of Software Libraries. InProceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering. 313–318

work page 2023
[63]

Alexandros Tsakpinis and Alexander Pretschner. 2024. Analyzing the Accessibil- ity of GitHub Repositories for PyPI and NPM Libraries. InProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering. 345–350

work page 2024
[64]

Alexandros Tsakpinis and Alexander Pretschner. 2025. Analyzing the Usage of Donation Platforms for PyPI Libraries. InProceedings of the 29th International Conference on Evaluation and Assessment in Software Engineering. 628–633

work page 2025
[65]

Matteo Vaccargiu, Sabrina Aufiero, Silvia Bartolucci, Rumyana Neykova, Roberto Tonelli, and Giuseppe Destefanis. 2024. Sustainability in blockchain develop- ment: A bert-based analysis of ethereum developer discussions. InProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering. 381–386

work page 2024
[66]

Roberto Verdecchia and Justus Bogner. 2025. Notes On Writing Effective Em- pirical Software Engineering Papers: An Opinionated Primer.ACM SIGSOFT Software Engineering Notes50, 3 (2025), 24–36

work page 2025
[67]

Stefan Wagner, Marvin Muñoz Barón, Davide Falessi, and Sebastian Baltes. 2024. Towards Evaluation Guidelines for Empirical Studies involving LLMs.arXiv preprint arXiv:2411.07668(2024)

work page arXiv 2024
[68]

Han Wang, Nirmalendu Prakash, Nguyen Khoi Hoang, Ming Shan Hee, Usman Naseem, and Roy Ka-Wei Lee. 2023. Prompting large language models for topic modeling. In2023 IEEE International Conference on Big Data (BigData). IEEE, 1236–1241

work page 2023
[69]

X Documentation. 2025. Counting characters when composing Tweets. https: //docs.x.com/fundamentals/counting-characters. Accessed: 2026-01-21

work page 2025
[70]

Ayfer Ezgi Yilmaz and Tulay Saracbasi. 2017. Assessing agreement between raters from the point of coefficients and log-linear models.Journal of Data Science15, 1 (2017), 1–24

work page 2017
[71]

Shuoxiao Zhang, Enyi Tang, Xinyu Gao, Zhekai Zhang, Yixiao Shan, Haofeng Zhang, Ziyang He, Jianhua Zhao, and Xuandong Li. 2025. Exploring the effective- ness of open-source donation platform: An empirical study on Opencollective. (2025)

work page 2025
[72]

Xunhui Zhang, Tao Wang, Yue Yu, Qiubing Zeng, Zhixing Li, and Huaimin Wang. 2022. Who, what, why and how? towards the monetary incentive in crowd collaboration: A case study of github’s sponsor mechanism. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–18

work page 2022