pith. machine review for the scientific record. sign in

arxiv: 2601.15139 · v3 · submitted 2026-01-21 · 💻 cs.SE

Recognition: no theorem link

Investigating Notable Metadata Practices in PyPI Libraries: An Empirical Study about Repository and Donation Platform URLs

Authors on Pith no claims yet

Pith reviewed 2026-05-16 12:03 UTC · model grok-4.3

classification 💻 cs.SE
keywords PyPImetadatarepository linksdonation platformssurvey studyLLM topic modelingopen source softwaremaintainer practices
0
0 comments X

The pith

PyPI maintainers most often leave out repository links due to oversight and skip donation links because of skepticism about their value.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to explain why metadata links to source repositories and donation platforms are frequently missing, outdated, or inconsistent in PyPI libraries. It draws on 1,776 open-ended survey responses from authors and maintainers and applies an LLM-based topic modeling pipeline to surface the main reasons. A sympathetic reader would care because incomplete metadata blocks reliable dependency tracking, security checks, and long-term project support. If the reported patterns hold, they point to straightforward fixes such as awareness prompts for repository links and better visibility options for donations. The work also tests the reliability of its own LLM analysis method across repeated runs and expert review.

Core claim

The study finds that missing or outdated repository links arise mainly from oversight, lack of awareness, or perceived irrelevance, while donation platform links are omitted chiefly due to skepticism, limited perceived benefit, or lack of knowledge. Platform dominance for these links is shaped by ideological, technical, and organizational factors, with GitHub preferred for visibility. The LLM topic modeling pipeline using LLaMA 3.3 70B proved robust, reaching up to 88 percent lexical and 92 percent semantic similarity across 30 runs, and produced topics judged high-quality by experts in roughly 77-78 percent of cases.

What carries the argument

The LLM-based topic modeling pipeline that preprocesses survey responses, extracts topics with LLaMA 3.3 70B, merges similar topics, and evaluates stability via Jaccard and cosine similarity plus expert assessment.

If this is right

  • Dependency monitoring tools could flag and auto-suggest missing repository links based on the dominant oversight pattern.
  • Donation platforms would gain more entries if they offered one-click GitHub integration rather than separate profile setup.
  • Security scanners would encounter fewer broken or stale references once maintainers are prompted about link relevance.
  • Platform maintainers could reduce inconsistency by surfacing visibility statistics that show GitHub's advantage for donation links.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same oversight pattern may appear in other package registries such as npm or crates.io, suggesting a shared tooling opportunity across ecosystems.
  • If awareness prompts prove effective in a follow-up experiment, they could be built directly into the PyPI upload workflow to raise metadata completeness without extra maintainer effort.
  • The preference for GitHub placement implies that donation platforms might increase adoption by allowing direct embedding of donation buttons on repository README files.
  • High robustness of the LLM pipeline indicates it could be reused on similar open-ended maintainer surveys in other software-engineering domains with modest additional validation.

Load-bearing premise

The 1,776 self-reported survey answers accurately capture the real motivations and practices of PyPI maintainers without systematic bias or social-desirability effects.

What would settle it

A controlled follow-up that tracks whether sending maintainers simple reminders about repository links actually increases the number of valid, up-to-date links in their PyPI entries within six months.

Figures

Figures reproduced from arXiv: 2601.15139 by Alexander Pretschner, Alexandros Tsakpinis, Nicolas Raube.

Figure 1
Figure 1. Figure 1: Overview of the topic modeling pipeline. The robustness assessment component is omitted to enhance visual clarity. [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
read the original abstract

Background: Open source software (OSS) libraries are critical components of modern software systems, yet their metadata-particularly links to source code repositories and donation platforms-is often incomplete, outdated, or inconsistent. Such deficiencies hinder dependency monitoring, security assessment, and the sustainability of OSS projects. Aims: This study aims to explain notable metadata practices in PyPI libraries, focusing on platform dominance, outdated links, and missing references to repositories and donation platforms. As this investigation relies on large-scale qualitative survey data, we further evaluate the robustness and quality of the LLM-based topic modeling approach used to derive the findings. Method: We conducted two surveys targeting PyPI authors and maintainers, collecting 1,776 open-ended responses. To analyze these responses, we developed a LLM-based topic modeling pipeline using LLaMA 3.3 70B, including preprocessing, topic extraction, and topic merging. Robustness was assessed across 30 repeated runs using Jaccard and cosine similarity, while topic quality was evaluated by 23 experts using a structured assessment framework and Randolph's Kappa. Results: The findings reveal that missing or outdated repository links are primarily associated with oversight, lack of awareness, or perceived irrelevance, while platform dominance is driven by ideological, technical, and organizational factors. Donation platform links are often omitted due to skepticism, limited perceived benefit, or lack of knowledge, and are preferentially placed on GitHub for visibility reasons. The topic modeling approach demonstrated high robustness (up to 88% lexical and 92% semantic similarity) and produced high-quality topics, with approximately 77-78% meeting all evaluation criteria and moderate inter-rater agreement.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper presents an empirical study of metadata practices in PyPI libraries, focusing on repository and donation platform URLs. Through two surveys yielding 1,776 open-ended responses from authors and maintainers, and an LLM-based topic modeling pipeline using LLaMA 3.3 70B, it identifies key reasons for incomplete metadata (e.g., oversight for repos, skepticism for donations) and evaluates the robustness of the analysis method via repeated runs and expert review.

Significance. If the sampling and validation concerns are addressed, the work provides valuable insights into maintainer behaviors affecting OSS metadata quality, with implications for dependency monitoring, security, and sustainability in package ecosystems. The methodological assessment of the LLM pipeline (robustness metrics and expert ratings) is a strength that could inform future qualitative SE research.

major comments (2)
  1. [Method] Method section: The survey description provides no sampling frame, total population size, response rate, or analysis of non-response bias. This undermines the generalizability of the central claims about motivations (oversight for repository links; skepticism for donations) drawn from the 1,776 responses.
  2. [Method and Results] Method and Results: The LLM topic modeling is validated only for internal robustness (Jaccard/cosine across 30 runs) and expert-perceived coherence (77-78% high-quality topics, moderate Kappa), but lacks comparison to human-coded ground truth or direct fidelity checks against actual maintainer interpretations, leaving open the risk of model-induced artifacts in the extracted themes.
minor comments (1)
  1. [Abstract] Abstract: Clarify whether the 'two surveys' are distinct instruments or differ in targeting (authors vs. maintainers) and how responses were combined.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our methodological approach. We address each major comment below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Method] Method section: The survey description provides no sampling frame, total population size, response rate, or analysis of non-response bias. This undermines the generalizability of the central claims about motivations (oversight for repository links; skepticism for donations) drawn from the 1,776 responses.

    Authors: We acknowledge this limitation in the current manuscript. The surveys were distributed by extracting author contact information from PyPI package metadata for packages meeting inclusion criteria and sending targeted invitations, but the dynamic and decentralized nature of the ecosystem prevented precise tracking of the total contacted population or a formal response rate calculation. In the revised version, we will expand the Method section to detail the distribution process, report the number of packages from which authors were contacted where feasible, and add an explicit limitations paragraph discussing generalizability and potential non-response bias. We will also qualify the central claims to reflect these constraints while retaining the value of the observed response patterns. revision: yes

  2. Referee: [Method and Results] Method and Results: The LLM topic modeling is validated only for internal robustness (Jaccard/cosine across 30 runs) and expert-perceived coherence (77-78% high-quality topics, moderate Kappa), but lacks comparison to human-coded ground truth or direct fidelity checks against actual maintainer interpretations, leaving open the risk of model-induced artifacts in the extracted themes.

    Authors: We agree that a direct comparison to human-coded ground truth would provide stronger validation. Given the scale of 1,776 responses, exhaustive human coding was not feasible with available resources; the 23-expert review was selected as a practical external quality check on coherence and relevance. We will revise the Method and Results sections to explicitly discuss this limitation, clarify the rationale for the chosen robustness metrics and expert evaluation, and note the potential for model-induced artifacts. If space and resources allow, we will also include a small-scale pilot comparison between LLM-derived topics and human coding on a subset of responses. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical survey with external validation

full rationale

The paper conducts two surveys yielding 1,776 open-ended responses from PyPI maintainers, then applies an LLM topic modeling pipeline (LLaMA 3.3 70B) whose outputs are checked for robustness via 30 repeated runs (Jaccard/cosine similarity) and quality via 23-expert structured ratings plus Randolph's Kappa. No equations, fitted parameters, predictions, or derivations exist that could reduce claims to inputs by construction. Central findings on motivations for missing/outdated links derive directly from the validated topic extractions rather than any self-referential loop or self-citation load-bearing premise. The methodology is self-contained against external benchmarks (expert review, repeated-run metrics) with no ansatz smuggling, uniqueness theorems, or renaming of known results.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on two standard empirical assumptions rather than new parameters or entities: that self-reported reasons reflect actual practices and that the LLM pipeline produces stable, high-quality topics.

axioms (2)
  • domain assumption Survey responses from PyPI authors and maintainers accurately reflect their metadata practices and motivations
    The study treats open-ended answers as direct evidence of reasons such as oversight or skepticism without independent verification of actual metadata states.
  • domain assumption The LLM-based topic modeling pipeline reliably extracts meaningful and stable themes from qualitative text
    Robustness is assessed via repeated runs and expert ratings, but the assumption that the model does not introduce systematic bias remains untested against human-coded baselines in the provided abstract.

pith-pipeline@v0.9.0 · 5606 in / 1476 out tokens · 53063 ms · 2026-05-16T12:03:14.699569+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

72 extracted references · 72 canonical work pages · 2 internal anchors

  1. [1]

    Replication Package

    2026. Replication Package. (1 2026). https://figshare.com/s/ b510f5274eb1333deb95

  2. [2]

    Rabe Abdalkareem, Vinicius Oda, Suhaib Mujahid, and Emad Shihab. 2020. On the impact of using trivial packages: An empirical case study on npm and pypi. Empirical Software Engineering25 (2020), 1168–1204

  3. [3]

    Aly Abdelrazek, Yomna Eid, Eman Gawish, Walaa Medhat, and Ahmed Hassan

  4. [4]

    Topic modeling algorithms and applications: A survey.Information Systems 112 (2023), 102131

  5. [5]

    Nikta Akbarpour, Ahmad Saleem Mirza, Erfan Raoofian, Fatemeh Fard, and Gema Rodríguez-Pérez. 2025. Unveiling Ruby: Insights from Stack Overflow and Developer Survey.arXiv preprint arXiv:2503.19238(2025)

  6. [6]

    Apache Software Foundation. 2021. Log4j 2. https://logging.apache.org/log4j/2.x/ Accessed: 2026-01-21

  7. [7]

    Sebastian Baltes, Florian Angermeir, Chetan Arora, Marvin Muñoz Barón, Chun- yang Chen, Lukas Böhme, Fabio Calefato, Neil Ernst, Davide Falessi, Brian Fitzgerald, et al. 2025. Evaluation Guidelines for Empirical Studies in Software Engineering involving LLMs.arXiv preprint arXiv:2508.15503(2025)

  8. [8]

    Sebastian Baltes and Paul Ralph. 2022. Sampling in software engineering research: A critical review and guidelines.Empirical Software Engineering27, 4 (2022), 94

  9. [9]

    Veronika Bauer, Lars Heinemann, and Florian Deissenboeck. 2012. A structured approach to assess third-party library usage. In2012 28th IEEE International Conference on Software Maintenance (ICSM). IEEE, 483–492

  10. [10]

    Isabele Bittencourt, Aparna S Varde, and Pankaj Lal. 2024. Opinion mining on offshore wind energy for environmental engineering. InInternational IOT, Electronics and Mechatronics Conference. Springer, 487–505

  11. [11]

    David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of machine Learning research3, Jan (2003), 993–1022

  12. [12]

    SRBH Chaturvedi and RC Shweta. 2015. Evaluation of inter-rater agreement and inter-rater reliability for observational data: an overview of concepts and methods.Journal of the Indian Academy of Applied Psychology41, 3 (2015), 20–27

  13. [13]

    Weisi Chen, Fethi Rabhi, Wenqi Liao, and Islam Al-Qudah. 2023. Leveraging state-of-the-art topic modeling for news impact analysis on financial markets: a comparative study.Electronics12, 12 (2023), 2605

  14. [14]

    Gerard Chung, Maria Rodriguez, Paul Lanier, and Daniel Gibbs. 2022. Text- mining open-ended survey responses using structural topic modeling: A practical demonstration to understand parents’ coping methods during the COVID-19 pandemic in Singapore.Journal of Technology in Human Services40, 4 (2022), 296–318

  15. [15]

    Rob Churchill and Lisa Singh. 2022. The evolution of topic modeling.Comput. Surveys54, 10s (2022), 1–35

  16. [16]

    Russ Cox. 2019. Surviving software dependencies.Commun. ACM62, 9 (2019), 36–43

  17. [17]

    Russ Cox. 2025. Fifty Years of Open Source Software Supply Chain Security: For decades, software reuse was only a lofty goal. Now it’s very real.Queue23, 1 (2025), 84–107

  18. [18]

    Alexandre Decan, Tom Mens, and Maelick Claes. 2016. On the topology of package dependency networks: A comparison of three programming language ecosystems. InProccedings of the 10th european conference on software architecture workshops. 1–4

  19. [19]

    Alexandre Decan, Tom Mens, and Eleni Constantinou. 2018. On the impact of security vulnerabilities in the npm package dependency network. InProceedings of the 15th international conference on mining software repositories. 181–191

  20. [20]

    Tomoki Doi, Masaru Isonuma, and Hitomi Yanaka. 2024. Topic modeling for short texts with large language models. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop). 21–33

  21. [21]

    Christof Ebert. 2008. Open source software in industry.IEEE Software25, 3 (2008), 52–53

  22. [22]

    Roman Egger and Joanne Yu. 2022. A topic modeling comparison between lda, nmf, top2vec, and bertopic to demystify twitter posts.Frontiers in sociology7 (2022), 886498

  23. [23]

    Youmei Fan, Tao Xiao, Hideaki Hata, Christoph Treude, and Kenichi Matsumoto

  24. [24]

    My GitHub Sponsors profile is live!

    " My GitHub Sponsors profile is live!" Investigating the Impact of Twitter/X Mentions on GitHub Sponsors. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering. 1–12

  25. [25]

    Kai Gao, Weiwei Xu, Wenhao Yang, and Minghui Zhou. 2024. PyRadar: Towards Automatically Retrieving and Validating Source Code Repository Information for PyPI Packages.Proceedings of the ACM on Software Engineering1, FSE (2024), 2608–2631

  26. [26]

    2025.Octoverse: A new developer joins GitHub every second as AI leads TypeScript to #1

    GitHub. 2025.Octoverse: A new developer joins GitHub every second as AI leads TypeScript to #1. https://github.blog/news-insights/octoverse/octoverse-a-new- developer-joins-github-every-second-as-ai-leads-typescript-to-1/ Accessed: 2026-01-21

  27. [27]

    GitHub Docs. 2023. Displaying a sponsor button in your repository. https://docs.github.com/en/repositories/managing-your-repositorys-settings- and-features/customizing-your-repository/displaying-a-sponsor-button-in- your-repository. Accessed: 2026-01-21

  28. [28]

    Maarten Grootendorst. 2022. BERTopic: Neural topic modeling with a class-based TF-IDF procedure.arXiv preprint arXiv:2203.05794(2022)

  29. [29]

    Sonja Hahn, Ulf Kroehne, and Samuel Merk. 2024. Improving and analyzing open-ended survey responses: A case study linking psychological theories and analysis approaches for text data.Zeitschrift für Psychologie232, 3 (2024), 171

  30. [30]

    Clayton Hutto and Eric Gilbert. 2014. Vader: A parsimonious rule-based model for sentiment analysis of social media text. InProceedings of the international AAAI conference on web and social media, Vol. 8. 216–225

  31. [31]

    Indeed Engineering. 2019. FOSS Fund: Six Months In. https://engineering. indeedblog.com/blog/2019/07/foss-fund-six-months-in/. Accessed: 2026-01-21

  32. [32]

    Trishia Khandelwal. 2024. Investigating the Impact of Text Summarization on Topic Modeling.arXiv preprint arXiv:2410.09063(2024)

  33. [33]

    Dai Li, Bolun Zhang, and Yimang Zhou. 2023. Can Large Language Models (LLM) label topics from a topic model? (2023)

  34. [34]

    Poonacha K Medappa, Murat M Tunc, and Xitong Li. 2023. Sponsorship Funding in Open-Source Software: Effort Reallocation and Spillover Effects in Knowledge- Sharing Ecosystems.A vailable at SSRN 4484403(2023)

  35. [35]

    Microsoft. 2023. FOSS Fund. https://github.com/microsoft/foss-fund. Accessed: 2026-01-21

  36. [36]

    Yida Mu, Peizhen Bai, Kalina Bontcheva, and Xingyi Song. 2024. Addressing Topic Granularity and Hallucination in Large Language Models for Topic Modelling. arXiv preprint arXiv:2405.00611(2024)

  37. [37]

    Yida Mu, Chun Dong, Kalina Bontcheva, and Xingyi Song. 2024. Large language models offer an alternative to the traditional approach of topic modelling.arXiv preprint arXiv:2403.16248(2024)

  38. [38]

    Ollama Contributors. 2025. Ollama Documentation. https://github.com/ollama/ ollama/tree/main/docs Accessed: 2026-01-21

  39. [39]

    OpenSSF. 2023. OpenSSF Scorecard. https://github.com/ossf/scorecard. Accessed: 2026-01-21

  40. [40]

    OpenSSF. 2024. XZ Backdoor (CVE-2024-3094). https://openssf.org/blog/2024/ 03/30/xz-backdoor-cve-2024-3094/. Accessed: 2026-01-21

  41. [41]

    2022.The Open Source Software Security Mobilization Plan

    OpenSSF and LF. 2022.The Open Source Software Security Mobilization Plan. https://openssf.org/oss-security-mobilization-plan/ Accessed: 2026-01-21

  42. [42]

    Cassandra Overney, Jens Meinicke, Christian Kästner, and Bogdan Vasilescu

  43. [43]

    In Proceedings of the ACM/IEEE 42nd international conference on software engineering

    How to not get rich: An empirical study of donations in open source. In Proceedings of the ACM/IEEE 42nd international conference on software engineering. 1209–1221

  44. [44]

    Andra-Selina Pietsch and Stefan Lessmann. 2018. Topic modeling for analyzing open-ended survey responses.Journal of Business Analytics1, 2 (2018), 93–116

  45. [45]

    Mike Pittenger. 2016. Open source security analysis: The state of open source security in commercial applications.Black Duck Software, Tech. Rep(2016)

  46. [46]

    Python Packaging Authority (PyPA). 2025. Packaging Python Projects: Con- figuring Metadata. https://packaging.python.org/en/latest/tutorials/packaging- projects/#configuring-metadata. Accessed: 2026-01-21

  47. [47]

    Python Packaging Authority (PyPA). 2025. PyPA Official Website. https://www. pypa.io/. Accessed: 2026-01-21

  48. [48]

    Steven Raemaekers, Arie van Deursen, and Joost Visser. 2011. Exploring risks in the usage of third-party libraries. Inof the BElgian-NEtherlands software eVOLution seminar. 31

  49. [49]

    Kristiina Rahkema and Dietmar Pfahl. 2022. SwiftDependencyChecker: Detecting Vulnerable Dependencies Declared Through CocoaPods, Carthage and Swift PM. In2022 IEEE/ACM 9th International Conference on Mobile Software Engineering and Systems (MobileSoft). IEEE, 107–111

  50. [50]

    Justus J Randolph. 2005. Free-Marginal Multirater Kappa (multirater K [free]): An Alternative to Fleiss’ Fixed-Marginal Multirater Kappa.Online submission (2005)

  51. [51]

    Nils Reimers and Iryna Gurevych. 2023. Sentence-Transformers: Pretrained Models Documentation. https://www.sbert.net/docs/sentence_transformer/ pretrained_models.html. Accessed: 2026-01-21

  52. [52]

    Emil Rijcken, Floortje Scheepers, Kalliopi Zervanou, Marco Spruit, Pablo Mosteiro, and Uzay Kaymak. 2023. Towards interpreting topic models with ChatGPT. InThe 20th World Congress of the International Fuzzy Systems Associa- tion

  53. [53]

    Margaret E Roberts, Brandon M Stewart, Dustin Tingley, Christopher Lucas, Jetson Leder-Luis, Shana Kushner Gadarian, Bethany Albertson, and David G Rand. 2014. Structural topic models for open-ended survey responses.American journal of political science58, 4 (2014), 1064–1082

  54. [54]

    Per Runeson and Martin Höst. 2009. Guidelines for conducting and reporting case study research in software engineering.Empirical software engineering14, 2 (2009), 131–164

  55. [55]

    Henry Sauermann and Michael Roach. 2013. Increasing web survey response rates in innovation research: An experimental study of static and dynamic contact design features.Research Policy42, 1 (2013), 273–286

  56. [56]

    Vida Sharifian-Attar, Suparna De, Sanaz Jabbari, Jenny Li, Harry Moss, and Jon Johnson. 2022. Analysing longitudinal social science questionnaires: topic modelling with BERT-based embeddings. In2022 IEEE international conference on big data (big data). IEEE, 5558–5567. Alexandros Tsakpinis, Nicolas Raube, and Alexander Pretschner

  57. [57]

    Naomichi Shimada, Tao Xiao, Hideaki Hata, Christoph Treude, and Kenichi Matsumoto. 2022. Github sponsors: exploring a new way to contribute to open source. InProceedings of the 44th international conference on software engineering. 1058–1069

  58. [58]

    Charlotte Siska, Katerina Marazopoulou, Melissa Ailem, and James Bono. 2024. Examining the robustness of LLM evaluation to the distributional assumptions of benchmarks. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 10406–10421

  59. [59]

    Snyk Security Team. 2022. The Colors and Faker NPM Packages Go Rogue. https://snyk.io/de/blog/open-source-npm-packages-colors-faker/. Accessed: 2026-01-21

  60. [60]

    Stripe. 2023. Why Stripe Sponsors Open Source. https://resources.github.com/ open-source/why-stripe-sponsors-open-source/. Accessed: 2026-01-21

  61. [61]

    2024.The 2024 Tidelift State of the Open Source Maintainer Report

    Tidelift. 2024.The 2024 Tidelift State of the Open Source Maintainer Report. https://tidelift.com/open-source-maintainer-survey-2024 Accessed: 2026-01-21

  62. [62]

    Alexandros Tsakpinis. 2023. Analyzing Maintenance Activities of Software Libraries. InProceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering. 313–318

  63. [63]

    Alexandros Tsakpinis and Alexander Pretschner. 2024. Analyzing the Accessibil- ity of GitHub Repositories for PyPI and NPM Libraries. InProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering. 345–350

  64. [64]

    Alexandros Tsakpinis and Alexander Pretschner. 2025. Analyzing the Usage of Donation Platforms for PyPI Libraries. InProceedings of the 29th International Conference on Evaluation and Assessment in Software Engineering. 628–633

  65. [65]

    Matteo Vaccargiu, Sabrina Aufiero, Silvia Bartolucci, Rumyana Neykova, Roberto Tonelli, and Giuseppe Destefanis. 2024. Sustainability in blockchain develop- ment: A bert-based analysis of ethereum developer discussions. InProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering. 381–386

  66. [66]

    Roberto Verdecchia and Justus Bogner. 2025. Notes On Writing Effective Em- pirical Software Engineering Papers: An Opinionated Primer.ACM SIGSOFT Software Engineering Notes50, 3 (2025), 24–36

  67. [67]

    Stefan Wagner, Marvin Muñoz Barón, Davide Falessi, and Sebastian Baltes. 2024. Towards Evaluation Guidelines for Empirical Studies involving LLMs.arXiv preprint arXiv:2411.07668(2024)

  68. [68]

    Han Wang, Nirmalendu Prakash, Nguyen Khoi Hoang, Ming Shan Hee, Usman Naseem, and Roy Ka-Wei Lee. 2023. Prompting large language models for topic modeling. In2023 IEEE International Conference on Big Data (BigData). IEEE, 1236–1241

  69. [69]

    X Documentation. 2025. Counting characters when composing Tweets. https: //docs.x.com/fundamentals/counting-characters. Accessed: 2026-01-21

  70. [70]

    Ayfer Ezgi Yilmaz and Tulay Saracbasi. 2017. Assessing agreement between raters from the point of coefficients and log-linear models.Journal of Data Science15, 1 (2017), 1–24

  71. [71]

    Shuoxiao Zhang, Enyi Tang, Xinyu Gao, Zhekai Zhang, Yixiao Shan, Haofeng Zhang, Ziyang He, Jianhua Zhao, and Xuandong Li. 2025. Exploring the effective- ness of open-source donation platform: An empirical study on Opencollective. (2025)

  72. [72]

    Xunhui Zhang, Tao Wang, Yue Yu, Qiubing Zeng, Zhixing Li, and Huaimin Wang. 2022. Who, what, why and how? towards the monetary incentive in crowd collaboration: A case study of github’s sponsor mechanism. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–18