Recognition: unknown
When AI Models Become Dependencies: Studying the Evolution of Pre-Trained Model Reuse in Downstream Software Systems
Pith reviewed 2026-05-10 04:23 UTC · model grok-4.3
The pith
Pre-trained models are added late in software projects and accumulate over time rather than being replaced.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The study of 4,988 releases in 323 GitHub repositories finds that PTMs are typically added late in the project life-cycle and tend to accumulate rather than be replaced as a project matures. PTM changes occur in only 406 of 2,814 release transitions, three times less frequently than library changes. PTM changes are less routinely documented yet more likely to include explicit rationale, and unlike reactive library evolution, PTM changes are proactively driven by capability expansion with a distinctive rationale of testing uncertainty.
What carries the argument
Empirical comparison of PTM versus library change frequency, timing, and documented rationales extracted from release notes and repository metadata across 323 projects.
If this is right
- PTMs require dedicated tracking methods separate from standard library dependency tools.
- Maintenance teams should expect PTM updates to be driven by new capabilities rather than routine fixes.
- Documentation standards for PTM changes need to capture their explicit rationales more consistently.
- Systems may accumulate multiple PTM instances for different tasks, increasing long-term complexity.
- Software engineering processes should treat PTMs as multi-role dependencies rather than single libraries.
Where Pith is reading between the lines
- If accumulation continues unchecked, long-lived projects could face growing integration and version conflicts across PTM instances.
- Automated tools that scan release notes for testing-uncertainty mentions could help flag risky PTM adoptions early.
- The proactive capability-driven pattern may not hold in closed-source or enterprise environments where update decisions follow different incentives.
- Extending the analysis to measure actual runtime impact of infrequent PTM changes could test whether lower frequency correlates with higher stability or hidden risks.
Load-bearing premise
The 323 GitHub repositories and 4,988 releases accurately represent typical downstream systems that reuse open-source pre-trained models, and PTM changes can be reliably identified from metadata and notes.
What would settle it
A study of a larger or more diverse set of repositories showing PTM change frequency equal to or higher than library changes, or lacking the late-addition and accumulation pattern, would undermine the claim of a qualitatively distinct evolution.
Figures
read the original abstract
Modern software systems have transitioned from purely code-based architectures to AI-integrated systems where pre-trained models (PTMs) serve as permanent dependencies. However, while the evolution of traditional software libraries is well-documented, we lack a clear understanding of how these "PTM dependencies" change over time. Unlike libraries, PTMs are characterized by opaque internals and less standardized, rapidly evolving release cycles. Furthermore, their multi-role nature enables developers to treat individual instances of a single PTM as separate functional dependencies based on their specific downstream tasks. This raises a critical question for software maintenance: do PTMs change like standard software libraries or do they follow a divergent pattern? To answer this, we present the first empirical study of downstream PTM changes, analyzing a comprehensive dataset of 4,988 releases across 323 GitHub OSS repositories that reuse open-source PTMs. Using traditional software libraries as a baseline, we find that PTMs follow a qualitatively distinct pattern. PTMs are typically added late in the project life-cycle and tend to accumulate rather than be replaced as a project matures. Our findings show that PTM changes are three times less frequent (406 of 2,814 release transitions) than library changes. PTM changes are also less routinely documented, but more likely to carry explicit rationale. Unlike libraries, which evolve reactively, PTM evolution is proactively driven by capability expansion, with a unique documented rationale of PTM testing uncertainty. Our work calls for a rethinking of how PTMs are tracked and managed as dependencies in modern software engineering.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents the first empirical study of pre-trained model (PTM) evolution as dependencies in downstream software systems. Analyzing 4,988 releases from 323 GitHub OSS repositories that reuse open-source PTMs, and using traditional libraries as a baseline, it claims PTMs follow a qualitatively distinct pattern: added late in the project lifecycle, tending to accumulate rather than be replaced; PTM changes occur three times less frequently than library changes (406 of 2,814 release transitions); PTM changes are less routinely documented but more likely to carry explicit rationale; and PTM evolution is proactively driven by capability expansion with a unique rationale of PTM testing uncertainty, unlike the reactive evolution of libraries.
Significance. If the results hold after addressing detection validity, the work is significant as the first large-scale quantitative and qualitative characterization of PTM dependency evolution in software engineering. It supplies concrete counts from a substantial dataset (323 repositories, 4,988 releases) and identifies actionable differences from library management, supporting calls to rethink tracking and maintenance practices for AI components. The combination of frequency statistics and rationale categorization provides falsifiable observations that can guide future tooling and empirical work.
major comments (2)
- [Abstract and §3] Abstract and §3 (Data Collection/Analysis): The central claim that PTM changes are three times less frequent than library changes (406 of 2,814 release transitions) rests on the PTM change detection pipeline. No description is provided of the extraction method from release notes and metadata, handling of dynamic loading or multi-role PTM instances, inter-rater reliability for categorization, or any validation against ground truth. This directly affects the quantitative distinctness result and the comparison baseline, as differential miss rates would inflate the reported frequency gap.
- [§4] §4 (Results) and selection criteria: The representativeness claim for the 323 repositories and 4,988 releases as typical downstream systems is load-bearing for generalizing the late-addition and accumulation patterns. The manuscript provides no details on selection bias controls, inclusion/exclusion criteria beyond GitHub OSS, or comparison to broader PTM reuse populations, leaving the qualitative pattern vulnerable to sampling artifacts.
minor comments (2)
- [Abstract] The abstract states PTM changes are 'less routinely documented' without providing the exact documentation rates or statistical test used for the comparison to libraries.
- [Results] Figure or table presenting the 406/2,814 counts and rationale categories would benefit from explicit confidence intervals or effect sizes to support the 'three times less frequent' statement.
Simulated Author's Rebuttal
We thank the referee for their thorough review and constructive suggestions. We have addressed each of the major comments in detail below and revised the manuscript to enhance methodological transparency and address concerns about generalizability.
read point-by-point responses
-
Referee: [Abstract and §3] Abstract and §3 (Data Collection/Analysis): The central claim that PTM changes are three times less frequent than library changes (406 of 2,814 release transitions) rests on the PTM change detection pipeline. No description is provided of the extraction method from release notes and metadata, handling of dynamic loading or multi-role PTM instances, inter-rater reliability for categorization, or any validation against ground truth. This directly affects the quantitative distinctness result and the comparison baseline, as differential miss rates would inflate the reported frequency gap.
Authors: We agree with the referee that the description of the PTM change detection pipeline in the original manuscript was insufficiently detailed, which is important for validating the central quantitative claim. We have revised §3 to include a comprehensive description of the extraction method from release notes and metadata, our handling of dynamic loading and multi-role PTM instances (by identifying distinct usage contexts in the code), the inter-rater reliability assessment for categorization, and the validation against ground truth on a sample of releases. These revisions ensure the reported frequency difference is robustly supported. revision: yes
-
Referee: [§4] §4 (Results) and selection criteria: The representativeness claim for the 323 repositories and 4,988 releases as typical downstream systems is load-bearing for generalizing the late-addition and accumulation patterns. The manuscript provides no details on selection bias controls, inclusion/exclusion criteria beyond GitHub OSS, or comparison to broader PTM reuse populations, leaving the qualitative pattern vulnerable to sampling artifacts.
Authors: We acknowledge the importance of discussing selection criteria and potential biases for the generalizability of our findings on late-addition and accumulation patterns. The original manuscript described the dataset as 323 GitHub OSS repositories with 4,988 releases that reuse open-source PTMs, but provided limited details on the exact selection process and bias controls. In the revised version, we have added explicit inclusion and exclusion criteria in §3 and a new subsection in §4 addressing threats to validity, including selection bias and limitations in representing the broader population of PTM-reusing systems. While a full comparative analysis to all PTM reuse instances is beyond the scope of this study, we discuss how our sample aligns with known characteristics of AI-integrated OSS projects. revision: yes
Circularity Check
No significant circularity: purely observational empirical study
full rationale
This paper conducts an empirical analysis of 4,988 releases from 323 GitHub repositories, reporting observed frequencies (e.g., 406 PTM changes out of 2,814 transitions) and patterns directly from external repository metadata and release notes. No derivations, equations, fitted parameters, or predictions exist that could reduce to inputs by construction. All claims rest on data extraction and categorization rather than self-definitions, self-citations as load-bearing premises, or ansatzes smuggled from prior author work. The study is self-contained against external benchmarks, with findings grounded in observable repository events instead of internal logic.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The 323 GitHub OSS repositories reusing open-source PTMs are representative of broader downstream software systems
Reference graph
Works this paper leans on
-
[1]
Software reuse research: status and future,
W. B. Frakes and K. Kang, “Software reuse research: status and future,” IEEE Trans. Softw. Eng., vol. 31, no. 7, pp. 529–536, 2005
2005
-
[2]
Predicting software reuse using machine learning techniques—A case study on open-source Java software systems,
M. Y . H. Yeow, C. Y . Chong, M. K. Lim, and Y . Yee Yen, “Predicting software reuse using machine learning techniques—A case study on open-source Java software systems,”PLoS ONE, vol. 20, no. 2, p. e0314512, feb 2025
2025
-
[3]
Do developers update their library dependencies?: An empirical study on the impact of security advisories on library migration,
R. G. Kula, D. M. German, A. Ouni, T. Ishio, and K. Inoue, “Do developers update their library dependencies?: An empirical study on the impact of security advisories on library migration,”Empirical Software Engineering, vol. 23, no. 1, pp. 384–417, 2018
2018
-
[4]
An empirical comparison of dependency network evolution in seven software packaging ecosystems,
A. Decan, T. Mens, and P. Grosjean, “An empirical comparison of dependency network evolution in seven software packaging ecosystems,” Empirical Software Engineering, vol. 24, no. 1, pp. 381–416, 2019
2019
-
[5]
An Empirical Study of Pre-Trained Model Reuse in the Hugging Face Deep Learning Model Registry,
W. Jianget al., “An Empirical Study of Pre-Trained Model Reuse in the Hugging Face Deep Learning Model Registry,” inProceedings of the 45th International Conference on Software Engineering (ICSE 2023). Piscataway, NJ, USA: IEEE, 2023, pp. 2463–2475
2023
-
[6]
Pre-trained models: Past, present and future,
X. Hanet al., “Pre-trained models: Past, present and future,”AI Open, vol. 2, pp. 225–250, 2021
2021
-
[7]
Software Dependencies 2.0: An Empirical Study of Reuse and Integration of Pre-Trained Models in Open-Source Projects,
J. Yasmin, W. Jiang, and C. D. Y . Tian, “Software Dependencies 2.0: An Empirical Study of Reuse and Integration of Pre-Trained Models in Open-Source Projects,” 2026
2026
-
[8]
Deep Learning Model Reuse in the HuggingFace Community: Chal- lenges, Benefit and Trends,
M. Taraghi, G. Dorcelus, A. Foundjem, F. Tambon, and F. Khomh, “Deep Learning Model Reuse in the HuggingFace Community: Chal- lenges, Benefit and Trends,” inProceedings of the 31st IEEE Interna- tional Conference on Software Analysis, Evolution and Reengineering (SANER 2024). Piscataway, NJ, USA: IEEE, mar 2024, pp. 512–523
2024
-
[9]
Challenges of Using Pre-trained Models: the Practitioners’ Perspective,
X. Tanet al., “Challenges of Using Pre-trained Models: the Practitioners’ Perspective,” inProceedings of the 31st IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER 2024). Los Alamitos, CA, USA: IEEE, mar 2024, pp. 67–78. JOURNAL OF LATEX CLASS FILES, VOL. 18, NO. 9, SEPTEMBER 2020 18
2024
-
[10]
From release to adoption: Challenges in reusing pre-trained ai models for downstream developers,
P. Banyongrakkul, M. Zahedi, P. Thongtanunam, C. Treude, and H. Gao, “From release to adoption: Challenges in reusing pre-trained ai models for downstream developers,” inProceedings of the 41st IEEE Inter- national Conference on Software Maintenance and Evolution (ICSME 2025). Piscataway, NJ, USA: IEEE, 2025, pp. 1–13
2025
-
[11]
What do we know about hugging face? a systematic literature review and quantitative validation of qualitative claims,
J. Joneset al., “What do we know about hugging face? a systematic literature review and quantitative validation of qualitative claims,” in Proceedings of the 18th ACM/IEEE International Symposium on Em- pirical Software Engineering and Measurement (ESEM 2024), vol. 1, no. 1. New York, NY , USA: ACM, 2024, pp. 13–24
2024
-
[12]
Docu- menting ethical considerations in open source ai models,
H. Gao, M. Zahedi, C. Treude, S. Rosenstock, and M. Cheong, “Docu- menting ethical considerations in open source ai models,” inProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM 2024). New York, NY , USA: ACM, 2024, p. 177–188
2024
-
[13]
Towards semantic versioning of open pre-trained language model releases on hugging face,
A. Ajibode, A. A. Bangash, F. R. Cogo, B. Adams, and A. E. Hassan, “Towards semantic versioning of open pre-trained language model releases on hugging face,”Empirical Software Engineering, vol. 30, no. 3, pp. 1–63, 2025
2025
-
[14]
PeaTMOSS: A Dataset and Initial Analysis of Pre- Trained Models in Open-Source Software,
W. Jianget al., “PeaTMOSS: A Dataset and Initial Analysis of Pre- Trained Models in Open-Source Software,” inProceedings of the 21st IEEE/ACM International Conference on Mining Software Repositories (MSR 2024), vol. 1. New York, NY , USA: ACM, 2024, p. 431–443
2024
-
[15]
Reusing Deep Learning Models: Challenges and Directions in Software Engineering,
J. C. Daviset al., “Reusing Deep Learning Models: Challenges and Directions in Software Engineering,” inProceedings of the 2023 IEEE John Vincent Atanasoff Symposium on Modern Computing (JVA 2023). Piscataway, NJ, USA: IEEE, 2023, pp. 17–30
2023
-
[16]
From Technical Debt to Cognitive and Intent Debt: Rethinking Software Health in the Age of AI
M.-A. Storey, “From technical debt to cognitive and intent debt: Rethinking software health in the age of ai,” 2026. [Online]. Available: https://arxiv.org/abs/2603.22106
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[17]
On the adoption and effects of source code reuse on defect proneness and maintenance effort,
G. Giordanoet al., “On the adoption and effects of source code reuse on defect proneness and maintenance effort,”Empirical Software Engineering, vol. 29, no. 1, p. 20, 2023
2023
-
[18]
Design patterns: Abstraction and reuse of object-oriented design,
E. Gamma, R. Helm, R. Johnson, and J. Vlissides, “Design patterns: Abstraction and reuse of object-oriented design,” inProceedings of the European Conference on Object-Oriented Programming (ECOOP 1993), ser. Lecture Notes in Computer Science. Cham, Switzerland: Springer Nature, 1993, vol. 707, pp. 406–431
1993
-
[19]
Surviving Software Dependencies,
R. Cox, “Surviving Software Dependencies,”Queue, vol. 17, no. 2, pp. 24–47, 2019
2019
-
[20]
An Empirical Analysis of Technical Lag in npm Package Dependencies,
A. Zerouali, E. Constantinou, T. Menset al., “An Empirical Analysis of Technical Lag in npm Package Dependencies,” inProceedings of the 17th International Conference on Software Reuse (ICSR 2018), ser. Lecture Notes in Computer Science, vol. 10826. Cham, Switzerland: Springer, 2018, pp. 95–110
2018
-
[21]
Characterizing usages, updates and risks of third- party libraries in java projects,
K. Huanget al., “Characterizing usages, updates and risks of third- party libraries in java projects,”Empirical Software Engineering, vol. 27, no. 4, p. 78, 2022
2022
-
[22]
How the apache community upgrades dependencies: an evolutionary study,
G. Bavota, G. Canfora, M. Di Penta, R. Oliveto, and S. Panichella, “How the apache community upgrades dependencies: an evolutionary study,” Empirical Software Engineering, vol. 20, no. 5, pp. 1275–1317, 2015
2015
-
[23]
A large scale analysis of semantic versioning in npm,
D. Pinckney, F. Cassano, A. Guha, and J. Bell, “A large scale analysis of semantic versioning in npm,” inProceedings of the 20th IEEE/ACM International Conference on Mining Software Repositories (MSR 2023). Piscataway, NJ, USA: IEEE, 2023, pp. 485–497
2023
-
[24]
A study of library migrations in java,
C. Teyton, J. R. Falleri, M. Palyart, and X. Blanc, “A study of library migrations in java,”Journal of Software: Evolution and Process, vol. 26, no. 11, pp. 1030–1052, 2014
2014
-
[25]
An Empirical Study on the Reuse of Third-Party Libraries in Open-Source Software Development,
A. Zaimiet al., “An Empirical Study on the Reuse of Third-Party Libraries in Open-Source Software Development,” inProceedings of the 7th Balkan Conference on Informatics Conference (BCI 2015). New York, NY , USA: ACM, 2015
2015
-
[26]
Logging library migrations: a case study for the apache software foundation projects,
S. Kabinna, C.-P. Bezemer, W. Shang, and A. E. Hassan, “Logging library migrations: a case study for the apache software foundation projects,” inProceedings of the 13th International Conference on Mining Software Repositories (MSR 2016). ACM, 2016, pp. 154–164
2016
-
[27]
A large-scale empirical study on Java library migrations: prevalence, trends, and rationales,
H. He, R. He, H. Gu, and M. Zhou, “A large-scale empirical study on Java library migrations: prevalence, trends, and rationales,” inProceed- ings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2021). New York, NY , USA: ACM, 2021, pp. 478–490
2021
-
[28]
How and why developers migrate python tests,
L. Barbosa and A. Hora, “How and why developers migrate python tests,” inProceedings of the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER 2022). Piscataway, NJ, USA: IEEE, 2022, pp. 538–548
2022
-
[29]
Software Reuse and Evolution in JavaScript Applications,
A. Terzi, “Software Reuse and Evolution in JavaScript Applications,” in Proceedings of the 48th Euromicro Conference on Software Engineering and Advanced Applications (SEAA 2022). IEEE, 2022, pp. 263–269
2022
-
[30]
Jaisri, B
P. Jaisri, B. Reid, and R. G. Kula,A Preliminary Study on Self-contained Libraries in the NPM Ecosystem. Cham, Switzerland: Springer Nature, 2025, pp. 53–65
2025
-
[31]
Pymigbench: A bench- mark for python library migration,
M. Islam, A. K. Jha, S. Nadi, and I. Akhmetov, “Pymigbench: A bench- mark for python library migration,” inProceedings of the IEEE/ACM 20th International Conference on Mining Software Repositories (MSR 2023). Piscataway, NJ, USA: IEEE, 2023, pp. 511–515
2023
-
[32]
Self-Admitted Library Migrations in Java, JavaScript, and Python Packaging Ecosystems: A Comparative Study,
H. Gu, H. He, and M. Zhou, “Self-Admitted Library Migrations in Java, JavaScript, and Python Packaging Ecosystems: A Comparative Study,” inProceedings of the 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER 2023). Piscataway, NJ, USA: IEEE, 2023, pp. 627–638
2023
-
[33]
A Qualitative Study of De- pendency Management and Its Security Implications,
I. Pashchenko, D. L. Vu, and F. Massacci, “A Qualitative Study of De- pendency Management and Its Security Implications,” inProceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security (CCS 2020). ACM, 2020, pp. 1513–1531
2020
-
[34]
Characterizing python library migrations,
M. Islam, A. K. Jha, I. Akhmetov, and S. Nadi, “Characterizing python library migrations,”Proceedings of the ACM on Software Engineering, vol. 1, no. FSE, pp. 92–114, 2024
2024
-
[35]
Cramming: training a language model on a single GPU in one day,
J. Geipinget al., “Cramming: training a language model on a single GPU in one day,” inIn Proceedings of the 40th International Conference on Machine Learning (ICML 2023), vol. 202. Cambridge, MA, USA: PMLR Press, Jul 2023, pp. 11 117–11 143
2023
-
[36]
How do Hugging Face Models Document Datasets, Bias, and Licenses? An Empirical Study,
F. Pepeet al., “How do Hugging Face Models Document Datasets, Bias, and Licenses? An Empirical Study,” inProccedings of the 32nd IEEE International Conference on Program Comprehension (ICPC 2024), no. iii. New York, NY , USA: ACM, 2024, pp. 370–381
2024
-
[37]
Siavash Ameli, Siyuan Zhuang, Ion Stoica, and Michael W
M. Abdin, S. Agarwal, A. Awadallahet al., “Phi-4-reasoning technical report,” 2025. [Online]. Available: https://arxiv.org/abs/2504.21318
-
[38]
DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning,
D. Guo, D. Yang, H. Zhanget al., “DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning,”Springer Nature, vol. 645, no. 8081, pp. 633–638, 2025
2025
-
[39]
High-Resolution Image Synthesis with Latent Diffusion Models ,
R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “ High-Resolution Image Synthesis with Latent Diffusion Models ,” in Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022). Los Alamitos, CA, USA: IEEE, jun 2022, pp. 10 674–10 685
2022
-
[40]
Analyzing the Evolution and Maintenance of ML Models on Hugging Face,
J. Castano, S. Martinez-Fernandez, X. Franch, and J. Bogner, “Analyzing the Evolution and Maintenance of ML Models on Hugging Face,” in Proceedings of the IEEE/ACM 21st International Conference on Mining Software Repositories (MSR 2024), vol. 1, no. 1. New York, NY , USA: ACM, 2024, pp. 607–618
2024
-
[41]
Navigating dataset documentations in ai: A large-scale analysis of dataset cards on hugging face,
X. Yang, W. Liang, and J. Zou, “Navigating dataset documentations in ai: A large-scale analysis of dataset cards on hugging face,” inProceedings of the 12th International Conference on Learning Representations (ICLR 2024). OpenReview.net, 2024
2024
-
[42]
“i see models being a whole other thing
W. Jianget al., ““i see models being a whole other thing”: an empirical study of pre-trained model naming conventions and a tool for enhancing naming consistency,”Empirical Software Engineering, vol. 30, no. 6, p. 155, 2025
2025
-
[43]
What Is the Intended Usage Context of This Model? An Exploratory Study of Pre- Trained Models on Various Model Repositories,
L. Gong, J. Zhang, M. Wei, H. Zhang, and Z. Huang, “What Is the Intended Usage Context of This Model? An Exploratory Study of Pre- Trained Models on Various Model Repositories,”ACM Trans. Softw. Eng. Methodol., vol. 32, no. 3, pp. 1–57, may 2023
2023
-
[44]
An Empirical Study of Artifacts and Security Risks in the Pre-trained Model Supply Chain,
W. Jiang, N. Synovic, R. Sethiet al., “An Empirical Study of Artifacts and Security Risks in the Pre-trained Model Supply Chain,” inProceed- ings of the 2022 ACM Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses (SCORED 2022). New York, NY , USA: ACM, 2022, pp. 105–114
2022
-
[45]
Discrepancies among pre-trained deep neural networks: A new threat to model zoo reliability,
D. Montes, P. Peerapatanapokin, J. Schultz, C. Guo, W. Jiang, and J. C. Davis, “Discrepancies among pre-trained deep neural networks: A new threat to model zoo reliability,” inProceedings of the 30th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2022). New York, NY , USA: A...
2022
-
[46]
Exploring the Carbon Footprint of Hugging Face’s ML Models: A Repository Min- ing Study,
J. Castano, S. Martinez-Fernandez, X. Franch, and J. Bogner, “Exploring the Carbon Footprint of Hugging Face’s ML Models: A Repository Min- ing Study,” inProceedings of the 17th IEEE/ACM International Sym- posium on Empirical Software Engineering and Measurement (ESEM 2023). Piscataway, NJ, USA: IEEE, 2023, pp. 1–12
2023
-
[47]
A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT,
C. Zhouet al., “A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT,”International Journal of Machine Learning and Cybernetics, 2023. JOURNAL OF LATEX CLASS FILES, VOL. 18, NO. 9, SEPTEMBER 2020 19
2023
-
[48]
Does reusing pre-trained NLP model propagate bugs?
M. Chakraborty, “Does reusing pre-trained NLP model propagate bugs?” inProceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2021). ACM, 2021, pp. 1686–1688
2021
-
[49]
AI Safety in the Eyes of the Downstream Developer: A First Look at Concerns, Practices, and Challenges,
H. Gao, M. Zahedi, W. Jiang, H. Y . Lin, J. Davis, and C. Treude, “AI Safety in the Eyes of the Downstream Developer: A First Look at Concerns, Practices, and Challenges,” vol. 1, no. 1, pp. 1–29, 2025
2025
-
[50]
An exploratory study of dataset and model management in open source machine learning applications,
T. R. Toma and C. P. Bezemer, “An exploratory study of dataset and model management in open source machine learning applications,” in Proceedings of the 3rd IEEE/ACM International Conference on AI Engineering – Software Engineering for AI (CAIN 2024). New York, NY , USA: ACM, 2024, pp. 64–74
2024
-
[51]
The State of the ML- universe: 10 Years of Artificial Intelligence & Machine Learning Soft- ware Development on GitHub,
D. Gonzalez, T. Zimmermann, and N. Nagappan, “The State of the ML- universe: 10 Years of Artificial Intelligence & Machine Learning Soft- ware Development on GitHub,” inProceedings of the 17th IEEE/ACM International Conference on Mining Software Repositories (MSR 2020). New York, NY , USA: ACM, 2020, pp. 431–442
2020
-
[52]
Comparison of release engineering practices in a large mature company and a startup,
E. Laukkanen, M. Paasivaara, J. Itkonenet al., “Comparison of release engineering practices in a large mature company and a startup,”Empir- ical Software Engineering, vol. 23, no. 6, pp. 3535–3577, 2018
2018
-
[53]
Identifying unmaintained projects in github,
J. Coelho, M. T. Valente, L. L. Silva, and E. Shihab, “Identifying unmaintained projects in github,” inProceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Mea- surement (ESEM 2018). New York, NY , USA: ACM, 2018
2018
-
[54]
RapidRelease - A dataset of projects and issues on github with rapid releases,
S. D. Joshi and S. Chimalakonda, “RapidRelease - A dataset of projects and issues on github with rapid releases,” inProceedings of the IEEE/ACM 16th International Conference on Mining Software Repositories (MSR 2019), vol. 2019-May. Piscataway, NJ, USA: IEEE, 2019, pp. 587–591
2019
-
[55]
Keep the Ball Rolling: Analyzing Release Cadence in GitHub Projects,
O. Kilic, N. Bowness, and O. Baysal, “Keep the Ball Rolling: Analyzing Release Cadence in GitHub Projects,” inProceedings of the 20th IEEE/ACM International Conference on Mining Software Repositories (MSR 2023). Piscataway, NJ, USA: IEEE, 2023, pp. 372–376
2023
-
[56]
Release conventions of open-source software: An exploratory study,
D. Chakroborti, S. S. Nath, K. A. Schneider, and C. K. Roy, “Release conventions of open-source software: An exploratory study,”Journal of Software: Evolution and Process, vol. 35, no. 1, p. e2499, 2023
2023
-
[57]
Semantic versioning 2.0.0,
T. Preston-Werner and Contributors, “Semantic versioning 2.0.0,” https: //semver.org, 2023, accessed: Oct. 13, 2025
2023
-
[58]
On a test of whether one of two random variables is stochastically larger than the other,
H. B. Mann and D. R. Whitney, “On a test of whether one of two random variables is stochastically larger than the other,”The Annals of Mathematical Statistics, vol. 18, no. 1, pp. 50–60, 1947
1947
-
[59]
Cohen’s d,
M. J. Diener, “Cohen’s d,” inThe Corsini Encyclopedia of Psychology. John Wiley & Sons, Ltd, 2010, p. 1
2010
-
[60]
The measurement of observer agreement for categorical data,
J. R. Landis and G. G. Koch, “The measurement of observer agreement for categorical data,”Biometrics, vol. 33, no. 1, pp. 159–174, mar 1977
1977
-
[61]
Interrater reliability: the kappa statistic,
M. L. McHugh, “Interrater reliability: the kappa statistic,”Biochemia medica, vol. 22, no. 3, pp. 276–282, 2012
2012
-
[62]
(2025) Commit 041f39a
foundation-model-stack/aiu-fms-testing utils. (2025) Commit 041f39a. (Accessed: 2025-12-01). [Online]. Available: https://github.com/ foundation-model-stack/aiu-fms-testing-utils/commit/041f39a9
2025
-
[63]
(2023) Commit 7ee8e07
ZFTurbo/Music-Source-Separation-Training. (2023) Commit 7ee8e07. (Accessed: 2025-12-01). [Online]. Avail- able: https://github.com/ZFTurbo/Music-Source-Separation-Training/ commit/7ee8e074e6a9f6cd217f66a360a82c84cc2b174a
2023
-
[64]
A General Inductive Approach for Analyzing Qualitative Evaluation Data,
D. R. Thomas, “A General Inductive Approach for Analyzing Qualitative Evaluation Data,”American Journal of Evaluation, vol. 27, no. 2, pp. 237–246, 2006
2006
-
[65]
Wilcoxon signed-rank test,
R. F. Woolson, “Wilcoxon signed-rank test,” inWiley Encyclopedia of Clinical Trials. John Wiley & Sons, Ltd, 2008, pp. 1–3
2008
-
[66]
(2025) Example 1
luxonis/datadreamer. (2025) Example 1. (Accessed: 2025-12-01). [Online]. Available: https://github.com/luxonis/datadreamer/pull/77
2025
-
[67]
(2023) Exam- ple 2
centre-for-humanities computing/conspiracies. (2023) Exam- ple 2. (Accessed: 2025-12-01). [Online]. Avail- able: https://github.com/centre-for-humanities-computing/conspiracies/ commit/2c3d5e32318dd0713770d32b485b63ff986e67ac
2023
-
[68]
(2024) Example 3
hpcaitech/ColossalAI. (2024) Example 3. (Accessed: 2025-12-01). [Online]. Available: https://github.com/hpcaitech/ColossalAI/releases/ tag/v0.4.3
2024
-
[69]
(2024) Example 4
koito19960406/ZenSVI. (2024) Example 4. (Accessed: 2025-12-01). [Online]. Available: https://github.com/koito19960406/ZenSVI/pull/91
2024
-
[70]
(2024) Example 5
luxonis/datadreamer. (2024) Example 5. (Accessed: 2025-12-01). [Online]. Available: https://github.com/vllm-project/vllm/issues/4141
2024
-
[71]
(2021) Example 6
castorini/pyserini. (2021) Example 6. (Accessed: 2025-12-01). [Online]. Available: https://github.com/castorini/pyserini/pull/620
2021
-
[72]
(2025) Example 7
huggingface/trl. (2025) Example 7. (Accessed: 2025-12-01). [Online]. Available: https://github.com/huggingface/trl/pull/3415
2025
-
[73]
(2025) Example 8
biopragmatics/bioregistry. (2025) Example 8. (Accessed: 2025-12-01). [Online]. Available: https://github.com/biopragmatics/bioregistry/pull/ 1439/commits/e29600af8d57c7dacf28d7bddddb3b629f2e0b1a
2025
-
[74]
(2024) Example 9
luxonis/datadreamer. (2024) Example 9. (Accessed: 2025-12- 01). [Online]. Available: https://github.com/TransformerLensOrg/ TransformerLens/pull/777
2024
-
[75]
(2025) Example 10
vllm project/vllm. (2025) Example 10. (Accessed: 2025-12-01). [Online]. Available: https://github.com/vllm-project/vllm/pull/14422
2025
-
[76]
(2025) Example 11
PrunaAI/pruna. (2025) Example 11. (Accessed: 2025-12- 01). [Online]. Available: https://github.com/PrunaAI/pruna/commit/ bc1ece9b77f4fd426fbaf43e03b2f5eb66f2dc96
2025
-
[77]
(2025) Example 12
arthur-ai/arthur engine. (2025) Example 12. (Accessed: 2025-12-01). [Online]. Available: https://github.com/arthur-ai/arthur-engine/pull/310
2025
-
[78]
(2023) Example 13
mlflow/mlflow. (2023) Example 13. (Accessed: 2025-12-01). [Online]. Available: https://github.com/mlflow/mlflow/pull/8623
2023
-
[79]
(2024) Example 14
——. (2024) Example 14. (Accessed: 2025-12-01). [Online]. Available: https://github.com/mlflow/mlflow/issues/10887
2024
-
[80]
(2025) Example 14
vllm project/vllm. (2025) Example 14. (Accessed: 2025-12-01). [Online]. Available: https://github.com/vllm-project/vllm/pull/21169
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.