Recognition: no theorem link
Code Sharing In Prediction Model Research: A Scoping Review
Pith reviewed 2026-05-15 10:16 UTC · model grok-4.3
The pith
Prediction model studies share code in only 12 percent of cases, and shared code is frequently not reusable.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Among 3,967 eligible articles citing TRIPOD or TRIPOD+AI, 12.2 percent included code-sharing statements. Repository assessment against 14 reproducibility features revealed substantial heterogeneity: 80.5 percent had a README, yet only 37.6 percent specified dependencies (21.6 percent with version constraints) and 42.4 percent were modular. Code sharing was higher in TRIPOD+AI-citing studies and increased over time, but overall remained uncommon and often fell short of supporting reuse.
What carries the argument
A scoping review of PubMed articles citing TRIPOD statements that uses an LLM-assisted pipeline to extract code-availability statements and evaluate repositories against 14 predefined reproducibility features.
If this is right
- TRIPOD-Code should define explicit requirements for documentation, dependency specification, licensing, and executable structure.
- Code sharing rates differ by journal and country, indicating that targeted journal policies could raise participation.
- Reproducibility in prediction model research requires standards beyond merely making code available.
- The current 12.2 percent baseline can be used to measure whether new guidelines increase both the quantity and quality of shared code.
Where Pith is reading between the lines
- Researchers might adopt standardized repository templates to meet future guidelines more easily.
- Low reusability of shared code could slow cumulative validation of prediction models across studies.
- Journal editors could require a minimal reproducibility checklist at submission to accelerate improvement.
Load-bearing premise
The LLM-assisted screening and extraction pipeline correctly identifies code availability statements and accurately assesses the 14 reproducibility features without substantial misclassification.
What would settle it
A manual audit of a random sample of 200 papers that directly compares the LLM-derived code-sharing rate and feature scores against human judgments.
Figures
read the original abstract
Analytical code is essential for reproducing diagnostic and prognostic prediction model research, yet code availability in the published literature remains limited. While the TRIPOD statements set standards for reporting prediction model methods, they do not define explicit standards for repository structure and documentation. This review quantifies current code-sharing practices to inform the development of TRIPOD-Code, a TRIPOD extension reporting guideline focused on code sharing. We conducted a scoping review of PubMed-indexed articles citing TRIPOD or TRIPOD+AI as of Aug 11, 2025, restricted to studies retrievable via the PubMed Central Open Access API. Eligible studies developed, updated, or validated multivariable prediction models. A large language model-assisted pipeline was developed to screen articles and extract code availability statements and repository links. Repositories were assessed with the same LLM against 14 predefined reproducibility-related features. Our code is made publicly available. Among 3,967 eligible articles, 12.2% included code sharing statements. Code sharing increased over time, reaching 15.8% in 2025, and was higher among TRIPOD+AI-citing studies than TRIPOD-citing studies. Sharing prevalence varied widely by journal and country. Repository assessment showed substantial heterogeneity in reproducibility features: most repositories contained a README file (80.5%), but fewer specified dependencies (37.6%; version-constrained 21.6%) or were modular (42.4%). In prediction model research, code sharing remains relatively uncommon, and when shared, often falls short of being reusable. These findings provide an empirical baseline for the TRIPOD-Code extension and underscore the need for clearer expectations beyond code availability, including documentation, dependency specification, licensing, and executable structure.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper conducts a scoping review of 3,967 PubMed-indexed articles citing TRIPOD or TRIPOD+AI (retrieved via PMC OA API) that develop, update, or validate multivariable prediction models. Using an LLM-assisted pipeline, it finds that 12.2% of articles include code-sharing statements (rising to 15.8% in 2025 and higher for TRIPOD+AI citations), with substantial variation by journal and country. Assessment of the shared repositories against 14 reproducibility features shows heterogeneity (e.g., 80.5% contain a README, 37.6% specify dependencies with 21.6% version-constrained, 42.4% are modular). The work provides an empirical baseline to inform the TRIPOD-Code extension guideline.
Significance. If the LLM extraction proves reliable, the large sample and direct counts supply a useful descriptive baseline on code-sharing prevalence and quality in prediction-model research. This directly supports development of TRIPOD-Code by quantifying gaps in documentation, dependency specification, and reusability beyond mere availability statements.
major comments (2)
- [Methods] Methods (LLM-assisted screening and extraction pipeline): No validation metrics (precision, recall, kappa, or human gold-standard comparison) are reported for either the article screening/code-statement extraction step or the subsequent 14-feature repository assessment. Systematic LLM errors could materially shift the headline 12.2% prevalence or the reported feature rates (e.g., dependency specification), undermining the claim that sharing is 'relatively uncommon' and 'often falls short of being reusable'.
- [Results] Results (prevalence and feature percentages): The reported figures (12.2%, 15.8%, 37.6%, etc.) are presented without confidence intervals, sensitivity analyses, or discussion of potential misclassification rates, leaving the central empirical claims vulnerable to extraction bias.
minor comments (2)
- [Abstract] Abstract: The cutoff date 'Aug 11, 2025' should be clarified (projection, typo, or actual search date) to avoid confusion.
- [Methods] Methods: Releasing the exact LLM prompts and model version used would improve reproducibility of the pipeline, consistent with the paper's own emphasis on code sharing.
Simulated Author's Rebuttal
Thank you for the detailed and constructive referee report. We have carefully considered the comments and provide point-by-point responses below. We plan to make revisions to address the concerns raised regarding the validation of our LLM-assisted methods and the statistical reporting of results.
read point-by-point responses
-
Referee: [Methods] Methods (LLM-assisted screening and extraction pipeline): No validation metrics (precision, recall, kappa, or human gold-standard comparison) are reported for either the article screening/code-statement extraction step or the subsequent 14-feature repository assessment. Systematic LLM errors could materially shift the headline 12.2% prevalence or the reported feature rates (e.g., dependency specification), undermining the claim that sharing is 'relatively uncommon' and 'often falls short of being reusable'.
Authors: We thank the referee for highlighting this important point. While the manuscript did not include formal validation metrics, the full pipeline code is publicly available to allow independent verification. To strengthen the work, we will add a validation subsection in the Methods, including a human review of a subsample (e.g., 100 articles) to compute precision and recall for screening and extraction steps, as well as inter-rater agreement. This will be reported in the revised manuscript. revision: yes
-
Referee: [Results] Results (prevalence and feature percentages): The reported figures (12.2%, 15.8%, 37.6%, etc.) are presented without confidence intervals, sensitivity analyses, or discussion of potential misclassification rates, leaving the central empirical claims vulnerable to extraction bias.
Authors: We agree that the absence of confidence intervals and sensitivity analyses is a limitation. In the revised manuscript, we will compute and report 95% confidence intervals for all key prevalence estimates using the binomial exact method. Additionally, we will include sensitivity analyses that adjust the reported rates under plausible misclassification scenarios informed by the validation results. A brief discussion of potential extraction bias will be added to the Results and Discussion sections. revision: yes
Circularity Check
No circularity: empirical scoping review reports direct counts without derivations or self-referential reductions
full rationale
This scoping review quantifies code-sharing prevalence via screening of 3,967 articles and direct assessment of repository features against 14 criteria. No equations, fitted parameters, predictions, or derivations appear in the methods or results. The LLM-assisted pipeline is a methodological tool whose outputs (counts such as 12.2% sharing rate) are presented as empirical observations, not as outputs forced by prior inputs or self-citations. TRIPOD citations provide background context for the review's motivation but do not bear the load of the reported statistics. No self-definitional loops, renamed known results, or uniqueness claims imported from the authors' prior work are present. The central findings remain independent of any internal construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The PubMed Central Open Access API yields a representative sample of TRIPOD-citing prediction model studies for the chosen date range.
Reference graph
Works this paper leans on
-
[1]
Collins, Rebecca Whittle, Garrett S
Gary S. Collins, Rebecca Whittle, Garrett S. Bullock, Patricia Logullo, Paula Dhiman, Jennifer A. de Beyer, Richard D. Riley, and Michael M. Schlussel. Open science practices need substantial improvement in prognostic model studies 10 in oncology using machine learning.Journal of Clinical Epidemiology, 165, January 2024. ISSN 0895-4356, 1878-5921. doi: 10...
-
[2]
Mark D. Wilkinson, Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, Jan-Willem Boiten, Luiz Bonino da Silva Santos, Philip E. Bourne, Jildau Bouwman, Anthony J. Brookes, Tim Clark, Mercè Crosas, Ingrid Dillo, Olivier Dumon, Scott Edmunds, Chris T. Evelo, Richard Finkers, Alejandra Gonzalez-Beltran, A...
-
[3]
Iveta Simera, David Moher, Allison Hirst, John Hoey, Kenneth F. Schulz, and Douglas G. Altman. Transparent and accurate reporting increases reliability, utility, and impact of your research: Reporting guidelines and the EQUATOR Network.BMC Medicine, 8(1):24, April 2010. ISSN 1741-7015. doi: 10.1186/1741-7015-8-24
-
[4]
doi: 10.1038/d41586-018-02741-4
Does your code stand up to scrutiny?Nature, 555(7695):142–142, March 2018. doi: 10.1038/d41586-018-02741-4
-
[5]
Theodora Bloom, Emma Ganley, and Margaret Winker. Data Access for the Open Access Literature: PLOS’s Data Policy.PLoS Medicine, 11(2):e1001607, February 2014. ISSN 1549-1277. doi: 10.1371/journal.pmed.1001607
-
[6]
The citation advantage of linking publications to research data.PLOS ONE, 15(4):e0230416, April 2020
Giovanni Colavizza, Iain Hrynaszkiewicz, Isla Staden, Kirstie Whitaker, and Barbara McGillivray. The citation advantage of linking publications to research data.PLOS ONE, 15(4):e0230416, April 2020. ISSN 1932-6203. doi: 10.1371/journal.pone.0230416
-
[7]
Gary S. Collins, Johannes B. Reitsma, Douglas G. Altman, and Karel G.M. Moons. Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): The TRIPOD Statement.Annals of Internal Medicine, 162(1):55–63, January 2015. ISSN 0003-4819. doi: 10.7326/M14-0697
-
[8]
Karel G.M. Moons, Douglas G. Altman, Johannes B. Reitsma, John P.A. Ioannidis, Petra Macaskill, Ewout W. Steyerberg, Andrew J. Vickers, David F. Ransohoff, and Gary S. Collins. Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): Explanation and Elaboration.Annals of Internal Medicine, 162(1):W1–W73, Ja...
-
[9]
Gary S Collins, Karel G M Moons, Paula Dhiman, Richard D Riley, Andrew L Beam, Ben Van Calster, Marzyeh Ghassemi, Xiaoxuan Liu, Johannes B Reitsma, Maarten Van Smeden, Anne-Laure Boulesteix, Jennifer Catherine Camaradou, Leo Anthony Celi, Spiros Denaxas, Alastair K Denniston, Ben Glocker, Robert M Golub, Hugh Harvey, Georg Heinze, Michael M Hoffman, André...
-
[10]
Amir H Zamanipoor Najafabadi, Chava L Ramspek, Friedo W Dekker, Pauline Heus, Lotty Hooft, Karel G M Moons, Wilco C Peul, Gary S Collins, Ewout W Steyerberg, and Merel van Diepen. TRIPOD statement: A preliminary pre-post analysis of reporting and methods of prediction models.BMJ Open, 10(9):e041537, September 2020. ISSN 2044-6055. doi: 10.1136/bmjopen-2020-041537
-
[11]
Anna Maria Streiber, Sanne J W Hoepel, Elisabet Blok, Frank J A van Rooij, Julia Neitzel, Jeremy Labrecque, M Kamram Ikram, and Daniel Bos. Improving reproducibility of data analysis and code in medical research: 5 recommendations to get started.BMJ Open, 15(10):e104691, October 2025. ISSN 2044-6055. doi: 10.1136/ bmjopen-2025-104691
work page 2025
-
[12]
Hamilton, Kyungwan Hong, Hannah Fraser, Anisa Rowhani-Farid, Fiona Fidler, and Matthew J
Daniel G. Hamilton, Kyungwan Hong, Hannah Fraser, Anisa Rowhani-Farid, Fiona Fidler, and Matthew J. Page. Prevalence and predictors of data and code sharing in the medical and health sciences: Systematic review with meta-analysis of individual participant data.BMJ, 382:e075767, July 2023. ISSN 1756-1833. doi: 10.1136/ bmj-2023-075767
work page 2023
-
[13]
Analytical code sharing practices in biomedical research
Nitesh Kumar Sharma, Ram Ayyala, Dhrithi Deshpande, Yesha Patel, Viorel Munteanu, Dumitru Ciorba, Viorel Bostan, Andrada Fiscutean, Mohammad Vahed, Aditya Sarkar, Ruiwei Guo, Andrew Moore, Nicholas Darci-Maher, Nicole Nogoy, Malak Abedalthagafi, and Serghei Mangul. Analytical code sharing practices in biomedical research. PeerJ Computer Science, 10:e2066,...
-
[14]
Sheeba Samuel and Daniel Mietchen. Computational reproducibility of Jupyter notebooks from biomedical publications.GigaScience, 13:giad113, January 2024. ISSN 2047-217X. doi: 10.1093/gigascience/giad113. 11
-
[15]
Reproducibility of the Methods in Medical Imaging with Deep Learning
Attila Simkó, Anders Garpebring, Joakim Jonsson, Tufve Nyholm, and Tommy Löfstedt. Reproducibility of the Methods in Medical Imaging with Deep Learning. InMedical Imaging with Deep Learning, pages 95–106. PMLR, January 2024
work page 2024
-
[16]
Sophia J. Wagner, Christian Matek, Sayedali Shetab Boushehri, Melanie Boxberg, Lorenz Lamm, Ario Sadafi, Dominik J.E. Winter, Carsten Marr, and Tingying Peng. Built to Last? Reproducibility and Reusability of Deep Learning Algorithms in Computational Pathology.Modern Pathology, 37(1):100350, January 2024. ISSN 08933952. doi: 10.1016/j.modpat.2023.100350
-
[17]
Chen, Anton Alyakin, Andreas Seas, Eunice Yang, Joanne J
Sully F. Chen, Anton Alyakin, Andreas Seas, Eunice Yang, Joanne J. Choi, Jin Vivian Lee, Amelia L. Chen, Pranav I. Warman, Rochelle T. Bitolas, Robert J. Steele, Daniel A. Alber, and Eric K. Oermann. LLM-assisted systematic review of large language models in clinical medicine.Nature Medicine, pages 1–8, March 2026. ISSN 1546-170X. doi: 10.1038/s41591-026-04229-5
-
[18]
Fernando M. Delgado-Chaves, Matthew J. Jennings, Antonio Atalaia, Justus Wolff, Rita Horvath, Zeinab M. Mamdouh, Jan Baumbach, and Linda Baumbach. Transforming literature screening: The emerging role of large language models in systematic reviews.Proceedings of the National Academy of Sciences of the United States of America, 122(2):e2411962122, January 2...
-
[19]
Gao, Leo Anthony Celi, Charlotta Lindvall, Hyeonhoon Lee, Hyung- Chul Lee, Karel G
Tom Pollard, Thomas Sounack, Catherine A. Gao, Leo Anthony Celi, Charlotta Lindvall, Hyeonhoon Lee, Hyung- Chul Lee, Karel G. M. Moons, and Gary S. Collins. Protocol for development of a reporting guideline (TRIPOD-Code) for code repositories associated with diagnostic and prognostic prediction model studies.Diagnostic and Prognostic Research, 10(1):4, Fe...
-
[20]
Tricco, Erin Lillie, Wasifa Zarin, Kelly K
Andrea C. Tricco, Erin Lillie, Wasifa Zarin, Kelly K. O’Brien, Heather Colquhoun, Danielle Levac, David Moher, Micah D.J. Peters, Tanya Horsley, Laura Weeks, Susanne Hempel, Elie A. Akl, Christine Chang, Jessie McGowan, Lesley Stewart, Lisa Hartling, Adrian Aldcroft, Michael G. Wilson, Chantelle Garritty, Simon Lewin, Christina M. Godfrey, Marilyn T. Macd...
-
[21]
Gao, Lasai Barreñada, Hyeonhoon Lee, Hyung-Chul Lee, Leo A
Thomas Sounack, Raffaele Giancotti, Catherine A. Gao, Lasai Barreñada, Hyeonhoon Lee, Hyung-Chul Lee, Leo A. Celi, Karel G.M. Moons, Gary S. Collins, Charlotta Lindvall, and Tom Pollard. Code sharing in prediction model research: A protocol for a scoping review. February 2026. doi: 10.37766/inplasy2026.2.0080
-
[22]
PMC Open Access Subset, 2003
work page 2003
-
[23]
https://www.nature.com/npjcompumats/editorial-policies/reporting-standards
Reporting standards and availability of data, materials, code and protocols | npj Computational Materials. https://www.nature.com/npjcompumats/editorial-policies/reporting-standards. ISSN 2057-3960
work page 2057
-
[24]
https://journals.plos.org/digitalhealth/s/materials-software- and-code-sharing,
Materials, software and code sharing | plos digital health. https://journals.plos.org/digitalhealth/s/materials-software- and-code-sharing,
-
[25]
https://journals.plos.org/plosmedicine/s/materials-software- and-code-sharing,
Materials, software and code sharing | plos medicine. https://journals.plos.org/plosmedicine/s/materials-software- and-code-sharing,
-
[26]
Fowler, Christopher Limb, Katharine Whitehurst, Robert Coe, Harkiran Sagoo, Daniyal J
Riaz Ahmed Agha, Alexander J. Fowler, Christopher Limb, Katharine Whitehurst, Robert Coe, Harkiran Sagoo, Daniyal J. Jafree, Charmilie Chandrakumar, and Buket Gundogan. Impact of the mandatory implementation of reporting guidelines on reporting quality in a surgical journal: A before and after study.International Journal of Surgery, 30:169–172, June 2016....
-
[27]
Nikola Panic, Emanuele Leoncini, Giulio de Belvis, Walter Ricciardi, and Stefania Boccia. Evaluation of the endorsement of the preferred reporting items for systematic reviews and meta-analysis (PRISMA) statement on the quality of published systematic review and meta-analyses.PloS One, 8(12):e83138, 2013. ISSN 1932-6203. doi: 10.1371/journal.pone.0083138
-
[28]
Victoria Leclercq, Charlotte Beaudart, Sara Ajamieh, Véronique Rabenda, Ezio Tirelli, and Olivier Bruyère. Meta- analyses indexed in PsycINFO had a better completeness of reporting when they mention PRISMA.Journal of Clinical Epidemiology, 115:46–54, November 2019. ISSN 1878-5921. doi: 10.1016/j.jclinepi.2019.06.014
-
[29]
Ho Jung Choi, Yeong Eun Kim, Jung-Man Namgoong, Inki Kim, Jun Sung Park, Woo Im Baek, Byong Sop Lee, Hee Mang Yoon, Young Ah Cho, Jin Seong Lee, Jung Ok Shim, Seak Hee Oh, Jin Soo Moon, Jae Sung Ko, Dae Yeon Kim, and Kyung Mo Kim. Development and Validation of a Machine Learning–Based Prediction Model for Detection of Biliary Atresia.Gastro Hep Advances, ...
-
[30]
Yihao Yu, Yuqi Yang, Qian Li, Jing Yuan, and Yan Zha. Predicting metabolic dysfunction associated steatotic liver disease using explainable machine learning methods.Scientific Reports, 15:12382, April 2025. ISSN 2045-2322. doi: 10.1038/s41598-025-96478-6. 12
-
[31]
Tran Quoc Bao Tran, Stefanie Lip, Clea du Toit, Tejas Kumar Kalaria, Ravi K. Bhaskar, Alison Q. O’Neil, Beata Graff, Michał Hoffmann, Anna Szyndler, Katarzyna Polonis, Jacek Wolf, Sandeep Reddy, Krzysztof Narkiewicz, Indranil Dasgupta, Anna F. Dominiczak, Shyam Visweswaran, Linsay McCallum, and Sandosh Padmanabhan. Assessing Machine Learning for Diagnosti...
-
[32]
Maarten C. Ottenhoff, Lucas A. Ramos, Wouter Potters, Marcus L. F. Janssen, Deborah Hubers, Shi Hu, Egill A. Fridgeirsson, Dan Piña-Fuentes, Rajat Thomas, Iwan C. C. van der Horst, Christian Herff, Pieter Kubben, Paul W. G. Elbers, Henk A. Marquering, Max Welling, Suat Simsek, Martijn D. de Kruif, Tom Dormans, Lucas M. Fleuren, Michiel Schinkel, Peter G. ...
-
[33]
Donahue, Deborah Blacker, Joseph P
Mamoon Habib, Rafaella Cazé de Medeiros, Syed Muhammad Ahsan, Aidan McDonald Wojciechowski, Maria A. Donahue, Deborah Blacker, Joseph P. Newhouse, Lee H. Schwamm, M. Brandon Westover, and Lidia Mvr Moura. A Claims-Based Machine Learning Classifier of Modified Rankin Scale in Acute Ischemic Stroke.medRxiv: The Preprint Server for Health Sciences, page 2025...
-
[34]
Rui Zhang, Fang Long, Jingyi Wu, and Ruoming Tan. Distinct immunological signatures define three sepsis recovery trajectories: A multi-cohort machine learning study.Frontiers in Medicine, 12:1575237, April 2025. ISSN 2296-858X. doi: 10.3389/fmed.2025.1575237
-
[35]
Anand S. Pandit, Arif H. B. Jalal, Ahmed K. Toma, and Parashkev Nachev. Analyzing historical and future acute neurosurgical demand using an AI-enabled predictive dashboard.Scientific Reports, 12(1):7603, May 2022. ISSN 2045-2322. doi: 10.1038/s41598-022-11607-9. 13 A PRISMA Checklist 14 15 B Article screening and metadata extraction prompt and output sche...
-
[36]
, 22d e s c r i p t i o n =" Whether t h e p a p e r meets t h e i n c l u s i o n c r i t e r i a f o r a m u l t i v a r i a b l e p r e d i c t i o n model s t u d y . " , 23) 24r e a s o n : s t r = F i e l d (
-
[37]
, 26d e s c r i p t i o n =" B r i e f e x p l a n a t i o n j u s t i f y i n g why t h e p a p e r does o r does n o t meet t h e c r i t e r i a . " , 27) 28c o u n t r y _ f i r s t _ a u t h o r _ i n s t i t u t i o n : s t r = F i e l d ( 16
-
[38]
32" R e t u r n ' n o t r e p o r t e d ' i f t h e i n f o r m a t i o n i s n o t found
, 30d e s c r i p t i o n =( 31" The c o u n t r y o f o r i g i n b a s e d on t h e a f f i l i a t i o n o f t h e f i r s t a u t h o r . Use t h e ISO 3166 s t a n d a r d name o f t h e c o u n t r y i n your r e s p o n s e . " 32" R e t u r n ' n o t r e p o r t e d ' i f t h e i n f o r m a t i o n i s n o t found "
-
[39]
, 34) 35r e p o _ u r l : O p t i o n a l [ s t r ] = F i e l d (
-
[40]
URL t o t h e p a p e r ' s code r e p o s i t o r y i f t h e p a p e r i s a match
, 37d e s c r i p t i o n =( 38"URL t o t h e p a p e r ' s code r e p o s i t o r y i f t h e p a p e r i s a match . " 39" Use ' Appendix ' i f code i s e x p l i c i t l y s t a t e d t o be i n s u p p l e m e n t a r y m a t e r i a l s "
-
[41]
, 41) 42c o d e _ s t a t e m e n t _ l o c a t i o n s : O p t i o n a l [ L i s t [ C o d e S t a t e m e n t L o c a t i o n ] ] = F i e l d (
-
[42]
, 44d e s c r i p t i o n =( 45" A l l l o c a t i o n s i n t h e p a p e r where a code a v a i l a b i l i t y s t a t e m e n t a p p e a r s i f a r e p o _ u r l i s found . " 46" Use [ ' o t h e r ' ] i f t h e code a v a i l a b i l i t y s t a t e m e n t l o c a t i o n does n o t f i t t h e a v a i l a b l e c a t e g o r i e s "
-
[43]
, 48) 49c o d e _ s t a t e m e n t _ s e n t e n c e : O p t i o n a l [ s t r ] = F i e l d (
-
[44]
, 51d e s c r i p t i o n =" I f r e p o _ u r l i s found , t h e s e n t e n c e i n t r o d u c i n g t h e r e p o s i t o r y u r l ( w i t h o u t t h e u r l i t s e l f ) , eg . ' The code can be found h e r e : ' " , 52) 17 C Code repository characterization prompt and output schema C.1 Prompt You will be provided the tree of a repository and its...
-
[45]
, 5d e s c r i p t i o n =( 6" Whether t h e r e p o s i t o r y i s empty . C o n s i d e r i t empty i f i t c o n t a i n s no f i l e s , " 7" o n l y empty f i l e s , o r o n l y a README f i l e . "
-
[46]
, 9) 10 11# README 12c o n t a i n s _ r e a d m e : b o o l = F i e l d (
-
[47]
, 14d e s c r i p t i o n =( 15" Whether t h e r e p o s i t o r y c o n t a i n s u s a g e / s t r u c t u r e i n s t r u c t i o n s ( e . g . , README . md /README. t x t /README) . "
-
[48]
, 17) 18r e a d m e _ p u r p o s e _ a n d _ o u t p u t s : O p t i o n a l [ b o o l ] = F i e l d (
-
[49]
, 20d e s c r i p t i o n =( 18 21" I f c o n t a i n s _ r e a d m e i s True , w h e t h e r t h e README p r o v i d e s an o v e r v i e w o f t h e r e p o s i t o r y p u r p o s e " 22" and e x p e c t e d o u t p u t s . Do n o t r e t u r n a n y t h i n g i f c o n t a i n s _ r e a d m e i s F a l s e . "
-
[50]
, 24) 25 26# R e q u i r e m e n t s 27c o n t a i n s _ r e q u i r e m e n t s : b o o l = F i e l d (
-
[51]
, 29d e s c r i p t i o n =( 30" Whether t h e r e p o s i t o r y s p e c i f i e s s o f t w a r e d e p e n d e n c i e s e i t h e r i n a d e d i c a t e d f i l e " 31" ( e . g . , r e q u i r e m e n t s . t x t , e n v i r o n m e n t . yml , p y p r o j e c t . toml ) o r i n t h e README . "
-
[52]
, 33) 34r e q u i r e m e n t s _ d e p e n d e n c y _ v e r s i o n s : O p t i o n a l [ b o o l ] = F i e l d (
-
[53]
, 36d e s c r i p t i o n =( 37" I f c o n t a i n s _ r e q u i r e m e n t s i s True , w h e t h e r d e p e n d e n c i e s i n c l u d e v e r s i o n c o n s t r a i n t s " 38" ( e . g . , package = = 1 . 2 . 3 , >= , ~=) . Do n o t r e t u r n a n y t h i n g i f c o n t a i n s _ r e q u i r e m e n t s i s F a l s e . "
-
[54]
, 40) 41 42# L i c e n s e 43c o n t a i n s _ l i c e n s e : b o o l = F i e l d (
-
[55]
, 45d e s c r i p t i o n =" Whether t h e r e p o s i t o r y i n c l u d e s a l i c e n s e f i l e d e s c r i b i n g u s a g e p e r m i s s i o n s . " , 46) 47 48# Documentation 49s u f f i c i e n t _ c o d e _ d o c u m e n t a t i o n : b o o l = F i e l d (
-
[56]
, 51d e s c r i p t i o n =( 52" Whether t h e code c o n t a i n s s u f f i c i e n t i n l i n e comments / d o c s t r i n g s e x p l a i n i n g key components " 53" so a u s e r can u n d e r s t a n d t h e l o g i c . "
-
[57]
, 55) 56 57# M o d u l a r i t y 58i s _ m o d u l a r _ a n d _ s t r u c t u r e d : b o o l = F i e l d (
-
[58]
, 60d e s c r i p t i o n =( 61" Whether code i s o r g a n i z e d i n t o modular , r e u s a b l e components ( f u n c t i o n s / c l a s s e s / modules ) " 62" r a t h e r t h a n a few l o n g s c r i p t s . "
-
[59]
, 64) 65 66# T e s t i n g 67i m p l e m e n t s _ t e s t s : b o o l = F i e l d (
-
[60]
, 69d e s c r i p t i o n =( 70" Whether t h e r e p o s i t o r y i n c l u d e s t e s t s ( u n i t / f u n c t i o n a l ) , t e s t f i l e s / s c r i p t s , o r m e a n i n g f u l " 71" a s s e r t i o n s v e r i f y i n g e x p e c t e d b e h a v i o r . "
-
[61]
, 73) 74 75# R e p r o d u c i b i l i t y 76f i x e s _ s e e d _ i f _ s t o c h a s t i c : O p t i o n a l [ b o o l ] = F i e l d (
-
[62]
, 78d e s c r i p t i o n =( 79" I f t h e r e p o s i t o r y u s e s s t o c h a s t i c p r o c e s s e s ( e . g . , random sampling , ML t r a i n i n g ) , w h e t h e r i t " 80" s e t s f i x e d random s e e d s f o r r e p r o d u c i b i l i t y . Do n o t r e t u r n a n y t h i n g i f s t o c h a s t i c i t y i s n o t a p p l i c a b l e . " 19
-
[63]
, 82) 83l i s t s _ h a r d w a r e _ r e q u i r e m e n t s : b o o l = F i e l d (
-
[64]
, 85d e s c r i p t i o n =" Whether h a r d w a r e r e q u i r e m e n t s ( e . g . , GPU/ CPU /R A M) a r e s t a t e d anywhere i n t h e r e p o s i t o r y . " , 86) 87 88# C i t a t i o n and L i n k i n g 89c o n t a i n s _ l i n k _ t o _ p a p e r : b o o l = F i e l d (
-
[65]
, 91d e s c r i p t i o n =" Whether t h e r e p o s i t o r y i n c l u d e s a l i n k (URL/ DOI / a r X i v / PubMed ) t o t h e a s s o c i a t e d p a p e r . " , 92) 93c o n t a i n s _ c i t a t i o n : b o o l = F i e l d (
-
[66]
, 95d e s c r i p t i o n =( 96" Whether t h e r e p o s i t o r y p r o v i d e s a c i t a t i o n f o r t h e p a p e r ( e . g . , p l a i n t e x t c i t a t i o n , BibTeX e n t r y , " 97"CITATION . c f f , o r a LaTeX c i t a t i o n key ) . "
-
[67]
, 99) 100 101# Data 102i n c l u d e s _ d a t a _ o r _ s a m p l e : b o o l = F i e l d (
-
[68]
, 104d e s c r i p t i o n =( 105" Whether t h e r e p o s i t o r y i n c l u d e s t h e o r i g i n a l d a t a s e t o r a sample / demo d a t a s e t s u f f i c i e n t t o run " 106" o r d e m o n s t r a t e t h e code . "
-
[69]
, 108) 109 110# Free − t e x t n o t e s 111c o m m e n t s _ a n d _ e x p l a n a t i o n s : O p t i o n a l [ s t r ] = F i e l d (
-
[70]
, 113d e s c r i p t i o n =( 114" A d d i t i o n a l comments a b o u t r e p o s i t o r y q u a l i t y , s t r e n g t h s / weaknesses , and n o t a b l e a s p e c t s n o t f u l l y " 115" c a p t u r e d by t h e b o o l e a n f i e l d s . "
-
[71]
, 117) 118 119# Languages 120c o d i n g _ l a n g u a g e s : O p t i o n a l [ L i s t [ s t r ] ] = F i e l d (
-
[72]
, 122d e s c r i p t i o n =( 123" I f t h e r e p o s i t o r y c o n t a i n s code , r e t u r n a l l programming l a n g u a g e s used . I n a l i s t " 124" For example , [ ' python ' , ' r ' , ' s q l ' ] . " 125"Do n o t r e t u r n a n y t h i n g i f t h e r e i s no code i n t h e r e p o s i t o r y . "
-
[73]
, 127) D Annotation codebook Article annotation guidelines Background This document provides detailed instructions for annotators involved in the TRIPOD-Code project. The goal of this annotation task is to evaluate the availability and quality of code repositories linked to studies that develop, update, or validate multivariable prediction models. Annotat...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.