Advancing Trustworthy AI in Healthcare Through Meta-Research: Results of an Interdisciplinary Design-Thinking Workshop
Pith reviewed 2026-05-15 17:51 UTC · model grok-4.3
The pith
Meta-research can concretely address challenges in translating AI ethics principles into healthcare practice.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Our results demonstrate how meta-research can offer concrete contributions to address pressing challenges of TAI in healthcare. These challenges include the dynamic and complex nature of TAI ethical requirements and principles, common terminology and understanding of TAI, ensuring robustness, replicability, and reproducibility, choosing adequate evaluation metrics, lack of transparency, advancing preclinical biomedical research, and validation in real-world clinical environments.
What carries the argument
The design-thinking-informed co-creation workshop followed by inductive descriptive analysis of outputs, which synthesized interconnections between meta-research and TAI challenges.
If this is right
- Meta-research methods can help establish shared terminology and understanding of trustworthy AI requirements.
- Meta-research approaches can improve robustness, replicability, and reproducibility of AI systems in healthcare settings.
- Meta-research can guide the selection of appropriate evaluation metrics for AI performance and ethics.
- Transparency in AI model development and reporting can be advanced by drawing on meta-research practices.
- Meta-research strategies can support better validation of AI tools in both preclinical studies and real-world clinical environments.
Where Pith is reading between the lines
- The roadmap could be tested by applying specific meta-research interventions to ongoing healthcare AI projects and measuring improvements in robustness or transparency.
- Similar workshops in other high-stakes AI domains such as autonomous systems or finance might reveal parallel opportunities for meta-research contributions.
- Future efforts could expand participant diversity to check whether additional challenges emerge beyond those identified here.
Load-bearing premise
The particular group of interdisciplinary participants and the facilitation style of the workshop captured the most important challenges and produced actionable solutions without significant bias.
What would settle it
A subsequent workshop with a substantially different participant group or a systematic literature review that finds no usable meta-research methods for the listed TAI challenges would undermine the claim.
read the original abstract
Meta-research and Trustworthy AI (TAI) share common goals, namely improving evidence, robustness, and transparency, yet there is very little interplay between the two fields. To investigate the potential benefits of closer collaboration between the domains of TAI in healthcare and meta-research, we convened an interdisciplinary workshop funded by the Volkswagen Foundation in February 2025. The workshop aimed to collaboratively examine key challenges in translating AI ethics principles into practice and to identify potential solutions informed by meta-research approaches. A Design Thinking-informed co-creation approach was followed by an inductive descriptive analysis of the outputs. Our results demonstrate how meta-research can offer concrete contributions to address pressing challenges of TAI in healthcare. These challenges include the dynamic and complex nature of TAI ethical requirements and principles, common terminology and understanding of TAI, ensuring robustness, replicability, and reproducibility, choosing adequate evaluation metrics, lack of transparency, advancing preclinical biomedical research, and validation in real-world clinical environments. We present a catalog of ideas and a roadmap for future research, which synthesize existing interconnections and identify concrete next steps and open research gaps, thereby serving as a foundation for future interdisciplinary efforts.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reports results from an interdisciplinary design-thinking workshop (February 2025) that brought together experts to examine challenges in translating AI ethics principles into practice for Trustworthy AI (TAI) in healthcare. Using a co-creation approach followed by inductive descriptive analysis, the authors identify seven core challenges (dynamic ethics, terminology, robustness/replicability, metrics, transparency, preclinical research, and real-world validation) and synthesize a catalog of ideas plus a roadmap for future meta-research contributions to TAI.
Significance. If the workshop outputs prove representative and actionable, the catalog and roadmap could usefully seed interdisciplinary projects that apply meta-research tools (e.g., reproducibility audits, standardized reporting) to TAI systems in healthcare. The work's primary value lies in its explicit bridging of two communities that share overlapping goals but currently interact little; however, because the manuscript presents only proposed ideas rather than executed meta-research studies or measured outcomes, its immediate empirical significance remains prospective.
major comments (2)
- [Abstract] Abstract: The sentence 'Our results demonstrate how meta-research can offer concrete contributions to address pressing challenges of TAI in healthcare' is not supported by the reported outputs. The workshop produced an inductive list of challenges plus a catalog of ideas and a roadmap; no meta-research method was applied to an existing TAI system, and no before/after metrics on reproducibility, robustness, or clinical impact are provided.
- [Workshop and Analysis] Workshop and Analysis sections: The manuscript supplies insufficient detail on participant selection criteria, exact design-thinking process steps, how outputs were validated or triangulated, and steps taken to reduce bias arising from group composition or facilitation style. These omissions make it difficult to assess whether the catalog reliably captures field-wide priorities.
minor comments (1)
- [Results] Ensure that all challenges listed in the abstract are explicitly cross-referenced to the corresponding entries in the catalog and roadmap so readers can trace each idea back to a specific workshop output.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which highlight opportunities to strengthen the clarity and methodological transparency of our manuscript. We address each major comment below and will revise the paper accordingly to better reflect the prospective nature of the workshop outputs.
read point-by-point responses
-
Referee: [Abstract] The sentence 'Our results demonstrate how meta-research can offer concrete contributions to address pressing challenges of TAI in healthcare' is not supported by the reported outputs. The workshop produced an inductive list of challenges plus a catalog of ideas and a roadmap; no meta-research method was applied to an existing TAI system, and no before/after metrics on reproducibility, robustness, or clinical impact are provided.
Authors: We agree that the phrasing 'demonstrate' implies a stronger empirical claim than the workshop outputs support. The results consist of an inductively derived catalog of ideas and a roadmap rather than direct application of meta-research methods or measured outcomes. We will revise the abstract to state that the results 'identify potential concrete contributions' meta-research can offer, aligning with the referee's assessment of the work's prospective significance. revision: yes
-
Referee: [Workshop and Analysis] Workshop and Analysis sections: The manuscript supplies insufficient detail on participant selection criteria, exact design-thinking process steps, how outputs were validated or triangulated, and steps taken to reduce bias arising from group composition or facilitation style. These omissions make it difficult to assess whether the catalog reliably captures field-wide priorities.
Authors: We acknowledge the need for expanded methodological detail. In the revised manuscript we will add: explicit participant selection criteria (expertise in AI ethics, meta-research, healthcare informatics, and interdisciplinary balance); a step-by-step account of the design-thinking activities and timeline; description of the inductive analysis, including iterative synthesis and team cross-validation of themes; and bias-mitigation steps such as structured facilitation protocols and post-workshop review. These additions will allow readers to better evaluate the reliability of the identified priorities. revision: yes
Circularity Check
No circularity: inductive workshop report with direct outputs
full rationale
The paper reports results from an interdisciplinary design-thinking workshop followed by inductive descriptive analysis. The central claim rests on the workshop outputs themselves (identified challenges and catalog of ideas), which are presented as the direct product of the co-creation process rather than derived from any fitted parameters, self-referential equations, or load-bearing self-citations. No mathematical derivations, predictions, or uniqueness theorems appear. The analysis is self-contained against the external workshop event and participant contributions, with no reduction of results to the paper's own inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Design-thinking-informed co-creation followed by inductive analysis produces reliable identification of challenges and solutions in interdisciplinary settings.
Reference graph
Works this paper leans on
-
[1]
Meta-research: Evaluation and Improvement of Research Methods and Practices
Ioannidis JPA, Fanelli D, Dunne DD, Goodman SN. Meta-research: Evaluation and Improvement of Research Methods and Practices. PLoS Biol. 2015 Oct 2;13(10):e1002264. doi:10.1371/journal.pbio.1002264
-
[2]
Primary, Secondary, and Meta-Analysis of Research
Glass GV. Primary, Secondary, and Meta-Analysis of Research. Educational Researcher. 1976 Nov;5(10):3–8. doi:10.3102/0013189X005010003
-
[3]
Handbook of meta-analysis in ecology and evolution
Koricheva J, Gurevitch J, Mengersen KL. Handbook of meta-analysis in ecology and evolution. Princeton: Princeton university press; 2013
work page 2013
-
[4]
The natural selection of bad science
Smaldino PE, McElreath R. The natural selection of bad science. R Soc open sci. 2016 Sep;3(9):160384. doi:10.1098/rsos.160384
-
[5]
The natural selection of good science
Stewart AJ, Plotkin JB. The natural selection of good science. Nature Human Behaviour. 2021 Nov 1;5(11):1510–8. doi:10.1038/s41562-021-01111-x
-
[6]
and Vianello, Michelangelo and Hasselman, Fred and Adams, Byron G
Klein RA, Vianello M, Hasselman F, Adams BG, Adams RB, Alper S, et al. Many Labs 2: Investigating Variation in Replicability Across Samples and Settings. Advances in Methods and Practices in Psychological Science. 2018 Dec;1(4):443–90. doi:10.1177/2515245918810225
-
[7]
Estimating the reproducibility of social learning research published between 1955 and 2018
Minocher R, Atmaca S, Bavero C, McElreath R, Beheim B. Estimating the reproducibility of social learning research published between 1955 and 2018. R Soc open sci. 2021 Sep;8(9):210450. doi:10.1098/rsos.210450
-
[8]
Hardwicke TE, Bohn M, MacDonald K, Hembacher E, Nuijten MB, Peloquin N, et al. Analytic reproducibility in articles receiving open data badges at the journal Psychological Science: an observational study
-
[9]
Meta-analysis and the science of research synthesis
Gurevitch J, Koricheva J, Nakagawa S, Stewart G. Meta-analysis and the science of research synthesis. Nature. 2018 Mar;555(7695):175–82. doi:10.1038/nature25753
-
[10]
The first six years of meta-research at PLOS Biology
Roberts RG, on behalf of the PLOS Biology Staff Editors. The first six years of meta-research at PLOS Biology. PLoS Biol. 2022 Jan 31;20(1):e3001553. doi:10.1371/journal.pbio.3001553
-
[11]
Rethlefsen ML, Brigham TJ, Price C, Moher D, Bouter LM, Kirkham JJ, et al. Systematic review search strategies are poorly reported and not reproducible: a cross-sectional metaresearch study. Journal of Clinical Epidemiology. 2024 Feb;166:111229. doi:10.1016/j.jclinepi.2023.111229
-
[12]
Reproducibility of meta-analytic results in systematic reviews of interventions: meta-research study
Nguyen PY, McKenzie JE, Alqaidoom Z, Hamilton DG, Moher D, Page MJ. Reproducibility of meta-analytic results in systematic reviews of interventions: meta-research study. bmjmed. 2025 Nov;4(1):e002024. doi:10.1136/bmjmed-2025-002024
-
[13]
BMJ372(71), 1–9 (2021) https://doi.org/10.1136/bmj.n71
Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021 Mar 29;n71. doi:10.1136/bmj.n71 21
-
[14]
Kahrass H, Borry P, Gastmans C, Ives J, Van Der Graaf R, Strech D, et al. PRISMA-Ethics – Reporting Guideline for Systematic Reviews on Ethics Literature: development, explanations and examples [preprint] [Internet]. Open Science Framework; 2021 Sep [cited 2023 May 19]. Report. Available from: https://osf.io/g5kfb doi:10.31219/osf.io/g5kfb
-
[15]
Hardwicke TE, Mathur MB, MacDonald K, Nilsonne G, Banks C, Kidwell MC, et al. Data availability, reusability, and analytic reproducibility: evaluating the impact of a mandatory open data policy at the journal Cognition
-
[16]
Kohrs FE, Auer S, Bannach-Brown A, Fiedler S, Haven TL, Heise V, et al. Eleven strategies for making reproducible research and open science training the norm at research institutions. eLife. 2023 Nov 23;12:e89736. doi:10.7554/eLife.89736
-
[17]
FUTURIUM - European Commission [Text] [Internet]
Smuha N. FUTURIUM - European Commission [Text] [Internet]. 2018 [cited 2023 Aug 9]. AI HLEG - steering group of the European AI Alliance. Available from: https://ec.europa.eu/futurium/en/european-ai-alliance/ai-hleg-steering-group-european-ai-alliance
work page 2018
-
[18]
The global landscape of AI ethics guidelines,
Jobin A, Ienca M, Vayena E. The global landscape of AI ethics guidelines. Nat Mach Intell. 2019 Sep 2;1(9):389–99. doi:10.1038/s42256-019-0088-2
-
[19]
(De)troubling transparency: artificial intelligence (AI) for clinical applications
Winter PD, Carusi A. (De)troubling transparency: artificial intelligence (AI) for clinical applications. Med Humanities. 2023 Mar;49(1):17–26. doi:10.1136/medhum-2021-012318
-
[20]
Towards Transparency by Design for Artificial Intelligence
Felzmann H, Fosch-Villaronga E, Lutz C, Tamò-Larrieux A. Towards Transparency by Design for Artificial Intelligence. Sci Eng Ethics. 2020 Dec;26(6):3333–61. doi:10.1007/s11948-020- 00276-4
-
[21]
Piloting a Survey-Based Assessment of Transparency and Trustworthiness with Three Medical AI Tools
Fehr J, Jaramillo-Gutierrez G, Oala L, Gröschel MI, Bierwirth M, Balachandran P, et al. Piloting a Survey-Based Assessment of Transparency and Trustworthiness with Three Medical AI Tools. Healthcare (2227-9032). 2022 Oct;10(10):N.PAG-N.PAG. doi:10.3390/healthcare10101923
-
[22]
A systematic review of robustness in deep learning for computer vision: Mind the gap?, 2022
Drenkow N, Sani N, Shpitser I, Unberath M. A Systematic Review of Robustness in Deep Learning for Computer Vision: Mind the gap? [Internet]. arXiv; 2022 [cited 2024 Sep 10]. Available from: http://arxiv.org/abs/2112.00639
-
[23]
Beyond generalization: a theory of robustness in machine learning
Freiesleben T, Grote T. Beyond generalization: a theory of robustness in machine learning. Synthese. 2023 Sep 27;202(4):109. doi:10.1007/s11229-023-04334-9
-
[24]
Operationalising AI ethics: barriers, enablers and next steps
Morley J, Kinsey L, Elhalal A, Garcia F, Ziosi M, Floridi L. Operationalising AI ethics: barriers, enablers and next steps. AI & Soc. 2023 Feb;38(1):411–23. doi:10.1007/s00146-021-01308-8
-
[25]
AI Ethics – Too Principled to Fail? SSRN Journal
Mittelstadt B. AI Ethics – Too Principled to Fail? SSRN Journal. 2019. doi:10.2139/ssrn.3391293
-
[26]
Navigating the European Union Artificial Intelligence Act for Healthcare
Busch F, Kather JN, Johner C, Moser M, Truhn D, Adams LC, et al. Navigating the European Union Artificial Intelligence Act for Healthcare. npj Digit Med. 2024 Aug 12;7(1):210. doi:10.1038/s41746-024-01213-6
-
[27]
Trustworthy artificial intelligence
Thiebes S, Lins S, Sunyaev A. Trustworthy artificial intelligence. Electron Markets. 2021 Jun;31(2):447–64. doi:10.1007/s12525-020-00441-4
-
[28]
Trust and trustworthiness in AI ethics
Reinhardt K. Trust and trustworthiness in AI ethics. AI Ethics. 2023 Aug;3(3):735–44. doi:10.1007/s43681-022-00200-5
-
[29]
Trustworthy AI: From Principles to Practices
Li B, Qi P, Liu B, Di S, Liu J, Pei J, et al. Trustworthy AI: From Principles to Practices. ACM Comput Surv. 2023 Sep 30;55(9):1–46. doi:10.1145/3555803 22
-
[30]
Davis J, Docherty CA, Dowling K. Design Thinking and Innovation: Synthesising Concepts of Knowledge Co-creation in Spaces of Professional Development. The Design Journal. 2016 Jan 2;19(1):117–39. doi:10.1080/14606925.2016.1109205
-
[31]
Rethinking Design Thinking: Part I
Kimbell L. Rethinking Design Thinking: Part I. Design and Culture. 2011 Nov;3(3):285–306. doi:10.2752/175470811X13071166525216
-
[32]
Design Thinking Methods and Tools for Innovation
Chasanidou D, Gasparini AA, Lee E. Design Thinking Methods and Tools for Innovation. In: Marcus A, editor. Design, User Experience, and Usability: Design Discourse [Internet]. Cham: Springer International Publishing; 2015 [cited 2025 Dec 9]. p. 12–23. (Lecture Notes in Computer Science). Available from: http://link.springer.com/10.1007/978-3-319-20886-2_2...
-
[33]
Using Design Thinking to Write and Publish Novel Teaching Cases: Tips From Experienced Case Authors
Sheehan NT, Gujarathi MR, Jones JC, Phillips F. Using Design Thinking to Write and Publish Novel Teaching Cases: Tips From Experienced Case Authors. Journal of Management Education. 2018 Feb;42(1):135–60. doi:10.1177/1052562917741179
-
[34]
Higgins D, Madai VI. From Bit to Bedside: A Practical Framework for Artificial Intelligence Product Development in Healthcare. Advanced Intelligent Systems. 2020 Oct;2(10):2000052. doi:10.1002/aisy.202000052
-
[35]
Characteristics of Qualitative Descriptive Studies: A Systematic Review
Kim H, Sefcik JS, Bradway C. Characteristics of Qualitative Descriptive Studies: A Systematic Review. Research in Nursing & Health. 2017 Feb;40(1):23–42. doi:10.1002/nur.21768
-
[36]
Qualitative description – the poor cousin of health research? BMC Med Res Methodol
Neergaard MA, Olesen F, Andersen RS, Sondergaard J. Qualitative description – the poor cousin of health research? BMC Med Res Methodol. 2009 Dec;9(1):52. doi:10.1186/1471-2288-9-52
-
[37]
Introduction to the GRADE tool for rating certainty in evidence and recommendations
Prasad M. Introduction to the GRADE tool for rating certainty in evidence and recommendations. Clinical Epidemiology and Global Health. 2024 Jan;25:101484. doi:10.1016/j.cegh.2023.101484
-
[38]
Yu Y, Hu X, Rajaganapathy S, Feng J, Abdelhameed A, Li X, et al. Accelerating AI innovation in healthcare: real-world clinical research applications on the Mayo Clinic Platform. npj Health Syst. 2026 Feb 16;3(1):17. doi:10.1038/s44401-026-00068-1
-
[39]
Maier-Hein L, Reinke A, Godau P, Tizabi MD, Buettner F, Christodoulou E, et al. Metrics reloaded: recommendations for image analysis validation. Nat Methods. 2024 Feb;21(2):195–212. doi:10.1038/s41592-023-02151-z
-
[40]
TRIPOD+AI: an updated reporting guideline for clinical prediction models
Cohen JF, Bossuyt PMM. TRIPOD+AI: an updated reporting guideline for clinical prediction models. BMJ. 2024 Apr 16;q824. doi:10.1136/bmj.q824
-
[41]
The TRIPOD-LLM reporting guideline for studies using large language models
Gallifant J, Afshar M, Ameen S, Aphinyanaphongs Y, Chen S, Cacciamani G, et al. The TRIPOD-LLM reporting guideline for studies using large language models. Nat Med. 2025 Jan;31(1):60–9. doi:10.1038/s41591-024-03425-5
-
[42]
Hernandez-Boussard T, Bozkurt S, Ioannidis JPA, Shah NH. MINIMAR (MINimum Information for Medical AI Reporting): Developing reporting standards for artificial intelligence in health care. Journal of the American Medical Informatics Association. 2020 Dec 9;27(12):2011–5. doi:10.1093/jamia/ocaa088
-
[43]
Suleman MU, Mursaleen M, Khalil U, Saboor A, Bilal M, Khan SA, et al. Assessing the generalizability of artificial intelligence in radiology: a systematic review of performance across different clinical settings. Annals of Medicine & Surgery. 2025 Dec;87(12):8803–11. doi:10.1097/MS9.0000000000004166 23
-
[44]
Kenig N, Monton Echeverria J, Muntaner Vives A. Artificial Intelligence in Surgery: A Systematic Review of Use and Validation. JCM. 2024 Nov 24;13(23):7108. doi:10.3390/jcm13237108
-
[45]
Randomized Controlled Trials of Artificial Intelligence in Clinical Practice: Systematic Review
Lam TYT, Cheung MFK, Munro YL, Lim KM, Shung D, Sung JJY. Randomized Controlled Trials of Artificial Intelligence in Clinical Practice: Systematic Review. J Med Internet Res. 2022 Aug 25;24(8):e37188. doi:10.2196/37188
-
[46]
Siontis GCM, Sweda R, Noseworthy PA, Friedman PA, Siontis KC, Patel CJ. Development and validation pathways of artificial intelligence tools evaluated in randomised clinical trials. BMJ Health Care Inform. 2021 Dec;28(1):e100466. doi:10.1136/bmjhci-2021-100466
-
[47]
Show and tell: A critical review on robustness and uncertainty for a more responsible medical AI
Marconi L, Cabitza F. Show and tell: A critical review on robustness and uncertainty for a more responsible medical AI. International Journal of Medical Informatics. 2025 Oct;202:105970. doi:10.1016/j.ijmedinf.2025.105970
-
[48]
Randomized Controlled Trials Evaluating Artificial Intelligence in Cardiovascular Care
Hadida Barzilai D, Sudri K, Goshen G, Klang E, Zimlichman E, Barbash I, et al. Randomized Controlled Trials Evaluating Artificial Intelligence in Cardiovascular Care. JACC: Advances. 2025 Nov;4(11):102152. doi:10.1016/j.jacadv.2025.102152
-
[49]
Characterizing the Robustness of Science [Internet]
Soler L, Trizio E, Nickles T, Wimsatt W, editors. Characterizing the Robustness of Science [Internet]. Vol. 292. Dordrecht: Springer Netherlands; 2012 [cited 2023 Dec 15]. (Boston Studies in the Philosophy of Science). Available from: http://link.springer.com/10.1007/978-94-007-2759- 5 doi:10.1007/978-94-007-2759-5
-
[50]
Moving Toward Transparency of Clinical Trials
Zarin DA, Tse T. Moving Toward Transparency of Clinical Trials. Science. 2008 Mar 7;319(5868):1340–2. doi:10.1126/science.1153632
-
[51]
UNPUBLISHED: The fundamentals of AI ethics in Medical Imaging
Madai VI, Bürger V, Amman J. UNPUBLISHED: The fundamentals of AI ethics in Medical Imaging. In
-
[52]
Clinical Pharmacology: Special Safety Considerations in Drug Development and Pharmacovigilance
Atuah KN, Hughes D, Pirmohamed M. Clinical Pharmacology: Special Safety Considerations in Drug Development and Pharmacovigilance. Drug Safety. 2004;27(8):535–54. doi:10.2165/00002018-200427080-00006
-
[53]
Pharmacovigilance: methods, recent developments and future perspectives
Härmark L, Van Grootheest AC. Pharmacovigilance: methods, recent developments and future perspectives. Eur J Clin Pharmacol. 2008 Aug;64(8):743–52. doi:10.1007/s00228-008-0475-9
-
[54]
FDA pharmaceutical quality oversight
Yu LX, Woodcock J. FDA pharmaceutical quality oversight. International Journal of Pharmaceutics. 2015 Aug;491(1–2):2–7. doi:10.1016/j.ijpharm.2015.05.066
-
[55]
Millennial-Scale Ocean Climate Variability,
Wright B. Clinical Trial Phases. In: A Comprehensive and Practical Guide to Clinical Trials [Internet]. Elsevier; 2017 [cited 2025 Dec 9]. p. 11–5. Available from: https://linkinghub.elsevier.com/retrieve/pii/B978012804729300002X doi:10.1016/B978-0-12- 804729-3.00002-X 24 Appendix Number Gender (m/f/d) Position Field of Research 1 f Assistant professor Me...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.