Trends in AI and Human-AI Interaction in Clinical Trials -- A Hybrid Human-AI Exploration
Pith reviewed 2026-06-29 11:51 UTC · model grok-4.3
The pith
AI-related clinical trials have increased markedly over time, and hybrid human-AI screening of records appears viable.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that AI-related trials show a marked increase over time with recent growth in terms such as machine learning, deep learning, chatbots, GPTs, and large language models. China and the United States account for the largest shares, with notable recent rises in several other countries. In a random sample of 100 records, human and AI classifiers agreed well on studies not using AI substantively but showed lower agreement on classifying types of human-AI interaction, especially when descriptions were ambiguous. The results indicate that a hybrid human-AI workflow for screening trial records is potentially viable, though clearer reporting and more precise definitions of interac
What carries the argument
The hybrid workflow that pairs a frontier generative AI model with human review to screen and categorize records returned by an AI-focused search of the registry.
If this is right
- AI terminology appears in clinical trials at increasing rates, especially references to advanced techniques in recent years.
- China and the United States lead in the number of AI-related trials, with several other countries showing recent growth.
- Human and AI classifiers reach good agreement on identifying trials that do not use AI, but agreement drops when classifying the details of human-AI interaction.
- Clearer trial reporting and more exact definitions of human-AI interaction would make hybrid screening more reliable.
Where Pith is reading between the lines
- Future registry policies could require explicit fields for AI methods and interaction details to reduce ambiguity in trend analyses.
- The hybrid approach might extend to monitoring other emerging technologies in clinical research if agreement rates improve with refined categories.
- Accelerating AI adoption in trials could eventually influence regulatory expectations around documentation of technology use.
- Researchers planning new trials may benefit from anticipating that ambiguous descriptions limit the usefulness of automated or hybrid meta-studies.
Load-bearing premise
The selected AI search terms and the random sample of 100 records are enough to represent the true prevalence of AI use and the reliability of hybrid classification without systematic bias from registry descriptions.
What would settle it
A full manual audit of the retrieved records that finds many AI-using trials omitted from the search terms or that shows consistent disagreement between human and AI classifiers on interaction types across a much larger sample.
Figures
read the original abstract
This paper examines records retrieved from the ClinicalTrials.gov registry to characterize temporal trends in AI terminology and the geographical distribution of AI trials. The work also reports on an exploratory hybrid human-AI approach to analyzing human-AI interaction trends in registered clinical trials. The hybrid workflow comprised a frontier generative AI model (GPT-5.5) and human review to screen and categorize records returned by an AI-focused search. The findings indicate a marked increase in AI-related trials over time, with recent growth in references to machine learning, deep learning, chatbots, GPTs, and large language models. Geographically, China and the United States accounted for the largest numbers of AI-related trials, with notable recent increases in several other countries including Italy, France, Spain, the UK and Turkey (T\"urkiye). In a random sample of 100 records, human and AI classifiers showed good agreement in identifying studies not substantively using AI, but lower agreement in classifying human-AI interaction, particularly where health professional interaction was ambiguous or insufficiently described. Overall, the results suggest that hybrid human-AI screening of clinical trial records is potentially viable, but clearer trial reporting and more precise interaction definitions will benefit the process.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper analyzes ClinicalTrials.gov records retrieved via an AI-focused search to characterize temporal trends in AI terminology (e.g., marked increases in machine learning, deep learning, chatbots, GPTs, and LLMs) and geographic distributions (China and US leading, with recent growth in Italy, France, Spain, UK, and Turkey). It also reports an exploratory hybrid human-AI workflow (GPT-5.5 plus human review) applied to screen and categorize human-AI interactions, concluding from a random sample of 100 records that hybrid screening is potentially viable despite lower agreement on interaction categories.
Significance. If the trends hold, the work supplies useful observational counts on AI adoption in registered trials and flags practical issues in registry-based classification of human-AI roles. The hybrid workflow exploration is a modest methodological contribution that could motivate clearer trial reporting standards, though the study remains purely descriptive with no statistical modeling or predictions.
major comments (2)
- [Results section on hybrid classification] The central suggestion that hybrid human-AI screening is potentially viable rests on agreement observed in the random sample of 100 records (Results section on hybrid classification). The manuscript states only that agreement was 'good' for non-substantive AI use and 'lower' for interaction categories without reporting any quantitative metrics (Cohen's kappa, raw percent agreement, or disagreement analysis), leaving the viability inference unsupported by standard reliability statistics.
- [Methods section on search and sampling] The assumption that the AI-term search and 100-record sample suffice to characterize prevalence and interaction reliability (Methods and Results) is load-bearing for both the trend claims and the viability conclusion, yet the paper provides no details on query construction, inter-rater procedures, handling of ambiguous entries, or checks for registry-entry bias.
minor comments (2)
- [Abstract] Abstract contains the string 'T"urkiye' which is a LaTeX artifact and should be corrected to Türkiye for readability.
- [Discussion] The paper would benefit from explicit comparison of the observed growth rates to prior bibliometric or registry studies on AI in medicine to contextualize the 'marked increase' claim.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on our manuscript. We agree that the points raised identify areas where additional detail and quantitative support will improve clarity and rigor, and we will revise the manuscript to address them.
read point-by-point responses
-
Referee: [Results section on hybrid classification] The central suggestion that hybrid human-AI screening is potentially viable rests on agreement observed in the random sample of 100 records (Results section on hybrid classification). The manuscript states only that agreement was 'good' for non-substantive AI use and 'lower' for interaction categories without reporting any quantitative metrics (Cohen's kappa, raw percent agreement, or disagreement analysis), leaving the viability inference unsupported by standard reliability statistics.
Authors: We agree that the viability claim would be better supported by quantitative reliability statistics. In the revised manuscript we will add Cohen's kappa, raw percent agreement, and a short disagreement analysis for the 100-record sample, computed from the existing human and GPT-5.5 classifications. revision: yes
-
Referee: [Methods section on search and sampling] The assumption that the AI-term search and 100-record sample suffice to characterize prevalence and interaction reliability (Methods and Results) is load-bearing for both the trend claims and the viability conclusion, yet the paper provides no details on query construction, inter-rater procedures, handling of ambiguous entries, or checks for registry-entry bias.
Authors: We accept this critique. The revised Methods section will include the precise search terms and Boolean structure used on ClinicalTrials.gov, the random-sampling protocol for the 100 records, the human-review workflow for resolving ambiguities, and an explicit limitations paragraph addressing potential registry-entry biases. revision: yes
Circularity Check
No circularity: purely observational counts and agreement metrics
full rationale
The paper reports temporal trends via direct registry search counts, geographical distributions, and raw human-AI agreement percentages on a 100-record sample. No equations, fitted parameters, predictions, or derivations appear. No self-citations are invoked as load-bearing premises, and no ansatz or uniqueness claims are present. The viability suggestion is an informal inference from observed agreement, not a constructed result equivalent to its inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Introduction Artificial Intelligence (AI) is increasingly used in clinical research and healthcare practice [1,2]. In clinical trials, AI use can be observed in the publicly available records 1 Corresponding Author: Sandra Woolley, s.i.woolley@keele.ac.uk PREPRINT Workshop on Health, Wellbeing and Human-AI Interaction Hybrid Human-Artificial Intelligence ...
2000
-
[2]
Maru et al
Related Work Previous studies have identified growth in the registration and reporting of AI -related clinical research, while also observing variations in terminology, application areas and reporting practices. Maru et al. [18] examined AI and machine learning studies registered on ClinicalTrials.gov between 2010 and 2023 and observed a substantial incre...
2010
-
[3]
Study Aims and Research Questions The aims of the study were i) to analyze trends in the use of AI and human-AI interaction in clinical trials, and ii) to explore the feasibility of a hybrid human-AI trial screening approach where, prospectively, AI assists the screening process and humans adopted an acceptance-sampling approach. More specifically, the st...
-
[4]
artificial intelligence
Methodology The ClinicalTrials.gov repository was searched using the AI-focused search string below and applying the inclusion and exclusion criteria. The search string was developed by collating and pruning search terms used in the related literature and by substantial test searches based on names of AI models and methods. Search String: AI OR "artificia...
2026
-
[5]
expert system
Results The AI search string returned 5,828 records for the search conducted on 23 April 2026 for trials first posted on or before 1 April 2026. At the time of writing, this represent s slightly over 1% of all ClinicalTrials.gov records. Of the returned records, 3,019 were interventional studies, 2,807 were observational and two were ‘expanded access’ stu...
2026
-
[6]
no AI use) classification proved challenging included: i) trials where there was conflicting or ambiguous use of terms across fields, e.g
Discussion Trials where AI use (vs. no AI use) classification proved challenging included: i) trials where there was conflicting or ambiguous use of terms across fields, e.g. AI terms such as ‘Deep learning technique’ included in ‘Terms related to this study’ but in- documentation references limited to ‘automated software’ (NCT03206333); ii) trials where ...
-
[7]
Specify whether there was human-AI interaction in the handling of the input data, and what level of expertise was required of users
Conclusions and Further Work Frontier large language models (such as GPT -5.5) demonstrate potential for screening clinical trials records. A h ybrid-human AI approach is , therefore, a potential alternative to the time -intensive human processes involved in systematic review s of clinical trials and literature, though not without a carbon-footprint. Atte...
-
[8]
For the purposes of open access, the authors have applied a Creative Commons Attribution (CC- BY) license to any Accepted Author Manuscript version arising from this submission
Acknowledgements Authors gratefully acknowledge support of the Digital Society Institute of Keele University, UK, that underpins efforts towards the publication of this wor k. For the purposes of open access, the authors have applied a Creative Commons Attribution (CC- BY) license to any Accepted Author Manuscript version arising from this submission. PRE...
-
[9]
Olawade DB, Fidelis SC, Marinze S, Egbon E, Osunmakinde A , Osborne A. Artificial intelligence in clinical trials: a comprehensive review of opportunities, challenges, and future directions . International Journal of Medical Informatics. 2026; 206, 106141, doi:10.1016/j.ijmedinf.2025.106141
-
[10]
Wang D, Zhang S. Large language models in medical and healthcare fields: applications, advances, and challenges, Artificial Intelligence Review. 2024; 57, article 299, doi:10.1007/s10462-024-10921-0
-
[11]
National Library of Medicine
ClinicalTrials, 2026, Trends and charts on registered studies | ClinicalTrials.gov, Bethesda, MD: U.S. National Library of Medicine. Available at: https://clinicaltrials.gov/about-site/trends-charts (Accessed: 3 May 2026)
2026
-
[12]
Cruz Rivera S, Liu X, Chan A-W, Denniston AK, Calvert MJ. SPIRIT-AI and CONSORT-AI Working G roup, SPIRIT-AI and CONSORT -AI Steering Group and SPIRIT -AI and CONSORT -AI Consensus Group (2020). Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. Nature Medicine; 2020; 26(9), pp. 1351–1363,...
-
[13]
Liu X, Cruz Rivera S, Moher D, Calvert MJ, Denniston AK and the SPIRIT -AI and CONSORT -AI Working Group . Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT -AI extension . Nature Medicine, 2020; 26(9), pp. 1364 –1374, doi:10.1038/s41591-020-1034-x
-
[14]
Martindale, APL, Llewellyn, CD, de Visser RO, Ng B, Ngai V, Kale AU, di Ruffano LF, Golub RM, Collins GS, Moher D, McCradden MD, Oakden-Rayner L, Rivera SC, Calvert M, Kelly CJ, Lee CS, Yau C, Chan A-W, Keane P, Beam AL , Liu X. Concordance of randomised controlled trials for artificia l i ntelligence interventions with the CONSORT-AI reporting guidelines...
-
[15]
Wearables, healthcare- computer interaction and the internet of obscure medical things
Khattak KA, Woolley SI , Collins T. Wearables, healthcare- computer interaction and the internet of obscure medical things . Proceedings of the 37th International BCS Human -Computer Interaction Conference. Swindon: BCS Learning and Development . 2024; pp. 225 –229, doi:10.14236/ewic/BCSHCI2024.22
-
[16]
Compounding barriers to fairness in the digital technology ecosystem
Woolley SI, Collins T, Andras P, Gardner A, Ortolani M, Pitt J. Compounding barriers to fairness in the digital technology ecosystem. IEEE International Symposium on Technology and Society (ISTAS) . 2021; pp. 1-5, doi: 10.1109/ISTAS52410.2021.9629166
-
[17]
Version reporting and assessment approaches for new and updated activity and heart rate monitors
Collins T, Woolley SI, Oniani S, Pires IM, Garcia NM, Ledger SJ, Pandyan A. Version reporting and assessment approaches for new and updated activity and heart rate monitors. Sensors. 2019 Apr 10;19(7):1705, doi:10.3390/s19071705
-
[18]
Wearables and c onnected health futures
Woolley S. Wearables and c onnected health futures. 2023; ITNOW, 65(1) , DOI: 10.1093/combul/bwad012
-
[19]
Human- AI interaction: intermittent continuous, and proactive
van Berkel, N, Skov MB , Kjeldskov J. Human- AI interaction: intermittent continuous, and proactive. Interactions. 2021; 28(6). pp. 67-71, doi:10.1145/3486941
-
[20]
Examining human-AI interaction in real-world healthcare beyond the laboratory
Wekenborg MK, Gilbert S, Kather JN. Examining human-AI interaction in real-world healthcare beyond the laboratory. npj Digital Medicine. 2025; 8(1), p. 169, doi:10.1038/s41746-025-01559-5
-
[21]
History of artificial intelligence in medicine, Gastrointestinal Endoscopy, 2020; 92(4): pp
Kaul V, Enslin S, Gross, SA. History of artificial intelligence in medicine, Gastrointestinal Endoscopy, 2020; 92(4): pp. 807–812, doi:10.1016/j.gie.2020.06.040
-
[22]
O’Mara-Eves A, Thomas J, McNaught J, Miwa M , Ananiadou S. Using text mining for study identification in systematic reviews: a systematic review of current approaches . Systematic Reviews. 2015; 4, 5, doi:10.1186/2046-4053-4-5
-
[23]
An open source machine learning framework for efficient and transparent systematic reviews
v an de Schoot R, de Bruin J, Schram R, Zahedi P, de Boer J, Weijdema F, Kramer B, Huijts M, Hoogerwerf M, Ferdinands G, Harkema A, Willemsen J, Ma Y, Fang Q, Hindriks S, Tummers L, Oberski, DL . An open source machine learning framework for efficient and transparent systematic reviews. Nature Machine Intelligence. 2021; 3(2), pp. 125–133, doi:10.1038/s42...
-
[24]
Ofori-Boateng R, Aceves -Martins M, Wiratunga N , Moreno-Garcia CF. Towards the automation of systematic reviews using natural language processing, machine learning, and deep learning: a comprehensive review. Artificial Intelligence Review. 2024; 57, 200. doi:10.1007/s10462-024-10844-w
-
[25]
Ge L, Agrawal R, Singer M, Kannapiran, P, De Castro Molina, JA, Teow KL, Yap CW, Abisheganaden JA. Leveraging artificial intelligence to enhance systematic reviews in health research: advanced tools and challenges. Systematic Reviews. 2024; 13, 269. doi:10.1186/s13643-024-02682-2
-
[26]
Maru S, Matthias MD, Kuwatsuru, R, Simpson RJ Jr. Studies of artificial intelligence/machine learning registered on ClinicalTrials.gov: cross -sectional study with temporal trends, 2010 –2023, Journal of Medical Internet Research. 2024; 26, e57750, doi:10.2196/57750
-
[27]
Han R, Acosta JN, Shakeri Z, Ioannidis JP, Topol EJ, Rajpurkar P. Randomised controlled trials evaluating artificial intelligence in clinical practice: a scoping review. The Lancet Digital Health. 2024 May 1;6(5):e367-73, doi:10.1016/S2589-7500(24)00047-5 PREPRINT Workshop on Health, Wellbeing and Human-AI Interaction Hybrid Human-Artificial Intelligence ...
-
[28]
Wang L, Wan Z, Ni C, Song Q, Li Y, Clayton E, Malin B, Yin Z. Applications and concerns of ChatGPT and other conversational large language models in health care: systematic review . Journal of Medical Internet Research. 2024; 26, e22769, doi:10.2196/22769
-
[29]
Power hungry: How AI will drive energy demand
Bogmans C, Ganpurev G, Gomez-Gonzalez P, Melina G, Pescatori A, Thube S. Power hungry: How AI will drive energy demand. Energy Economics. 2026 Mar 23, doi:10.1016/j.eneco.2026.109278
-
[30]
This is a record of a clinical trial taken from the clinicaltrials.gov repository
Appendix AI prompt (markdown format): You are an expert systematic reviewer in clinical applications of AI. This is a record of a clinical trial taken from the clinicaltrials.gov repository. Inspect the record and classify the role of artificial intelligence in the clinical trial using the following categories: * No use of Artificial Intelligence; * No hu...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.