pith. sign in

arxiv: 2605.29096 · v1 · pith:HVFJ357Mnew · submitted 2026-05-27 · 💻 cs.AI

Trends in AI and Human-AI Interaction in Clinical Trials -- A Hybrid Human-AI Exploration

Pith reviewed 2026-06-29 11:51 UTC · model grok-4.3

classification 💻 cs.AI
keywords AI trendsclinical trialshybrid human-AI analysisClinicalTrials.govmachine learninghuman-AI interactionlarge language modelstrial registry analysis
0
0 comments X

The pith

AI-related clinical trials have increased markedly over time, and hybrid human-AI screening of records appears viable.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines records from a clinical trial registry to track the use of AI terminology and the patterns of human-AI interaction in registered studies. It documents a steady rise in AI mentions, including newer references to machine learning, deep learning, and large language models, along with the leading countries involved. The work tests a combined process in which a generative AI model first screens records and humans then review the results. A sympathetic reader would care because the approach offers a practical way to monitor how AI is entering medical research and to identify where trial descriptions need improvement for better tracking.

Core claim

The paper establishes that AI-related trials show a marked increase over time with recent growth in terms such as machine learning, deep learning, chatbots, GPTs, and large language models. China and the United States account for the largest shares, with notable recent rises in several other countries. In a random sample of 100 records, human and AI classifiers agreed well on studies not using AI substantively but showed lower agreement on classifying types of human-AI interaction, especially when descriptions were ambiguous. The results indicate that a hybrid human-AI workflow for screening trial records is potentially viable, though clearer reporting and more precise definitions of interac

What carries the argument

The hybrid workflow that pairs a frontier generative AI model with human review to screen and categorize records returned by an AI-focused search of the registry.

If this is right

  • AI terminology appears in clinical trials at increasing rates, especially references to advanced techniques in recent years.
  • China and the United States lead in the number of AI-related trials, with several other countries showing recent growth.
  • Human and AI classifiers reach good agreement on identifying trials that do not use AI, but agreement drops when classifying the details of human-AI interaction.
  • Clearer trial reporting and more exact definitions of human-AI interaction would make hybrid screening more reliable.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Future registry policies could require explicit fields for AI methods and interaction details to reduce ambiguity in trend analyses.
  • The hybrid approach might extend to monitoring other emerging technologies in clinical research if agreement rates improve with refined categories.
  • Accelerating AI adoption in trials could eventually influence regulatory expectations around documentation of technology use.
  • Researchers planning new trials may benefit from anticipating that ambiguous descriptions limit the usefulness of automated or hybrid meta-studies.

Load-bearing premise

The selected AI search terms and the random sample of 100 records are enough to represent the true prevalence of AI use and the reliability of hybrid classification without systematic bias from registry descriptions.

What would settle it

A full manual audit of the retrieved records that finds many AI-using trials omitted from the search terms or that shows consistent disagreement between human and AI classifiers on interaction types across a much larger sample.

Figures

Figures reproduced from arXiv: 2605.29096 by Ariane Arevalo, Chris Richardson, Illia Chernomorets, Khalid Khattak, Sandra Woolley, Tim Collins.

Figure 1
Figure 1. Figure 1: Trends in AI terms in AI-classified clinical trials making use of AI. As shown in [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Country locations of AI-classified clinical trials making use of AI [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
read the original abstract

This paper examines records retrieved from the ClinicalTrials.gov registry to characterize temporal trends in AI terminology and the geographical distribution of AI trials. The work also reports on an exploratory hybrid human-AI approach to analyzing human-AI interaction trends in registered clinical trials. The hybrid workflow comprised a frontier generative AI model (GPT-5.5) and human review to screen and categorize records returned by an AI-focused search. The findings indicate a marked increase in AI-related trials over time, with recent growth in references to machine learning, deep learning, chatbots, GPTs, and large language models. Geographically, China and the United States accounted for the largest numbers of AI-related trials, with notable recent increases in several other countries including Italy, France, Spain, the UK and Turkey (T\"urkiye). In a random sample of 100 records, human and AI classifiers showed good agreement in identifying studies not substantively using AI, but lower agreement in classifying human-AI interaction, particularly where health professional interaction was ambiguous or insufficiently described. Overall, the results suggest that hybrid human-AI screening of clinical trial records is potentially viable, but clearer trial reporting and more precise interaction definitions will benefit the process.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper analyzes ClinicalTrials.gov records retrieved via an AI-focused search to characterize temporal trends in AI terminology (e.g., marked increases in machine learning, deep learning, chatbots, GPTs, and LLMs) and geographic distributions (China and US leading, with recent growth in Italy, France, Spain, UK, and Turkey). It also reports an exploratory hybrid human-AI workflow (GPT-5.5 plus human review) applied to screen and categorize human-AI interactions, concluding from a random sample of 100 records that hybrid screening is potentially viable despite lower agreement on interaction categories.

Significance. If the trends hold, the work supplies useful observational counts on AI adoption in registered trials and flags practical issues in registry-based classification of human-AI roles. The hybrid workflow exploration is a modest methodological contribution that could motivate clearer trial reporting standards, though the study remains purely descriptive with no statistical modeling or predictions.

major comments (2)
  1. [Results section on hybrid classification] The central suggestion that hybrid human-AI screening is potentially viable rests on agreement observed in the random sample of 100 records (Results section on hybrid classification). The manuscript states only that agreement was 'good' for non-substantive AI use and 'lower' for interaction categories without reporting any quantitative metrics (Cohen's kappa, raw percent agreement, or disagreement analysis), leaving the viability inference unsupported by standard reliability statistics.
  2. [Methods section on search and sampling] The assumption that the AI-term search and 100-record sample suffice to characterize prevalence and interaction reliability (Methods and Results) is load-bearing for both the trend claims and the viability conclusion, yet the paper provides no details on query construction, inter-rater procedures, handling of ambiguous entries, or checks for registry-entry bias.
minor comments (2)
  1. [Abstract] Abstract contains the string 'T"urkiye' which is a LaTeX artifact and should be corrected to Türkiye for readability.
  2. [Discussion] The paper would benefit from explicit comparison of the observed growth rates to prior bibliometric or registry studies on AI in medicine to contextualize the 'marked increase' claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We agree that the points raised identify areas where additional detail and quantitative support will improve clarity and rigor, and we will revise the manuscript to address them.

read point-by-point responses
  1. Referee: [Results section on hybrid classification] The central suggestion that hybrid human-AI screening is potentially viable rests on agreement observed in the random sample of 100 records (Results section on hybrid classification). The manuscript states only that agreement was 'good' for non-substantive AI use and 'lower' for interaction categories without reporting any quantitative metrics (Cohen's kappa, raw percent agreement, or disagreement analysis), leaving the viability inference unsupported by standard reliability statistics.

    Authors: We agree that the viability claim would be better supported by quantitative reliability statistics. In the revised manuscript we will add Cohen's kappa, raw percent agreement, and a short disagreement analysis for the 100-record sample, computed from the existing human and GPT-5.5 classifications. revision: yes

  2. Referee: [Methods section on search and sampling] The assumption that the AI-term search and 100-record sample suffice to characterize prevalence and interaction reliability (Methods and Results) is load-bearing for both the trend claims and the viability conclusion, yet the paper provides no details on query construction, inter-rater procedures, handling of ambiguous entries, or checks for registry-entry bias.

    Authors: We accept this critique. The revised Methods section will include the precise search terms and Boolean structure used on ClinicalTrials.gov, the random-sampling protocol for the 100 records, the human-review workflow for resolving ambiguities, and an explicit limitations paragraph addressing potential registry-entry biases. revision: yes

Circularity Check

0 steps flagged

No circularity: purely observational counts and agreement metrics

full rationale

The paper reports temporal trends via direct registry search counts, geographical distributions, and raw human-AI agreement percentages on a 100-record sample. No equations, fitted parameters, predictions, or derivations appear. No self-citations are invoked as load-bearing premises, and no ansatz or uniqueness claims are present. The viability suggestion is an informal inference from observed agreement, not a constructed result equivalent to its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical registry study with no mathematical model. No free parameters, axioms, or invented entities are required.

pith-pipeline@v0.9.1-grok · 5763 in / 1109 out tokens · 22794 ms · 2026-06-29T11:51:07.692973+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

30 extracted references · 20 canonical work pages

  1. [1]

    Introduction Artificial Intelligence (AI) is increasingly used in clinical research and healthcare practice [1,2]. In clinical trials, AI use can be observed in the publicly available records 1 Corresponding Author: Sandra Woolley, s.i.woolley@keele.ac.uk PREPRINT Workshop on Health, Wellbeing and Human-AI Interaction Hybrid Human-Artificial Intelligence ...

  2. [2]

    Maru et al

    Related Work Previous studies have identified growth in the registration and reporting of AI -related clinical research, while also observing variations in terminology, application areas and reporting practices. Maru et al. [18] examined AI and machine learning studies registered on ClinicalTrials.gov between 2010 and 2023 and observed a substantial incre...

  3. [3]

    Study Aims and Research Questions The aims of the study were i) to analyze trends in the use of AI and human-AI interaction in clinical trials, and ii) to explore the feasibility of a hybrid human-AI trial screening approach where, prospectively, AI assists the screening process and humans adopted an acceptance-sampling approach. More specifically, the st...

  4. [4]

    artificial intelligence

    Methodology The ClinicalTrials.gov repository was searched using the AI-focused search string below and applying the inclusion and exclusion criteria. The search string was developed by collating and pruning search terms used in the related literature and by substantial test searches based on names of AI models and methods. Search String: AI OR "artificia...

  5. [5]

    expert system

    Results The AI search string returned 5,828 records for the search conducted on 23 April 2026 for trials first posted on or before 1 April 2026. At the time of writing, this represent s slightly over 1% of all ClinicalTrials.gov records. Of the returned records, 3,019 were interventional studies, 2,807 were observational and two were ‘expanded access’ stu...

  6. [6]

    no AI use) classification proved challenging included: i) trials where there was conflicting or ambiguous use of terms across fields, e.g

    Discussion Trials where AI use (vs. no AI use) classification proved challenging included: i) trials where there was conflicting or ambiguous use of terms across fields, e.g. AI terms such as ‘Deep learning technique’ included in ‘Terms related to this study’ but in- documentation references limited to ‘automated software’ (NCT03206333); ii) trials where ...

  7. [7]

    Specify whether there was human-AI interaction in the handling of the input data, and what level of expertise was required of users

    Conclusions and Further Work Frontier large language models (such as GPT -5.5) demonstrate potential for screening clinical trials records. A h ybrid-human AI approach is , therefore, a potential alternative to the time -intensive human processes involved in systematic review s of clinical trials and literature, though not without a carbon-footprint. Atte...

  8. [8]

    For the purposes of open access, the authors have applied a Creative Commons Attribution (CC- BY) license to any Accepted Author Manuscript version arising from this submission

    Acknowledgements Authors gratefully acknowledge support of the Digital Society Institute of Keele University, UK, that underpins efforts towards the publication of this wor k. For the purposes of open access, the authors have applied a Creative Commons Attribution (CC- BY) license to any Accepted Author Manuscript version arising from this submission. PRE...

  9. [9]

    Artificial intelligence in clinical trials: a comprehensive review of opportunities, challenges, and future directions

    Olawade DB, Fidelis SC, Marinze S, Egbon E, Osunmakinde A , Osborne A. Artificial intelligence in clinical trials: a comprehensive review of opportunities, challenges, and future directions . International Journal of Medical Informatics. 2026; 206, 106141, doi:10.1016/j.ijmedinf.2025.106141

  10. [10]

    Large language models in medical and healthcare fields: applications, advances, and challenges, Artificial Intelligence Review

    Wang D, Zhang S. Large language models in medical and healthcare fields: applications, advances, and challenges, Artificial Intelligence Review. 2024; 57, article 299, doi:10.1007/s10462-024-10921-0

  11. [11]

    National Library of Medicine

    ClinicalTrials, 2026, Trends and charts on registered studies | ClinicalTrials.gov, Bethesda, MD: U.S. National Library of Medicine. Available at: https://clinicaltrials.gov/about-site/trends-charts (Accessed: 3 May 2026)

  12. [12]

    SPIRIT-AI and CONSORT-AI Working G roup, SPIRIT-AI and CONSORT -AI Steering Group and SPIRIT -AI and CONSORT -AI Consensus Group (2020)

    Cruz Rivera S, Liu X, Chan A-W, Denniston AK, Calvert MJ. SPIRIT-AI and CONSORT-AI Working G roup, SPIRIT-AI and CONSORT -AI Steering Group and SPIRIT -AI and CONSORT -AI Consensus Group (2020). Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. Nature Medicine; 2020; 26(9), pp. 1351–1363,...

  13. [13]

    Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT -AI extension

    Liu X, Cruz Rivera S, Moher D, Calvert MJ, Denniston AK and the SPIRIT -AI and CONSORT -AI Working Group . Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT -AI extension . Nature Medicine, 2020; 26(9), pp. 1364 –1374, doi:10.1038/s41591-020-1034-x

  14. [14]

    Concordance of randomised controlled trials for artificia l i ntelligence interventions with the CONSORT-AI reporting guidelines

    Martindale, APL, Llewellyn, CD, de Visser RO, Ng B, Ngai V, Kale AU, di Ruffano LF, Golub RM, Collins GS, Moher D, McCradden MD, Oakden-Rayner L, Rivera SC, Calvert M, Kelly CJ, Lee CS, Yau C, Chan A-W, Keane P, Beam AL , Liu X. Concordance of randomised controlled trials for artificia l i ntelligence interventions with the CONSORT-AI reporting guidelines...

  15. [15]

    Wearables, healthcare- computer interaction and the internet of obscure medical things

    Khattak KA, Woolley SI , Collins T. Wearables, healthcare- computer interaction and the internet of obscure medical things . Proceedings of the 37th International BCS Human -Computer Interaction Conference. Swindon: BCS Learning and Development . 2024; pp. 225 –229, doi:10.14236/ewic/BCSHCI2024.22

  16. [16]

    Compounding barriers to fairness in the digital technology ecosystem

    Woolley SI, Collins T, Andras P, Gardner A, Ortolani M, Pitt J. Compounding barriers to fairness in the digital technology ecosystem. IEEE International Symposium on Technology and Society (ISTAS) . 2021; pp. 1-5, doi: 10.1109/ISTAS52410.2021.9629166

  17. [17]

    Version reporting and assessment approaches for new and updated activity and heart rate monitors

    Collins T, Woolley SI, Oniani S, Pires IM, Garcia NM, Ledger SJ, Pandyan A. Version reporting and assessment approaches for new and updated activity and heart rate monitors. Sensors. 2019 Apr 10;19(7):1705, doi:10.3390/s19071705

  18. [18]

    Wearables and c onnected health futures

    Woolley S. Wearables and c onnected health futures. 2023; ITNOW, 65(1) , DOI: 10.1093/combul/bwad012

  19. [19]

    Human- AI interaction: intermittent continuous, and proactive

    van Berkel, N, Skov MB , Kjeldskov J. Human- AI interaction: intermittent continuous, and proactive. Interactions. 2021; 28(6). pp. 67-71, doi:10.1145/3486941

  20. [20]

    Examining human-AI interaction in real-world healthcare beyond the laboratory

    Wekenborg MK, Gilbert S, Kather JN. Examining human-AI interaction in real-world healthcare beyond the laboratory. npj Digital Medicine. 2025; 8(1), p. 169, doi:10.1038/s41746-025-01559-5

  21. [21]

    History of artificial intelligence in medicine, Gastrointestinal Endoscopy, 2020; 92(4): pp

    Kaul V, Enslin S, Gross, SA. History of artificial intelligence in medicine, Gastrointestinal Endoscopy, 2020; 92(4): pp. 807–812, doi:10.1016/j.gie.2020.06.040

  22. [22]

    Using text mining for study identification in systematic reviews: a systematic review of current approaches

    O’Mara-Eves A, Thomas J, McNaught J, Miwa M , Ananiadou S. Using text mining for study identification in systematic reviews: a systematic review of current approaches . Systematic Reviews. 2015; 4, 5, doi:10.1186/2046-4053-4-5

  23. [23]

    An open source machine learning framework for efficient and transparent systematic reviews

    v an de Schoot R, de Bruin J, Schram R, Zahedi P, de Boer J, Weijdema F, Kramer B, Huijts M, Hoogerwerf M, Ferdinands G, Harkema A, Willemsen J, Ma Y, Fang Q, Hindriks S, Tummers L, Oberski, DL . An open source machine learning framework for efficient and transparent systematic reviews. Nature Machine Intelligence. 2021; 3(2), pp. 125–133, doi:10.1038/s42...

  24. [24]

    Towards the automation of systematic reviews using natural language processing, machine learning, and deep learning: a comprehensive review

    Ofori-Boateng R, Aceves -Martins M, Wiratunga N , Moreno-Garcia CF. Towards the automation of systematic reviews using natural language processing, machine learning, and deep learning: a comprehensive review. Artificial Intelligence Review. 2024; 57, 200. doi:10.1007/s10462-024-10844-w

  25. [25]

    Leveraging artificial intelligence to enhance systematic reviews in health research: advanced tools and challenges

    Ge L, Agrawal R, Singer M, Kannapiran, P, De Castro Molina, JA, Teow KL, Yap CW, Abisheganaden JA. Leveraging artificial intelligence to enhance systematic reviews in health research: advanced tools and challenges. Systematic Reviews. 2024; 13, 269. doi:10.1186/s13643-024-02682-2

  26. [26]

    Studies of artificial intelligence/machine learning registered on ClinicalTrials.gov: cross -sectional study with temporal trends, 2010 –2023, Journal of Medical Internet Research

    Maru S, Matthias MD, Kuwatsuru, R, Simpson RJ Jr. Studies of artificial intelligence/machine learning registered on ClinicalTrials.gov: cross -sectional study with temporal trends, 2010 –2023, Journal of Medical Internet Research. 2024; 26, e57750, doi:10.2196/57750

  27. [27]

    Randomised controlled trials evaluating artificial intelligence in clinical practice: a scoping review

    Han R, Acosta JN, Shakeri Z, Ioannidis JP, Topol EJ, Rajpurkar P. Randomised controlled trials evaluating artificial intelligence in clinical practice: a scoping review. The Lancet Digital Health. 2024 May 1;6(5):e367-73, doi:10.1016/S2589-7500(24)00047-5 PREPRINT Workshop on Health, Wellbeing and Human-AI Interaction Hybrid Human-Artificial Intelligence ...

  28. [28]

    Applications and concerns of ChatGPT and other conversational large language models in health care: systematic review

    Wang L, Wan Z, Ni C, Song Q, Li Y, Clayton E, Malin B, Yin Z. Applications and concerns of ChatGPT and other conversational large language models in health care: systematic review . Journal of Medical Internet Research. 2024; 26, e22769, doi:10.2196/22769

  29. [29]

    Power hungry: How AI will drive energy demand

    Bogmans C, Ganpurev G, Gomez-Gonzalez P, Melina G, Pescatori A, Thube S. Power hungry: How AI will drive energy demand. Energy Economics. 2026 Mar 23, doi:10.1016/j.eneco.2026.109278

  30. [30]

    This is a record of a clinical trial taken from the clinicaltrials.gov repository

    Appendix AI prompt (markdown format): You are an expert systematic reviewer in clinical applications of AI. This is a record of a clinical trial taken from the clinicaltrials.gov repository. Inspect the record and classify the role of artificial intelligence in the clinical trial using the following categories: * No use of Artificial Intelligence; * No hu...