pith. sign in

arxiv: 2606.02914 · v2 · pith:ESC2VE4Knew · submitted 2026-06-01 · 💻 cs.AI · cs.CL

Large AI Models in Dental Healthcare: From General-Purpose Systems to Domain-Specific Foundation Models

Pith reviewed 2026-06-28 14:03 UTC · model grok-4.3

classification 💻 cs.AI cs.CL
keywords large AI modelsfoundation modelsdentistrysystematic reviewmultimodal AIdental imagingclinical decision supportoral healthcare
0
0 comments X

The pith

General-purpose and dental-specific AI models complement each other, with their combination in structured pipelines yielding the best results for dental tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This review examines 97 studies on large AI models applied to dentistry between 2020 and 2026. It introduces a two-dimensional framework to classify models according to their architecture and level of dental specialization. The analysis shows that language models handle text tasks effectively, vision models adapted from general sources perform well on image analysis, and dental-specific models lead on complex multimodal problems. Combining these in pipelines outperforms any single approach. This matters because oral diseases impact billions of people, and clarifying how these models interact could guide better AI tools in healthcare while highlighting gaps that must be closed for reliable use.

Core claim

Three distinct model categories have emerged in dentistry: language-generative models, discriminative vision foundation models, and dental-specific foundation models. Using a proposed two-dimensional classification framework based on architectural paradigm and dental specialization degree, the review finds that general-purpose models excel in text-based tasks but are inconsistent on image diagnostics, adapted general vision models achieve strong results in segmentation and detection, and dental-specific models perform best on complex multimodal tasks. Integrated pipelines outperform single-model approaches, though dental-specific pretraining is heavily skewed toward vision due to scarce text

What carries the argument

The two-dimensional classification framework that organizes large AI models by architectural paradigm and degree of dental specialization.

If this is right

  • Integrated pipelines consistently outperform single-model approaches.
  • Dental-specific models demonstrate the strongest performance on complex multimodal tasks.
  • Language-generative models excel at text-based tasks but show inconsistent performance on image-dependent diagnostics.
  • Adapted general vision models like SAM and CLIP variants achieve strong results in tooth segmentation and lesion detection.
  • A data asymmetry exists where dental-specific pretraining concentrates almost entirely in the vision domain.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Future work could test whether creating more dental text data would reduce the observed asymmetry and improve multimodal performance.
  • Establishing standardized benchmarks might enable clearer comparisons and faster iteration across different model types.
  • Hybrid systems that route tasks to the most suitable model type could become a practical deployment strategy in clinics.
  • Addressing hallucination might involve combining generative models with verification steps from discriminative models.

Load-bearing premise

The 97 studies identified through database searches with dual screening form a comprehensive and unbiased sample of the literature.

What would settle it

Discovery of many additional relevant studies from 2020-2026 not captured by the searches in PubMed, Google Scholar, Scopus, and arXiv, or demonstration that models do not fit the proposed two-dimensional classification framework.

Figures

Figures reproduced from arXiv: 2606.02914 by Alaa Abd-Alrazaq, Faleh Tamimi, Lina Abu Nada, Rafat Damseh, Sausan Al Kawas, Sema Helali.

Figure 1
Figure 1. Figure 1: The evolution of AI technology in dental practice, illustrating the progression [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: PRISMA 2020 flow diagram summarizing the systematic process of study identi [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Large AI models in dental healthcare organized by architectural paradigm (left) [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Summary of the main themes across clinical, educational, and patient commu [PITH_FULL_IMAGE:figures/full_fig_p027_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: 5.2. Detection and Diagnosis Beyond segmentation, discriminative vision foundation models have been applied to two further task types: open-vocabulary abnormality detection and contrastive diagnosis. Du et al. applied GroundingDINO for dental abnormality detection using text prompts of abnormality class names [108]. The system enhances detec￾tion through FDI-based tooth notation and a multi-level strategy … view at source ↗
Figure 5
Figure 5. Figure 5: SAM adaptation strategies and adapted models for dental image segmentation, [PITH_FULL_IMAGE:figures/full_fig_p032_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Transition from narrow to foundation models and key model categories with [PITH_FULL_IMAGE:figures/full_fig_p039_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Performance heatmap of large AI models across dental task categories. Ratings (Moderate, High, Very High) reflect [PITH_FULL_IMAGE:figures/full_fig_p041_7.png] view at source ↗
read the original abstract

Background: Oral diseases affect nearly 3.5 billion people worldwide, yet the comparative clinical potential of large-scale AI models in dentistry remains poorly understood. Three distinct model categories have emerged: language-generative models, discriminative vision foundation models, and dental-specific foundation models, with no unified review examining their relationships and collective limitations. Methods: Following PRISMA-ScR guidelines, we systematically searched four databases (PubMed, Google Scholar, Scopus, arXiv), screened independently by two reviewers. After applying inclusion/exclusion criteria, 97 studies (2020-2026) were included. We propose a two-dimensional classification framework organizing models by architectural paradigm and dental specialization degree. Results: Language-generative models excel at text-based tasks (clinical reasoning, licensing exams, patient communication) but show inconsistent performance on image-dependent diagnostics. Adapted SAM and CLIP variants achieve strong tooth segmentation and lesion detection results. Dental-specific models (DentVFM, DentVLM, OralGPT) demonstrate strongest performance on complex multimodal tasks. Integrated pipelines consistently outperform single-model approaches. A data asymmetry is observed: dental-specific pretraining concentrates almost entirely in the vision domain, reflecting scarce large-scale dental text corpora. Conclusions: General-purpose and dental-specific models play complementary roles; the most effective systems combine both within structured pipelines. Safe autonomous deployment requires resolving three persistent barriers: hallucination in generative models, limited annotated dental datasets, and absent standardized clinical evaluation benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper is a PRISMA-ScR scoping review that searched PubMed, Google Scholar, Scopus, and arXiv, applied dual independent screening, and included 97 studies (2020-2026) on large AI models in dentistry. It introduces a two-dimensional classification framework (architectural paradigm × dental specialization degree) to organize language-generative models, discriminative vision foundation models (e.g., adapted SAM/CLIP), and dental-specific models (DentVFM, DentVLM, OralGPT). The central claims are that general-purpose and domain-specific models are complementary, integrated pipelines outperform single models, a vision-text data asymmetry exists, and three barriers (hallucination, limited annotated datasets, absent clinical benchmarks) must be resolved for safe deployment.

Significance. If the 97-study sample proves representative and the framework reproducible, the review would usefully synthesize an emerging literature and supply a practical organizing lens for multimodal dental AI. The explicit ranking of three deployment barriers and the complementarity finding could guide both model development and regulatory discussion; the data-asymmetry observation is a concrete, falsifiable observation that future work can test.

major comments (2)
  1. [Methods] Methods: the description of the search strategy supplies only the four databases and the PRISMA-ScR label; no Boolean strings, date filters, or exact inclusion/exclusion criteria are reproduced. Without these, it is impossible to verify whether the 97-study corpus is unbiased or complete, directly affecting the reliability of the complementarity and barrier-ranking conclusions.
  2. [Results] Results / Framework: the two-dimensional taxonomy is introduced without reported inter-rater reliability for model binning, without a sensitivity analysis on alternative categorizations, and without comparison to existing AI taxonomies. The claim that dental-specific models show “strongest performance on complex multimodal tasks” therefore rests on an unvalidated partitioning whose stability is unknown.
minor comments (2)
  1. [Abstract / Methods] The date range “2020-2026” in the abstract and methods appears to project into the future; clarify whether this is a typographical error or an intended forward-looking inclusion rule.
  2. [Results] Table or figure presenting the 97 studies should include a column for the two-dimensional classification labels so readers can inspect the binning decisions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our PRISMA-ScR scoping review. We address each major comment below, indicating planned revisions to improve transparency and reproducibility.

read point-by-point responses
  1. Referee: [Methods] Methods: the description of the search strategy supplies only the four databases and the PRISMA-ScR label; no Boolean strings, date filters, or exact inclusion/exclusion criteria are reproduced. Without these, it is impossible to verify whether the 97-study corpus is unbiased or complete, directly affecting the reliability of the complementarity and barrier-ranking conclusions.

    Authors: We agree that the Methods section requires greater specificity for full reproducibility. The current text summarizes the approach at a high level to meet journal constraints, but the full Boolean strings, date filters (2020-2026), and exact inclusion/exclusion criteria were applied during screening by two independent reviewers. In the revised manuscript we will expand the Methods section (or add a supplementary table) to reproduce these details verbatim. This will allow direct verification of corpus selection and strengthen confidence in the synthesized findings on model complementarity and barriers. revision: yes

  2. Referee: [Results] Results / Framework: the two-dimensional taxonomy is introduced without reported inter-rater reliability for model binning, without a sensitivity analysis on alternative categorizations, and without comparison to existing AI taxonomies. The claim that dental-specific models show “strongest performance on complex multimodal tasks” therefore rests on an unvalidated partitioning whose stability is unknown.

    Authors: The two-dimensional framework (architectural paradigm × dental specialization degree) is offered as a novel descriptive lens derived directly from patterns in the 97 included studies, with explicit categorization rules stated in the Methods. Because the binning was performed by the author team rather than as an independent multi-rater annotation task, inter-rater reliability statistics were not computed. We will revise the manuscript to (1) add an explicit comparison to prior AI taxonomies in the Discussion and (2) elaborate the categorization criteria for greater transparency. A post-hoc sensitivity analysis on alternative partitions is not feasible without re-screening all studies and would not change the primary descriptive observations; we will instead qualify the framework as exploratory and note this as a limitation. The performance statements are aggregated from the primary studies’ own reported results and will be rephrased to reflect this source. revision: partial

Circularity Check

0 steps flagged

No circularity: scoping review synthesizes external literature

full rationale

This scoping review follows PRISMA-ScR to identify and classify 97 external studies from 2020-2026. The proposed two-dimensional framework (architectural paradigm × specialization degree) is an organizational taxonomy, not a derivation that reduces to fitted inputs or self-definitions. Central claims about model complementarity and three barriers are synthesized from the reviewed papers rather than generated by internal equations, predictions, or self-citation chains. No load-bearing step equates outputs to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on standard systematic review methodology (PRISMA-ScR) rather than introducing new free parameters, domain-specific axioms beyond background assumptions, or invented entities.

axioms (1)
  • standard math PRISMA-ScR guidelines constitute an appropriate and sufficient standard for conducting and reporting scoping reviews in health sciences.
    The paper states it followed PRISMA-ScR guidelines for the systematic search, screening, and inclusion process.

pith-pipeline@v0.9.1-grok · 5817 in / 1328 out tokens · 34159 ms · 2026-06-28T14:03:50.242167+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

122 extracted references · 89 canonical work pages

  1. [1]

    rep., World Health Organization, Geneva (2022)

    World Health Organization, Global oral health status report: Towards universal health coverage for oral health by 2030, Tech. rep., World Health Organization, Geneva (2022). URLhttps://www.who.int/publications/i/item/9789240061484

  2. [2]

    S. Lee, S. I. Oh, J. Jo, S. Kang, Y. Shin, J. W. Park, Deep learning for early dental caries detection in bitewing radiographs, Scientific Reports 11 (1) (2021) 16807.doi:10.1038/s41598-021-96368-7

  3. [3]

    Virupaiah, A

    G. Virupaiah, A. Sathyanarayana, Analysis of image enhancement techniques for dental caries detection using texture analysis and sup- port vector machine, International Journal of Applied Science and En- gineering 17 (2020) 75–86. URLhttps://api.semanticscholar.org/CorpusID:231800088 48

  4. [4]

    Bashir, Z

    N. Bashir, Z. Ur Rahman, S. Chen, Systematic comparison of ma- chine learning algorithms to develop and validate predictive models for periodontitis, Journal of Clinical Periodontology 49 (07 2022). doi:10.1111/jcpe.13692

  5. [5]

    D. V. Tuzoff, et al., Tooth detection and numbering in panoramic ra- diographs using convolutional neural networks, Dento Maxillofacial Ra- diology 48 (4) (2019) 20180051.doi:10.1259/dmfr.20180051

  6. [6]

    Li, et al., Detection of dental apical lesions using cnns on periapi- cal radiograph, Sensors 21 (21) (2021) 7049.doi:10.3390/s21217049

    C.-W. Li, et al., Detection of dental apical lesions using cnns on periapi- cal radiograph, Sensors 21 (21) (2021) 7049.doi:10.3390/s21217049

  7. [7]

    Vaswani, N

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need, Advances in Neural Information Processing Systems 30 (2017)

  8. [8]

    Brown, B

    T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhari- wal, A. Neelakantan, et al., Language models are few-shot learners, Advances in Neural Information Processing Systems 33 (2020) 1877– 1901

  9. [9]

    Fanelli, M

    F. Fanelli, M. Saleh, P. Santamaria, K. Zhurakivska, L. Nibali, G. Troiano, Development and comparative evaluation of a reinstructed gpt-4o model specialized in periodontology, Journal of Clinical Peri- odontology 52 (5) (2025) 707–716.doi:10.1111/jcpe.14101

  10. [10]

    Huang, F

    X. Huang, F. Xiao, D. He, A. Gao, D. Li, X. Zhang, S. Zhang, X. Wang, Towards generalist intelligence in dentistry: Vision foundation models for oral and maxillofacial radiology, arXiv preprint arXiv:2510.14532 (2025)

  11. [11]

    URLhttps://doi.org/10.1007/s11760-025-04208-2

    J.Zhang, M.Lin, H.Hou, B.Sun, F.Hu, Y.Yu, M.Li, Easam: anedge- aware sam-based paradigm for tooth segmentation, Signal, Image and Video Processing 19 (2025) 673.doi:10.1007/s11760-025-04208-2. URLhttps://doi.org/10.1007/s11760-025-04208-2

  12. [12]

    Z. Meng, J. Hao, X. Dai, Y. Feng, J. Liu, B. Feng, et al., Dentvlm: A multimodal vision-language model for comprehensive dental diagnosis andenhancedclinicalpractice, arXivpreprintarXiv:2509.23344(2025). 49

  13. [13]

    Zhang, B

    J. Zhang, B. Du, Y. Miao, D. Sun, X. Cao, Oralgpt: A two-stage vision-language model for oral mucosal disease diagnosis and descrip- tion, arXiv preprint arXiv:2510.13911 (2025)

  14. [14]

    M. J. Page, et al., The prisma 2020 statement: an updated guideline for reporting systematic reviews, BMJ 372 (2021) n71.doi:10.1136/ bmj.n71

  15. [15]

    Dosovitskiy, L

    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, N. Houlsby, et al., An image is worth 16x16 words: Transformers for image recognition at scale, arXiv preprint arXiv:2010.11929 (2020)

  16. [16]

    H. Liu, C. Li, Q. Wu, Y. J. Lee, Visual instruction tuning, Advances in neural information processing systems 36 (2023) 34892–34916

  17. [17]

    Hurst, A

    A. Hurst, A. Lerer, A. P. Goucher, A. Perelman, A. Ramesh, A. Clark, A. Ostrow, A. Welihinda, A. Hayes, A. Radford, et al., Gpt-4o system card, arXiv preprint arXiv:2410.21276 (2024)

  18. [18]

    Radford, J

    A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, I. Sutskever, Learning transferable visual models from natural language supervision, in: M. Meila, T. Zhang (Eds.), Proceedings of the 38th International Conference on Machine Learning, Vol. 139 of Proceedings of Machine Learning Resea...

  19. [19]

    Kirillov, E

    A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dollár, R. Girshick, Segment anything, Proceedings of the IEEE/CVF International Con- ference on Computer Vision (ICCV) (2023) 3992–4003doi:10.1109/ ICCV51070.2023.00371

  20. [20]

    S. Liu, Z. Zeng, T. Ren, F. Li, H. Zhang, J. Yang, others, L. Zhang, Grounding dino: Marrying dino with grounded pre-training for open- set object detection, in: European Conference on Computer Vision, Springer Nature Switzerland, 2024, pp. 38–55

  21. [21]

    J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V. Le, D. Zhou, et al., Chain-of-thought prompting elicits reasoning in large 50 language models, Advances in neural information processing systems 35 (2022) 24824–24837

  22. [22]

    Lewis, E

    P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel, et al., Retrieval- augmented generation for knowledge-intensive nlp tasks, Advances in neural information processing systems 33 (2020) 9459–9474

  23. [23]

    Houlsby, A

    N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Larous- silhe, A. Gesmundo, M. Attariyan, S. Gelly, Parameter-efficient trans- fer learning for nlp, in: International conference on machine learning, PMLR, 2019, pp. 2790–2799

  24. [24]

    E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, W. Chen, et al., Lora: Low-rank adaptation of large language models., Iclr 1 (2) (2022) 3

  25. [25]

    P. Wang, H. Gu, Y. Sun, Tooth segmentation on multimodal images using adapted segment anything model, Scientific Reports 15 (2025) 13874.doi:10.1038/s41598-025-96301-2

  26. [26]

    K. He, X. Chen, S. Xie, Y. Li, P. Dollár, R. Girshick, Masked autoen- coders are scalable vision learners, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16000–16009

  27. [27]

    Ouyang, J

    L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, et al., Training language models to follow instructions with human feedback, Advances in neural information processing systems 35 (2022) 27730–27744

  28. [28]

    S. Tomo, J. R. Lechien, H. S. Bueno, D. F. Cantieri-Debortoli, L. E. Simonato, Accuracy and consistency of chatgpt-3.5 and -4 in providing differential diagnoses in oral and maxillofacial diseases: a comparative diagnostic performance analysis, Clinical Oral Investigations 28 (10) (2024) 544.doi:10.1007/s00784-024-05939-1

  29. [29]

    Suárez, Y

    A. Suárez, Y. Freire, M. Suárez, V. Díaz-Flores García, C. Andreu- Vázquez, I. J. Thuissard Vasallo, A. I. Castillo Varón, C. Martín, Diagnostic performance of multimodal large language models in the 51 analysis of oral pathology, Oral Diseases 31 (12) (2025) 3344–3354. doi:10.1111/odi.70009

  30. [30]

    Rewthamrongsris, J

    P. Rewthamrongsris, J. Burapacheep, E. Phattarataratip, P. Kulthanaamondhita, A. Tichy, F. Schwendicke, T. Osathanon, K. Sappayatosok, Image-based diagnostic performance of llms vs cnns for oral lichen planus: Example-guided and differential diagnosis, International Dental Journal 75 (4) (2025) 100848. doi:10.1016/j.identj.2025.100848

  31. [31]

    D. P. Bubna, N. H. R. Mattos, L. B. D. P. Luiz, F. Baratto-Filho, M. T. Mattos-Calil, Y. T. C. Silva-Sousa, E. C. Küchler, Â. G. D. Schroder, C. M. Araujo, B. M. M. Araujo, Can large language models detect periapical lesions in anterior teeth? a comparative study, Brazilian Dental Journal 36 (2026) e256861.doi:10.1590/0103-644020256861

  32. [32]

    Büker, M

    M. Büker, M. Sümbüllü, H. Arslan, Comparative performance of chat- bots in endodontic clinical decision support: A 4-day accuracy and consistency study, International Dental Journal 75 (5) (2025) 100920. doi:10.1016/j.identj.2025.100920

  33. [33]

    Özbay, D

    Y. Özbay, D. Erdoğan, G. A. Dinçer, Evaluation of the performance of large language models in clinical decision-making in endodontics, BMC Oral Health 25 (2025) 648.doi:10.1186/s12903-025-06050-x

  34. [34]

    L. P. de Araújo, L. B. Moreno, B. C. C. de Araújo, E. T. Chaves, T. M. Botero, V. H. D. Romero, From evidence-based endodontics to generative ai: A comparative study of 11 large language models, Journal of Endodontics (2026) S0099–2399(26)00010–5Advance online publication.doi:10.1016/j.joen.2026.01.009

  35. [35]

    Qutieshat, A

    A. Qutieshat, A. Al Rusheidi, S. Al Ghammari, A. Alarabi, A. Salem, M. Zelihic, Comparative analysis of diagnostic accuracy in endodontic assessments: dental students vs. artificial intelligence, Diagnosis 11 (3) (2024) 259–265.doi:10.1515/dx-2024-0034

  36. [36]

    Amador Barbosa, M

    I. Amador Barbosa, M. Sergio Almeida Alves, P. Rayse Zagalo de Almeida, P. de Almeida Rodrigues, R. Pimentel de Oliveira, S. Au- gusto Fernades de Menezes, J. D. Mendonça de Moura, R. Roberto de Souza Fonseca, Assessing the diagnostic and treatment accuracy of 52 large language models (llms) in peri-implant diseases: A clinical ex- perimental study, Journ...

  37. [37]

    G. S. Chatzopoulos, V. P. Koidou, L. Tsalikis, E. G. Kaklamanos, Large language models in periodontology: Assessing their performance in clinically relevant questions, The Journal of Prosthetic Dentistry 134 (6) (2025) 2328–2336.doi:10.1016/j.prosdent.2024.10.020

  38. [38]

    E. M. Aşar, İ. İpek, K. Bilge, Customized gpt-4v(ision) for ra- diographic diagnosis: can large language model detect supernumer- ary teeth?, BMC Oral Health 25 (1) (2025) 756.doi:10.1186/ s12903-025-06163-3

  39. [39]

    M. G. Kanmaz, G. Agani Sabah, Diagnostic accuracy of large language models in the classification of superior labial frenulum attachments, Odontology (2025).doi:10.1007/s10266-025-01283-2

  40. [40]

    Sezer, T

    B. Sezer, T. Aydoğdu, Performance of advanced artificial intelligence models in traumatic dental injuries in primary dentition: A compar- ative evaluation of chatgpt-4 omni, deepseek, gemini advanced, and claude 3.7 in terms of accuracy, completeness, response time, and read- ability, Applied Sciences 15 (14) (2025).doi:10.3390/app15147778. URLhttps://www...

  41. [41]

    Küçük Keleş, Z

    Ö. Küçük Keleş, Z. B. Arslan, Performance of artificial intelligence chatbots in the diagnosis and management of simulated dental trauma cases: an evaluation based on iadt guidelines, Clinical Oral Investi- gationsPublished online: 23 December 2025 (2026).doi:10.1007/ s00784-025-06716-4

  42. [42]

    Termteerapornpimol, S

    K. Termteerapornpimol, S. Kulvitit, S. Prommanee, Z. Khurshid, T. Porntaveetus, Comparative benchmark of seven large language models for traumatic dental injury knowledge, European Jour- nal of DentistryAdvance online publication (2025).doi:10.1055/ s-0045-1812064

  43. [43]

    X. Wu, G. Cai, B. Guo, L. Ma, S. Shao, J. Yu, Y. Zheng, L. Wang, F. Yang, A multi-dimensional performance evaluation of large language models in dental implantology: comparison of chatgpt, deepseek, grok, 53 gemini and qwen across diverse clinical scenarios, BMC Oral Health 25 (1) (2025) 1272.doi:10.1186/s12903-025-06619-6. URLhttps://doi.org/10.1186/s129...

  44. [44]

    Y. Wu, Y. Zhang, M. Xu, C. Jinzhi, Y. Xue, Y. Zheng, Effectiveness of various general large language models in clinical consensus and case analysis in dental implantology: A comparative study, BMC Medical Informatics and Decision Making 25 (1) (2025) 147.doi:10.1186/ s12911-025-02972-2. URLhttps://doi.org/10.1186/s12911-025-02972-2

  45. [45]

    Y. Mine, T. Taji, S. Takeda, S. Okazaki, T. Y. Peng, N. Kakimoto, T. Murayama, Assessing multimodal large language models for lo- calizing dental implant fixtures on panoramic radiographs, Journal of Dentistry 168 (2026) 106580, advance online publication.doi: 10.1016/j.jdent.2026.106580. URLhttps://doi.org/10.1016/j.jdent.2026.106580

  46. [46]

    M. B. Erden, M. G. Kanmaz, G. A. Sabah, Can chatbots re- place experts? diagnostic accuracy of ai models in classifying im- pacted mandibular third molars, OdontologyAdvance online publica- tion (2025).doi:10.1007/s10266-025-01214-1. URLhttps://doi.org/10.1007/s10266-025-01214-1

  47. [47]

    K. Ji, Z. Wu, J. Han, G. Zhai, J. Liu, Evaluating chatgpt-4’s per- formance on oral and maxillofacial queries: Chain of thought and standard method, Frontiers in Oral Health 6 (2025) 1541976.doi: 10.3389/froh.2025.1541976. URLhttps://doi.org/10.3389/froh.2025.1541976

  48. [48]

    O. B. Kandaz, T. Teksoz, C. Avlayici, et al., Using ai large language models to assess dental history in systemic conditions, Discover Artifi- cial Intelligence 6 (2026) 103.doi:10.1007/s44163-025-00816-6

  49. [49]

    Tayeb, C

    S. Tayeb, C. Barausse, G. Pellegrino, M. Sansavini, R. Pistilli, P. Fe- lice, Comparing artificial intelligence (chatgpt, gemini, deepseek) and oral surgeons in detecting clinically relevant drug–drug interactions in dental therapy, Applied Sciences 15 (23) (2025) 12851.doi: 10.3390/app152312851. 54

  50. [50]

    Rewthamrongsris, V

    P. Rewthamrongsris, V. Thongchotchat, J. Burapacheep, V. Tra- choo, Z. Khurshid, T. Porntaveetus, Evaluating retrieval-augmented generation-large language models for infective endocarditis prophy- laxis: Clinical accuracy and efficiency, International Dental Journal 76 (1) (2026) 109344.doi:10.1016/j.identj.2025.109344

  51. [51]

    Tosun, Z

    B. Tosun, Z. Öztürk, Performance of five large language models in managing acute dental pain: A comprehensive analysis, Turk Endod J 10 (1) (2025) 39–49, doi: 10.14744/TEJ.2025.27147.arXiv:https:// dx.doi.org/10.14744/TEJ.2025.27147,doi:10.14744/TEJ.2025. 27147. URLhttps://dx.doi.org/10.14744/TEJ.2025.27147

  52. [52]

    Shirani, M

    M. Shirani, M. Emami, Performance comparison of large language models in treatment planning for the restoration of endodontically treated teeth over time, Journal of Dentistry 161 (2025) 105998. doi:10.1016/j.jdent.2025.105998

  53. [53]

    E. S. Song, G. H. Kim, S.-P. Lee, Evaluation of gpt-4o and gemini ad- vanced on the korean national dental licensing examination: Accuracy, consistency, and question generation, Journal of Dental Sciences 21 (1) (2026) 96–102.doi:10.1016/j.jds.2025.07.020

  54. [54]

    Dashti, S

    M. Dashti, S. Ghasemi, N. Ghadimi, D. Hefzi, A. Karimian, N. Zare, A. Fahimipour, Z. Khurshid, M. M. Chafjiri, S. Ghaedsharaf, Perfor- mance of chatgpt 3.5 and 4 on u.s. dental examinations: the inbde, adat, and dat, Imaging science in dentistry 54 (3) (2024) 271–275. doi:10.5624/isd.20240037

  55. [55]

    H. C. Nguyen, H. P. Dang, T. L. Nguyen, V. Hoang, V. A. Nguyen, Accuracy of latest large language models in answering multiple choice questions in dentistry: A comparative study, PLOS ONE 20 (1) (2025) e0317423.doi:10.1371/journal.pone.0317423

  56. [56]

    C. C.-C. Lin, J.-S. Sun, C.-H. Chang, Y.-H. Chang, J. Z.-C. Chang, Performance of artificial intelligence chatbots in national dental licens- ing examination, Journal of Dental Sciences 20 (4) (2025) 2307–2314. doi:10.1016/j.jds.2025.05.012. 55

  57. [57]

    Y. Mine, S. Okazaki, T. Taji, H. Kawaguchi, N. Kakimoto, T. Mu- rayama, Benchmarking multimodal large language models on the den- tal licensing examination: Challenges with clinical image interpre- tation, Journal of Dental Sciences 20 (4) (2025) 2427–2435.doi: 10.1016/j.jds.2025.03.018

  58. [58]

    Watanabe, O

    H. Watanabe, O. Uehara, T. Morikawa, T. Kojima, T. Suga, A. Toyofuku, S. Takada, Y. Abiko, Performance of large language models on image-based oral pathology questions from the japanese national dental examination, Journal of Dental Sciences (2025). doi:10.1016/j.jds.2025.08.037. URLhttps://www.sciencedirect.com/science/article/pii/ S1991790225003113

  59. [59]

    M. B. Dundar Sari, B. Sezer, Comparative performance evaluation of chatgpt-4 omni and gemini advanced in the turkish dentistry special- ization exam, BMC Medical Education 26 (2026) 251.doi:10.1186/ s12909-026-08621-0. URLhttps://doi.org/10.1186/s12909-026-08621-0

  60. [60]

    Haberal, D

    M. Haberal, D. Hançerlioğulları, Can artificial intelligence chatbots think like dentists? a comparative analysis based on dental specialty examinationquestionsinrestorativedentistry, BMCOralHealth26(1) (2026) 231.doi:10.1186/s12903-025-07612-9. URLhttps://doi.org/10.1186/s12903-025-07612-9

  61. [61]

    B. E. Yilmaz, B. N. Gokkurt Yilmaz, F. Ozbey, Artificial intelligence performance in answering multiple-choice oral pathology questions: a comparative analysis, BMC Oral Health 25 (1) (2025) 573.doi:10. 1186/s12903-025-05926-2. URLhttps://doi.org/10.1186/s12903-025-05926-2

  62. [62]

    Çakmak, T

    B. Çakmak, T. Sökmen, B. Baloş Tuncer, Artificial intelligence- powered chatbots’ responses to orthodontic questions from the den- tistry specialization examination: Accuracy and source evaluation, Journal of Dental Sciences (2025).doi:10.1016/j.jds.2025.11.027. URLhttps://doi.org/10.1016/j.jds.2025.11.027

  63. [63]

    Tassoker, Chatgpt-4 omni’s superiority in answering multiple-choice 56 oral radiology questions, BMC Oral Health 25 (1) (2025) 173.doi: 10.1186/s12903-025-05554-w

    M. Tassoker, Chatgpt-4 omni’s superiority in answering multiple-choice 56 oral radiology questions, BMC Oral Health 25 (1) (2025) 173.doi: 10.1186/s12903-025-05554-w

  64. [64]

    Wu, K.-Y

    Y.-H. Wu, K.-Y. Tso, C.-P. Chiang, Performance of chatgpt in an- swering the oral pathology questions of various types or subjects from taiwan national dental licensing examinations, Journal of Dental Sci- ences 20 (3) (2025) 1709–1715.doi:10.1016/j.jds.2025.03.030

  65. [65]

    Akkoca, M

    F. Akkoca, M. Özdede, G. İlhan, E. Koyuncu, H. Ellidokuz, Assessing the success of chatgpt-4o in oral radiology education and practice: A pioneering research, Cumhuriyet Dental Journal 28 (2) (2025) 210–215

  66. [66]

    Huang, Y.-P

    C.-Y. Huang, Y.-P. Lee, A. Sun, C.-P. Chiang, Performance of chatgpt- 4, gemini, and deepseek-v3 on answering the multiple choice questions fromtaiwannationaldentaltechnicianlicensingexaminationsandtheir self-learning abilities over a three-week period, Journal of Dental Sci- ences 20 (4) (2025) 2154–2162.doi:10.1016/j.jds.2025.07.011

  67. [67]

    Fukuda, M

    H. Fukuda, M. Morishita, K. Muraoka, S. Yamaguchi, T. Nakamura, M. Habu, I. Yoshioka, S. Awano, K. Ono, Evaluating the accuracy and performance of chatgpt-4o in solving japanese national dental techni- cian examination, International Dental Journal 75 (4) (2025) 100847. doi:10.1016/j.identj.2025.100847

  68. [68]

    Sismanoglu, B

    S. Sismanoglu, B. S. Capan, Performance of artificial intelligence on turkish dental specialization exam: can chatgpt-4.0 and gemini ad- vanced achieve comparable results to humans?, BMC Medical Educa- tion 25 (1) (2025) 214.doi:10.1186/s12909-024-06389-9

  69. [69]

    H. Alqahtani, Assessment of artificial intelligence chatbots in respond- ing to dental occlusion questions: a comparative study, BMC Oral Health 26 (1) (2025) 201.doi:10.1186/s12903-025-07573-z

  70. [70]

    Arılı Öztürk, C

    E. Arılı Öztürk, C. Turan Gökduman, B. C. Çanakçi, Evaluation of the performance of chatgpt-4 and chatgpt-4o as a learning tool in en- dodontics, International Endodontic JournalAdvance online publica- tion (2025).doi:10.1111/iej.14217

  71. [71]

    P. M. Durmazpinar, E. Ekmekci, Comparing diagnostic skills in en- dodontic cases: dental students versus chatgpt-4o, BMC Oral Health 25 (1) (2025) 457.doi:10.1186/s12903-025-05857-y. 57

  72. [72]

    A. A. Azhari, W. M. Ahmed, A. Alhamadani, A. Alfaraj, M. Zhang, C. T. Lu, Assessing the efficacy of artificial intelligence platforms in answeringdentalcariesmultiple-choicequestions: Acomparativestudy of chatgpt and google gemini language models, Dentistry Journal 14 (2) (2026) 72.doi:10.3390/dj14020072

  73. [73]

    Llorente de Pedro, A

    M. Llorente de Pedro, A. Suárez, J. Algar, V. Díaz-Flores García, C. Andreu-Vázquez, Y. Freire, Assessing chatgpt’s reliability in en- dodontics: Implications for ai-enhanced clinical learning, Applied Sci- ences 15 (10) (2025) 5231.doi:10.3390/app15105231

  74. [74]

    Ö. Kurt, E. Şimsek, Knowledge-level comparison in pulpal and pe- riapical diseases: dental students versus artificial intelligence models (gemini, microsoft copilot, chatgpt-3.5, chatgpt-4o): cross-sectional study, BMC Medical Education 25 (1) (2025) 1657.doi:10.1186/ s12909-025-08263-8

  75. [75]

    Sağlam, G

    H. Sağlam, G. P. Sezgin, T. Kaplan, S. S. Kaplan, Artificial intelligence chatbots versus dentists: a comparative knowledge assessment on trau- matic dental injury management, BMC Oral Health 26 (1) (2026) 313. doi:10.1186/s12903-026-07728-6

  76. [76]

    H. E. Kuru, A. Aşık, D. M. Demir, Can artificial intelligence language models effectively address dental trauma questions?, Dental Trauma- tology 41 (5) (2025) 567–580.doi:10.1111/edt.13063

  77. [77]

    Rodrigues-Pereira, M

    P. Rodrigues-Pereira, M. A. P. Dias-Calças, A. Moreira Mélo, M. O. Melchior, L. Gaspar Ribeiro, A. Pazin-Filho, J. F. Mazzi-Chaves, L. V. Magri, Generative artificial intelligence-driven clinical case simulation in temporomandibular disorder education: Chatgpt versus real pa- tients, Journal of Dental EducationAdvance online publication (2025). doi:10.100...

  78. [78]

    Brondani, C

    M. Brondani, C. Alves, C. Ribeiro, M. M. Braga, R. C. M. Gar- cia, T. Ardenghi, K. Pattanaporn, Artificial intelligence, chatgpt, and dental education: Implications for reflective assignments and qualita- tive research, Journal of Dental Education 88 (12) (2024) 1671–1680. doi:10.1002/jdd.13663. 58

  79. [79]

    Dermata, A

    A. Dermata, A. Arhakis, M. A. Makrygiannakis, K. Giannakopoulos, E. G. Kaklamanos, Evaluating the evidence-based potential of six large language models in paediatric dentistry: a comparative study on gener- ative artificial intelligence, European Archives of Paediatric Dentistry 26 (3) (2025) 527–535.doi:10.1007/s40368-025-01012-x

  80. [80]

    Hakami, S

    Z. Hakami, S. A. K. Saheb, O. A. Bawazeer, Orthodontic knowledge assessment: A comparison of five ai chatbots, Saudi Dental Journal 38 (2026) 20.doi:10.1007/s44445-025-00091-2

Showing first 80 references.