Recognition: unknown
Caries DETR: Tooth Structure-aware Prior and Lesion-aware Dynamic Loss Refinement for DETR Based Caries Detection
Pith reviewed 2026-05-08 06:40 UTC · model grok-4.3
The pith
Caries-DETR improves detection of subtle low-contrast dental lesions by injecting tooth structure priors into DETR queries and dynamically reweighting losses for hard examples.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Caries-DETR reaches state-of-the-art results on the AlphaDent and DentalAI datasets by guiding query initialization with high-frequency tooth structure priors and by adaptively reweighting the loss for subtle lesions according to their size, anatomical relevance, and current prediction quality.
What carries the argument
Tooth Structure-aware Query Initialization (TSQI), which combines pre-trained intraoral features with a structure perception branch to embed anatomical priors into the DETR queries, together with Lesion-aware Dynamic Loss Refinement (LDLR), which performs quality-driven hard mining by reweighting losses on the fly.
If this is right
- Standard DETR-style detectors can be specialized for low-contrast medical lesions without redesigning the entire backbone.
- Pre-training on large intraoral photo collections supplies reusable structural priors that improve localization of early-stage pathology.
- Adaptive per-instance loss reweighting based on lesion size and anatomical score offers a general way to handle class imbalance and difficulty variation in detection tasks.
- The resulting model shows improved generalization across two independent public dental datasets, suggesting robustness to variations in imaging conditions.
Where Pith is reading between the lines
- Similar query-initialization priors could be derived for other anatomical sites where lesions are subtle and location-specific, such as early retinal or skin lesions.
- The dynamic loss scheme might transfer to non-medical domains that also feature small, low-contrast targets, such as defect detection in manufacturing imagery.
- If the pre-training corpus size proves critical, smaller medical domains could benefit from synthetic structure priors generated from anatomical models rather than real photos.
Load-bearing premise
The reported gains on the two public datasets arise from the TSQI and LDLR modules rather than from unstated differences in training length, hyperparameters, or dataset-specific tuning.
What would settle it
An ablation experiment that removes TSQI and LDLR while keeping all other training details fixed and still matches or exceeds the full model's average precision on both test sets.
Figures
read the original abstract
As dental caries appear as subtle, low-contrast lesions in intraoral imaging, existing deep learning models face significant challenges in the early detection of caries. While recent Transformer-based detectors have shown promising results in natural images, they often fail to capture the domain-specific anatomical priors crucial for dental caries detection. In this paper, we propose Caries-DETR, a specialized Transformer framework for caries detection in intraoral images. A Tooth Structure-aware Query Initialization (TSQI) is designed, leveraging large-scale intraoral photograph pre-training and a structure perception branch (SPB) to integrate high-frequency structural priors, guiding the model to focus on anatomically significant lesion areas. Furthermore, we design a Lesion-aware Dynamic Loss Refinement (LDLR) to implement quality-driven hard mining through adaptive loss reweighting based on lesion size, anatomical relevance, and prediction quality, optimizing detection for subtle lesions. Extensive experiments on two public datasets (i.e., AlphaDent and DentalAI) demonstrate that Caries-DETR achieves a state-of-the-art performance compared to existing methods and exhibits good generalization and robustness. Code and data at https://github.com/XuefenLiu-SZU/Caries-DETR}{https://github.com/XuefenLiu-SZU/Caries-DETR.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Caries-DETR, a DETR variant for caries detection in intraoral images. It proposes Tooth Structure-aware Query Initialization (TSQI) that leverages large-scale intraoral pre-training and a structure perception branch to inject anatomical priors, plus Lesion-aware Dynamic Loss Refinement (LDLR) that performs adaptive loss reweighting based on lesion size, anatomical relevance, and prediction quality. Experiments on AlphaDent and DentalAI are reported to achieve state-of-the-art detection performance with improved generalization and robustness.
Significance. If the reported gains can be isolated to TSQI and LDLR, the work would offer a concrete example of embedding domain-specific anatomical priors and quality-aware loss modulation into Transformer detectors for low-contrast medical lesions. This could be useful for other subtle-lesion tasks where standard DETR variants underperform due to lack of structural guidance.
major comments (2)
- [§4 (Experiments)] §4 (Experiments): The manuscript reports SOTA results on AlphaDent and DentalAI but supplies no ablation tables that hold training schedule, optimizer, data augmentation, pre-training data, and backbone fixed while toggling only TSQI and LDLR. Without these controls, the headline improvements cannot be attributed to the proposed modules rather than confounding factors such as longer training or dataset-specific hyperparameter search.
- [§3.2 (LDLR)] §3.2 (LDLR): The dynamic loss formulation is described at a high level but lacks the precise mathematical definition of the reweighting function (e.g., how lesion size, anatomical relevance, and quality scores are normalized and combined into per-sample weights). This prevents verification that LDLR is more than a re-packaging of standard hard-mining or focal-loss variants.
minor comments (2)
- [Abstract] The abstract claims 'extensive experiments' yet contains no numerical results, confidence intervals, or even a single mAP value; readers must reach the experimental section to evaluate the strength of the SOTA claim.
- [Figures 2-3] Figure captions and the structure-perception-branch diagram would benefit from explicit annotation of which feature maps are passed to the query initialization versus the detection head.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and positive assessment of the work's potential. We address the two major comments below and will revise the manuscript to strengthen the experimental controls and mathematical clarity.
read point-by-point responses
-
Referee: [§4 (Experiments)] §4 (Experiments): The manuscript reports SOTA results on AlphaDent and DentalAI but supplies no ablation tables that hold training schedule, optimizer, data augmentation, pre-training data, and backbone fixed while toggling only TSQI and LDLR. Without these controls, the headline improvements cannot be attributed to the proposed modules rather than confounding factors such as longer training or dataset-specific hyperparameter search.
Authors: We agree that isolating the contributions of TSQI and LDLR requires strictly controlled ablations. The current experiments include component-wise ablations, but they do not hold every hyperparameter and training detail fixed across all variants. In the revised manuscript we will add a dedicated ablation table in Section 4 that fixes training schedule, optimizer, data augmentation, pre-training corpus, and backbone, toggling only the presence of TSQI and LDLR. This will allow direct attribution of gains to the proposed modules. revision: yes
-
Referee: [§3.2 (LDLR)] §3.2 (LDLR): The dynamic loss formulation is described at a high level but lacks the precise mathematical definition of the reweighting function (e.g., how lesion size, anatomical relevance, and quality scores are normalized and combined into per-sample weights). This prevents verification that LDLR is more than a re-packaging of standard hard-mining or focal-loss variants.
Authors: We acknowledge that the current description of LDLR is at a conceptual level. In the revision we will insert the exact mathematical formulation of the reweighting function in Section 3.2, including the normalization steps for lesion size, anatomical relevance score, and prediction quality, together with the formula that combines them into the per-sample loss weight. The added equations will make explicit how domain-specific dental priors are incorporated, distinguishing LDLR from generic hard-mining or focal-loss approaches. revision: yes
Circularity Check
No circularity: architectural modules and empirical results are independent of inputs
full rationale
The paper introduces TSQI (using pre-training and SPB for structural priors) and LDLR (adaptive loss reweighting) as design choices motivated by domain characteristics of dental images. These are not derived from equations that reduce to fitted parameters or self-citations. Performance claims are measured on held-out public datasets (AlphaDent, DentalAI) against baselines; no self-definitional loops, no 'prediction' that is the fit itself, and no load-bearing uniqueness theorems from the same authors. The derivation chain consists of standard DETR extensions plus empirical validation, which remains falsifiable and non-circular.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Intraoral images contain detectable high-frequency structural priors that can be transferred from large-scale pre-training to guide lesion localization.
- domain assumption Lesion size, anatomical relevance, and prediction quality are reliable signals for adaptive loss reweighting that improves detection of subtle caries.
invented entities (2)
-
Tooth Structure-aware Query Initialization (TSQI)
no independent evidence
-
Lesion-aware Dynamic Loss Refinement (LDLR)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Cariesx- plainer: Enhancing dental caries detection using gradient- weighted class activation mapping and transfer learning
Saira Asghar, Junaid Rashid, and Anum Masood. Cariesx- plainer: Enhancing dental caries detection using gradient- weighted class activation mapping and transfer learning. Multimedia Tools and Applications, pages 1–26, 2025. 3, 8
2025
-
[2]
Sohaib Asif, Vicky Yang Wang, and Dong Xu. Oraltransnet: A novel hybrid model integrating transformer attention and cnn features for accurate diagnosis of mouth and oral dis- eases.Engineering Applications of Artificial Intelligence, 159:111609, 2025. 3, 8
2025
-
[3]
Cascade r-cnn: High quality object detection and instance segmentation.IEEE transactions on pattern analysis and machine intelligence, 43(5):1483–1498, 2019
Zhaowei Cai and Nuno Vasconcelos. Cascade r-cnn: High quality object detection and instance segmentation.IEEE transactions on pattern analysis and machine intelligence, 43(5):1483–1498, 2019. 7
2019
-
[4]
End-to- end object detection with transformers
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to- end object detection with transformers. InEuropean confer- ence on computer vision, pages 213–229. Springer, 2020. 1, 2, 7
2020
-
[5]
You only look one-level feature
Qiang Chen, Yingming Wang, Tong Yang, Xiangyu Zhang, Jian Cheng, and Jian Sun. You only look one-level feature. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 13039–13048, 2021. 7
2021
-
[6]
Disentangle your dense ob- ject detector
Zehui Chen, Chenhongyi Yang, Qiaofei Li, Feng Zhao, Zheng-Jun Zha, and Feng Wu. Disentangle your dense ob- ject detector. InProceedings of the 29th ACM international conference on multimedia, pages 4939–4948, 2021. 7
2021
-
[7]
Digital dental photography: a contemporary revolution.International journal of clinical pediatric dentistry, 6(3):193, 2013
Vela Desai and Dipika Bumb. Digital dental photography: a contemporary revolution.International journal of clinical pediatric dentistry, 6(3):193, 2013. 1
2013
-
[8]
Detection of dental caries in oral photographs taken by mobile phones based on the yolov3 algorithm.An- nals of Translational Medicine, 9(21):1622, 2021
Baichen Ding, Zhuo Zhang, Yiran Liang, Weiwei Wang, Si- wei Hao, Ze Meng, Lian Guan, Ying Hu, Bin Guo, Runlian Zhao, et al. Detection of dental caries in oral photographs taken by mobile phones based on the yolov3 algorithm.An- nals of Translational Medicine, 9(21):1622, 2021. 2
2021
-
[9]
Yolov3: An incremental improvement
Ali Farhadi, Joseph Redmon, et al. Yolov3: An incremental improvement. InComputer vision and pattern recognition, volume 1804, pages 1–6. Springer Berlin/Heidelberg, Ger- many, 2018. 2
2018
-
[10]
John Wiley & Sons, 2015
Ole Fejerskov, Bente Nyvad, and Edwina Kidd.Dental caries: the disease and its clinical management. John Wiley & Sons, 2015. 1
2015
-
[11]
Tood: Task-aligned one-stage object detec- tion
Chengjian Feng, Yujie Zhong, Yu Gao, Matthew R Scott, and Weilin Huang. Tood: Task-aligned one-stage object detec- tion. In2021 IEEE/CVF International Conference on Com- puter Vision (ICCV), pages 3490–3499. IEEE Computer So- ciety, 2021. 7
2021
-
[12]
Detection and diagnosis of the early caries lesion
J Gomez. Detection and diagnosis of the early caries lesion. BMC oral health, 15(Suppl 1):S3, 2015. 1
2015
-
[13]
Salience detr: Enhancing detection trans- former with hierarchical salience filtering refinement
Xiuquan Hou, Meiqin Liu, Senlin Zhang, Ping Wei, and Badong Chen. Salience detr: Enhancing detection trans- former with hierarchical salience filtering refinement. In Proceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 17574–17583, 2024. 7, 8
2024
-
[14]
Real-time object detection meets dinov3.arXiv,
Shihua Huang, Yongjie Hou, Longfei Liu, Xuanlong Yu, and Xi Shen. Real-time object detection meets dinov3.arXiv,
-
[15]
Dq-detr: Detr with dynamic query for tiny object de- tection
Yi-Xin Huang, Hou-I Liu, Hong-Han Shuai, and Wen-Huang Cheng. Dq-detr: Detr with dynamic query for tiny object de- tection. InEuropean Conference on Computer Vision, pages 290–305. Springer, 2024. 2
2024
-
[16]
Cariesfg: A fine-grained rgb image classification framework with attention mechanism for dental caries.Engi- neering Applications of Artificial Intelligence, 123:106306,
Hao Jiang, Peiliang Zhang, Chao Che, Bo Jin, and Yongjun Zhu. Cariesfg: A fine-grained rgb image classification framework with attention mechanism for dental caries.Engi- neering Applications of Artificial Intelligence, 123:106306,
-
[17]
Digital image analysis and visualization of early caries changes in human teeth.Mate- rials Science-Poland, 23(2), 2005
El ˙zbieta Kaczmarek, Anna Surdacka, Teresa Matthews- Brzozowska, and B Miskowiak. Digital image analysis and visualization of early caries changes in human teeth.Mate- rials Science-Poland, 23(2), 2005. 1
2005
-
[18]
Global burden of untreated caries: a systematic review and metaregression.Journal of dental research, 94(5):650–658, 2015
Nicholas J Kassebaum, E Bernab ´e, M Dahiya, B Bhandari, CJL Murray, and W Marcenes. Global burden of untreated caries: a systematic review and metaregression.Journal of dental research, 94(5):650–658, 2015. 1
2015
-
[19]
Cnn-based remote dental diagnosis model for caries detec- tion with grad-cam.Scientific Reports, 15(1):26555, 2025
Donghyeok Kim, Jangkyum Kim, and Seong Gon Choi. Cnn-based remote dental diagnosis model for caries detec- tion with grad-cam.Scientific Reports, 15(1):26555, 2025. 3, 8
2025
-
[20]
Accuracy of clinical photography for the detection of dental caries: A systematic review and meta-analysis.Jour- nal of Dentistry, 2025
Jason Chi Kit Ku, Kaijing Mao, Feifei Wang, Adriana da Fonte Porto Carreiro, Walter Yu Hang Lam, and Ollie Yiru Yu. Accuracy of clinical photography for the detection of dental caries: A systematic review and meta-analysis.Jour- nal of Dentistry, 2025. 1
2025
-
[21]
Detection and diagnosis of dental caries using a deep learning-based convolutional neural network al- gorithm.Journal of dentistry, 77:106–111, 2018
Jae-Hong Lee, Do-Hyung Kim, Seong-Nyum Jeong, and Seong-Ho Choi. Detection and diagnosis of dental caries using a deep learning-based convolutional neural network al- gorithm.Journal of dentistry, 77:106–111, 2018. 1
2018
-
[22]
Focal loss for dense object detection
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll´ar. Focal loss for dense object detection. InPro- ceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017. 1, 2, 7
2017
-
[23]
Dab-detr: Dynamic anchor boxes are better queries for detr,
Shilong Liu, Feng Li, Hao Zhang, Xiao Yang, Xianbiao Qi, Hang Su, Jun Zhu, and Lei Zhang. Dab-detr: Dynamic anchor boxes are better queries for detr.arXiv preprint arXiv:2201.12329, 2022. 2, 7
-
[24]
Grounding dino: Marrying dino with grounded pre-training for open-set object detection
Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Qing Jiang, Chunyuan Li, Jianwei Yang, Hang Su, et al. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. InEuro- pean conference on computer vision, pages 38–55. Springer,
-
[25]
Enhancing den- tal disease classification with agent attention infused vision transformer in conformer architecture.Biomedical Signal Processing and Control, 112:108373, 2026
Wanxin Liu, Xuxia Wang, and Jun Zhang. Enhancing den- tal disease classification with agent attention infused vision transformer in conformer architecture.Biomedical Signal Processing and Control, 112:108373, 2026. 3, 8
2026
-
[26]
Grid r-cnn
Xin Lu, Buyu Li, Yuxin Yue, Quanquan Li, and Junjie Yan. Grid r-cnn. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7363–7372,
-
[27]
Conditional detr for fast training convergence
Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, and Jingdong Wang. Conditional detr for fast training convergence. InProceed- ings of the IEEE/CVF international conference on computer vision, pages 3651–3660, 2021. 2, 7
2021
-
[28]
Faster r-cnn: Towards real-time object detection with region proposal networks.Advances in neural information process- ing systems, 28, 2015
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks.Advances in neural information process- ing systems, 28, 2015. 1, 2, 7
2015
-
[29]
Rf-detr: Neural architecture search for real-time detection transformers, 2025
Isaac Robinson, Peter Robicheaux, Matvei Popov, Deva Ra- manan, and Neehar Peri. Rf-detr: Neural architecture search for real-time detection transformers, 2025. 2, 7, 8
2025
-
[30]
Dental caries early detection using convolutional neural network for tele dentistry
Devesh Saini, Richa Jain, and Anita Thakur. Dental caries early detection using convolutional neural network for tele dentistry. In2021 7th International Conference on Ad- vanced Computing and Communication Systems (ICACCS), volume 1, pages 958–963. IEEE, 2021. 2
2021
-
[31]
Dental radiology: a convolutional neural network-based approach to detect dental disorders from den- tal images in a real-time environment.Multimedia Systems, 29(6):3179–3191, 2023
Humaira Shafiq, Ghulam Gilanie, Muhammad Sajid, and Muhammad Ahsan. Dental radiology: a convolutional neural network-based approach to detect dental disorders from den- tal images in a real-time environment.Multimedia Systems, 29(6):3179–3191, 2023. 8
2023
-
[32]
Sosnin, Yuriy L
Evgeniy I. Sosnin, Yuriy L. Vasilev, Roman A. Solovyev, Aleksandr L. Stempkovskiy, Dmitry V . Telpukhov, Artem A. Vasilev, Aleksandr A. Amerikanov, and Aleksandr Y . Ro- manov. Alphadent: A dataset for automated tooth pathology detection, 2025. 5
2025
-
[33]
Sparse r-cnn: End-to-end ob- ject detection with learnable proposals
Peize Sun, Rufeng Zhang, Yi Jiang, Tao Kong, Chen- feng Xu, Wei Zhan, Masayoshi Tomizuka, Lei Li, Zehuan Yuan, Changhu Wang, et al. Sparse r-cnn: End-to-end ob- ject detection with learnable proposals. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14454–14463, 2021. 7
2021
-
[34]
Dental pho- tography using digital single-lens reflex cameras vs smart- phones.AJO-DO Clinical Companion, 5(1):26–34, 2025
Kathryn Teruya, Jae Hyun Park, and Curt Bay. Dental pho- tography using digital single-lens reflex cameras vs smart- phones.AJO-DO Clinical Companion, 5(1):26–34, 2025. 1
2025
-
[35]
YOLOv12: Attention-Centric Real-Time Object Detectors
Yunjie Tian, Qixiang Ye, and David Doermann. Yolov12: Attention-centric real-time object detectors.arXiv preprint arXiv:2502.12524, 2025. 2, 7, 8
work page internal anchor Pith review arXiv 2025
-
[36]
Dentalai computer vision project, 2023
Pawan Valluri. Dentalai computer vision project, 2023. 5
2023
-
[37]
Attention is all you need.Advances in neural information processing systems, 30, 2017
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017. 2
2017
-
[38]
Chang-Bin Zhang, Yujie Zhong, and Kai Han. Mr. detr: In- structive multi-route training for detection transformers. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, 2025. 2, 7, 8
2025
-
[39]
Dynamic r-cnn: Towards high quality object detection via dynamic training
Hongkai Zhang, Hong Chang, Bingpeng Ma, Naiyan Wang, and Xilin Chen. Dynamic r-cnn: Towards high quality object detection via dynamic training. InEuropean conference on computer vision, pages 260–275. Springer, 2020. 7
2020
-
[40]
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M Ni, and Heung-Yeung Shum. Dino: Detr with improved denoising anchor boxes for end-to-end object detection.arXiv preprint arXiv:2203.03605, 2022. 1, 2, 7, 8
work page internal anchor Pith review arXiv 2022
-
[41]
Multi-category fusion contrastive learning with core data selection for robust rgb image-based den- tal caries classification.Information Fusion, page 103390,
Peiliang Zhang, Yaru Chen, Yunjiong Liu, Chao Che, and Yongjun Zhu. Multi-category fusion contrastive learning with core data selection for robust rgb image-based den- tal caries classification.Information Fusion, page 103390,
-
[42]
Dense distinct query for end-to-end object detection
Shilong Zhang, Xinjiang Wang, Jiaqi Wang, Jiangmiao Pang, Chengqi Lyu, Wenwei Zhang, Ping Luo, and Kai Chen. Dense distinct query for end-to-end object detection. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 7329–7338, 2023. 7
2023
-
[43]
Xingyi Zhou, Dequan Wang, and Philipp Kr ¨ahenb¨uhl. Ob- jects as points.arXiv preprint arXiv:1904.07850, 2019. 2, 7
work page Pith review arXiv 1904
-
[44]
Feature se- lective anchor-free module for single-shot object detection
Chenchen Zhu, Yihui He, and Marios Savvides. Feature se- lective anchor-free module for single-shot object detection. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 840–849, 2019. 7
2019
-
[45]
Deformable detr: Deformable transformers for end-to-end object detection
Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. Deformable detr: Deformable transformers for end-to-end object detection. InInternational Conference on Learning Representations, 2021. 1, 2, 7
2021
-
[46]
Detrs with col- laborative hybrid assignments training
Zhuofan Zong, Guanglu Song, and Yu Liu. Detrs with col- laborative hybrid assignments training. InProceedings of the IEEE/CVF international conference on computer vision, pages 6748–6758, 2023. 2, 7, 8 10
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.