Logo Logo
Hilfe
Hilfe
Switch Language to English

Schwärzler, Julia ORCID logoORCID: https://orcid.org/0009-0003-9304-2954; Tolstaya, Ekaterina ORCID logoORCID: https://orcid.org/0000-0002-8893-2683; Tichy, Antonin ORCID logoORCID: https://orcid.org/0000-0002-6260-9992; Paris, Sebastian ORCID logoORCID: https://orcid.org/0000-0002-1302-8761; Aarabi, Ghazal ORCID logoORCID: https://orcid.org/0000-0001-5484-2594; Chaurasia, Akhilanand ORCID logoORCID: https://orcid.org/0000-0002-8356-9512; Malenova, Yoana; Steybe, David und Schwendicke, Falk ORCID logoORCID: https://orcid.org/0000-0003-1223-1669 (2025): Machine learning versus clinicians for detection and classification of oral mucosal lesions. In: Journal of Dentistry, Bd. 161, 105992 [PDF, 10MB]

[thumbnail of 1-s2.0-S0300571225004361-main.pdf]
Vorschau
Creative Commons: Namensnennung 4.0 (CC-BY)
Veröffentlichte Version

Abstract

Objectives

The detection and classification of oral mucosal lesions is a challenging task due to high heterogeneity and overlap in clinical appearance. Nevertheless, differentiating benign from potentially malignant lesions is essential for appropriate management. This study evaluated whether a deep learning model trained to discriminate 11 classes of oral mucosal lesions could exceed the performance of general dentists.

Methods

4079 intraoral photographs of benign, potentially malignant and malignant oral lesions were labeled using bounding boxes and classified into 11 classes. The data were split 80:20 for training (n = 3031) and validation (n = 766), keeping an independent test set (n = 282). The YOLOv8 computer vision model was implemented for image classification and object detection. Model performance was evaluated on the test set which was also assessed by six general dentists and three specialists in oral surgery. Evaluation metrics included sensitivity, specificity, F1-score, precision, area under the receiver operating characteristic curve (AUROC), and average precision (AP) at multiple thresholds of intersection over union.

Results

In terms of classification, the highest F1-score (0.80) and AUROC (0.96) were observed for human papillomavirus (HPV)-related lesions, whereas the lowest F1-score (0.43) and AUROC (0.78) were obtained for keratosis. In terms of object detection, the best results were achieved for HPV-related lesions (AP25 = 0.82) and proliferative verrucous leukoplakia (AP25 = 0.80; AP50 = 0.76), while the lowest values were noted for leukoplakia (AP25 = 0.36; AP50 = 0.20). Overall, the model performed comparable to specialists (p = 0.93) and significantly better than general dentists (p < 0.01).

Conclusion

The developed model performed as well as specialists in oral surgery, highlighting its potential as a valuable tool for oral lesion assessment.

Clinical significance

By providing performance comparable to oral surgeons and superior to general dentists, the developed multi-class model could support the clinical evaluation of oral lesions, potentially enabling earlier diagnosis of potentially malignant disorders, enhancing patient management and improving patient prognosis.

Dokument bearbeiten Dokument bearbeiten