Machine learning versus clinicians for detection and classification of oral mucosal lesions

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Schwärzler, Julia ORCID: https://orcid.org/0009-0003-9304-2954; Tolstaya, Ekaterina ORCID: https://orcid.org/0000-0002-8893-2683; Tichy, Antonin ORCID: https://orcid.org/0000-0002-6260-9992; Paris, Sebastian ORCID: https://orcid.org/0000-0002-1302-8761; Aarabi, Ghazal ORCID: https://orcid.org/0000-0001-5484-2594; Chaurasia, Akhilanand ORCID: https://orcid.org/0000-0002-8356-9512; Malenova, Yoana; Steybe, David und Schwendicke, Falk ORCID: https://orcid.org/0000-0003-1223-1669 (2025): Machine learning versus clinicians for detection and classification of oral mucosal lesions. In: Journal of Dentistry, Bd. 161, 105992 [PDF, 10MB]

[thumbnail of 1-s2.0-S0300571225004361-main.pdf]

Vorschau

Creative Commons: Namensnennung 4.0 (CC-BY)

Veröffentlichte Version

DOI: 10.1016/j.jdent.2025.105992

Abstract

Objectives

The detection and classification of oral mucosal lesions is a challenging task due to high heterogeneity and overlap in clinical appearance. Nevertheless, differentiating benign from potentially malignant lesions is essential for appropriate management. This study evaluated whether a deep learning model trained to discriminate 11 classes of oral mucosal lesions could exceed the performance of general dentists.

Methods

4079 intraoral photographs of benign, potentially malignant and malignant oral lesions were labeled using bounding boxes and classified into 11 classes. The data were split 80:20 for training (n = 3031) and validation (n = 766), keeping an independent test set (n = 282). The YOLOv8 computer vision model was implemented for image classification and object detection. Model performance was evaluated on the test set which was also assessed by six general dentists and three specialists in oral surgery. Evaluation metrics included sensitivity, specificity, F1-score, precision, area under the receiver operating characteristic curve (AUROC), and average precision (AP) at multiple thresholds of intersection over union.

Results

In terms of classification, the highest F1-score (0.80) and AUROC (0.96) were observed for human papillomavirus (HPV)-related lesions, whereas the lowest F1-score (0.43) and AUROC (0.78) were obtained for keratosis. In terms of object detection, the best results were achieved for HPV-related lesions (AP25 = 0.82) and proliferative verrucous leukoplakia (AP25 = 0.80; AP50 = 0.76), while the lowest values were noted for leukoplakia (AP25 = 0.36; AP50 = 0.20). Overall, the model performed comparable to specialists (p = 0.93) and significantly better than general dentists (p < 0.01).

Conclusion

The developed model performed as well as specialists in oral surgery, highlighting its potential as a valuable tool for oral lesion assessment.

Clinical significance

By providing performance comparable to oral surgeons and superior to general dentists, the developed multi-class model could support the clinical evaluation of oral lesions, potentially enabling earlier diagnosis of potentially malignant disorders, enhancing patient management and improving patient prognosis.

Dokumententyp:	Zeitschriftenartikel
Fakultät:	Medizin > Klinikum der LMU München > Poliklinik für Zahnerhaltung und Parodontologie
Themengebiete:	600 Technik, Medizin, angewandte Wissenschaften > 610 Medizin und Gesundheit
URN:	urn:nbn:de:bvb:19-epub-128968-8
ISSN:	03005712
Sprache:	Englisch
Dokumenten ID:	128968
Datum der Veröffentlichung auf Open Access LMU:	22. Okt. 2025 12:45
Letzte Änderungen:	22. Okt. 2025 12:45

Dokument bearbeiten