From inconsistent annotations to ground truth: Aggregation strategies for annotations of proximal carious lesions in dental imagery

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Klein, Vanessa ORCID: https://orcid.org/0009-0000-0706-8794; Büttner, Martha; Göstemeyer, Gerd ORCID: https://orcid.org/0000-0003-3128-3616; Rolle, Sarina; Tichy, Antonin ORCID: https://orcid.org/0000-0002-6260-9992; Schwendicke, Falk ORCID: https://orcid.org/0000-0003-1223-1669 und Nordblom, Noah F. ORCID: https://orcid.org/0000-0002-2705-4194 (Juni 2025): From inconsistent annotations to ground truth: Aggregation strategies for annotations of proximal carious lesions in dental imagery. In: Journal of Dentistry, Bd. 157, 105728 [PDF, 1MB]

[thumbnail of 1-s2.0-S0300571225001733-main.pdf]

Vorschau

Creative Commons: Namensnennung 4.0 (CC-BY)

Veröffentlichte Version

DOI: 10.1016/j.jdent.2025.105728

Abstract

Objectives

Annotating carious lesions on images is challenging. For artificial intelligence (AI) applications, the aggregation of heterogeneous multi-examiner annotations into one single annotation (e.g. via majority voting, MV) is usually needed. We assessed different aggregation strategies for multi-examiner annotations of primary proximal carious lesions on orthoradial radiographs and Near-Infrared Light Transillumination (NILT) images.

Methods

A total of 1007 proximal surfaces from 522 extracted posterior teeth were assessed by five dentists. Histological analysis provided the gold standard. Surfaces were classified as (1) sound, (2) enamel lesion or (3) dentin lesion. Four label aggregation strategies - MV, Weighted Majority Voting (WMV), Dawid-Skene (DS), and multi-annotator competence estimation (MACE) - were applied to unimodal (radiographs, NILT) and multimodal (combined) datasets. The area under the receiver operating characteristic curve (AUROC) was the primary outcome metric.

Results

According to the gold standard, 637 (63 %) surfaces were sound, 280 (28 %) showed carious lesions limited to the enamel, and 90 (9 %) showed lesions extending into the dentin. For radiographs, aggregation using MACE outperformed MV, WMV and DS significantly across all lesion depths (p < 0.002). For NILT, MACE significantly outperformed MV across all lesion depths (p < 0.001) and DS for enamel and dentin lesions (p ≤ 0.002). In the multimodal dataset, DS outperformed the other label aggregation strategies across all lesion depths significantly (p < 0.05).

Conclusions

The commonly applied MV may be suboptimal. There is a need for informed application of specific aggregation strategies, depending on the dataset characteristics.

Clinical significance

Most AI applications for dental image analysis are trained on a single annotation, usually resulting from aggregated multi-examiner annotations of each image. However, since these annotations are usually aggregated in an in vivo setting where no definitive ground truth is available, the choice of aggregation strategy plays a crucial role.

Dokumententyp:	Zeitschriftenartikel
Fakultät:	Medizin > Klinikum der LMU München > Poliklinik für Zahnerhaltung und Parodontologie
Themengebiete:	600 Technik, Medizin, angewandte Wissenschaften > 610 Medizin und Gesundheit
URN:	urn:nbn:de:bvb:19-epub-126751-5
ISSN:	03005712
Sprache:	Englisch
Dokumenten ID:	126751
Datum der Veröffentlichung auf Open Access LMU:	12. Jun. 2025 05:21
Letzte Änderungen:	12. Jun. 2025 05:21

Dokument bearbeiten