ORCID: https://orcid.org/0009-0000-0706-8794; Büttner, Martha; Göstemeyer, Gerd
ORCID: https://orcid.org/0000-0003-3128-3616; Rolle, Sarina; Tichy, Antonin
ORCID: https://orcid.org/0000-0002-6260-9992; Schwendicke, Falk
ORCID: https://orcid.org/0000-0003-1223-1669 und Nordblom, Noah F.
ORCID: https://orcid.org/0000-0002-2705-4194
(Juni 2025):
From inconsistent annotations to ground truth: Aggregation strategies for annotations of proximal carious lesions in dental imagery.
In: Journal of Dentistry, Bd. 157, 105728
[PDF, 1MB]

Abstract
Objectives
Annotating carious lesions on images is challenging. For artificial intelligence (AI) applications, the aggregation of heterogeneous multi-examiner annotations into one single annotation (e.g. via majority voting, MV) is usually needed. We assessed different aggregation strategies for multi-examiner annotations of primary proximal carious lesions on orthoradial radiographs and Near-Infrared Light Transillumination (NILT) images.
Methods
A total of 1007 proximal surfaces from 522 extracted posterior teeth were assessed by five dentists. Histological analysis provided the gold standard. Surfaces were classified as (1) sound, (2) enamel lesion or (3) dentin lesion. Four label aggregation strategies - MV, Weighted Majority Voting (WMV), Dawid-Skene (DS), and multi-annotator competence estimation (MACE) - were applied to unimodal (radiographs, NILT) and multimodal (combined) datasets. The area under the receiver operating characteristic curve (AUROC) was the primary outcome metric.
Results
According to the gold standard, 637 (63 %) surfaces were sound, 280 (28 %) showed carious lesions limited to the enamel, and 90 (9 %) showed lesions extending into the dentin. For radiographs, aggregation using MACE outperformed MV, WMV and DS significantly across all lesion depths (p < 0.002). For NILT, MACE significantly outperformed MV across all lesion depths (p < 0.001) and DS for enamel and dentin lesions (p ≤ 0.002). In the multimodal dataset, DS outperformed the other label aggregation strategies across all lesion depths significantly (p < 0.05).
Conclusions
The commonly applied MV may be suboptimal. There is a need for informed application of specific aggregation strategies, depending on the dataset characteristics.
Clinical significance
Most AI applications for dental image analysis are trained on a single annotation, usually resulting from aggregated multi-examiner annotations of each image. However, since these annotations are usually aggregated in an in vivo setting where no definitive ground truth is available, the choice of aggregation strategy plays a crucial role.
Dokumententyp: | Zeitschriftenartikel |
---|---|
Fakultät: | Medizin > Klinikum der LMU München > Poliklinik für Zahnerhaltung und Parodontologie |
Themengebiete: | 600 Technik, Medizin, angewandte Wissenschaften > 610 Medizin und Gesundheit |
URN: | urn:nbn:de:bvb:19-epub-126751-5 |
ISSN: | 03005712 |
Sprache: | Englisch |
Dokumenten ID: | 126751 |
Datum der Veröffentlichung auf Open Access LMU: | 12. Jun. 2025 05:21 |
Letzte Änderungen: | 12. Jun. 2025 05:21 |