Logo Logo
Hilfe
Hilfe
Switch Language to English

Klein, Vanessa ORCID logoORCID: https://orcid.org/0009-0000-0706-8794; Büttner, Martha; Göstemeyer, Gerd ORCID logoORCID: https://orcid.org/0000-0003-3128-3616; Rolle, Sarina; Tichy, Antonin ORCID logoORCID: https://orcid.org/0000-0002-6260-9992; Schwendicke, Falk ORCID logoORCID: https://orcid.org/0000-0003-1223-1669 und Nordblom, Noah F. ORCID logoORCID: https://orcid.org/0000-0002-2705-4194 (Juni 2025): From inconsistent annotations to ground truth: Aggregation strategies for annotations of proximal carious lesions in dental imagery. In: Journal of Dentistry, Bd. 157, 105728 [PDF, 1MB]

Abstract

Objectives

Annotating carious lesions on images is challenging. For artificial intelligence (AI) applications, the aggregation of heterogeneous multi-examiner annotations into one single annotation (e.g. via majority voting, MV) is usually needed. We assessed different aggregation strategies for multi-examiner annotations of primary proximal carious lesions on orthoradial radiographs and Near-Infrared Light Transillumination (NILT) images.

Methods

A total of 1007 proximal surfaces from 522 extracted posterior teeth were assessed by five dentists. Histological analysis provided the gold standard. Surfaces were classified as (1) sound, (2) enamel lesion or (3) dentin lesion. Four label aggregation strategies - MV, Weighted Majority Voting (WMV), Dawid-Skene (DS), and multi-annotator competence estimation (MACE) - were applied to unimodal (radiographs, NILT) and multimodal (combined) datasets. The area under the receiver operating characteristic curve (AUROC) was the primary outcome metric.

Results

According to the gold standard, 637 (63 %) surfaces were sound, 280 (28 %) showed carious lesions limited to the enamel, and 90 (9 %) showed lesions extending into the dentin. For radiographs, aggregation using MACE outperformed MV, WMV and DS significantly across all lesion depths (p < 0.002). For NILT, MACE significantly outperformed MV across all lesion depths (p < 0.001) and DS for enamel and dentin lesions (p ≤ 0.002). In the multimodal dataset, DS outperformed the other label aggregation strategies across all lesion depths significantly (p < 0.05).

Conclusions

The commonly applied MV may be suboptimal. There is a need for informed application of specific aggregation strategies, depending on the dataset characteristics.

Clinical significance

Most AI applications for dental image analysis are trained on a single annotation, usually resulting from aggregated multi-examiner annotations of each image. However, since these annotations are usually aggregated in an in vivo setting where no definitive ground truth is available, the choice of aggregation strategy plays a crucial role.

Dokument bearbeiten Dokument bearbeiten