ORCID: https://orcid.org/0000-0001-7363-4299; Chew, Rob; Plank, Barbara und Kreuter, Frauke
ORCID: https://orcid.org/0000-0002-7339-2645
(2025):
Aligning NLP Models with Target Population Perspectives using PAIR: Population-Aligned Instance Replication.
4th Workshop on Perspectivist Approaches to NLP, Suzhou, China, 8. November 2025.
Abercrombie, Gavin; Basile, Valerio; Frenda, Simona; Tonelli, Sara und Dudy, Shiran (eds.) :
In: Proceedings of the The 4th Workshop on Perspectivist Approaches to NLP,
Kerrville: Association for Computational Linguistics. pp. 100-110
[PDF, 724kB]
This is the latest version of this item.
Abstract
Models trained on crowdsourced annotations may not reflect population views, if those who work as annotators do not represent the broader population. In this paper, we propose PAIR: Population-Aligned Instance Replication, a post-processing method that adjusts training data to better reflect target population characteristics without collecting additional annotations. Using simulation studies on offensive language and hate speech detection with varying annotator compositions, we show that non-representative pools degrade model calibration while leaving accuracy largely unchanged. PAIR corrects these calibration problems by replicating annotations from underrepresented annotator groups to match population proportions. We conclude with recommendations for improving the representativity of training data and model performance.
| Item Type: | Conference or Workshop Item (Paper) |
|---|---|
| EU Funded Grant Agreement Number: | 101043235 |
| EU Projects: | Horizon Europe > ERC Grants |
| Faculties: | Mathematics, Computer Science and Statistics > Statistics > Chairs/Working Groups > Chair for Statistics and Data Science in Social Sciences and the Humanities |
| Subjects: | 300 Social sciences > 310 Statistics |
| URN: | urn:nbn:de:bvb:19-epub-130361-2 |
| Place of Publication: | Kerrville |
| Language: | English |
| Item ID: | 130361 |
| Date Deposited: | 20. Jan 2026 11:02 |
| Last Modified: | 20. Jan 2026 11:02 |
Available Versions of this Item
-
Aligning NLP Models with Target Population Perspectives using PAIR: Population-Aligned Instance Replication. (deposited 03. Dec 2025 08:40)
- Aligning NLP Models with Target Population Perspectives using PAIR: Population-Aligned Instance Replication. (deposited 20. Jan 2026 11:02) [Currently Displayed]
