Abstract
The rise of multimodal content on social media underscores the need for automated text-image clustering in the social sciences (Peng et al., 2023). As global issues are discussed across multiple languages simultaneously, applying multimodal models like CLIP beyond single-language contexts becomes important. However, using these models for multilingual tasks presents a challenge of language bias, remaining underexplored in text-image settings. This study examines the applicability of multimodal models for multilingual social media data, focusing on language bias in text-image clustering. We apply the CLIP model to a multilingual dataset of 65,685 Instagram images with OCR-generated texts collected during COP28. Building on prior research in multilingual analysis (e.g. Reber, 2019), we combine image embeddings with text embeddings from either (a) the original texts or (b) English translations by Google. The k-means clustering results are then compared. The analysis reveals biases in the multimodal clustering approach using CLIP, particularly in its tendency to cluster data by language rather than content. Contrary to expectations, machine translation only marginally mitigates these biases, which become more pronounced when images are weighted more heavily for classification. This suggests that biases are rooted not only in the textual but also in image features, particularly in how the visual representation of texts differs across languages. We explore potential techniques for mitigating language bias, such as inpainting to neutralize text-based visual elements and vision-language models for generating more language-agnostic textual descriptions of images. The findings highlight the potential and limitations of these approaches, offering new avenues for multimodal analysis.
| Dokumententyp: | Konferenzbeitrag (Vortrag) |
|---|---|
| Keywords: | multimodality; language bias; text-image clustering |
| Fakultät: | Sozialwissenschaften > Institut für Kommunikationswissenschaft und Medienforschung (IfKW) |
| Themengebiete: | 300 Sozialwissenschaften > 380 Handel, Kommunikation, Verkehr |
| Sprache: | Englisch |
| Dokumenten ID: | 131655 |
| Datum der Veröffentlichung auf Open Access LMU: | 28. Jan. 2026 08:21 |
| Letzte Änderungen: | 29. Jan. 2026 06:18 |
