Relationformer: A Unified Framework for Image-to-Graph Generation

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Shit, Suprosanna ORCID: https://orcid.org/0000-0003-4435-7207; Koner, Rajat ORCID: https://orcid.org/0000-0003-3441-8192; Wittmann, Bastian; Paetzold, Johannes; Ezhov, Ivan; Li, Hongwei; Pan, Jiazhen; Sharifzadeh, Sahand; Kaissis, Georgios; Tresp, Volker und Menze, Bjoern (2022): Relationformer: A Unified Framework for Image-to-Graph Generation. 17th European Conference on Computer Vision (ECCV 2022), Tel Aviv, Israel, October 23–27, 2022. Avidan, Shai; Brostow, Gabriel; Cissé, Moustapha; Farinella, Giovanni Maria und Hassner, Tal (Hrsg.): In: Computer Vision – ECCV 2022, Lecture Notes in Computer Science Bd. 13697 Cham, Switzerland: Springer. S. 422-439

Volltext auf 'Open Access LMU' nicht verfügbar.

DOI: 10.1007/978-3-031-19836-6_24

Abstract

A comprehensive representation of an image requires understanding objects and their mutual relationship, especially in image-to-graph generation, e.g., road network extraction, blood-vessel network extraction, or scene graph generation. Traditionally, image-to-graph generation is addressed with a two-stage approach consisting of object detection followed by a separate relation prediction, which prevents simultaneous object-relation interaction. This work proposes a unified one-stage transformer-based framework, namely Relationformer that jointly predicts objects and their relations. We leverage direct set-based object prediction and incorporate the interaction among the objects to learn an object-relation representation jointly. In addition to existing [obj]-tokens, we propose a novel learnable token, namely [rln]-token. Together with [obj]-tokens, [rln]-token exploits local and global semantic reasoning in an image through a series of mutual associations. In combination with the pair-wise [obj]-token, the [rln]-token contributes to a computationally efficient relation prediction. We achieve state-of-the-art performance on multiple, diverse and multi-domain datasets that demonstrate our approach’s effectiveness and generalizability. (Code is available at https://github.com/suprosanna/relationformer).

Dokumententyp:	Konferenzbeitrag (Paper)
Fakultät:	Mathematik, Informatik und Statistik > Informatik
Themengebiete:	000 Informatik, Informationswissenschaft, allgemeine Werke > 004 Informatik
ISSN:	0302-9743
Ort:	Cham, Switzerland
Sprache:	Englisch
Dokumenten ID:	110125
Datum der Veröffentlichung auf Open Access LMU:	26. Mrz. 2024 08:49
Letzte Änderungen:	26. Mrz. 2024 08:49

Dokument bearbeiten