Adaptive Multi-Resolution Attention with Linear Complexity

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Zhang, Yao; Ma, Yunpu ORCID: https://orcid.org/0000-0001-6112-8794; Seidl, Thomas und Tresp, Volker ORCID: https://orcid.org/0000-0001-9428-3686 (2023): Adaptive Multi-Resolution Attention with Linear Complexity. 2023 International Joint Conference on Neural Networks (IJCNN), Gold Coast, Australia, 18-23 June 2023. In: IJCNN 2023, International Joint Conference on Neural Networks, 18-23 June 2023, Gold Coast Convention and Exhibition Centre, Queensland, Australia, Piscataway, NJ, USA: IEEE.

Volltext auf 'Open Access LMU' nicht verfügbar.

DOI: 10.1109/IJCNN54540.2023.10191567

Abstract

Transformers have improved the state-of-the-art across numerous tasks in sequence modeling. Besides the quadratic computational and memory complexity with respect to the sequence length, the self-attention mechanism only processes information at the same scale, i.e., all attention heads are in the same resolution, resulting in the limited power of the Transformer. To remedy this, we propose a novel and efficient structure named Adaptive Multi-Resolution Attention (AdaMRA for short), which scales linearly to sequence length in terms of time and space. Specifically, we leverage a multi-resolution multi-head attention mechanism, enabling attention heads to capture long-range contextual information in a coarse-to-fine fashion. Moreover, to capture the potential relations between query representation and clues of different attention granularities, we leave the decision of which resolution of attention to use to query, which further improves the model's capacity compared to the vanilla Transformer. In an effort to reduce complexity, we adopt kernel attention without degrading the performance. Extensive experiments demonstrate the effectiveness and efficiency of our model by achieving state-of-the-art speed-memory-accuracy trade-off. To facilitate AdaMRA utilization by the scientific community, the implementation will be made publicly available.

Dokumententyp:	Konferenzbeitrag (Paper)
Fakultät:	Mathematik, Informatik und Statistik > Informatik Mathematik, Informatik und Statistik > Informatik > Künstliche Intelligenz und Maschinelles Lernen
Themengebiete:	000 Informatik, Informationswissenschaft, allgemeine Werke > 004 Informatik
ISBN:	978-1-6654-8867-9 ; 978-1-6654-8868-6
Ort:	Piscataway, NJ, USA
Sprache:	Englisch
Dokumenten ID:	121946
Datum der Veröffentlichung auf Open Access LMU:	29. Okt. 2024 15:12
Letzte Änderungen:	29. Okt. 2024 15:12

Dokument bearbeiten