Logo Logo
Help
Contact
Switch Language to German

Böhm, Christian; Perdacher, Martin and Plant, Claudia (2016): Cache-oblivious Loops Based on a Novel Space-filling Curve. 2016 IEEE International Conference on Big Data (Big Data), Washington D.C., USA, 5-8 December 2016. IEEE Computer Society. pp. 17-26

Full text not available from 'Open Access LMU'.

Abstract

Modern microprocessors offer a rich memory hierarchy including various levels of cache and registers. Some of these memories (like main memory, L3 cache) are big but slow and shared among all cores. Others (registers, L1 cache) are fast and exclusively assigned to a single core but small. Only if the data accesses have a high locality, we can avoid excessive data transfers between the memory hierarchy. In this paper we consider fundamental algorithms like matrix multiplication and decomposition as well as K-means clustering typically operating in two or three nested loops. We propose to traverse these loops whenever possible not in the canonical order but in an order defined by a space-filling curve. This traversal order dramatically improves data locality over a wide granularity allowing not only to efficiently support a cache of a single, known size (cache conscious) but also a hierarchy of various caches where the effective size available to our algorithms may even be unknown (cache oblivious). We propose a new space-filling curve called Fast Unrestricted (FUR) Hilbert with the following advantages: (1) we overcome the usual limitation to square-like grid sizes where the side-length is a power of 2 or 3. Instead, our approach allows arbitrary loop boundaries for all variables. (2) FUR-Hilbert is non-recursive with a guaranteed constant worst case time complexity per loop iteration (in contrast to O(log(grid-size)) for previous methods). (3) Our non-recursive approach makes the application of our cacheoblivious loops in any host algorithm as easy as conventional loops and facilitates automatic optimization by the compiler. We believe that future compilers could translate nested loops into cache-oblivious loops either fully automatic or by a user-guided analysis of the data dependency.

Actions (login required)

View Item View Item