Logo Logo
Switch Language to German
Perdacher, Martin; Plant, Claudia; Boehm, Christian (2019): Cache-oblivious High-performance Similarity Join. In: Sigmod '19: Proceedings of the 2019 International Conference on Management of Data: pp. 87-104
Full text not available from 'Open Access LMU'.


A similarity join combines vectors based on a distance condition. Typically, such algorithms apply a filter step (by indexing or sorting) and then refine pairs of candidate vectors. In this paper, we propose to refine the pairs in an order defined by a space-filling curve which dramatically improves data locality. Modern multi-core processors are supported by a deep memory hierarchy including RAM, various levels of cache, and registers. The space-filling curve makes our proposed algorithm cache-oblivious to fully exploit this memory hierarchy. Our novel Fast General Form (FGF) Hilbert-curve solves a number of limitations of well-known space-filling curves: it is non-recursive, not restricted to traverse squares, and has a constant time and space complexity per loop iteration. As we demonstrate the easy transformation from conventional into cache-oblivious loops we believe that many complex database operators can be transformed systematically into cache-oblivious parallel algorithms.