Abstract
Sorting is one of the most critical non-numerical algorithms and covers use cases in a wide spectrum of scientific applications. Although we can build upon excellent research over the last decades, scaling to thousands of processing units on modern many-core architectures reveals a gap between theory and practice. We adopt ideas of the well-known quickselect and sample sort algorithms to minimize data movement. Our evaluation demonstrates that we can keep up with recently proposed distribution sort algorithms in large-scale experiments, without any assumptions on the input keys. Additionally, our implementation outperforms an efficient multi-threaded merge sort on a single node. Our implementation is based on a C++ PGAS approach with an STL-like interface and can easily be integrated into many application codes. As part of the presented experiments, we further reveal challenges with multi-threaded MPI and one-sided communication.
| Item Type: | Journal article |
|---|---|
| Faculties: | Mathematics, Computer Science and Statistics > Computer Science |
| Subjects: | 000 Computer science, information and general works > 004 Data processing computer science |
| ISSN: | 1552-5244 |
| Language: | English |
| Item ID: | 82278 |
| Date Deposited: | 15. Dec 2021 15:01 |
| Last Modified: | 15. Dec 2021 15:01 |
