Logo Logo
Help
Contact
Switch Language to German
Thanh-Dang, Diep; Kien Trung, Pham; Fürlinger, Karl; Nam, Thoai (2019): A time-stamping system to detect memory consistency errors in MPI one-sided applications. In: Parallel Computing, Vol. 86: pp. 36-44
Full text not available from 'Open Access LMU'.

Abstract

Many high performance computing applications have been developed by using MPI one-sided communication. The separation between data movement and synchronization poses enormous challenges for programmers in preserving the reliability of programs. One of those challenges is the detection of memory consistency errors, which are a notorious bug, degrading the reliability and performance of programs. Even an MPI expert can easily make these mistakes. The lockopts bug, which occurred in an RMA test case of the MPICH MPI implementation, is an example for this situation. MC-Checker is the most effective debugger in solving the memory consistency errors. MC-Checker did ignore the transitive ordering of the happened-before relation to ensure the acceptable overheads in terms of time complexity. Consequently, MC-Checker is prone to error due to the source of false positives attributable to the ignorance of the transitive ordering of the happened-before relation. To address this issue, we propose a time-stamping system based on the encoded vector clock to help preserve the full happened-before relation with reasonable overhead. The system is implemented in MC-CChecker, which is an enhancement of MC-Checker. The experimental findings prove that MC-CChecker not only effectively detects memory consistency errors like MC-Checker did, but also completely eliminates the potential source of false positives, which is a major limitation of MC-Checker while still retaining acceptable overheads of execution time and memory usage. Especially, MC-CChecker is fairly scalable when processing a large number of trace files generated from running the lockopts up to 8192 processes. (C) 2019 Elsevier B.V. All rights reserved.