Abstract
We investigate the applicability of single-precision (fp32) floating point operations within our linear-scaling, seminumerical exchange method sn-LinK [Laqua et al., J. Chem. Theory Comput. 16, 1456 (2020)] and find that the vast majority of the three-center-one-electron (3c1e) integrals can be computed with reduced numerical precision with virtually no loss in overall accuracy. This leads to a near doubling in performance on central processing units (CPUs) compared to pure fp64 evaluation. Since the cost of evaluating the 3c1e integrals is less significant on graphic processing units (GPUs) compared to CPU, the performance gains from accelerating 3c1e integrals alone is less impressive on GPUs. Therefore, we also investigate the possibility of employing only fp32 operations to evaluate the exchange matrix within the self-consistent-field (SCF) followed by an accurate one-shot evaluation of the exchange energy using mixed fp32/fp64 precision. This still provides very accurate (1.8 mu Eh maximal error) results while providing a sevenfold speedup on a typical gaming GPU (GTX 1080Ti). We also propose the use of incremental exchange-builds to further reduce these errors. The proposed SCF scheme (i-sn-LinK) requires only one mixed-precision exchange matrix calculation, while all other exchange-matrix builds are performed with only fp32 operations. Compared to pure fp64 evaluation, this leads to 4-7x speedups for the whole SCF procedure without any significant deterioration of the results or the convergence behavior.
Dokumententyp: | Zeitschriftenartikel |
---|---|
Fakultät: | Chemie und Pharmazie > Department Chemie |
Themengebiete: | 500 Naturwissenschaften und Mathematik > 540 Chemie |
ISSN: | 0021-9606 |
Sprache: | Englisch |
Dokumenten ID: | 99934 |
Datum der Veröffentlichung auf Open Access LMU: | 05. Jun. 2023, 15:33 |
Letzte Änderungen: | 05. Jun. 2023, 15:33 |