Speaker
Dr
Meifeng Lin
(Brookhaven National Laboratory)
Description
We present updated strategies and results of combining hand-tuning with the R-Stream source-to-source auto-parallelizing compiler to transform the serial implementation of the domain wall fermion Dslash kernel in CPS into an efficient parallel code targeting the Intel Xeon CPUs. The R-Stream compiler performs preliminary optimizations of the input Dslash code, including a novel iteration space compression scheme, while the SIMD optimization is done with a data layout transformation and compiler intrinsics. Tuning for the OpenMP and MPI scaling will also be discussed.
Primary author
Dr
Meifeng Lin
(Brookhaven National Laboratory)