24-30 July 2016
Highfield Campus, University of Southampton
Europe/London timezone

Optimization of the Domain Wall Dslash Kernel in Columbia Physics System

27 Jul 2016, 11:30
Dr Meifeng Lin (Brookhaven National Laboratory)


We present updated strategies and results of combining hand-tuning with the R-Stream source-to-source auto-parallelizing compiler to transform the serial implementation of the domain wall fermion Dslash kernel in CPS into an efficient parallel code targeting the Intel Xeon CPUs. The R-Stream compiler performs preliminary optimizations of the input Dslash code, including a novel iteration space compression scheme, while the SIMD optimization is done with a data layout transformation and compiler intrinsics. Tuning for the OpenMP and MPI scaling will also be discussed.

Primary author

Presentation Materials

