Speaker
Dr
Kate Clark
(NVIDIA)
Description
It is well-known that computers are no longer getting faster, and only more parallel and hierarchical. Moreover, computations are increasingly bandwidth limited; and with the advent of the end of Moore's Law, often power limited as well. This requires us to rethink how we deploy LQCD computations to maximize science throughput. In this talk we discuss the rearchitecting of QUDA for batch computation to expose more parallelism and locality. Performance results are shown for a variety of workloads, including linear solves, multigrid, and contractions.
Primary author
Dr
Kate Clark
(NVIDIA)
Co-authors
Dr
Bálint Joó
(NVIDIA)
Dr
Evan Weinberg
(NVIDIA)
Dr
Jiqun Tu
(NVIDIA)
Dr
Mathias Wagner
(NVIDIA)