Conveners
Software development and machines: TR7
- Anthony Kennedy (University of Edinburgh)
Software development and machines: TR7
- Kate Clark (NVIDIA)
The NVIDIA Grace Hopper Superchip architecture, powering upcoming supercomputers like Alps at CSCS, Jupyter in Juelich, Isambard-AI in the UK, and Venado at Los Alamos National Laboratory, offers significant advancements for Lattice QCD applications. Integrating the ARM-based Grace CPU with the Hopper GPU via NVLink-C2C, it provides 7x the bandwidth of PCIe Gen5, enabling coherent memory and...
The advancement of lattice Quantum Chromodynamics (QCD) simulations demands robust and efficient computational infrastructure. This presentation details the implementation of Continuous Integration and Continuous Development (CI/CD) within our research group, specifically tailored to enhance the development and deployment of scientific software on a supercomputing cluster.
Our...
Heterogeneous clusters of GPU-accelerated nodes offer large total memory bandwidth which can be used to speed-up our application, openQxD-1.1. In this work we investigate offloading solves of the Dirac equation from our framework openQxD-1.1 to GPU using the lattice-QCD library QUDA, and our early results demonstrate a significant potential speed-up in the time-to-solution for state-of-the-art...
We present a new GPU-based open source package to perform Lattice simulations developed in Julia. The Code currently supports generation of SU(2) and SU(3) (pure gauge) configurations with different actions and boundary conditions. The code can be used to measure different flow observables (both gluonic and fermionic) as well as different fermionic two point functions. In the talk we will show...
It is well-known that computers are no longer getting faster, and only more parallel and hierarchical. Moreover, computations are increasingly bandwidth limited; and with the advent of the end of Moore's Law, often power limited as well. This requires us to rethink how we deploy LQCD computations to maximize science throughput. In this talk we discuss the rearchitecting of QUDA for batch...
openQCD is a simulation suite for lattice QCD, featuring an efficient implementation of the $\mathcal{O}(a^2)$ Wilson-Dirac fermion operator. Pivotal to the scaling properties of the code is the locally deflated solver. In this presentation, I will report on the status of porting openQCD to the GPU architecture and its scaling performance, specifically for the SAP Deflated solver.
Multigrid-preconditioned solvers have proven crucial for the efficient generation of ensembles of gauge configurations at physical quark mass parameters. A highly efficient implementation of such a solver for GPUs by different vendors and for different types of Wilson fermions is provided in the QUDA library. It includes functionality for updating and evolving the multigrid setup in the Hybrid...
Trace estimation poses a significant challenge in lattice QCD simulations. The Hutchinson method's accuracy scales with the square root of the sample size, resulting in high costs for achieving precise estimates. To alleviate this issue, variance reduction techniques are employed, such as deflating the lowest eigen or singular vectors of the matrix.
This study explores Multigrid Multilevel...
Solving linear systems is oftentimes the most demanding computation in lattice QCD simulations. The overlap discretization, which allows the implementation of chiral symmetry on the lattice, requires the solution of particularly demanding linear systems. When solving $D_{ov} x = b$ with an iterative method, with $D_{ov}$ the overlap operator, every iteration requires applying the sign function...