System Grete in Göttingen is online

System Grete in Göttingen is online

System Grete in Göttingen

We’re happy to announce the beginning of regular user operation for our new GPU cluster, “Grete” in Göttingen.

The main part of the cluster is available via the new partition grete, consisting of 33 nodes equipped with 4 NVIDIA Tesla A100 40 GB GPUs, 2 AMD Epyc CPUs, and an Infiniband HDR interconnect. The grete:shared partition contains additionally two nodes with 8 A100 80 GB nodes each. All nodes have 16 CPU cores and 128GB memory per GPU. “Grete” has a dedicated new login node, glogin9, also available via its DNS alias

Another 3 GPU nodes are available in the partition grete:interactive for interactive usage (limited to 2 jobs per user). The grete:preemptible partition is available for backfilling these nodes. On these nodes, the GPUs are split via Multi-Instance GPU (MIG) into slices with 2 or 3 compute units each and 10 or 20 GB of GPU memory each, respectively. These slices can be requested like GPUs in Slurm. For example, -G 2g.10gb:1 will allocate one slice with 2 compute units and 10 GB of memory. Preemptible jobs do not cost core h, but a compute project account has to be used, like for the preempt QoS in the CPU partitions.

The default walltime limit on all grete partitions is 2 days.

Part of “Grete” is a new dedicated flash-based WORK storage system mounted at /scratch on the new GPU nodes and glogin9. Each user and each compute project has a soft (hard) block quota of 3 TB (6 TB) and 1M (2M) inodes. The system is intended for fast access to the active data set required by the currently running jobs. The existing “Emmy” WORK file system is still reachable from the new cluster under /scratch-emmy via a long-distance connection. The HOME and PERM filesystems are shared between “Emmy” and “Grete”.

The default CUDA version is 12.0, and the NVIDIA HPC SDK 23.3 is available via nvhpc/23.3, nvhpc-byo-compiler/23.3, nvhpc-hpcx/23.3 and nvhpc-nompi/23.3 modules.
CUDA-enabled OpenMPI is available in the form of HPC-X Toolkit (nvhpc-hpcx/23.3) and the NVIDIA/Mellanox OFED stack (openmpi-mofed/4.1.5a1). However, previous OpenMPI versions will not provide CUDA support in combination with Infiniband!

More information about using the new GPU system can be found in [1], and the accounting information has been extended to include the GPUs and MIG slices. [2] For example, in accordance with the recent round of compute time proposals, one full GPU node counts for the equivalent of 600 CPU cores.

Please do not hesitate to contact us if you have questions or need support migrating suitable applications to the GPU system.

The existing GPU nodes ggpu[01-03] with Nvidia V100 32GB GPUs will be migrated to the same site (“RZGö”) as “Grete” in mid-May. The operation will resume with the same “Rocky Linux 8” based OS image as the new GPU nodes and an Infiniband interconnect as part of the grete:shared, preemptible and interactive partitions.



Parallel programming day, 27 April 2023
April 6, 2023
HPC course Berlin, March 2023
März 6, 2023
Apply for computing time Oct 16th 2022
Oktober 14, 2022