A hybrid job uses both OpenMP and MPI for parallelization. This requires the number of tasks and CPUs per task to be set up correctly. Here is an example of such a job. We will use Intel MPI in this example, but this also works using OpenMPI.


#include <mpi.h>
#include <stdio.h>
int main(int argc, char** argv) {
    // Initialize the MPI environment
    MPI_Init(NULL, NULL);
    // Get the number of processes
    int world_size;
    MPI_Comm_size(MPI_COMM_WORLD, &world_size);
    // Get the rank of the process
    int world_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
    // Get the name of the processor
    char processor_name[MPI_MAX_PROCESSOR_NAME];
    int name_len;
    MPI_Get_processor_name(processor_name, &name_len);
    int nthreads, tid;
    /* Fork a team of threads giving them their own copies of variables */
    #pragma omp parallel private(nthreads, tid)
    /* Obtain thread number */
    tid = omp_get_thread_num();
    printf("Hello World from thread = %d, processor %s, rank %d out of %d processors\n", tid, processor_name, world_rank, world_size);
    /* Only master thread does this */
    if (tid == 0)
      nthreads = omp_get_num_threads();
      printf("Number of threads = %d\n", nthreads);
    }  /* All threads join master thread and disband */
    // Finalize the MPI environment.


This can be compiles with both the Intel or the GNU compiler.

Compilation With the Intel Compiler
module load intel
module load impi
mpiicc -qopenmp  -Wl,-rpath,$LD_RUN_PATH -o hybrid_hello_world.bin hybrid_hello_world.c
Compilation with the GNU Compiler
module load gcc
module load impi
mpigcc -fopenmp -Wl,-rpath,$LD_RUN_PATH -o hybrid_hello_world.bin hybrid_hello_world.c

Batch Job

Job Script
#SBATCH --time=00:10:00
#SBATCH --nodes=2
#SBATCH --tasks-per-node=4
#SBATCH --cpus-per-task=10
#SBATCH --partition=medium40:test
module load impi
srun ./hybrid_hello_world.bin