This lesson is being piloted (Beta version)

Sample Applications of MultiGPUs for Computer Vision using Horovod

Overview

Teaching: 20 min
Exercises: 0 min
Questions
  • How to utilize MultiGPUs in SuperPOD

Objectives
  • Apply Horovod to drive multiple GPUs using CIFAR100

MultiGPUs using CIFAR100

Here is the sample python code that utilizing Tensorflow to train the CIFAR100 datasets;

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, BatchNormalization, Dropout
from tensorflow.keras.utils import to_categorical

from tensorflow.keras.layers import Conv2D # convolutional layers to reduce image size
from tensorflow.keras.layers import MaxPooling2D,AveragePooling2D # Max pooling layers to further reduce image size   
from tensorflow.keras.layers import Flatten # flatten data from 2D to column for Dense layer

from tensorflow.keras.datasets import cifar100
import matplotlib.pyplot as plt
# TODO: Step 1: import Horovod
import horovod.tensorflow.keras as hvd
# TODO: Step 1: initialize Horovod
hvd.init()

# TODO: Step 1: pin to a GPU
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    tf.config.experimental.set_memory_growth(gpus[hvd.local_rank()], True)
    tf.config.experimental.set_visible_devices(gpus[hvd.local_rank()], 'GPU')


def plot_acc_loss(history):
    plt.plot(history.history['accuracy'])
    plt.plot(history.history['val_accuracy'])
    plt.title('model accuracy')
    plt.ylabel('accuracy')
    plt.xlabel('epoch')
    plt.legend(['training', 'validation'], loc='best')
    plt.savefig("calval_hvod.png") #save as jpg
    plt.show()

# load data
(X_train, y_train), (X_test, y_test) = cifar100.load_data()

# Normalized data to range (0, 1):
X_train, X_test = X_train/X_train.max(), X_test/X_test.max()

num_categories=100
y_train = tf.keras.utils.to_categorical(y_train,num_categories)
y_test = tf.keras.utils.to_categorical(y_test,num_categories)

model = Sequential()
model.add(Conv2D(1024, (3, 3), strides=(1, 1), activation='relu', input_shape=(32, 32, 3)))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(BatchNormalization())
model.add(Dropout(.1))
model.add(Conv2D(512, (3, 3), strides=(1, 1), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(BatchNormalization())
model.add(Dropout(.1))
model.add(Conv2D(256, (3, 3), strides=(1, 1), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(BatchNormalization())
model.add(Dropout(.1))

model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(.1))

#Output layer contains 10 different number from 0-9
model.add(Dense(100, activation='softmax'))

model.summary()
# create model
model.compile(optimizer='Adam', loss='categorical_crossentropy',  metrics=['accuracy'])

#Train the model
model_CNN = model.fit(X_train, y_train, epochs=40,verbose=1,
                    validation_data=(X_test, y_test))

plot_acc_loss(model_CNN)

Using SuperPOD to run MultiGPUs

The following batch script is used to submit the training job using 8 GPUs and Tensorflow 22.02 version

#!/bin/bash
#SBATCH -J CIFAR100M      # job name to display in squeue
#SBATCH -c 16 --mem=750G      # requested partition
#SBATCH -o output-%j.txt    # standard output file
#SBATCH -e error-%j.txt     # standard error file
#SBATCH --gres=gpu:8
#SBATCH -t 1440              # maximum runtime in minutes
#SBATCH -D /work/users/tuev/cv1/cifar100/multi
#SBATCH --exclusive
#SBATCH --mail-user tuev@smu.edu
#SBATCH --mail-type=end

srun --container-image=$WORK/sqsh/nvidia+tensorflow+22.02-tf2-py3.sqsh --container-mounts=$WORK mpirun -np 8 --allow-run-as-root --oversubscribe python /work/users/tuev/cv1/cifar100/multi/cifar100spod-hvod.py

Make sure to use nvidia-smi to check the usage of all 8 GPUs

image

Key Points

  • NGC Container, Horovod, Computer Vision