This lesson is being piloted (Beta version)

Using NGC Container in SuperPOD

Overview

Teaching: 20 min
Exercises: 0 min
Questions
  • How to use NGC Container in SuperPOD?

Objectives
  • Learn how to master NGC Container useage in SuperPOD

3. Using NVIDIA NGC Container in SuperPOD

What is Container?

Docker Container

NVIDIA NGC Container

ENROOT

It is very convenient to download docker and NGC container to SuperPOD. Here I would like to introduce a very effective tool name enroot

Importing docker container to SuperPOD from docker hub

$ enroot import docker://ubuntu
$ enroot create ubuntu.sqsh
$ enroot start ubuntu

#Type ls to see the content of container:
# ls

bin   dev  home  lib32  libx32  mnt  proc  run   srv  tmp    usr
boot  etc  lib   lib64  media   opt  root  sbin  sys  users  var

Exercise

Go to dockerhub, search for any container, for example lolcow then use enroot to contruct that container environment

enroot import docker://godlovedc/lolcow
enroot create godlovedc+lolcow.sqsh
enroot start godlovedc+lolcow

image

Download Tensorflow container

image

The following information was copied to the memory when selecting the 22.12-tf2 version:

nvcr.io/nvidia/tensorflow:22.12-tf2-py3
$ cd $WORK/sqsh
$ enroot import docker://nvcr.io#nvidia/tensorflow:22.12-tf2-py3

The sqsh file nvidia+tensorflow+22.12-tf2-py3.sqsh is created.

$ enroot create nvidia+tensorflow+22.12-tf2-py3.sqsh

Working with NGC container in Interactive mode:

Once the container is import and created into your folder in SuperPOD, you can simply activate it from login node when requesting a compute node:

$ srun -N1 -G1 -c10 --mem=64G --time=12:00:00 --container-image $WORK/sqsh/nvidia+tensorflow+22.12-tf2-py3.sqsh --container-mounts=$WORK --pty $SHELL

Check the GPU enable:

$ python
>>> import tensorflow as tf
>>> tf.config.list_physical_devices('GPU')
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

Exit the container using exit command.

Working with NGC container in Batch mode

#!/bin/bash
#SBATCH -J Testing       # job name to display in squeue
#SBATCH -o output-%j.txt    # standard output file
#SBATCH -e error-%j.txt     # standard error file
#SBATCH -p batch -c 12 --mem=20G --gres=gpu:1     # requested partition
#SBATCH -t 1440              # maximum runtime in minutes
#SBATCH -D /link-to-your-folder/

srun --container-image=/work/users/tuev/sqsh/nvidia+tensorflow+22.12-tf2-py3.sqsh --container-mounts=$WORK python testing.py
import tensorflow as tf
print(tf.config.list_physical_devices('GPU'))

Working with NGC container in Jupyter Lab

root@bcm-dgxa100-0001:/workspace# jupyter lab --allow-root --no-browser --ip=0.0.0.0

The following URL appear with its token

Or copy and paste this URL:
        http://hostname:8888/?token=fd6495a28350afe11f0d0489755bc3cfd18f8893718555d2

Note that you must replace hostname to the corresponding node that you are in, this case is bcm-dgxa100-0001.

Therefore, you should change the above address to and paste to Firefox:

http://bcm-dgxa100-0001:8888/?token=fd6495a28350afe11f0d0489755bc3cfd18f8893718555d2

Note: you should select the default Python 3 (ipykernel) instead of any other kernels for running the container.

image

Tip: Once forwarding to Jupter Lab, you are placed in container’s root. It’s recommended to create a symlink for your folder in order to navigate away:

$ ln -s $WORK work

Key Points

  • NGC Container