Introduction to SMU SuperPOD
Overview
Teaching: 10 min
Exercises: 0 minQuestions
Objectives
Onboarding to SMU SuperPOD
Introduction
-
The SMU SuperPOD is a high-performance computing (HPC) cluster, made by NVIDIA, specifically tailored to meet the demands of cutting-edge research
-
This shared resource machine consists of 20 NVIDIA DGX A100 nodes, each with 8 advanced and powerful graphical processing units (GPUs) to accelerate calculations and train AI models.
-
The SMU Office of Information Technology (OIT) and the Center for Research Computing (CRC) jointly manage and provide both access and support for this top of the line machine.
NVIDIA DGX SuperPOD Advantage Specifications
Specification | Values |
---|---|
Computational Ability | 1,644 TFLOPS |
Number of Nodes | 20 |
CPU Cores | 2,560 |
Total Memory | 52.5 TB |
Node Interconnect Bandwidth | 200 Gb/s Infiniband Connections Per Node |
Work Storage | 768 TB (Shared) |
Scratch Storage | 750 TB (Raw) |
Archival Storage | N/A |
Operating System | Ubuntu 20.04 |
Specification for each compute node:
Specification | Values |
---|---|
CPU number | 128 |
GPU number | 8 |
Memory | 1910gb |
Time Limit | 2 days |
Home Storage | 200gb (Independence from M3) |
Scratch Storage | Unlimited (Independence from M3) |
Work Storage | 8TB (shared with M3) |
Command to check number the configuration of All nodes:
$ sinfo --Format="PartitionName,Nodes:10,CPUs:8,Memory:12,Time:15,Features:18,Gres:14"
Storage
Note that:
- SuperPOD’s home & scratch directory is different from M3’s home.
- However, both SuperPOD and M3 share the same $WORK storage
Variable | Path | Quota | Usage |
---|---|---|---|
${HOME} | /users/${USER} | 200 GB | Home directory, backed up |
${WORK} | /work/users/${USER} | 8 TB | Long term storage |
${SCRATCH} | /scratch/users/${USER} | None | Temporary scratch space |
${JOB_SCRATCH} | /scratch/_tmp/${USER:0:1}/ | None | Per job scratch space, |
${JOB_SCRATCH} | ${USER}/${SLURM_JOB_ID}_ | ${SLURM_ARRAY_TASK_ID} is | |
${JOB_SCRATCH} | ${SLURM_ARRAY_TASK_ID} | zero for standard jobs |
Command to check available data from your work storage:
$ lfs quota -h -u $USERNAME /work
Login to SuperPOD
- Make sure you have a SuperPOD account created for you. You can ask your supervisor to request for an account by submitting this form
- There are several ways to login to SuperPOD: via 2 login nodes (must be on VPN)
$ ssh username@superpod.smu.edu
$ ssh username@slogin-01.superpod.smu.edu
$ ssh username@slogin-02.superpod.smu.edu
SuperPOD is using the same module system as M3 so nearly all commands are similar.
Requesting a compute node
SuperPOD uses SLURM as scheduler so it is no different from M3 when requesting an interactive node:
For example, requesting a node with 1 GPU, 10 CPUs, 128gb memory for 12 hours:
$ srun -N 1 -G 1 -c 10 --mem=128G --time=12:00:00 --pty $SHELL
$ srun -N 1 -G 1 -c 10 --mem=128G --time=12:00:00 --pty bash
For this workshop on campus, we do have available workshop queue (using flag -p workshop) for you (to speed up the process of requesting resources):
$ srun -N 1 -G 1 -c 10 --mem=64G -p workshop --time=12:00:00 --pty $SHELL
Transfering data
- It is no difference when transfering data to-from SuperPOD if you are familiar with M3, you can use scp for regular transfer
scp /link/fileA username@superpod.smu.edu:/users/username
or using WinSCP on Windows machine if you dont want to use CLI
- Tips, since SuperPOD and M3 share the same work storage, you can utilize this share storage for both systems.
Working with module
By default, very few modules available when using module avail
$ module avail
------------------------------------------------------------------------- /hpc/mp/module_files/compilers -------------------------------------------------------------------------
amd/aocc/4.1.0 gcc/11.2.0 intel/oneapi/2023.2 nvidia/nvhpc/23.7
--------------------------------------------------------------------------- /hpc/mp/module_files/apps ----------------------------------------------------------------------------
amber/22 apptainer/1.1.9 conda gaussian/g16c02 julia/1.9.2 lammps/may22 spack
Similar to M3, SuperPOD also uses Spack as its module manager. Therefore you can find all your needed modules after loading spack:
$ module load spack
$ module avail
------------------------------------------------------------------ /hpc/mp/spack_modules/linux-ubuntu22.04-zen2 ------------------------------------------------------------------
aocc-4.1.0/aocl-sparse/4.0-t2kjb3u gcc-11.2.0/aocl-sparse/4.0-zczy7ug gcc-11.2.0/lz4/1.9.4-gtzsc3c
aocc-4.1.0/autoconf-archive/2023.02.20-inwkm6b gcc-11.2.0/autoconf-archive/2023.02.20-r5lazua gcc-11.2.0/lzo/2.10-x6itbky
aocc-4.1.0/autoconf/2.69-x53b2ii gcc-11.2.0/autoconf/2.69-xlmuzvq gcc-11.2.0/m4/1.4.19-sv4d5ah
aocc-4.1.0/automake/1.16.5-hfcjabg gcc-11.2.0/automake/1.16.5-nsy2ron gcc-11.2.0/mbedtls/2.28.2-xvf3rc3
aocc-4.1.0/berkeley-db/18.1.40-5po7n7c gcc-11.2.0/berkeley-db/18.1.40-hlnjdqn gcc-11.2.0/mbedtls/2.28.2-42lnomn (D)
aocc-4.1.0/binutils/2.40-eivqxcw gcc-11.2.0/binutils/2.40-u6hr2wz gcc-11.2.0/meson/1.1.0-teqdfz5
aocc-4.1.0/bzip2/1.0.8-5ag7qmi gcc-11.2.0/bison/3.8.2-tifozqf gcc-11.2.0/metis/5.1.0-coza6f3
aocc-4.1.0/cmake/3.26.3-p6v5a7t gcc-11.2.0/boost/1.82.0-xpmd3v6 gcc-11.2.0/mpfr/4.2.0-meodww2
aocc-4.1.0/diffutils/3.9-bzq7rzo gcc-11.2.0/bzip2/1.0.8-qaxdt7f gcc-11.2.0/msgpack-c/3.1.1-d624eki
aocc-4.1.0/expat/2.5.0-kav5ad4 gcc-11.2.0/cmake/3.26.3-r23mmbq gcc-11.2.0/nasm/2.15.05-mdqravc
aocc-4.1.0/gdbm/1.23-6r6asdl gcc-11.2.0/cmake/3.26.3-utseokk (D) gcc-11.2.0/ncurses/6.4-rfw5ur5
aocc-4.1.0/gettext/0.21.1-dmnukqt gcc-11.2.0/curl/8.0.1-cp7iioq gcc-11.2.0/neovim/0.8.3-mdppjp3
....
Note: Press “q” to quit checking module
As we are on installation process, if you do not see the modules that you needed available, please inform us so we can install that for you
Key Points
SuperPOD 101
Working with Conda Environment
Overview
Teaching: 20 min
Exercises: 0 minQuestions
How to create personal conda environment in SuperPOD
Objectives
Create Conda environment for AI&ML Application
2. Conda Environment
- Beside Spack module manager installed in SuperPOD, you can also use Conda for your own package manager.
- In many cases, you want to use Conda environment for many AI&ML application, just like you do in M3
- First thing first, just load the conda module installed:
$ module load conda
$ conda env list
# conda environments:
#
base /hpc/mp/apps/conda
Create conda environment for Tensorflow with GPUs support
Next, let’s create a conda environment for Tensorflow 2.9, here are the steps:
(1) Request a compute node with 1 GPU
$ srun -N1 -G1 -c10 --mem=64G --time=12:00:00 --pty $SHELL
(2) Load cuda and cudnn module for GPU support
$ module load conda gcc
$ module load cuda
$ module load cudnn
(3) Create Tensorflow environment with your prefered version of python
$ conda create --prefix ~/tensorflow_2.9 python=3.8 pip --y
The conda environment named tensorflow_2.9 is created on your home directory
(4) Activate the conda environment and Install Tensorflow 2.9.1 (or your prefered TF version)
$ source activate ~/tensorflow_2.9/
$ pip install tensorflow==2.9.1
Install ipkernel and create the kernel for Notebook
$ pip install ipykernel
$ python3 -m ipykernel install --user --name tensorflow_2.9 --display-name TensorflowGPU29
(5) Once installation done, check if the conda environment is able to enable the GPU
$ python
>>> import tensorflow as tf
>>> tf.config.list_physical_devices('GPU')
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Usage of conda environment manager is no difference compared to running in M3.
Create conda environment for Pytorch with GPUs support
Similar to Tensorflow, one can create conda environment for Pytorch with GPUs support.
Following is the brief steps (3) to (5) to create the env and install Pytorch after requesting a node and load the libraries
$ conda create --prefix ~/pytorch_1.13 python=3.8 pip --y
$ source activate ~/pytorch_1.13
$ conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia --y
$ python
>>> import torch
>>> torch.cuda.is_available()
True
>>> torch.cuda.device_count()
1
Key Points
Conda environment
Using NGC Container in SuperPOD
Overview
Teaching: 20 min
Exercises: 0 minQuestions
How to use NGC Container in SuperPOD?
Objectives
Learn how to master NGC Container useage in SuperPOD
3. Using NVIDIA NGC Container in SuperPOD
What is Container?
- Container demonstrates its efficiency in application deployment in HPC.
- Containers can encapsulate complex programs with their dependencies in isolated environments making applications more portable.
- A container is a portable unit of software that combines the application and all its dependencies into a single package that is agnostic to the underlying host OS.
- Thereby, it removes the need to build complex environments and simplifies the process of application development to deployment.
Docker Container
- Docker is the most popular container system at this time
- It allows applications to be deployed inside a container on Linux systems.
NVIDIA NGC Container
- NGC Stands for NVIDIA GPU Clouds
- NGC providing a complete catalog of GPU-accelerated containers that can be deployed and maintained for artificial intelligence applications.
- It enables users to run their projects on a reliable and efficient platform that respects confidentiality, reversibility and transparency.
- NVIDIA NGC containers and their comprehensive catalog are an amazing suite of prebuilt software stacks (using the Docker backend) that simplifies the use of complex deep learning and HPC libraries that must leverage some sort of GPU-accelerated computing infrastructure.
- Complete catalogs of NGC can be found here, where you can find tons of containers for Tensorflow, Pytorch, NEMO, Merlin, TAO, etc…
ENROOT
It is very convenient to download docker and NGC container to SuperPOD. Here I would like to introduce a very effective tool name enroot
- A simple, yet powerful tool to turn traditional container/OS images into unprivileged sandboxes.
- This approach is generally preferred in high-performance environments or virtualized environments where portability and reproducibility is important, but extra isolation is not warranted.
Importing docker container to SuperPOD from docker hub
- The following command import docker container ubuntu from https://hub.docker.com/_/ubuntu
- It then create the squash file named ubuntu.sqsh at the same location
- Finally, it start the ubuntu container
$ enroot import docker://ubuntu
$ enroot create ubuntu.sqsh
$ enroot start ubuntu
#Type ls to see the content of container:
# ls
bin dev home lib32 libx32 mnt proc run srv tmp usr
boot etc lib lib64 media opt root sbin sys users var
- Type exit to quit container environment
Exercise
Go to dockerhub, search for any container, for example lolcow then use enroot to contruct that container environment
enroot import docker://godlovedc/lolcow
enroot create godlovedc+lolcow.sqsh
enroot start godlovedc+lolcow
Download Tensorflow container
-
Now, let’s start downloading Tensorflow container from NGC. By browsing the NGC Catalog and search for Tensorflow, I got the link: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tensorflow
-
Copy the image path from website:
The following information was copied to the memory when selecting the 22.12-tf2 version:
nvcr.io/nvidia/tensorflow:22.12-tf2-py3
- Im gonna download the version 22.12 tf2 to my work location using enroot, pay attention to the syntax difference when pasting:
$ cd $WORK/sqsh
$ enroot import docker://nvcr.io#nvidia/tensorflow:22.12-tf2-py3
The sqsh file nvidia+tensorflow+22.12-tf2-py3.sqsh is created.
- Next create the sqsh file:
$ enroot create nvidia+tensorflow+22.12-tf2-py3.sqsh
Working with NGC container in Interactive mode:
Once the container is import and created into your folder in SuperPOD, you can simply activate it from login node when requesting a compute node:
$ srun -N1 -G1 -c10 --mem=64G --time=12:00:00 --container-image $WORK/sqsh/nvidia+tensorflow+22.12-tf2-py3.sqsh --container-mounts=$WORK --pty $SHELL
-
Once loaded, you are placed into /workspace which is the container local storage. You can navigate to your $HOME or $WORK folder freely.
-
Note that in this example, I mounted the container to $WORK location only but you can always mount it to your own working directory
Check the GPU enable:
$ python
>>> import tensorflow as tf
>>> tf.config.list_physical_devices('GPU')
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Exit the container using exit command.
Working with NGC container in Batch mode
- Similar to M3, container can be loaded and executed in batch mode.
- Following is the sample content of a batch file named spod_testing.sh with a python file testing.py
#!/bin/bash
#SBATCH -J Testing # job name to display in squeue
#SBATCH -o output-%j.txt # standard output file
#SBATCH -e error-%j.txt # standard error file
#SBATCH -p batch -c 12 --mem=20G --gres=gpu:1 # requested partition
#SBATCH -t 1440 # maximum runtime in minutes
#SBATCH -D /link-to-your-folder/
srun --container-image=/work/users/tuev/sqsh/nvidia+tensorflow+22.12-tf2-py3.sqsh --container-mounts=$WORK python testing.py
- Content of testing.py
import tensorflow as tf
print(tf.config.list_physical_devices('GPU'))
Working with NGC container in Jupyter Lab
- It is a little bit different if you want to use NGC container in Jupyter Lab
- After requesting a node running with your container, you need to run the jupyter command with additional flag –allow-root:
root@bcm-dgxa100-0001:/workspace# jupyter lab --allow-root --no-browser --ip=0.0.0.0
The following URL appear with its token
Or copy and paste this URL:
http://hostname:8888/?token=fd6495a28350afe11f0d0489755bc3cfd18f8893718555d2
Note that you must replace hostname to the corresponding node that you are in, this case is bcm-dgxa100-0001.
Therefore, you should change the above address to and paste to Firefox:
http://bcm-dgxa100-0001:8888/?token=fd6495a28350afe11f0d0489755bc3cfd18f8893718555d2
Note: you should select the default Python 3 (ipykernel) instead of any other kernels for running the container.
Tip: Once forwarding to Jupter Lab, you are placed in container’s root. It’s recommended to create a symlink for your folder in order to navigate away:
$ ln -s $WORK work
Key Points
NGC Container
Using Jupyter Lab in SuperPOD
Overview
Teaching: 20 min
Exercises: 0 minQuestions
How to use Jupter Lab in SuperPOD?
Objectives
Learn port forwarding technique to enable Jupter Lab
4. Jupter Lab on SuperPOD
-
There is NO display config and Open OnDemand setup in SuperPOD, so it is not quite straighforward to use Jupter Lab
- However, it is still possible to use Port-Forwarding in SuperPOD in order to run Jupyter Lab.
- Please download and use VSCode for all OS (Windows/Macs/Linux). From VSCode terminal, ssh to superpod with specific port, for example port 8000:
$ ssh -C -D 8000 username@superpod.smu.edu
The C stands for Compression and D stands for Dynamic port-forwarding with SOCKS4/5 to port number 8000. Feel free to change the port and remember to set it up in your browser
4.1 Setup browser to enable proxy viewing (similar for MacOS/Linux as well)
4.1.1 Using Firefox as browser:
Open Firefox, my version is 104.0.2. Use combination Alt+T+S to open up the settings tab. Scroll to bottom and select Settings from Network Settings:
- Select Manual Proxy Configuration
- In the SOCKS Host, enter localhost, Port 8000
- Check SOCKS v5.
- Check Proxy DNS when using SOCKS v5.
- Check Enable DNS over HTTPS.
- Make sure everything else is unchecked, then click OK.
- Your screenshot should look like below:
4.1.2 Using Chrome/Safari as browser:
Search for proxies and set a Socks proxy with sever localhost and port 8000.
4.2 Test Proxy
4.2.1. Test Proxy using conda environment:
Go back to MobaXTerm and login into SuperPOD using regular SSH Request a compute node
$ srun -N1 -G1 -c10 --mem=64G --time=12:00:00 --pty $SHELL
Load cuda, cudnn and activate any of your conda environment, for example Tensorflow_2.9 in the home directory
$ module load conda gcc; module load cuda; module load cudnn
$ conda activate ~/tensorflow_2.9
Make sure to install jupyter
$ pip install jupyterlab
Next insert the following command:
$ jupyter notebook --ip=0.0.0.0 --no-browser
# or
$ jupyter lab --ip=0.0.0.0 --no-browser
The following screen appears
Copy the highlighted URLs to Firefox, you will see Jupyter Notebook port forward to this:
Select TensorflowGPU29 kernel notebook and Check GPU device:
4.2.2. Test Proxy using docker container:
For docker container, the command line need to have 1 additional flag:
$ jupyter lab --ip=0.0.0.0 --no-browser --allow-root
You will need to replace the hostname to the name of the node that you are having:
For example in the previous command, you need to copy and paste the following line to Firefox browser:
$ http://bcm-dgxa100-0016:8888/?token=daefb1c3e2754b37b6b94b619387cb3fd9710608e0152182
Troubleshoot for notebook requesting password
In certain case, your Jupyter Notebook requires password to be enable, you can setup the password using the command below prior to requesting jupyter lab instance:
$ jupyter notebook password
In case changing password do not help, it might be the case that the forwarded port has some problems. In that case you should either:
(1) change the default port 8888 to other (8889 for example), or
(2) change the localhost port when you first login to SuperPOD 8000 in this case to other local port (5000 for example)
Key Points
Jupter Lab, Port-Forwarding
Using Batch script in SuperPOD
Overview
Teaching: 20 min
Exercises: 0 minQuestions
How to run Batch script in SuperPOD
Objectives
Running batch script using CIFAR100 template model
5. Using Batch script in SuperPOD
SuperPOD uses SLURM as scheduler so there is no difference in running Batch script comparing to ManeFrame 3. However, there are some commands you might need to pay attention when running Batch script using container.
Following are the instructions on how to run Batch script for a Computer Vision sample using CIFAR10 data. Here, I use a python file called model_CNN_CIFAR10.py.
The file can be downloaded from here to your $WORK folder:
$ cd $WORK
$ wget https://raw.githubusercontent.com/SouthernMethodistUniversity/SMU_SuperPOD_101/e6315c29ca0542351b79233729708dfa16161cdf/files/model_CNN_CIFAR10.py
5.1 Running Batch script with conda environment
Prepare the batch script with name: modelCNN.sh using the following content:
#!/bin/bash
#SBATCH -J CNN_CIFAR10_SPOD # job name to display in squeue
#SBATCH -t 60 # maximum runtime in minutes
#SBATCH -c 2 # request 2 cpus
#SBATCH -G 1 # request 1 gpu a100
#SBATCH -p workshop # request queue name workshop (optional)
#SBATCH -D /work/users/tuev # link to your folder
#SBATCH --mem=32gb # request 32gb memory
#SBATCH --mail-user tuev@smu.edu # request to email to your emailID
#SBATCH --mail-type=end # request to mail when the model **end**
module load conda gcc
module load cuda cudnn
conda activate ~/tensorflow_2.9
python model_CNN_CIFAR10.py
Be on login node to submit the batch script:
$ sbatch modelCNN.sh
5.2 Running Batch script with container
Prepare the batch script with name: modelCNN_ngc.sh using the following content:
#!/bin/bash
#SBATCH -J CNN_CIFAR10_SPOD # job name to display in squeue
#SBATCH -t 60 # maximum runtime in minutes
#SBATCH -c 2 # request 2 cpus
#SBATCH -G 1 # request 1 gpu a100
#SBATCH -p workshop # request queue name workshop (optional)
#SBATCH --mem=32gb # request 32gb memory
#SBATCH --mail-user tuev@smu.edu # request to email to your emailID
#SBATCH --mail-type=end # request to mail when the model **end**
srun --container-image=/work/users/tuev/sqsh/nvidia+tensorflow+22.12-tf2-py3.sqsh --container-mounts=$WORK python $WORK/model_CNN_CIFAR10.py
Be on login node to submit the batch script:
$ sbatch modelCNN_ngc.sh
Key Points
Batch script, Computer Vision
Job queueing and control in SuperPOD
Overview
Teaching: 20 min
Exercises: 0 minQuestions
How to run control Job in SuperPOD
Objectives
To teach command to work with Job in SLURM
The SuperPOD cluster uses the Simple Linux Utility for Resource Management system (SLURM) to manage jobs.
5b. Job Queue and Control
In SLURM there are several usefull commands for checking your job:
Lifecycle of a Job
The life of a job begins when you submit the job to the scheduler. If accepted, it will enter the Queued state.
Thereafter, the job may move to other states, as defined below:
- Queued - the job has been accepted by the scheduler and is eligible for execution; waiting for resources.
- Held - the job is not eligible for execution because it was held by user request, administrative action, or job dependency.
- Running - the job is currently executing on the compute node(s).
- Finished - the job finished executing or was canceled/deleted. The diagram below demonstrates these relationships in graphical form.
Useful Commands
Here are some basic SLURM commands for submitting, querying and deleting jobs in SuperPOD:
Command | Actions |
---|---|
srun -N1 -G1 --pty $SHELL |
Submit an interactive job (reserves 1 Node, 1GPU, 1CPU, 6gb RAM, 1 hour walltime) |
sbatch job.sh |
submit the job script job.sh |
sstat <job id> |
Check the status of the job given jobID |
sstat <job id> --format=AveCPU,AvePages,AveRSS,AveVMSize,JobID |
Narrow some information on sstat |
squeue -u <username> |
Check the status of all jobs submitted by given username |
scontrol show job <job id> |
Check the detailed information for job with given job ID |
scancel <job id> |
Delete the queued or running job given job ID |
Check pending, working job:
$ squeue -u $USERNAME
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON
12345 workshop bash tuev R 39:46 1 bcm-dgxa100-0002
The above Job has a JOBID=12345, which will be used below:
Check configuration of any requested job using JOBID:
$ scontrol show job 12345 grep ReqTRES
ReqTRES=cpu=5,mem=30G,node=1,billing=5,gres/gpu=1
Delete any job
$ scancel 12345
Checking how your job is running in node
When you know your working node, for example bcm-dgxa100-0001, from login node, you can login to the compute node and check the processing:
- Command to check working cpus:
$ ssh bcm-dgxa100-0001
$ top -u $USERNAME
- Command to check working gpus:
$ ssh bcm-dgxa100-0001
$ nvidia-smi
OR to refresh the command every 0.2s
$ watch -n .2 nvidia-smi
Key Points
Job queue, control
Data Science workflow with GPUs using RAPIDS
Overview
Teaching: 20 min
Exercises: 0 minQuestions
How to install and use RAPIDS
Objectives
Using GPUs directly to work with data
RAPIDS
RAPIDS provides unmatched speed with familiar APIs that match the most popular PyData libraries. Built on the shoulders of giants including NVIDIA CUDA and Apache Arrow, it unlocks the speed of GPUs with code you already know.
https://rapids.ai/
Installing RAPIDS
There are several ways to install RAPIDS to HPC systems
Using Conda Environment
This is the simplest method and usable to both M2 and SuperPOD system. You can install interactively, first, you just need to request a GPU node and load the corresponding library:
- In M3:
$ srun -n1 --gres=gpu:1 -c2 --mem=4gb --time=12:00:00 -p gpu-dev --pty $SHELL
$ module load conda
In SuperPOD:
$ srun -n1 --gres=gpu:1 -c2 --mem=4gb --time=12:00:00 --pty $SHELL
$ module load conda
Once the necessary module has been loaded, you just need to create the conda environment and install rapids, the following command get the latest standard version from https://rapids.ai/
$ conda create -n rapids-23.02 -c rapidsai -c conda-forge -c nvidia rapids=23.02 python=3.10 cudatoolkit=11.8
If you have more personalized version, you can select the corresponding option and copy the command from rapids website: rapids.ai:
Using container
This approach is working on SuperPOD only. We will need to download the RAPIDS container from NGC
$ enroot import docker://nvcr.io#nvidia/rapidsai/rapidsai:cuda11.2-runtime-centos7-py3.10
$ enroot create nvidia+rapidsai+rapidsai+cuda11.2-runtime-centos7-py3.10.sqsh
Once my docker container has been downloaded to my home/scratch/work directory, I can load it from login node:
$ srun -N1 -G1 -c10 --mem=64G --time=12:00:00 --container-image $WORK/sqsh/nvidia+rapidsai+rapidsai+cuda11.2-runtime-centos7-py3.10.sqsh --container-mounts=$WORK --pty $SHELL
Your installation is done!
Key Points
NGC Container, RAPIDS, cudf, cuDask
Sample Application of NEMO for Sentiment Analysis
Overview
Teaching: 20 min
Exercises: 0 minQuestions
How to use NEMO in container
Objectives
Apply NEMO to run sentiment analysis
NeMo
- Neural Module - NeMo is a toolkit for building new state-of-the-art conversational AI models.
- NeMo has separate collections for Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text-to-Speech (TTS) synthesis models.
- Each collection consists of prebuilt modules that include everything needed to train on your data. Every module can easily be customized, extended, and composed to create new conversational AI model architectures.
Import and Create NeMo sqsh file:
The NGC for NeMo can be found here: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo
$ enroot import docker://nvcr.io#nvidia/nemo:22.09
$ enroot create nvidia+nemo+22.09.sqsh
Sentiment Analysis using NeMo
Here we use this sentiment sample from NVIDIA
SST2 data:
We download the Stanford Sentiment Treebank v2 (SST-2) and preprocess to nemo format for training and testing data
cd $WORK
mkdir nemo && cd nemo
curl -s -O https://dl.fbaipublicfiles.com/glue/data/SST-2.zip\
&& unzip -o SST-2.zip -d ./\
&& sed 1d ./SST-2/train.tsv > ./train_nemo_format.tsv\
&& sed 1d ./SST-2/dev.tsv > ./dev_nemo_format.tsv &
Requesting a compute node with NeMo container enable with a GPU:
srun -N1 -G1 -c10 --mem=64G --time=12:00:00 --container-image $WORK/sqsh/nvidia+nemo+22.09.sqsh --container-mounts=$WORK --pty bash -i
Let’s run Sentiment Analysis using NeMo
- The model named ‘bert-base-cased’
- Computation using 2 GPUs and 20 epochs
cd $WORK/nemo/SST-2
python /workspace/nemo/examples/nlp/text_classification/text_classification_with_bert.py \
model.dataset.num_classes=2 \
model.dataset.max_seq_length=256 \
model.train_ds.batch_size=64 \
model.validation_ds.batch_size=64 \
model.language_model.pretrained_model_name='bert-base-cased' \
model.train_ds.file_path=train_nemo_format.tsv \
model.validation_ds.file_path=dev_nemo_format.tsv \
trainer.num_nodes=1 \
trainer.max_epochs=20 \
trainer.precision=16 \
model.optim.name=adam \
model.optim.lr=1e-4
Check the GPU usage with nvidia-smi command
Output of the model training is text_classification_model.nemo
Model Evaluation and Inference
- After saving the model in .nemo format, you can load the model and perform evaluation or inference on the model.
- Following is the content of python file to load the trained nemo model and evaluate it:
from nemo.collections.nlp.models.text_classification import TextClassificationModel
model = TextClassificationModel.restore_from("text_classification_model.nemo")
model.to("cuda")
# define the list of queries for inference
queries = ['legendary irish writer brendan behan memoir , borstal boy',
'demonstrates that the director of such hollywood blockbusters as patriot games can still turn out a small , personal film with an emotional wallop ',
'on the worst revenge-of-the-nerds clichés the filmmakers could dredge up',
'uneasy mishmash of styles and genres']
results = model.classifytext(queries=queries, batch_size=3, max_seq_length=512)
print('The prediction results of some sample queries with the trained model:')
for query, result in zip(queries, results):
print(f'Query : {query}')
print(f'Predicted label: {result}')
Key Points
NGC Container, NEMO, Sentiment Analysis
Sample Applications of MultiGPUs for Computer Vision using Horovod
Overview
Teaching: 20 min
Exercises: 0 minQuestions
How to utilize MultiGPUs in SuperPOD
Objectives
Apply Horovod to drive multiple GPUs using CIFAR100
MultiGPUs using CIFAR100
- In the code, Horovod is imported for MultiGPUs generation
- Rest of the code are regular computer vision model as seen in many other papers
Here is the sample python code that utilizing Tensorflow to train the CIFAR100 datasets;
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, BatchNormalization, Dropout
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.layers import Conv2D # convolutional layers to reduce image size
from tensorflow.keras.layers import MaxPooling2D,AveragePooling2D # Max pooling layers to further reduce image size
from tensorflow.keras.layers import Flatten # flatten data from 2D to column for Dense layer
from tensorflow.keras.datasets import cifar100
import matplotlib.pyplot as plt
# TODO: Step 1: import Horovod
import horovod.tensorflow.keras as hvd
# TODO: Step 1: initialize Horovod
hvd.init()
# TODO: Step 1: pin to a GPU
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
tf.config.experimental.set_memory_growth(gpus[hvd.local_rank()], True)
tf.config.experimental.set_visible_devices(gpus[hvd.local_rank()], 'GPU')
def plot_acc_loss(history):
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['training', 'validation'], loc='best')
plt.savefig("calval_hvod.png") #save as jpg
plt.show()
# load data
(X_train, y_train), (X_test, y_test) = cifar100.load_data()
# Normalized data to range (0, 1):
X_train, X_test = X_train/X_train.max(), X_test/X_test.max()
num_categories=100
y_train = tf.keras.utils.to_categorical(y_train,num_categories)
y_test = tf.keras.utils.to_categorical(y_test,num_categories)
model = Sequential()
model.add(Conv2D(1024, (3, 3), strides=(1, 1), activation='relu', input_shape=(32, 32, 3)))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(BatchNormalization())
model.add(Dropout(.1))
model.add(Conv2D(512, (3, 3), strides=(1, 1), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(BatchNormalization())
model.add(Dropout(.1))
model.add(Conv2D(256, (3, 3), strides=(1, 1), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(BatchNormalization())
model.add(Dropout(.1))
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(.1))
#Output layer contains 10 different number from 0-9
model.add(Dense(100, activation='softmax'))
model.summary()
# create model
model.compile(optimizer='Adam', loss='categorical_crossentropy', metrics=['accuracy'])
#Train the model
model_CNN = model.fit(X_train, y_train, epochs=40,verbose=1,
validation_data=(X_test, y_test))
plot_acc_loss(model_CNN)
Using SuperPOD to run MultiGPUs
The following batch script is used to submit the training job using 8 GPUs and Tensorflow 22.02 version
#!/bin/bash
#SBATCH -J CIFAR100M # job name to display in squeue
#SBATCH -c 16 --mem=750G # requested partition
#SBATCH -o output-%j.txt # standard output file
#SBATCH -e error-%j.txt # standard error file
#SBATCH --gres=gpu:8
#SBATCH -t 1440 # maximum runtime in minutes
#SBATCH -D /work/users/tuev/cv1/cifar100/multi
#SBATCH --exclusive
#SBATCH --mail-user tuev@smu.edu
#SBATCH --mail-type=end
srun --container-image=$WORK/sqsh/nvidia+tensorflow+22.02-tf2-py3.sqsh --container-mounts=$WORK mpirun -np 8 --allow-run-as-root --oversubscribe python /work/users/tuev/cv1/cifar100/multi/cifar100spod-hvod.py
Make sure to use nvidia-smi to check the usage of all 8 GPUs
Key Points
NGC Container, Horovod, Computer Vision
Using YOLOv5 for object detection
Overview
Teaching: 20 min
Exercises: 0 minQuestions
How to use train YOLOv5 to detect objects
Objectives
Download pretrained YOLOv5 and images then apply YOLO to detect object
YOLOv5
YOLOv5 🚀 is the world’s most loved vision AI, representing Ultralytics open-source research into future vision AI methods, incorporating lessons learned and best practices evolved over thousands of hours of research and development.
To download YOLO, simply go to the github page and clone it to your home or work directory:
$ git clone https://github.com/ultralytics/yolov5.git
Suggestion:: It is better to use $WORK directory to store the code and data to avoid jamming up your $HOME directory
Open Conda env and install requirement
Prior to training YOLOv5 model, it’s better to go to your own conda env and install the missing library. For simplicit, I use NEMO Container:
$ srun -n1 --gres=gpu:1 --container-image $WORK/sqsh/nvidia+nemo+22.04.sqsh --container-mounts=$WORK --time=12:00:00 --pty $SHELL
Go to yolov5 folder and install missing library
$ cd yolov5
$ pip install -r requirements.txt
Select Pretrained model
Refer to this table for full comparison of models. Here let’s use yolov5l6 for better performance
Dataset for training:
YOLOv5 is trained by using COCO (Common Object in Context) dataset, here we use coco128 which is 128 classes of images from larger COCO dataset.
The dataset is automatically downloaded when using flag –data coco128.yaml
Train YOLOv5
Let’s train model with image size of 1280 pixels, 32 batches and 10 epochs, the data in use is coco128 and pretrained model is yolov5l6:
$ python train.py --img 1280 --batch 32 --epochs 10 --data coco128.yaml --weights yolov5l6.pt
Tail of The output from model training:
Epoch GPU_mem box_loss obj_loss cls_loss Instances Size
9/9 75.5G 0.02099 0.05281 0.006695 573 1280: 100%|██████████| 4/4 [00:03<00:00, 1.17it/s]
Class Images Instances P R mAP50 mAP50-95: 100%|██████████| 2/2 [00:01<00:00, 1.01it/s]
all 128 929 0.905 0.805 0.902 0.736
10 epochs completed in 0.031 hours.
Optimizer stripped from runs/train/exp/weights/last.pt, 154.9MB
Optimizer stripped from runs/train/exp/weights/best.pt, 154.9MB
Here we see that there are 2 pretrained model created from the training process last.pt and best.pt from corresponding output location.
We will use the best.pt weight for model inference:
To validate the model inference, we use the data from Kaggle
The Kaggle dataset can be found here: https://www.kaggle.com/competitions/open-images-2019-object-detection/data#
Using Kaggle API, one can simply download the dataset from CLI:
kaggle competitions download -c open-images-2019-object-detection
unzip the open-images-2019-object-detection.zip to get the test folder with 100000 images.
Inference using YOLOv5 for object detection with Kaggle data
The weight is used from pretrained model best.pt,
$ python detect.py --weights runs/train/exp/weights/best.pt --img 1280 --conf 0.25 --source ../test
The model output can be found in /run/detect/exp.
Sample model result:
Inference using YOLOv5 for object detection with video
We can also use YOLOv5 for video detection. From the sample video like this:
https://user-images.githubusercontent.com/43855029/222778747-b5312f6d-58c9-4f63-9233-93dfa65f8345.mp4
We run the inference with the best pretrained model using following command:
$ python detect.py --weights runs/train/exp/weights/best.pt --source video.mp4
output of the inference would look like:
detect: weights=['runs/train/exp/weights/best.pt'], source=../test/before_short.mp4, data=data/coco128.yaml, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False, vid_stride=1
YOLOv5 🚀 v7.0-56-gc0ca1d2 Python-3.8.13 torch-1.13.0a0+d0d6b1f CUDA:0 (NVIDIA A100-SXM4-80GB, 81251MiB)
Fusing layers...
Model summary: 157 layers, 7225885 parameters, 0 gradients, 16.4 GFLOPs
video 1/1 (1/120) /work/users/tuev/YOLO/test/before_short.mp4: 384x640 2 trains, 156.8ms
video 1/1 (2/120) /work/users/tuev/YOLO/test/before_short.mp4: 384x640 2 trains, 8.2ms
video 1/1 (3/120) /work/users/tuev/YOLO/test/before_short.mp4: 384x640 2 trains, 8.2ms
video 1/1 (4/120) /work/users/tuev/YOLO/test/before_short.mp4: 384x640 2 trains, 8.1ms
video 1/1 (5/120) /work/users/tuev/YOLO/test/before_short.mp4: 384x640 2 trains, 8.1ms
video 1/1 (6/120) /work/users/tuev/YOLO/test/before_short.mp4: 384x640 2 trains, 8.1ms
video 1/1 (7/120) /work/users/tuev/YOLO/test/before_short.mp4: 384x640 3 trains, 8.1ms
video 1/1 (8/120) /work/users/tuev/YOLO/test/before_short.mp4: 384x640 2 trains, 8.1ms
video 1/1 (9/120) /work/users/tuev/YOLO/test/before_short.mp4: 384x640 2 trains, 8.2ms
video 1/1 (10/120) /work/users/tuev/YOLO/test/before_short.mp4: 384x640 2 trains, 8.2ms
Speed: 0.3ms pre-process, 9.4ms inference, 2.2ms NMS per image at shape (1, 3, 640, 640)
Results saved to runs/detect/exp2
and the output video is saved in runs/detect/exp2 folder:
https://user-images.githubusercontent.com/43855029/222778650-f68c4a4f-ad51-4237-92a8-bfb0ad37cd54.mp4
Key Points
YOLOv5, object detection, inference
Using Transfer Learning with ResNet50
Overview
Teaching: 20 min
Exercises: 0 minQuestions
How to use apply transfer learning to detect object
Objectives
Apply ResNet50 model in transfer learning
The following lecture note based on NVIDIA’s Fundamental Introduction to Deep Learning course with different input data
Transfer Learning
So far, we have trained accurate models on large datasets, and also downloaded a pre-trained model that we used with no training necessary. But what if we cannot find a pre-trained model that does exactly what you need, and what if we do not have a sufficiently large dataset to train a model from scratch? In this case, there is a very helpful technique we can use called transfer learning.
With transfer learning, we take a pre-trained model and retrain it on a task that has some overlap with the original training task. A good analogy for this is an artist who is skilled in one medium, such as painting, who wants to learn to practice in another medium, such as charcoal drawing. We can imagine that the skills they learned while painting would be very valuable in learning how to draw with charcoal.
As an example in deep learning, say we have a pre-trained model that is very good at recognizing different types of cars, and we want to train a model to recognize types of motorcycles. A lot of the learnings of the car model would likely be very useful, for instance the ability to recognize headlights and wheels.
Transfer learning is especially powerful when we do not have a large and varied dataset. In this case, a model trained from scratch would likely memorize the training data quickly, but not be able to generalize well to new data. With transfer learning, you can increase your chances of training an accurate and robust model on a small dataset.
Here we just use a simple tensorflow conda environment or container:
$ srun -n1 -G1 --container-image $WORK/sqsh/nvidia+tensorflow+22.02-tf2-py3.sqsh --container-mounts=$WORK --time=12:00:00 --pty bash -i
Objective
- Prepare a pretrained model for transfer learning
- Perform transfer learning with your own small dataset on a pretrained model
- Further fine tune the model for even better performance
Urban or Rural
In this example, we would like to create a model to recognize urban and rural. The data is downloaded from here
Download the pre-trained model
The ImageNet pre-trained models are often good choices for computer vision transfer learning, as they have learned to classify various different types of images. In doing this, they have learned to detect many different types of features that could be valuable in image recognition.
Let us start by downloading the pre-trained model. Again, this is available directly from the Keras library. As we are downloading, there is going to be an important difference. The last layer of an ImageNet model is a dense layer of 1000 units, representing the 1000 possible classes in the dataset. In our case, we want it to make a different classification: is this urban or rural? Because we want the classification to be different, we are going to remove the last layer of the model. We can do this by setting the flag include_top=False
when downloading the model. After removing this top layer, we can add new layers that will yield the type of classification that we want:
from tensorflow.keras.applications.resnet50 import ResNet50
base_model = ResNet50(
weights='imagenet', # Load weights pre-trained on ImageNet.
input_shape=(224, 224, 3),
include_top=False)
base_model.summary()
Freezing the Base Model
Before we add our new layers onto the pre-trained model, we should take an important step: freezing the model’s pre-trained layers. This means that when we train, we will not update the base layers from the pre-trained model. Instead we will only update the new layers that we add on the end for our new classification. We freeze the initial layers because we want to retain the learning achieved from training on the ImageNet dataset. If they were unfrozen at this stage, we would likely destroy this valuable information. There will be an option to unfreeze and train these layers later, in a process called fine-tuning.
Freezing the base layers is as simple as setting trainable on the model to False
.
base_model.trainable = False
Adding new layer
We can now add the new trainable layers to the pre-trained model. They will take the features from the pre-trained layers and turn them into predictions on the new dataset. We will add two layers to the model. First will be a pooling layer like we saw in our earlier convolutional neural network. (If you want a more thorough understanding of the role of pooling layers in CNNs, please read this detailed blog post). We then need to add our final layer, which will classify urban or rural. This will be a densely connected layer with one output.
from tensorflow import keras
inputs = keras.Input(shape=(224, 224, 3))
# Separately from setting trainable on the model, we set training to False
x = base_model(inputs, training=False)
x = keras.layers.GlobalAveragePooling2D()(x)
# A Dense classifier with a single unit (binary classification)
outputs = keras.layers.Dense(1)(x)
model = keras.Model(inputs, outputs)
model.summary()
Keras gives us a nice summary here, as it shows the vgg16 pre-trained model as one unit, rather than showing all of the internal layers. It is also worth noting that we have many non-trainable parameters as we have frozen the pre-trained model.
Compile the model
As with our previous exercises, we need to compile the model with loss and metrics options. We have to make some different choices here. In previous cases we had many categories in our classification problem. As a result, we picked categorical crossentropy for the calculation of our loss. In this case we only have a binary classification problem (Urban or Rural), and so we will use binary crossentropy. Further detail about the differences between the two can found here. We will also use binary accuracy instead of traditional accuracy.
By setting from_logits=True
we inform the loss function that the output values are not normalized (e.g. with softmax).
# Important to use binary crossentropy and binary accuracy as we now have a binary classification problem
model.compile(loss=keras.losses.BinaryCrossentropy(from_logits=True), metrics=[keras.metrics.BinaryAccuracy()])
Augmenting the data
Now that we are dealing with a very small dataset, it is especially important that we augment our data. As before, we will make small modifications to the existing images, which will allow the model to see a wider variety of images to learn from. This will help it learn to recognize new pictures of Urban/Rural instead of just memorizing the pictures it trains on.
from tensorflow.keras.preprocessing.image import ImageDataGenerator
# create a data generator
datagen = ImageDataGenerator(
samplewise_center=True, # set each sample mean to 0
rotation_range=10, # randomly rotate images in the range (degrees, 0 to 180)
zoom_range = 0.1, # Randomly zoom image
width_shift_range=0.1, # randomly shift images horizontally (fraction of total width)
height_shift_range=0.1, # randomly shift images vertically (fraction of total height)
horizontal_flip=True, # randomly flip images
vertical_flip=False) # we don't expect the image to be taken upsizedown
Loading the data
We have seen datasets in a couple different formats so far. In the MNIST exercise, we were able to download the dataset directly from within the Keras library. For the sign language dataset, the data was in CSV files. For this exercise, we are going to load images directly from folders using Keras’ flow_from_directory
function. We have set up our directories to help this process go smoothly as our labels are inferred from the folder names. In the data
directory, we have train and validation directories, which each have folders for images of Urban or Rural. Feel free to explore the images to get a sense of our dataset.
Note that flow_from_directory will also allow us to size our images to match the model: 244x244 pixels with 3 channels.
# load and iterate training dataset
train_it = datagen.flow_from_directory('data/train/',
target_size=(224, 224),
color_mode='rgb',
class_mode='binary',
batch_size=8)
# load and iterate validation dataset
valid_it = datagen.flow_from_directory('data/val/',
target_size=(224, 224),
color_mode='rgb',
class_mode='binary',
batch_size=8)
Training the model
Time to train our model and see how it does. Recall that when using a data generator, we have to explicitly set the number of steps_per_epoch
:
model.fit(train_it, steps_per_epoch=12, validation_data=valid_it, validation_steps=4, epochs=20)
Discussion of Results
Both the training and validation accuracy should be quite high. This is a pretty awesome result! We were able to train on a small dataset, but because of the knowledge transferred from the ImageNet model, it was able to achieve high accuracy and generalize well. This means it has a very good sense of Urban and Rural
If you saw some fluctuation in the validation accuracy, that is okay too. We have a technique for improving our model in the next section.
Fine tuning the model
Now that the new layers of the model are trained, we have the option to apply a final trick to improve the model, called fine-tuning. To do this we unfreeze the entire model, and train it again with a very small learning rate. This will cause the base pre-trained layers to take very small steps and adjust slightly, improving the model by a small amount.
Note that it is important to only do this step after the model with frozen layers has been fully trained. The untrained pooling and classification layers that we added to the model earlier were randomly initialized. This means they needed to be updated quite a lot to correctly classify the images. Through the process of backpropagation, large initial updates in the last layers would have caused potentially large updates in the pre-trained layers as well. These updates would have destroyed those important pre-trained features. However, now that those final layers are trained and have converged, any updates to the model as a whole will be much smaller (especially with a very small learning rate) and will not destroy the features of the earlier layers.
Let’s try unfreezing the pre-trained layers, and then fine tuning the model:
# Unfreeze the base model
base_model.trainable = True
# It's important to recompile your model after you make any changes
# to the `trainable` attribute of any inner layer, so that your changes
# are taken into account
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate = .00001), # Very low learning rate
loss=keras.losses.BinaryCrossentropy(from_logits=True),
metrics=[keras.metrics.BinaryAccuracy()])
model.fit(train_it, steps_per_epoch=12, validation_data=valid_it, validation_steps=4, epochs=10)
Examine the Prediction
Now that we have a well-trained model, it is time to create the model to detect Urban or Rural We can start by looking at the predictions that come from the model.
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from tensorflow.keras.preprocessing import image as image_utils
from tensorflow.keras.applications.imagenet_utils import preprocess_input
def show_image(image_path):
image = mpimg.imread(image_path)
plt.imshow(image)
def make_predictions(image_path):
show_image(image_path)
image = image_utils.load_img(image_path, target_size=(224, 224))
image = image_utils.img_to_array(image)
image = image.reshape(1,224,224,3)
image = preprocess_input(image)
preds = model.predict(image)
return preds
make_predictions('data/val/urban/urban_20.jpeg')
make_predictions('data/val/rural/rural5.jpeg')
It looks like a negative number prediction means that it is Rural and a positive number prediction means it is Urban. We can use this information to differentiate these scenary
def detect_img(image_path):
preds = make_predictions(image_path)
if preds[0]<0:
print("It's Rural! So freshy")
else:
print("It's Urban! So developed!")
import numpy as np
detect_img('data/val/rural/rural15.jpeg')
detect_img('data/val/urban/urban_40.jpeg')
Summary
Great work! With transfer learning, you have built a highly accurate model using a very small dataset. This can be an extremely powerful technique, and be the difference between a successful project and one that cannot get off the ground. We hope these techniques can help you out in similar situations in the future!
There is a wealth of helpful resources for transfer learning in the NVIDIA Transfer Learning Toolkit.
Key Points
ResNet50, object detection, transfer learning
Using Stable Diffusion with SuperPOD
Overview
Teaching: 20 min
Exercises: 0 minQuestions
How to use Stable Diffusion model
Objectives
Learn how to download and install Stable Diffusion from HuggingFace
User can now access to Stable Diffusion from HuggingFace but still utilizing the power of SPOD’s A100 GPU to inference the data with any incoming prompt. The following take an example from Stable Diffusion model from HuggingFace
- First of all, download the library:
pip install diffusers --upgrade
- Then use the following command with prompt to generate images:
from diffusers import DiffusionPipeline
import torch
pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, use_safetensors=True, variant="fp16")
pipe.to("cuda")
# if using torch < 2.0
# pipe.enable_xformers_memory_efficient_attention()
prompt = "An astronaut riding a green horse"
images = pipe(prompt=prompt).images[0]
Key Points
Stable Diffusion, Prompt, HuggingFace
Using Pre-trained model from HuggingFace
Overview
Teaching: 20 min
Exercises: 0 minQuestions
How to use pre-trained model already available from Hugging Face hub
Objectives
To master the usage of pre-trained deep learning model from Hugging Face
Hugging Face hub
- Hugging Face hub is considered to be Github of Machine Learning
- It is a platform with over 120k models, 20k datasets and 50k demo apps, All open source and publicly available
- All in one platform where people can easily deploy, collaborate and build ML model
Transformers library
- The Transformers library, developed by Hugging Face, has played a significant role in making state-of-the-art NLP models more accessible to researchers and developers.
- It includes pre-trained models like BERT, GPT, RoBERTa, and more, which can be fine-tuned for specific tasks such as text classification, language generation, question answering, and more.
- The library offers a consistent API for various NLP tasks, making it easier for practitioners to experiment with and deploy these models.
Model task
The screenshot below describes the model task from Hugging Face that covers many different aspecs from Computer Vision to NLP, Audio or Reinforcement Learning
Pipeline for inference
- The
pipeline()
makes it simple to use any model from the Hub for inference on any language, computer vision, speech, and multimodal tasks. - Even if you don’t have experience with a specific modality or aren’t familiar with the underlying code behind the models, you can still use them for inference with the pipeline()!
Pipeline for NLP Sentiment Analysis
from transformers import pipeline
classifier = pipeline("sentiment-analysis")
classifier("I am so excited to use the new SuperPOD from NVIDIA")
[{'label': 'POSITIVE', 'score': 0.9995261430740356}]
classifier(
["I am so excited to use the new SuperPOD from NVIDIA", "I hate running late"])
[{'label': 'POSITIVE', 'score': 0.9995261430740356},
{'label': 'NEGATIVE', 'score': 0.9943193197250366}]
Pipeline Text Generation
from transformers import pipeline
generator = pipeline("text-generation")
generator("Using SMU latest HPC cluster NVIDIA SuperPOD, you will be able to")
[{'generated_text': 'Using SMU latest HPC cluster NVIDIA SuperPOD, you will be able to connect to other SSE nodes such as the following and use them as a HPC node:\n\n[CPU: CPU1, GIGABYTE'}]
Pipeline for Mask filling
from transformers import pipeline
unmasker = pipeline("fill-mask")
unmasker("This course will teach you all about <mask> models.", top_k=2)
[{'score': 0.19619698822498322,
'token': 30412,
'token_str': ' mathematical',
'sequence': 'This course will teach you all about mathematical models.'},
{'score': 0.04052705690264702,
'token': 38163,
'token_str': ' computational',
'sequence': 'This course will teach you all about computational models.'}]
Pipeline for Name Entity Recognition
from transformers import pipeline
ner = pipeline("ner", grouped_entities=True)
ner("My name is Tue Vu and I work at SMU in Dallas")
[{'entity_group': 'PER',
'score': 0.9868829,
'word': 'Tue Vu',
'start': 11,
'end': 17},
{'entity_group': 'ORG',
'score': 0.9965092,
'word': 'SMU',
'start': 32,
'end': 35},
{'entity_group': 'LOC',
'score': 0.9950755,
'word': 'Dallas',
'start': 39,
'end': 45}]
Pipeline for Question Answering
from transformers import pipeline
question_answerer = pipeline("question-answering")
question_answerer(
question="Where do I work?",
context="My name is Tue Vu and I work at SMU in Dallas",
)
{'score': 0.3651700019836426, 'start': 32, 'end': 35, 'answer': 'SMU'}
Pipeline for Conversational
from transformers import pipeline, Conversation
converse = pipeline("conversational")
conversation_1 = Conversation("What do you think about using HPC SuperPOD")
conversation_2 = Conversation("Do you believe in God?")
converse([conversation_1, conversation_2])
Answer:
[Conversation id: 44cf473c-29f2-4b44-be6c-15352dab13a2
user >> What do you think about using HPC SuperPOD
bot >> I think it's a good idea, but I don't think it's a good idea to use it for a lot of things. ,
Conversation id: 489d923c-f127-4847-8cde-972c77470230
user >> What do you do to optimize the Python workflow?
bot >> I believe in the power of love.]
Pipeline for Computer Vision - Image Classification
from transformers import pipeline
clf = pipeline("image-classification")
Display the image:
import urllib.request
from io import BytesIO
url = 'https://t4.ftcdn.net/jpg/02/66/72/41/360_F_266724172_Iy8gdKgMa7XmrhYYxLCxyhx6J7070Pr8.jpg'
with urllib.request.urlopen(url) as url:
img = Image.open(BytesIO(url.read()))
img
Model Inference
clf(img)
[{'score': 0.49216628074645996, 'label': 'Egyptian cat'},
{'score': 0.41306015849113464, 'label': 'tabby, tabby cat'},
{'score': 0.050162095576524734, 'label': 'tiger cat'},
{'score': 0.012556081637740135, 'label': 'lynx, catamount'},
{'score': 0.00524393143132329, 'label': 'ping-pong ball'}]
Key Points
Hugging Face, pre-trained, pipeline