Welcome to the Snorlax cluster

Getting access

Please contact tcauduro@uwaterloo.ca for an account.

Before making an account request, please load an ssh key at https://authman.uwaterloo.ca

Contact

If you require assistance while using Snorlax, you can contact one of the following:

Tom Cauduro: tcauduro@uwaterloo.ca

Introduction

Welcome to Snorlax.

This documentation serves as a guide to understanding and utilizing Snorlax, a cluster managed through the Slurm workload manager.

Shared GPU Resources

Snorlax offers shared access to GPUs across six nodes, each offering two RTX A6000 GPUs.

Slurm simplifies the user experience by allowing you to submit, monitor, and manage your computational jobs seamlessly. Through straightforward command-line interfaces, you can submit batch jobs, specify resource requirements, and monitor job progress. Slurm ensures fair resource allocation, allowing you to focus on your research without the complexity of manual resource management.

If you're new to shared GPU computing and Slurm, getting started with Snorlax is a breeze! Think of Snorlax as your personal computing powerhouse, shared with other users when their GPUs are idle. To begin, follow these simple steps:

Login: Access snorlax-login.cs.uwaterloo.ca using your credentials.
Submit a Job: Utilize the sbatch command to submit your script that you wish to run. Think of it like asking the server to perform specific computations for you using specific ressources (like how many GPUs, how much memory ...).
Monitor Progress: Check the status of your job with the squeue command and see how it's progressing. You can use squeue to view the job queue and monitor job details.
Enjoy: Your job will be run by the server as soon as the resources asked are available.

For more indepth inforamtion, visit this page.

Happy Computing!

Introduction

An SSH server can authenticate clients using a variety of different methods. The most basic of these is password authentication, which is easy to use, but not the most secure.

Although passwords are sent to the server in a secure manner, they are generally not complex or long enough to be resistant to repeated, persistent attackers.

SSH key pairs are two cryptographically secure keys that can be used to authenticate a client to an SSH server. Each key pair consists of a public key and a private key.

Step 1 — Creating SSH Keys

The first step to configure SSH key authentication to your server is to generate an SSH key pair on your local computer.

ssh-keygen -t ed25519
Generating public/private ed25519 key pair.
Enter file in which to save the key (/Users/username/.ssh/id_ed25519): 
Enter passphrase (empty for no passphrase):

The utility will prompt you to select a location for the keys that will be generated. By default, the keys will be stored in the ~/.ssh directory within your user’s home directory. The private key will be called id_ed25519 and the associated public key will be called id_ed25519.pub.

Usually, it is best to stick with the default location at this stage. Doing so will allow your SSH client to automatically find your SSH keys when attempting to authenticate. If you would like to choose a non-standard path, type that in now, otherwise, press ENTER to accept the default.

If you had previously generated an SSH key pair, you may see a prompt that looks like this:

/home/username/.ssh/id_ed2551 already exists.
Overwrite (y/n)?

If you choose to overwrite the key on disk, you will not be able to authenticate using the previous key anymore. Be very careful when selecting yes, as this is a destructive process that cannot be reversed.

Created directory '/home/username/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:

Next, you will be prompted to enter a passphrase for the key.

A passphrase is optional. If you enter one, you will have to provide it every time you use this key (unless you are running SSH agent software that stores the decrypted key). You can press ENTER to bypass this prompt.

Your identification has been saved in id_ed25519
Your public key has been saved in id_ed25519.pub
The key fingerprint is:
SHA256:94gxE94XEk+6K+P816TJPO93LPuKKa/4NY4DUIIytnw username@localhost.localnet
The key's randomart image is:
+--[ED25519 256]--+
|     .    . .    |
|  + . . .  =     |
| o +   o. o o    |
|  o E .. o o .   |
|   .   .S + .    |
|        .* =  .  |
|        +.oo+= . |
|       o +o+B+o +|
|        +o=*=+=*o|
+----[SHA256]-----+

You now have a public and private key that you can use to authenticate. The next step is to place the public key on the server so that you can use SSH key authentication to log in.

Step 2 — Copying an SSH Public Key to Your Server

There are multiple ways to upload your public key to your remote SSH server. The method you use depends largely on the tools you have available and the details of your current configuration.

Copying Your Public Key Using `ssh-copy-id`

The simplest way to copy your public key to an existing server is to use a utility called ssh-copy-id. Because of its simplicity, this method is recommended if available.

To use the utility, you need to specify the remote host that you would like to connect to, and the user account that you have password-based SSH access to. This is the account where your public SSH key will be copied.

ssh-copy-id username@snorlax-login.cs.uwaterloo.ca
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
username@snorlax-login.cs.uwaterloo.ca's password:

Type in the password (your typing will not be displayed for security purposes) and press ENTER. The utility will connect to the account on the remote host using the password you provided. It will then copy the contents of your ~/.ssh/id_ed25519.pub key into a file in the remote account’s home ~/.ssh directory called authorized_keys.

You will see output that looks like this:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'username@snorlax-login.cs.uwaterloo.ca'"
and check to make sure that only the key(s) you wanted were added.

At this point, your id_ed25519.pub key has been uploaded to the remote account. You can continue onto the next section.

Copying Your Public Key Using SSH

If you do not have ssh-copy-id available, but you have password-based SSH access to an account on your server, you can upload your keys using a conventional SSH method.

We can do this by outputting the content of our public SSH key on our local computer and piping it through an SSH connection to the remote server. On the other side, we can make sure that the ~/.ssh directory exists under the account we are using and then output the content we piped over into a file called authorized_keys within this directory.

We will use the >> redirect symbol to append the content instead of overwriting it. This will let us add keys without destroying previously added keys.

cat ~/.ssh/id_ed25519.pub | ssh username@snorlax-login.cs.uwaterloo.ca "cat >> ~/.ssh/authorized_keys"
username@snorlax-login.cs.uwaterloo.ca's password:

After entering your password, the content of your id_ed25519.pub key will be copied to the end of the authorized_keys file of the remote user’s account. Continue to the next section if this was successful.

Step 3 — Authenticating to Your Server Using SSH Keys

If you have successfully completed one of the procedures above, you should be able to log into the remote host without the remote account’s password.

The process is mostly the same:

ssh username@snorlax-login.cs.uwaterloo.ca

If you did not supply a passphrase for your private key, you will be logged in immediately. If you supplied a passphrase for the private key when you created the key, you will be required to enter it now. Afterwards, a new shell session will be created for you with the account on the remote system.

SLURM

SLURM (formerly, Simple Linux Utility for Resource Management) is an application for managing tasks on computer systems.

The official SLURM cheatsheet can be found HERE

SLURM has two main operating modes: batch and interactive. Batch is the preferred mode for watgpu.cs: you can start your task and get email when it is complete. If you require an interactive session (useful for debugging) with snorlax.cs hardware, see the salloc section below

Batch mode usage

You can submit jobs using an SLURM job script. Below is an example of a simple script: :warning: #SBATCH is the trigger word for slurm to take into account your arguments. If you wish to disable a line consider using ##SBATCH, do not remove # at the begining of the lines.

#!/bin/bash

# To be submitted to the SLURM queue with the command:
# sbatch batch-submit.sh

# Set resource requirements: Queues are limited to seven day allocations
# Time format: HH:MM:SS
#SBATCH --time=00:15:00
#SBATCH --mem=10GB
#SBATCH --cpus-per-task=4
#SBATCH --gres=gpu:1

# Set output file destinations (optional)
# By default, output will appear in a file in the submission directory:
# slurm-$job_number.out
# This can be changed:
#SBATCH -o JOB%j.out # File to which STDOUT will be written
#SBATCH -e JOB%j.out # File to which STDERR will be written

# email notifications: Get email when your job starts, stops, fails, completes...
# Set email address
#SBATCH --mail-user=(email address where notifications are delivered to)
# Set types of notifications (from the options: BEGIN, END, FAIL, REQUEUE, ALL):
#SBATCH --mail-type=ALL
 
# Load up your conda environment
# Set up environment on snorlax-login.cs or in interactive session (use `source` keyword instead of `conda`)
source activate <env>
 
# Task to run

~/cuda-samples/Samples/5_Domain_Specific/nbody/nbody -benchmark -device=0 -numbodies=16777216

You can use SBATCH variables like --mem for example the one above will assign 10GB of RAM to the job.

For CPU cores allocation, you can use --cpus-per-task , for example the one above will assign 4 cores to the job. The --gres=gpu:1 will assign 1x GPU to your job.

Running the script

To run the script, simply run sbatch your_script.sh on snorlax.cs

Interactive mode usage

You can book/reserve resources on the cluster using the salloc command. Below is an example:

salloc --gres=gpu:2 --cpus-per-task=4 --mem=16G --time=2:00:00

The example above will reserve 2 GPUs, 4 CPU cores, and 16GB of RAM for 2 hours. Once you run the command, it will output the name of the host like so:

salloc: Nodes snorlax-1 are ready for job

here snorlax-1 is the assigned host that the user can SSH to.

Ideally you want to run this command in either a screen or tmux session on snorlax-login.cs

Queues

To look at the queue of jobs currently, you can use squeue to display it.

The command scurrent will also give all the current job running on snorlax. It is useful to see which ressource are in use at the moment.

Monitoring jobs

Current jobs

By default squeue will show all the jobs the scheduler is managing at the moment. It will run much faster if you ask only about your own jobs with

$ squeue -u $USER

You can show only running jobs, or only pending jobs:

$ squeue -u <username> -t RUNNING  
$ squeue -u <username> -t PENDING

You can show detailed information for a specific job with scontrol:

$ scontrol show job -dd <jobid>

Do not run squeue from a script or program at high frequency, e.g., every few seconds. Responding to squeue adds load to Slurm, and may interfere with its performance or correct operation.

Cancelling jobs

Use scancel with the job ID to cancel a job:

$ scancel <jobid>

You can also use it to cancel all your jobs, or all your pending jobs:

$ scancel -u $USER

$ scancel -t PENDING -u $USER

Accessing Jupyter notebooks in the cluster

From a (Linux/OSX) terminal ssh to snorlax-login.cs:

ssh user@snorlax-login.cs.uwaterloo.ca

Preliminaries: Create a conda environment that includes the jupyter server

Also add required conda packages for your working environment e.g. pytorch:

conda init
source ~/.bashrc
conda create -y -n jupyter-server
conda activate jupyter-server
conda install -c conda-forge pytorch-gpu
pip install jupyter
conda deactivate

Using jupyter notebooks with your environment

Make an interactive reservation with the SLURM scheduler:

salloc --gres=gpu:1 --cpus-per-task=4 --mem=32G --time=1:00:00

Once the reservation starts, ssh to the allocated compute system e.g. snorlax-2:

ssh snorlax-2

Activate your jupyter-server environment and start a jupyter notebook:

jupyter notebook --ip=$PUBLIC_IP --no-browser
...
[I 19:07:49.062 NotebookApp] Serving notebooks from local directory: /u1/tcauduro
[I 19:07:49.063 NotebookApp] Jupyter Notebook 6.5.4 is running at:
[I 19:07:49.063 NotebookApp] http://129.97.26.146:8888/?token=ffd03731fff0a09fe01eb3b536a3076127439be446095a62
[I 19:07:49.063 NotebookApp]  or http://127.0.0.1:8888/?token=ffd03731fff0a09fe01eb3b536a3076127439be446095a62
[I 19:07:49.063 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 19:07:49.072 NotebookApp]

    To access the notebook, open this file in a browser:
        file:///u1/tcauduro/.local/share/jupyter/runtime/nbserver-105349-open.html
    Or copy and paste one of these URLs:
        http://129.97.26.146:8888/?token=ffd03731fff0a09fe01eb3b536a3076127439be446095a62
     or http://127.0.0.1:8888/?token=ffd03731fff0a09fe01eb3b536a3076127439be446095a62

Copy the link with the 8888 port to your broswer. Your juptyer notebook will be available.

Be sure to shut down the server when done with Control-c.

Custom kernels

There is a single default kernel at the moment: "Python 3". You can also create your own kernels by opening a Terminal inside the notebook:

Once you've opened the terminal you can create your own kernel. Below is an example:

conda create --name myenv # create a custom conda environment which will install packages to, and add to the notebook as a kernel

conda install --yes numpy # install a package you want

conda install -c anaconda ipykernel #install ipykernel which we will use to add kernel to notebook

python -m ipykernel install --user --name=myenv # add the conda environment as a kernel

VSCode Tutorial: Working with Snorlax

In this tutorial, we'll walk you through the process of setting up and using Visual Studio Code (VSCode) to work with Snorlax. This guide will cover connecting to the login gateway and accessing specific clusters (e.g., snorlax-1, snorlax-2, ...) after allocating resources.

Note that this method of connecting to Snorlax should be used for debugging and understanding your code via testing and notebooks. If you wish to run long experiments, please use the SBATCH command.

Prerequisites

Before you begin, make sure you have the following prerequisites installed:

Quick steps to install the Remote - SSH extension:

Open Visual Studio Code
Click on the Extensions icon in the Activity Bar on the side of the window.
Search for "Remote - SSH" in the Extensions view search box.
Install the "Remote - SSH" extension.

Step 1: Configuring VSCode

Once the extension is installed, follow these steps to connect to the login gateway (replace your_username with your actual username):

Press Ctrl + Shift + P (Windows/Linux) or Cmd + Shift + P (Mac) to open the command palette.
Type "Remote-SSH: Open SSH Configuration File ...", select it and select the desired file (recommanded file: ~/.ssh/config).

Insert the following part and replace <username> and <path/to/private/key> with your username and the path to your private key registered in Snorlax:

#Snorlax
Host snorlax-login
    User <username>
    IdentityFile <path/to/private/key>
    HostName snorlax-login.cs.uwaterloo.ca

Host snorlax-1
    User <username>
    IdentityFile <path/to/private/key>
    HostName snorlax-1
    ProxyJump snorlax-login

Host snorlax-2
    User <username>
    IdentityFile <path/to/private/key>
    HostName snorlax-2
    ProxyJump snorlax-login

Host snorlax-3
    User <username>
    IdentityFile <path/to/private/key>
    HostName snorlax-3
    ProxyJump snorlax-login

Host snorlax-4
    User <username>
    IdentityFile <path/to/private/key>
    HostName snorlax-4
    ProxyJump snorlax-login

Host snorlax-5
    User <username>
    IdentityFile <path/to/private/key>
    HostName snorlax-5
    ProxyJump snorlax-login

Host snorlax-6
    User <username>
    IdentityFile <path/to/private/key>
    HostName snorlax-6
    ProxyJump snorlax-login

Save and refresh the Remote - SSH extension menu.

Step 2: Allocate Resources and Connect to Cluster

After successfully connecting via ssh to snorlax-login.cs.uwaterloo.ca, you'll need to allocate resources using salloc and then connect to a specific cluster:

Follow the Interractive mode part of the documentation to obtain ressource allocation.
Once resources are allocated, connect to the appropriate cluster in VSCode:

That's it! You are now set up to work with Snorlax using Visual Studio Code.