0 Preface

Some time ago, I was tortured by some cumbersome processes about configuring the deep learning environment on Compute Canada. This process involved numerous compatibility issues, conflicting dependencies, which made it time-consuming and frustrating. Recently, I spent some time in learning Apptainer, the Container recommended on Compute Canada, and wrote this blog to note the key points down.

1 Introduction

The technologies of Container, including platforms like Docker and Apptainer, provide a method to package, distribute, and run applications in a standardized environment, ensuring consistency across different computing environments. (from ChatGPT)

Definition: Containers are lightweight, portable units that package an application and its dependencies, libraries, and configuration files, allowing the application to run reliably in different computing environments.
Isolation: They provide a level of isolation between applications, meaning multiple containers can run on the same host without interfering with each other.
Efficiency: Containers share the host operating system’s kernel, making them more efficient than traditional virtual machines, which require a full OS stack.

1-1 Concepts Comparisons between Docker and Apptainer

1-1-1 Concepts in Docker

Docker is one of the most popular containerization platforms that simplifies the process of creating, deploying, and managing containers.

Key Features:

Docker Images: Read-only templates used to create containers. An image includes the application code, libraries, and dependencies.
Docker Hub: A cloud-based registry where developers can share and distribute Docker images.
Docker Compose: A tool for defining and managing multi-container applications with a single YAML configuration file.
Docker Swarm: Native clustering and orchestration solution for managing a group of Docker engines as a single virtual host.

Differences between Docker Image and Docker Container

Key differeces:

Aspect	Docker Image	Docker Container
Definition	A read-only template for creating containers	A running instance of a Docker image
State	Immutable	Mutable during runtime
Function	Blueprint or recipe	Execution of the image as a running environment
Storage	Stored in Docker registries	Lives in memory when running
Purpose	To provide the environment setup for containers	To run applications in an isolated environment
Lifecycle	Static and reusable	Dynamic and has a start/stop/remove lifecycle

1-1-2 Concepts in Apptainer

Apptainer is specifically designed for high-performance computing (HPC) and scientific workloads, allowing users to create and run containers in environments where Docker might not be suitable.

Key Features:

User-Focused: Unlike Docker, which requires root privileges to manage containers, Apptainer allows users to create and run containers without needing elevated permissions, making it safer for multi-user environments.
Integration with HPC: Apptainer is optimized for HPC environments, enabling users to leverage existing tools and resources.
Image Formats: Supports different image formats, allowing for flexibility in how images are created and used.
Simplicity: Focuses on simplicity and ease of use, making it suitable for researchers and scientists who may not have extensive experience with containerization.

1-2 Comparisons between Docker and Apptainer

There are some differences between the concepts in Docker and Apptainer.

Feature	Docker	Apptainer (Singularity)
Image	composed of multiple layers (i.e., represent incremental changes for building) with Union File System;	a single, read-only file `*.sif`;
Container	Mutable, changes can persist in the container;	immutable by default, easy reproducibility
Target Audience	Developers, DevOps, microservices	Researchers, HPC environments
Security	Requires root privileges, runs as root	User-level execution, no root required
Image Format	Layered images (multi-step build)	Single-file `.sif` images for portability
Container Creation	Built with `Dockerfile`, mutable containers	Can use Docker images, runs immutable containers
Ecosystem	Large ecosystem, Docker Hub, Kubernetes	Focused on HPC, integrates with cluster schedulers
Performance	Small overhead, optimized for microservices	Low overhead, optimized for HPC performance
Portability	Portable across environments with Docker installed	Highly portable, especially in HPC environments
Reproducibility	Good for general use, less suited for exact replication	Designed for exact scientific reproducibility

Generally speaking, Apptainer is a Secure Alternative to Docker and it is adopted on many scientific computer clustering, like Digital Research Alliance of Canada. That is because Docker images are not secure because they provide a means to gain root access to the system they are running on.

2 Docker (with Nvidia-Container-Toolkit)

Remark: Since version=19.03, Docker began to support NVIDIA GPU, we don’t need to install ~~nvidia-docker~~ separately. We should use Docker + Nvidia-Container-Toolkit (documentation).

2-1 Installation

2-1-1 Install NVIDIA GPU driver & CUDA

# Step 1: Prepare Your System
sudo apt update && sudo apt upgrade -y
sudo apt-get purge nvidia* -y

# Step 2: Install NVIDIA GPU Driver
sudo add-apt-repository ppa:graphics-drivers/ppa -y
sudo apt update
sudo ubuntu-drivers autoinstall
sudo reboot

# Verify the installation
nvidia-smi

# Step 3: Install CUDA Toolkit
# Replace with the appropriate download link for your CUDA version
CUDA_VERSION=12.6.3
wget https://developer.download.nvidia.com/compute/cuda/${CUDA_VERSION}/local_installers/cuda-repo-ubuntu2204-${CUDA_VERSION}-local_${CUDA_VERSION}-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-${CUDA_VERSION}-local_${CUDA_VERSION}-1_amd64.deb
sudo apt-key add /var/cuda-repo-ubuntu2204-${CUDA_VERSION}-local/cuda-archive-keyring.gpg
sudo apt update
sudo apt install cuda -y
sudo reboot

# Step 4: Set Environment Variables
echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc

# Step 5: Verify CUDA Installation
nvcc --version

2-1-2 Install Docker

# Step 1: Update Your System
sudo apt update && sudo apt upgrade -y

# Step 2: Install Required Packages
sudo apt install apt-transport-https ca-certificates curl software-properties-common -y

# Step 3: Add Docker’s Official GPG Key
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

# Step 4: Add Docker’s Official Repository
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"

# Step 5: Update Package Index Again
sudo apt update

# Step 6: Install Docker
sudo apt install docker-ce -y

# Step 7: Start and Enable Docker
sudo systemctl start docker
sudo systemctl enable docker

# Step 8: Verify Docker Installation
docker --version

# Step 9: (Optional) Run Docker as Non-Root User
sudo usermod -aG docker $USER
newgrp docker

# Verify Docker without sudo
docker run hello-world

2-1-3 Install Nvidia-Container-Toolkit

References: install guid for Nvidia-Container-Toolkit

Configure the production repository:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

# [Optional] configure the repository to use experimental packages:
sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list

# Update the packages list from the repository:
sudo apt-get update

# Install the NVIDIA Container Toolkit packages:
sudo apt-get install -y nvidia-container-toolkit

Configure container runtime as Docker

# Configure the container runtime by using the `nvidia-ctk` command:
sudo nvidia-ctk runtime configure --runtime=docker

The nvidia-ctk command modifies the /etc/docker/daemon.json file on the host. The file is updated so that Docker can use the NVIDIA Container Runtime.

Restart the Docker daemon:

sudo systemctl restart docker

[Rootless mode]

To configure the container runtime for Docker running in Rootless mode, follow these steps:

Configure the container runtime by using the nvidia-ctk command:

nvidia-ctk runtime configure --runtime=docker --config=$HOME/.config/docker/daemon.json

# Restart the Rootless Docker daemon:
systemctl --user restart docker

# Configure ```/etc/nvidia-container-runtime/config.toml``` by using the `sudo nvidia-ctk` command:
sudo nvidia-ctk config --set nvidia-container-cli.no-cgroups --in-place

2-2 Create Images and Containers

2-2-1 Pull Image from remote

Pull an Image from remote

sudo docker pull nvidia/cuda:12.6.3-cudnn-runtime-ubuntu24.04

2-2-2 Build Image with Dockerfile

Build the image from Dockerfile:

sudo docker build --build-arg CUDA_VERSION=12.6.3 --build-arg OS_VERSION=24.04 -t [Img_Name][:[Tag]] -f ./Dockerfile .

Do not ignore the last . in the above command.

A template of Dockerfile:

# Dockerfile

# Accept CUDA and OS version as build arguments
ARG CUDA_VERSION=12.6.3
ARG OS_VERSION=24.04

# Use build args in the FROM statement
FROM nvidia/cuda:${CUDA_VERSION}-cudnn-runtime-ubuntu${OS_VERSION}

# Set the working directory inside the container
WORKDIR /workspace
ENV PROJECT_DIR=${WORKDIR}/project # remember to `docker run -v <host_path>:<container_path>`
ENV VENV_DIR=${WORKDIR}/venv

# update and upgrade
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential \
    wget \
    curl \
    git \
    vim \
    ca-certificates \
    python3 \
    python3-pip \
    python3-dev \
    python3-virtualenv \
    zsh \
    && rm -rf /var/lib/apt/lists/*

# Configure zsh
RUN sh -c "$(wget https://raw.githubusercontent.com/ohmyzsh/ohmyzsh/master/tools/install.sh -O -)" --unattended
RUN chsh -s $(which zsh)
RUN git clone https://github.com/caiogondim/bullet-train.zsh.git ~/.oh-my-zsh/themes/bullet-train.zsh
RUN cp ~/.oh-my-zsh/themes/bullet-train.zsh/bullet-train.zsh-theme ~/.oh-my-zsh/themes
RUN sed -i 's/robbyrussell/bullet-train/g' ~/.zshrc
RUN sed -i '$a\# use command-not-found package\n[[ -a "/etc/zsh_command_not_found" ]] && \. /etc/zsh_command_not_found\n' ~/.zshrc
RUN zsh

# create python virtual environment
RUN ln -s /usr/bin/python3 /usr/bin/python
RUN python -m virtualenv $VENV_DIR
## rather than "source xxx/activate", activate venv by setting $PATH
ENV PATH="$VENV_DIR/bin:$PATH"
RUN pip install --upgrade pip
RUN pip install -r ${PROJECT_DIR}/requirements.txt
RUN pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126

# Set environment variables for CUDA
ENV CUDA_HOME=/usr/local/cuda
ENV PATH=$CUDA_HOME/bin:$PATH
ENV LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH

# ENTRYPOINT specifies the main executable (e.g., python).
ENTRYPOINT ["python"] 
# CMD provides default arguments to the ENTRYPOINT (e.g., main.py), which can be overridden at runtime.
CMD ["main.py"]
### Will always run `python`
### if `docker run <Image_name>`, will execute `python main.py`
### if `docker run <Image_name> other.py`, will execute "python other.py"

### Or
# ENTRYPOINT ["bash", "-c"]
# CMD ["python", "--version"] # Executable format

How to set the Dockerfile for the future use on running your own program?

We should set something for CMD and ENTRYPOINT. The following table provides their differences. In general, the arguments in CMD can be override by the provided <args> when running docker run <Image_name> <args>, but those in ENTRYPOINT CANNOT be override by <args>.

Feature	`CMD`	`ENTRYPOINT`
Purpose	Sets default commands and parameters for running the container.	Configures a container to run as an executable with a specified command.
Overriding	The command specified in `CMD` can be easily overridden by arguments provided to `docker run`.	The command defined in `ENTRYPOINT` is not easily overridden, but it can be extended with additional arguments.
Forms	Can be specified in shell form or exec form.	Typically specified in exec form (recommended), but can also be in shell form.
Default Behavior	If you specify both `CMD` and `ENTRYPOINT`, `CMD` provides default arguments to `ENTRYPOINT`.	The command defined will always run when the container starts.
Use Case	Useful for providing default commands or options for applications where you want to allow flexibility in overriding them.	Useful for applications that need to run a specific command or script, ensuring that the container behaves consistently.
Example	`CMD ["python", "app.py"]` sets default arguments.	`ENTRYPOINT ["python"]` ensures that the container always runs Python, even if a different command is specified.

2-3 Delete Image

sudo docker rmi [Img_Name][:[Tag]]

2-4 Run a Container based on the Image

Directly create a Container Instance from Image (pull if not exist & run):

# (first search `nvidia/cuda:12.6.3-cudnn-runtime-ubuntu24.04` at local; if not exists, pull from remote)
sudo docker run -it --gpus=all  --name=ubuntu2404-dl --env NVIDIA_DISABLE_REQUIRE=1  nvidia/cuda:12.6.3-cudnn-runtime-ubuntu24.04  bash

A detailed description for options in docker run

Option	Description	Example
`-it`	Runs the container interactively, attaching a terminal (`-i` for interactive, `-t` for TTY).	`docker run -it ubuntu /bin/bash` — Runs an interactive Ubuntu container with a bash shell.
`-d`	Runs the container in detached mode (in the background).	`docker run -d nginx` — Runs the NGINX web server in the background.
`--name`	Assigns a name to the container.	`docker run --name my_container ubuntu` — Names the container “my_container”.
`-p`	Publishes ports from the container to the host. Maps container ports to host ports (`host_port:container_port`).	`docker run -p 8080:80 nginx` — Maps port 80 inside the container to port 8080 on the host.
`-v`	Mounts a volume or directory from the host to the container (`<host_path>:<container_path>`).	`docker run -v $(pwd):/app ubuntu` — Mounts the current directory to `/app` in the container.
`-e`	Sets environment variables inside the container.	`docker run -e MYSQL_ROOT_PASSWORD=my_password mysql` — Sets the MySQL root password as an environment variable.
`--rm`	Automatically removes the container when it exits.	`docker run --rm ubuntu` — Automatically removes the container after it stops.
`--network`	Specifies the network mode for the container (e.g., `bridge`, `host`, `none`).	`docker run --network host nginx` — Uses the host’s network stack.
`--restart`	Configures the restart policy (e.g., `no`, `on-failure`, `always`).	`docker run --restart always nginx` — Ensures the container restarts automatically if it stops.
`-w`	Sets the working directory inside the container.	`docker run -w /app node` — Sets the working directory inside the container to `/app`.
`--link`	Links two containers so that they can communicate.	`docker run --link db_container:db app_container` — Links `db_container` to `app_container` and sets an alias `db` for the linked container.
`--cpus`	Limits the number of CPUs available to the container.	`docker run --cpus=2 ubuntu` — Limits the container to use 2 CPUs.
`--memory`	Limits the memory available to the container (e.g., `512m`, `1g`).	`docker run --memory=512m ubuntu` — Limits the container to 512 MB of memory.
`--privileged`	Grants extended privileges to the container (useful for hardware access or running Docker-in-Docker).	`docker run --privileged ubuntu` — Runs the container with full privileges.
`-u`	Runs the container as a specific user.	`docker run -u 1001 ubuntu` — Runs the container with user ID 1001.
`--env-file`	Loads environment variables from a file.	`docker run --env-file ./env.list ubuntu` — Loads environment variables from the `env.list` file.
`--device`	Adds a host device to the container (e.g., hardware devices).	`docker run --device /dev/sda:/dev/xvda ubuntu` — Adds a device from the host to the container.
`--entrypoint`	Overwrites the default entrypoint of the image with a custom command.	`docker run --entrypoint /bin/bash ubuntu` — Runs `/bin/bash` as the entrypoint instead of the default.
`--log-driver`	Specifies the log driver for the container (e.g., `json-file`, `syslog`, `none`).	`docker run --log-driver syslog ubuntu` — Uses syslog as the log driver.
`--cap-add`	Adds Linux capabilities to the container (e.g., `NET_ADMIN`, `SYS_TIME`).	`docker run --cap-add=NET_ADMIN ubuntu` — Grants the container the `NET_ADMIN` capability for network administration tasks.
`--gpus`	Allocates GPUs to the container (useful for machine learning or GPU-intensive tasks).	`docker run --gpus all nvidia/cuda:latest` — Allocates all available GPUs to the container.
`--detach-keys`	Specifies a key sequence to detach from the container.	`docker run --detach-keys="ctrl-x" ubuntu` — Uses `Ctrl+X` to detach from the container.

Or start an existing container, and enter this container with bash:

# start on backend
sudo docker start <container_id>

# start with interactive mode
sudo docker exec -it <container_id> bash

2-5 Delete Container Instance

sudo docker rm <container_id>

2-6 Modify, Commit and Push your own Image

# run a Container with interactive mode
sudo docker run -it --gpus=all  --name=<container_name> --env NVIDIA_DISABLE_REQUIRE=1  [Img_Name][:[Tag]]  bash

##### container since here: #####
# install something for your environment
pip install xxx

sudo docker commit  -m="[Msg]" -a="[Author_Info]" <container_id> [New_Img_Name][:[New_Tag]]

This will create a new Image named New_Img_Name:New_Tag

You can create another tag of this Image with

sudo docker tag New_Img_Name:New_Tag New_Img_Name:Another_Tag

You can push it to your Docker Hub with

sudo docker push lijiaqiisai/ubuntu24.04-cuda:latest
sudo docker push lijiaqiisai/ubuntu24.04-cuda:cuda12.6.3

Remark: differences between docker commit and docker tag

Feature	`docker commit`	`docker tag`
Purpose	Create a new image from a container	Add a new tag to an existing image
Operation	Saves changes made in a container	Renames or versions an existing image
Creates a layer	Yes	No
Syntax	`docker commit <CONTAINER> <IMAGE>`	`docker tag <SOURCE_IMAGE> <TARGET_IMAGE>`

2-7 Execute your program with Docker Container

2-7-1 Create a new Container for your program:

Manually

# adding '--rm' will remove it after closing the container
sudo docker run -it --rm --gpus=all --env NVIDIA_DISABLE_REQUIRE=1 -v /[Local_Project_Path]:/workspace/project  [Img_Name][:[Tag]]  zsh

cd ${PROJECT_DIR}
python main.py

Automatically on Container Start

(Option 1): Configure CMD in the Dockerfile

# add this line to the Dockerfile
CMD ["python", "/workspace/project/train.py"]

then run

sudo docker run -it --rm --gpus=all --env NVIDIA_DISABLE_REQUIRE=1 -v /[Local_Project_Path]:/workspace/project  [Img_Name][:[Tag]]

(Option 2): Pass Script as a Command

sudo docker run -it --rm --gpus=all --env NVIDIA_DISABLE_REQUIRE=1 -v /[Local_Project_Path]:/workspace/project  [Img_Name][:[Tag]] cd ${PROJECT_DIR} && python main.py

2-7-2 Launch an existing Container for your program:

docker exec will launch an existing Container rather than creating a new one

sudo docker exec -it --gpus=all --env NVIDIA_DISABLE_REQUIRE=1 -v /[Local_Project_Path]:/workspace/project  <Container_id_or_name> <your_program>

2-7-3 Running Code via JupyterLab

sudo docker run -it -p 8888:8888 --rm --gpus=all --env NVIDIA_DISABLE_REQUIRE=1 -v /[Local_Project_Path]:/workspace/project  [Img_Name][:[Tag]]

then you can access the Jupyter notebook via http://localhost:8888

2-6-4 Run a container on clusters with Slurm:

# Load any required modules (e.g., CUDA, Docker)
module load docker  # if needed to load Docker via module system

PROJECT_DIR="[Local_Project_Path]"
DOCKER_IMAGE="[Img_Name][:[Tag]]"

# Run Docker with GPU support using --gpus flag
srun docker run --gpus all \
     -v ${PROJECT_DIR}:/workspace/project \
     ${DOCKER_IMAGE} \
     python /workspace/project/train.py

3 Apptainer

The concept in Apptainer (previously-known as Singularity) is less sophisticated. It only has the concept of Container, that’s how we call each virtual instance that we run.

3-1 Installation

References: install apptainer

Install unprivileged from pre-built binaries

curl -s https://raw.githubusercontent.com/apptainer/apptainer/main/tools/install-unprivileged.sh | \
    bash -s - install-dir

Install Debian packages

Pre-built Debian packages are only available on GitHub and only for the amd64 architecture.

For the non-setuid installation use these commands:

sudo apt update
sudo apt install -y wget
cd /tmp
wget https://github.com/apptainer/apptainer/releases/download/v1.4.0/apptainer_1.4.0_amd64.deb
sudo apt install -y ./apptainer_1.4.0_amd64.deb

For the setuid installation do above commands first and then these:

$ wget https://github.com/apptainer/apptainer/releases/download/v1.4.0/apptainer-suid_1.4.0_amd64.deb
$ sudo dpkg -i ./apptainer-suid_1.4.0_amd64.deb

Install Ubuntu packages

# First, on Ubuntu based containers install software-properties-common package to obtain add-apt-repository command.
# On Ubuntu Desktop/Server derived systems skip this step.
sudo apt update
sudo apt install -y software-properties-common

# For the non-setuid installation use these commands:
sudo add-apt-repository -y ppa:apptainer/ppa
sudo apt update
sudo apt install -y apptainer

# For the setuid installation do above commands first and then these:
sudo add-apt-repository -y ppa:apptainer/ppa
sudo apt update
sudo apt install -y apptainer-suid

Local Apptainer Container Image can be built with two modes:

generate a read-only *.sif file
create a mutable (read & write) sandbox directory with --sandbox option that can be modified later

3-2 Create Apptainer Image into `.sif` (read-only)

Build a *.sif Image with apptainer build command:

Options for apptainer build:

--nv: inject host Nvidia libraries during build for post and test sections;
--nvccli: use nvidia-container-cli for GPU setup (experimental)
--band src[:dest[:opts]] or -B src[:dest[:opts]]: a user-bind path specification. spec has the format src[:dest[:opts]],where src and dest are outside (on host machine) and inside (on container) paths. If dest is not given,it is set equal to src. Mount options (opts) may be specified as ro(read-only) or rw (read/write, which is the default). Multiple bind paths can be given by a comma separated list.
-f, --fakeroot: build with the appearance of running as root (default when building from a definition file unprivileged)
--sandbox: will be introduced in the next section

The Apptainer Image file *.sif is read-only and portable.

3-2-1 Create Image from Docker Hub

apptainer build dl-env.sif docker://lijiaqiisai/ubuntu24.04-cuda:cuda12.6.3

3-2-2 Create Image from other Library API Registries

apptainer build/pull dl-env.sif [Library]://[Image_name]

3-2-3 Build from Apptainer definition file (analogue to `Dockerfile`)

References: https://apptainer.org/docs/user/latest/definition_files.html

Headers in Apptainer definition file

From a remote Registry (e.g., Docker Hub)

# from Docker Hub
Bootstrap: docker
From: lijiaqiisai/ubuntu24.04-cuda:cuda12.6.3

or from a local Image

Bootstrap: localimage
From: <Old_Image>.sif
Fingerprints: 12045C8C0B1004D058DE4BEDA20C27EE7FF7BA84,22045C8C0B1004D058DE4BEDA20C27EE7FF7BA84

Sections in Apptainer definition file

We explain the sections by the order of execution:

flowchart LR
  A[header] --> B[%arguments];
  B --> C[%setup];
  C --> D[%file];
  D --> E[%post];
  E --> F[%test];
  F --> G[%environment]
  G --> H[%startscript]
  G --> I[%runscript]

  G --> J[%labels]
  G --> K[%help]

$\downarrow$ %arguments: define custom arguments or flags that can be passed when building. The variables defined in %arguments can be accessed in %setup, %post

$\downarrow$ %setup: some commands that will be firstly executed on the host system outside of the container after the base OS has been installed. The container file system will be referred as environment variable $APPTAINER_ROOTFS in this section

Warning:

Should be careful with the commands within %setup section since the operations are done on host system.

$\downarrow$ %files: allows to copy files into the container with greater safety than using the %setup section

%files [from <stage>]
    <source> [<destination>]
    ...

$\downarrow$ %post: download pacakges, install softwares and libraries, write configuration files, create new directories, etc;

    apt-get update && apt-get install -y netcat
    NOW=`date`
    echo "export NOW=\"${NOW}\"" >> $APPTAINER_ENVIRONMENT

Please note that the above commands also set an environmental variable NOW at build time. The value of this variable cannot be anticipated, and therefore cannot be set during the %environment section. For situations like this, the $APPTAINER_ENVIRONMENT variable is provided. Redirecting text to this variable will cause it to be written to a file called /.singularity.d/env/91-environment.sh that will be sourced at runtime.
**Priority**
: environmental variables set in the %post section through $APPTAINER_ENVIRONMENT > those added via %environment.

$\downarrow$ %test: runs at the very end of the build process to validate the container using a method of your choice.

can be excuted with apptainer test *.sif;
build with --notest option to build a container without running the %test section, like sudo apptainer build --notest my_container.sif my_container.def;

$\downarrow$ %environment: allows you to define environment variables that will be set at runtime.

Note: variables in the %environment section are not made available at build time. This means that if you need the same variables during the build process, you should also define them in your %post section.
during build: The %environment section is written to a file in the container metadata directory. This file is not sourced.
during runtime: The file in the container metadata directory is sourced.

$\downarrow$ %startscript: the contents of the %startscript will be executed when apptainer instance start xxx. Only once when starting the instance.

$\downarrow$ %runscript: The contents of the %runscript will be executed when apptainer run xxx. Every time the container is run

$*: a single string that ehe options passed to the container at runtime
$@: a quoted array that the options are passed to echo

$\downarrow$ %labels: is used to add metadata to the file /.singularity.d/labels.json within your container.

To inspect the labels, using apptainer inspect <Image>.sif

$\downarrow$ %help: any text in this section is transcribed into a metadata file in the container during the build.

TIP

The lifecycle of environmental variables defined in different sections

Place of environmental variables definition:	Available build-time	Available runtime	Notes
In `%arguments`	✓	✘	ONLY for building process in `%setup` or `%post` sections
In `%post`: set by `export xxx`	✓	✘
In `%post`: set by `$APPTAINER_ENVIRONMENT`	✓	✓	Will save to `/.singularity.d/env/91-environment.sh` and will be sourced during runtime
In `%environment`	✘	✓	lower priority compared to setting through `$APPTAINER_ENVIRONMENT` in `%post`

Two examples

Base example:

The following *.def file will pull from nvidia/cuda, and just install latest Python.

# apptainer-base.def

# Header
Bootstrap: docker # from Docker Hub
From: nvidia/cuda:{ { CUDA_VERSION } }-cudnn-runtime-ubuntu{ { OS_VERSION } }
Stage: build

%arguments

%setup
    mkdir -p /workspace

%files
    requirements.txt /workspace/requirements.txt

%post
    # Set timezone and install necessary software
    export TZ=America/Toronto
    export DEBIAN_FRONTEND=noninteractive
    apt-get update && apt-get upgrade -y --no-install-recommends
    apt-get install -y --no-install-recommends tzdata python3 python3-pip python3-virtualenv
    ln -fs /usr/share/zoneinfo/${TZ} /etc/localtime
    dpkg-reconfigure -f noninteractive tzdata
    ln -s /usr/bin/python3 /usr/bin/python

    # Set up virtual environment and install from requirements.txt
    export VENV_DIR=/workspace/venv
    python -m virtualenv $VENV_DIR
    $VENV_DIR/bin/pip install --no-cache-dir -r /workspace/requirements.txt

    # Hold variables after building process
    NOW=$(date)
    echo "export NOW=\"${NOW}\"" >> $APPTAINER_ENVIRONMENT
    echo "export VENV_DIR=\"${VENV_DIR}\"" >> $APPTAINER_ENVIRONMENT
    echo "export CUDA_VERSION=\"${CUDA_VERSION}\"" >> $APPTAINER_ENVIRONMENT
    echo "export OS_VERSION=\"${OS_VERSION}\"" >> $APPTAINER_ENVIRONMENT

    # Clean up to reduce size
    apt-get clean && rm -rf /var/lib/apt/lists/*

%test
    # test python
    if [ "$($VENV_DIR/bin/python --version)" ]; then
        echo "Python virtual environment is correctly set up."
    else
        echo "Python virtual environment not found."
        exit 1
    fi

%environment
    export LC_ALL=C
    export PATH=/usr/local/cuda/bin:$PATH
    export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
    export PATH=${VENV_DIR}/bin:$PATH

# will be executed when executing 'apptainer instance start xxx.sif`
%startscript
    echo "Container was created $NOW"
    echo "Ubuntu version: $OS_VERSION"
    echo "CUDA version: $CUDA_VERSION"
    echo "Virtual Env path: $VENV_DIR"

# will be executed when executing 'apptainer run xxx.sif`
%runscript
    echo "Container was created $NOW"
    echo "Ubuntu version: $OS_VERSION"
    echo "CUDA version: $CUDA_VERSION"
    echo "Virtual Env path: $VENV_DIR"
    echo "Arguments received: $*"

    ### Execute the following args with the default program "python"
    ### --> `apptainer run <Image>.sif main.py`
    echo "Executing '$VENV_DIR/bin/python $*'"
    exec $VENV_DIR/bin/python $@

%labels
    Author lijiaqi

%help
    This is a .def file to create a Python environment based on the `nvidia/cuda` docker image.

then build with

sudo apptainer build --nv --build-arg CUDA_VERSION=12.6.3 --build-arg OS_VERSION=24.04 apptainer-base.sif apptainer-base.def

Detailed example:

a .def file for building from my customized Image ubuntu24.04-cuda Docker Hub, the modification is installing torch into it.

# apptainer-detailed.def

# Header
Bootstrap: docker # from Docker Hub
From: lijiaqiisai/ubuntu24.04-cuda:cuda{ { CUDA_VERSION } }
Stage: build

%arguments

%setup

# copy from Host path to Container path
%files
    requirements.txt

%post
    NOW=`date`
    echo "export NOW=\"${NOW}\"" >> $APPTAINER_ENVIRONMENT
    $VENV_DIR/bin/pip install -r requirements.txt

%test
    # test python
    if [ "$($VENV_DIR/bin/python --version)" ]; then
        echo "Python virtual environment is correctly set up."
    else
        echo "Python virtual environment not found."
        exit 1
    fi

%environment
    export LC_ALL=C
    export PATH=/usr/local/cuda/bin:$PATH
    export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
    export PATH=${VENV_DIR}/bin:$PATH

%startscript
    echo "Container was created $NOW"
    echo "Ubuntu version: $OS_VERSION"
    echo "CUDA version: $CUDA_VERSION"
    echo "Virtual Env path: $VENV_DIR"

%runscript
    echo "Container was created $NOW"
    echo "Ubuntu version: $OS_VERSION"
    echo "CUDA version: $CUDA_VERSION"
    echo "Virtual Env path: $VENV_DIR"
    echo "Arguments received: $*"

    ### Execute the following args with the default program "python"
    ### --> `apptainer run <Image>.sif main.py`
    echo "Executing '$VENV_DIR/bin/python $*'"
    exec $VENV_DIR/bin/python $@

%labels
    Author lijiaqi

%help
    This is a .def file to create a python environment based on `lijiaqiisai/ubuntu24.04-cuda` docker image.

then build with

sudo apptainer build --nv --build-arg CUDA_VERSION=12.6.3  apptainer-advance.sif apptainer-advance.def

It will follows the requirements.txt to build an Image file

3-2-4 Build and Pack from an existing `sandbox` directory

sudo apptainer build <Image_name>.sif <SANDBOX_FOLDER>

What is the sandbox directory <SANDBOX_FOLDER> in the above example? Let’s move to next step.

3-3 Create Apptainer Image sandbox directory (then `*.sif`) with `--sandbox` option

By using the option --sandbox when using apptainer build --sandbox <SANDBOX_FOLDER> <Image_name>.sif/<URL>, we can create a mutable sandbox directory that we can customize to build our own virtual environment:

WARNING

It’s possible to create a sandbox without root privileges, but to ensure proper file permissions it is recommended to do so as root.

# create a sandbox directory
## from an existing *.sif
sudo apptainer build --nv --sandbox <SANDBOX_FOLDER>/ <Local>.sif
## from a remote docker hub
sudo apptainer build --nv --sandbox <SANDBOX_FOLDER>/ docker://<Image>:[Tag]
### e.g.
sudo apptainer build --nv --sandbox <SANDBOX_FOLDER>/ docker://lijiaqiisai/ubuntu24.04-cuda:latest

Enter the sandbox container:

sudo apptainer shell --writable <SANDBOX_FOLDER>/

WARNING

Please note that with --writable mode, the nv files may not be bound.

Install softwares:

# (optional) Install Python
apt-get update && apt-get -y upgrade
apt-get -y install python3 python3-pip python3-virtualenv

# (optional) fix the issue from "nvidia/cuda" docker hub
# cd /etc/apt && cp trusted.gpg trusted.gpg.d && cd -

# (optional) Install torch (latest version)
pip install torch torchvision

Tip

Or if you want to install Python packages from a requirements.txt, please remember to bind its path to the Container
sudo apptainer shell --writable -B <<host_path>>/requirements.txt:<<container_path>>/requirements.txt <SANDBOX_FOLDER>/
pip install -r <<container_path>>/requirements.txt

After install packages, you may want to pack the Sandbox Directory into a *.sif Image file

# exit the container
exit

# package into .sif file
sudo apptainer build --nv  <SANDBOX_ENV>.sif <SANDBOX_FOLDER>/

# (optional) clean the directories
rm -rf <SANDBOX_FOLDER>/

Discovery

I found that the Apptainer Image file *.sif seems smaller than the Docker Image when containing same packages. Not sure. Need to be verifyed.

3-4 Run a Container

After build a portable *.sif Image file, you can use it for production at other place. There are several ways to run an instance based on your Apptainer Image file:

3-4-1 Command: apptainer run

executes the default runscripts defined in the container (e.g., see %runscript in apptainer.def)

# `apptainer run`if `%runscript` has been defined for this `*.sif` Image file (e.g., in the `apptainer.def` file)
apptainer run --nv dl-env.sif

3-4-2 Command: apptainer exec

followed by a specified command or program to execute within the container

# `apptainer exec` followed by your program
apptainer exec --nv dl-env.sif <your_program>
## e.g., 
apptainer exec --nv dl-env.sif python main.py

# can also run a specific shell; in this case, equivalent to `apptainer shell`
apptainer exec --nv dl-env.sif /bin/bash

Tip

Here is the comparisons between apptainer run and apptainer exec

Feature apptainer run apptainer exec

Purpose Executes the default runscripts defined in the container Executes a specified command or program within the container

Runscripts Requires that the container has a defined runscripts; will fail if none exists Does not rely on runscripts; you specify the command to run

Usage Scenario Ideal for running pre-configured applications or workflows Useful for debugging, testing, or running specific commands

Command Syntax apptainer run <container.sif> apptainer exec <container.sif> <command>

Additional Options --nv for GPU support --nv for GPU support, --bind/-B for binding host directories

Example apptainer run example.sif apptainer exec example.sif python script.py

Interactivity Does not support interactive shell unless the runscripts allow it Can launch an interactive shell (e.g., apptainer exec example.sif /bin/bash)

Flexibility Less flexible; limited to the runscripts defined in the image More flexible; allows execution of any command or script

Error Handling Fails if no runscripts are defined Fails if the specified command is not found or fails to execute

Feature	`apptainer run`	`apptainer exec`
Purpose	Executes the default runscripts defined in the container	Executes a specified command or program within the container
Runscripts	Requires that the container has a defined runscripts; will fail if none exists	Does not rely on runscripts; you specify the command to run
Usage Scenario	Ideal for running pre-configured applications or workflows	Useful for debugging, testing, or running specific commands
Command Syntax	`apptainer run <container.sif>`	`apptainer exec <container.sif> <command>`
Additional Options	`--nv` for GPU support	`--nv` for GPU support, `--bind/-B` for binding host directories
Example	`apptainer run example.sif`	`apptainer exec example.sif python script.py`
Interactivity	Does not support interactive shell unless the runscripts allow it	Can launch an interactive shell (e.g., `apptainer exec example.sif /bin/bash`)
Flexibility	Less flexible; limited to the runscripts defined in the image	More flexible; allows execution of any command or script
Error Handling	Fails if no runscripts are defined	Fails if the specified command is not found or fails to execute

3-4-3 Command: apptainer shell

run the container instance under the interactive mode.

3-4-4 Command: apptainer instance

apptainer instance [options] <command> allows users to create, start, stop, and remove instances of containers, providing a way to run multiple instances of the same container image concurrently.

Some options:

start: Start a new container instance (running in the backend)

apptainer instance start <Image_name>.sif <instance_name>

stop: Stop a running container instance.

# stop a specific instance
apptainer instance stop <instance_name>

list: List all currently running instances.

apptainer instance list

3-5 Some notes for runing Apptainer on Compute Canada

References:

This part provides some best practices for using Apptainer on Compute Canada.

apptainer exec --nv dl-env.sif my_script.sh

When use run, shell, instance, exec commands on Compute Canada:

Always use one of -C, -c or -e options:
- -C: hides filesystems, PID, IPC, and environment;
- -c: use minimal \dev, shared-with-host directories will appear empty, e.g., \tmp, unless explicitly bind mounted;
- -e: clean environment before running container;
Always use the -W dir option with dir being a path to a real directory that you have write-access to
- In sbatch scripts, set -W $SLURM_TMPDIR
When using NVIDIA GPUs, use -nv to expose the NVIDIA hardwares to the container.
When access to host directories is needed, bind mount the top-level directories of those filesystem, or, the desired directories themselves.
- useful bind mounts: -B /home -B /project -B /scratch

# an general example
apptainer exec -C --nv -B /home -B /project -B /scratch dl-env.sif -W $SLURM_TMPDIR my_program

# for commands with `srun`, e.g., MIP program
srun apptainer run dl-env.sif /path/to/your/mpi-program

4 Summary

4.1 A quick overview and comparison between Docker and Apptainer

Here is a take-away summary of two platforms:

Feature	Docker	Apptainer (previously-known as Singularity)
Image	✓	✘
Container	✓	✓ (what we will run is a Container)
Build Image by pull	`docker pull --name=[Local_Image:Tag] <Image>:[Tag]`	SIF Image: `apptainer pull/build <Image>.sif docker://<Image>:[Tag]` or Sandbox: `apptainer build --sandbox <SANDBOX_DIR> /docker://<Image>:[Tag]`
Build Image by from existing Image	N/A??? (need several steps, i.e., `docker run` to creat a Container from an Image –> make changes –> commit to a new Image with `docker commit <Container_id> <New_Image>:<Tag>`)	SIF Image: `apptainer build <New>.sif <Old>.sif` (meanless, just like copy) or Sandbox: `apptainer build --sandbox <SANDBOX_DIR> <Old>.sif`
Build Image by a definition file	`docker build -t/--tag [Img_Name][:[Tag]] .` or `docker build -t/--tag [Img_Name][:[Tag]] -f/--file ./Dockerfile`	SIF Image: `apptainer build .sif apptainer.def` or Sandbox*: `apptainer build --sandbox <SANDBOX_DIR> apptainer.def`
Start a New Container (instance)	`docker run --name=<Container_name> <Image_name>`	`apptainer instance start *.sif <intance_name>`
Enter the Shell of a running Container	`docker exec -it <container_id_or_name> /bin/bash`	N/A (once the Container instance is created in the background, we cannot enter its interactive mode)
Start a new Container as Interactive mode (combine the above two steps into one)	`docker run -it --name=<container_name> <Image>`	`apptainer shell <Image>.sif`
Start an existing instance	`docker start <instance_id_or_name>` (should already exist)	N/A (since Apptainer instances will NOT be saved)
Start an existing instance with Interactive mode (one-step)	`docker start -it <instance_id_or_name>` (should already exist)	N/A (since Apptainer instances will NOT be saved)
Stop a running instance	`docker stop <instance_id_or_name>`	`apptainer instance stop <instance>`
Pack into a new Image after modification	`docker commit -m="xxx" -a="xx" <Container_id> <Image>:<Tag>`	`apptainer build --nv <New>.sif <SANDBOX_DIR>`
Check running Container instances	`docker ps` (or `docker ps -a` for all including stopped ones)	`apptainer instance list`
Check existing Images	`docker images`	N/A (just need to check `*.sif` files)
Run an application program	default `CMD` command with `docker run --rm <Image>` or speific program with `docker run --rm <Image> <your_program>`	default `%runscript` with `apptainer run .sif` or specific program* with `apptainer exec *.sif <your_program>`
Mount/Bind a directory (useful for running programs)	`-v <host_path>:<container_path>`	`--band/-B host_path[:container_path[:options]]`
Use Nvidia GPUs in Container (useful for running programs)	`--gpus=xxx`	`--nv`
set workspace folder (useful for running programs)	`-w=<dir_on_container>`	N/A (no such a concept)

4-2 Core Commands

It is noteworthy that the names of the commands between Docker and Apptainer have some overlap but some of them have different functions. Here is a summary of some common commands within each platform:

Docker

Build
- docker pull: pull an existing Image from remote registry (e.g., Docker Hub)
- docker build: build an Image from a Dockerfile
Run
- docker run: create a Container from an Image. Will execute the following command according to CMD and ENTRYPOINT in the Dockerfile;
  - docker run -it: interactive mode. Analogue to apptainer instance start <Image>.sif <container_name>
  - docker run -d: detach mode (background). Analogue to apptainer shell <Image>.sif
- docker exec: Executes a command in an existing running container. Will not create a new container
  - docker exec <container_id_or_name> <your_program>
- docker start: start an existing Container (it may be closed on the background)
- docker stop: stop a running Container (if it is not running with the interactive mode and you cannot close it with exit)

Apptainer

Build
- apptainer pull: pull an existing Image from remote registry like docker://xxx (Docker Hub)
- apptainer build: build (1) .sif Image file; or (2) Sandbox directories with --sandbox from (1) remote registry (e.g., Docker Hub) or (2) an existing local *.sif file; (1)-(1) will align with the behaviour of apptainer pull [Local].sif docker://xxx
Run
- apptainer run: only for executing the default command defined in %runscript in apptainer.def
- apptainer exec: launch a Container instance to execut a program
- apptainer shell: build an Container instance with interactive mode
- apptainer instance: oprations w.r.t. the Container instances
  - apptainer instance start: start a Container instance
  - apptainer instance stop: stop a Container instance
  - apptainer instance list: list all running Container instances