Container for Deep Learning Environment

containerizing and deploying deep learning environments (e.g., Docker, Apptainer)

Preface

Some time ago, I was tortured by some cumbersome processes about configuring the deep learning environment on Compute Canada. This process involved numerous compatibility issues, conflicting dependencies, which made it time-consuming and frustrating. Recently, I spent some time in learning Apptainer, the Container recommended on Compute Canada, and wrote this blog to note the key points down.

Introduction

The technologies of Container, including platforms like Docker and Apptainer, provide a method to package, distribute, and run applications in a standardized environment, ensuring consistency across different computing environments. (from ChatGPT)

  • Definition: Containers are lightweight, portable units that package an application and its dependencies, libraries, and configuration files, allowing the application to run reliably in different computing environments.
  • Isolation: They provide a level of isolation between applications, meaning multiple containers can run on the same host without interfering with each other.
  • Efficiency: Containers share the host operating system’s kernel, making them more efficient than traditional virtual machines, which require a full OS stack.

Concepts Comparisons between Docker and Apptainer

Concepts in Docker

Docker is one of the most popular containerization platforms that simplifies the process of creating, deploying, and managing containers.

Key Features:

  • Docker Images: Read-only templates used to create containers. An image includes the application code, libraries, and dependencies.
  • Docker Hub: A cloud-based registry where developers can share and distribute Docker images.
  • Docker Compose: A tool for defining and managing multi-container applications with a single YAML configuration file.
  • Docker Swarm: Native clustering and orchestration solution for managing a group of Docker engines as a single virtual host.

Differences between Docker Image and Docker Container

Key differeces:

Aspect Docker Image Docker Container
Definition A read-only template for creating containers A running instance of a Docker image
State Immutable Mutable during runtime
Function Blueprint or recipe Execution of the image as a running environment
Storage Stored in Docker registries Lives in memory when running
Purpose To provide the environment setup for containers To run applications in an isolated environment
Lifecycle Static and reusable Dynamic and has a start/stop/remove lifecycle

Concepts in Apptainer

Apptainer is specifically designed for high-performance computing (HPC) and scientific workloads, allowing users to create and run containers in environments where Docker might not be suitable.

Key Features:

  • User-Focused: Unlike Docker, which requires root privileges to manage containers, Apptainer allows users to create and run containers without needing elevated permissions, making it safer for multi-user environments.
  • Integration with HPC: Apptainer is optimized for HPC environments, enabling users to leverage existing tools and resources.
  • Image Formats: Supports different image formats, allowing for flexibility in how images are created and used.
  • Simplicity: Focuses on simplicity and ease of use, making it suitable for researchers and scientists who may not have extensive experience with containerization.

Comparisons between Docker and Apptainer

There are some differences between the concepts in Docker and Apptainer.

Feature Docker Apptainer (Singularity)
Image composed of multiple layers (i.e., represent incremental changes for building) with Union File System; a single, read-only file *.sif;
Container Mutable, changes can persist in the container; immutable by default, easy reproducibility
Target Audience Developers, DevOps, microservices Researchers, HPC environments
Security Requires root privileges, runs as root User-level execution, no root required
Image Format Layered images (multi-step build) Single-file .sif images for portability
Container Creation Built with Dockerfile, mutable containers Can use Docker images, runs immutable containers
Ecosystem Large ecosystem, Docker Hub, Kubernetes Focused on HPC, integrates with cluster schedulers
Performance Small overhead, optimized for microservices Low overhead, optimized for HPC performance
Portability Portable across environments with Docker installed Highly portable, especially in HPC environments
Reproducibility Good for general use, less suited for exact replication Designed for exact scientific reproducibility

Generally speaking, Apptainer is a Secure Alternative to Docker and it is adopted on many scientific computer clustering, like Digital Research Alliance of Canada. That is because Docker images are not secure because they provide a means to gain root access to the system they are running on.

Docker (with Nvidia-Container-Toolkit)

Remark: Since version=19.03, Docker began to support NVIDIA GPU, we don’t need to install nvidia-docker separately. We should use Docker + Nvidia-Container-Toolkit (documentation).

Installation

Install NVIDIA GPU driver & CUDA

# Step 1: Prepare Your System
sudo apt update && sudo apt upgrade -y
sudo apt-get purge nvidia* -y

# Step 2: Install NVIDIA GPU Driver
sudo add-apt-repository ppa:graphics-drivers/ppa -y
sudo apt update
sudo ubuntu-drivers autoinstall
sudo reboot

# Verify the installation
nvidia-smi

# Step 3: Install CUDA Toolkit
# Replace with the appropriate download link for your CUDA version
CUDA_VERSION=12.6.0
wget https://developer.download.nvidia.com/compute/cuda/${CUDA_VERSION}/local_installers/cuda-repo-ubuntu2204-${CUDA_VERSION}-local_${CUDA_VERSION}-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-${CUDA_VERSION}-local_${CUDA_VERSION}-1_amd64.deb
sudo apt-key add /var/cuda-repo-ubuntu2204-${CUDA_VERSION}-local/cuda-archive-keyring.gpg
sudo apt update
sudo apt install cuda -y
sudo reboot

# Step 4: Set Environment Variables
echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc

# Step 5: Verify CUDA Installation
nvcc --version

Install Docker

# Step 1: Update Your System
sudo apt update && sudo apt upgrade -y

# Step 2: Install Required Packages
sudo apt install apt-transport-https ca-certificates curl software-properties-common -y

# Step 3: Add Docker’s Official GPG Key
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

# Step 4: Add Docker’s Official Repository
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"

# Step 5: Update Package Index Again
sudo apt update

# Step 6: Install Docker
sudo apt install docker-ce -y

# Step 7: Start and Enable Docker
sudo systemctl start docker
sudo systemctl enable docker

# Step 8: Verify Docker Installation
docker --version

# Step 9: (Optional) Run Docker as Non-Root User
sudo usermod -aG docker $USER
newgrp docker

# Verify Docker without sudo
docker run hello-world

Install Nvidia-Container-Toolkit

References: install guid for Nvidia-Container-Toolkit

Configure the production repository:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

# [Optional] configure the repository to use experimental packages:
sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list

# Update the packages list from the repository:
sudo apt-get update

# Install the NVIDIA Container Toolkit packages:
sudo apt-get install -y nvidia-container-toolkit

Configure container runtime as Docker

# Configure the container runtime by using the `nvidia-ctk` command:
sudo nvidia-ctk runtime configure --runtime=docker

The nvidia-ctk command modifies the /etc/docker/daemon.json file on the host. The file is updated so that Docker can use the NVIDIA Container Runtime.

Restart the Docker daemon:

sudo systemctl restart docker
[Rootless mode]

To configure the container runtime for Docker running in Rootless mode, follow these steps:

Configure the container runtime by using the nvidia-ctk command:

nvidia-ctk runtime configure --runtime=docker --config=$HOME/.config/docker/daemon.json

# Restart the Rootless Docker daemon:
systemctl --user restart docker

# Configure ```/etc/nvidia-container-runtime/config.toml``` by using the `sudo nvidia-ctk` command:
sudo nvidia-ctk config --set nvidia-container-cli.no-cgroups --in-place

Create Images and Containers

Pull Image from remote

  • Pull an Image from remote
sudo docker pull nvidia/cuda:12.6.0-cudnn-runtime-ubuntu24.04

Build with Dockerfile

A template of Dockerfile:

# Dockerfile

# Use the NVIDIA CUDA runtime image
FROM nvidia/cuda:12.6.0-cudnn-runtime-ubuntu24.04

# Set the working directory inside the container
WORKDIR /workspace
ENV PROJECT_DIR=${WORKDIR}/project # remember to `docker run -v <host_path>:<container_path>`
ENV VENV_DIR=${WORKDIR}/venv

# update and upgrade
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential \
    wget \
    curl \
    git \
    vim \
    ca-certificates \
    python3 \
    python3-pip \
    python3-dev \
    python3-virtualenv \
    zsh \
    && rm -rf /var/lib/apt/lists/*

# Configure zsh
RUN sh -c "$(wget https://raw.githubusercontent.com/ohmyzsh/ohmyzsh/master/tools/install.sh -O -)" --unattended
RUN chsh -s $(which zsh)
RUN git clone https://github.com/caiogondim/bullet-train.zsh.git ~/.oh-my-zsh/themes/bullet-train.zsh
RUN cp ~/.oh-my-zsh/themes/bullet-train.zsh/bullet-train.zsh-theme ~/.oh-my-zsh/themes
RUN sed -i 's/robbyrussell/bullet-train/g' ~/.zshrc
RUN sed -i '$a\# use command-not-found package\n[[ -a "/etc/zsh_command_not_found" ]] && \. /etc/zsh_command_not_found\n' ~/.zshrc
RUN zsh

# create python virtual environment
RUN ln -s /usr/bin/python3 /usr/bin/python
RUN python -m virtualenv $VENV_DIR
## rather than "source xxx/activate", activate venv by setting $PATH
ENV PATH="$VENV_DIR/bin:$PATH"
RUN pip install --upgrade pip
RUN pip install -r ${PROJECT_DIR}/requirements.txt
RUN pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126

# Set environment variables for CUDA
ENV CUDA_HOME=/usr/local/cuda
ENV PATH=$CUDA_HOME/bin:$PATH
ENV LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH


# default parameters that can be override by the Command Line Interface (CLI)
CMD ["python", "--version"] # Executable format
CMD python --version # Shell format
# default parameters that CANNOT be override by CLI
ENTRYPOINT ["python"] 
CMD ["main.py"]
### Will always run `python`
### if `docker run <Image_name>`, will execute `python main.py`
### if `docker run <Image_name> other.py`, will execute "pythons other.py"

Build the image from Dockerfile:

sudo docker build -t [Img_Name][:[Tag]] -f ./Dockerfile

How to set the Dockerfile for the future use on running your own program?

We should set something for CMD and ENTRYPOINT. The following table provides their differences. In general, the arguments in CMD can be override by the provided <args> when running docker run <Image_name> <args>, but those in ENTRYPOINT CANNOT be override by <args>.

Feature CMD ENTRYPOINT
Purpose Sets default commands and parameters for running the container. Configures a container to run as an executable with a specified command.
Overriding The command specified in CMD can be easily overridden by arguments provided to docker run. The command defined in ENTRYPOINT is not easily overridden, but it can be extended with additional arguments.
Forms Can be specified in shell form or exec form. Typically specified in exec form (recommended), but can also be in shell form.
Default Behavior If you specify both CMD and ENTRYPOINT, CMD provides default arguments to ENTRYPOINT. The command defined will always run when the container starts.
Use Case Useful for providing default commands or options for applications where you want to allow flexibility in overriding them. Useful for applications that need to run a specific command or script, ensuring that the container behaves consistently.
Example CMD ["python", "app.py"] sets default arguments. ENTRYPOINT ["python"] ensures that the container always runs Python, even if a different command is specified.

Delelte Image or Container

# for Containers
sudo docker rm <container_id>

# for Images
sudo docker rmi [Img_Name][:[Tag]]

Run a Container based on the Image

  • Directly create a Container from Image (pull if not exist & run):
# (first search `nvidia/cuda:12.6.0-cudnn-runtime-ubuntu24.04` at local; if not exists, pull from remote)
sudo docker run -it --gpus=all  --name=ubuntu2404-dl --env NVIDIA_DISABLE_REQUIRE=1  nvidia/cuda:12.6.0-cudnn-runtime-ubuntu24.04  bash

A detailed description for options in docker run

Option Description Example
-it Runs the container interactively, attaching a terminal (-i for interactive, -t for TTY). docker run -it ubuntu /bin/bash — Runs an interactive Ubuntu container with a bash shell.
-d Runs the container in detached mode (in the background). docker run -d nginx — Runs the NGINX web server in the background.
--name Assigns a name to the container. docker run --name my_container ubuntu — Names the container “my_container”.
-p Publishes ports from the container to the host. Maps container ports to host ports (host_port:container_port). docker run -p 8080:80 nginx — Maps port 80 inside the container to port 8080 on the host.
-v Mounts a volume or directory from the host to the container (<host_path>:<container_path>). docker run -v $(pwd):/app ubuntu — Mounts the current directory to /app in the container.
-e Sets environment variables inside the container. docker run -e MYSQL_ROOT_PASSWORD=my_password mysql — Sets the MySQL root password as an environment variable.
--rm Automatically removes the container when it exits. docker run --rm ubuntu — Automatically removes the container after it stops.
--network Specifies the network mode for the container (e.g., bridge, host, none). docker run --network host nginx — Uses the host’s network stack.
--restart Configures the restart policy (e.g., no, on-failure, always). docker run --restart always nginx — Ensures the container restarts automatically if it stops.
-w Sets the working directory inside the container. docker run -w /app node — Sets the working directory inside the container to /app.
--link Links two containers so that they can communicate. docker run --link db_container:db app_container — Links db_container to app_container and sets an alias db for the linked container.
--cpus Limits the number of CPUs available to the container. docker run --cpus=2 ubuntu — Limits the container to use 2 CPUs.
--memory Limits the memory available to the container (e.g., 512m, 1g). docker run --memory=512m ubuntu — Limits the container to 512 MB of memory.
--privileged Grants extended privileges to the container (useful for hardware access or running Docker-in-Docker). docker run --privileged ubuntu — Runs the container with full privileges.
-u Runs the container as a specific user. docker run -u 1001 ubuntu — Runs the container with user ID 1001.
--env-file Loads environment variables from a file. docker run --env-file ./env.list ubuntu — Loads environment variables from the env.list file.
--device Adds a host device to the container (e.g., hardware devices). docker run --device /dev/sda:/dev/xvda ubuntu — Adds a device from the host to the container.
--entrypoint Overwrites the default entrypoint of the image with a custom command. docker run --entrypoint /bin/bash ubuntu — Runs /bin/bash as the entrypoint instead of the default.
--log-driver Specifies the log driver for the container (e.g., json-file, syslog, none). docker run --log-driver syslog ubuntu — Uses syslog as the log driver.
--cap-add Adds Linux capabilities to the container (e.g., NET_ADMIN, SYS_TIME). docker run --cap-add=NET_ADMIN ubuntu — Grants the container the NET_ADMIN capability for network administration tasks.
--gpus Allocates GPUs to the container (useful for machine learning or GPU-intensive tasks). docker run --gpus all nvidia/cuda:latest — Allocates all available GPUs to the container.
--detach-keys Specifies a key sequence to detach from the container. docker run --detach-keys="ctrl-x" ubuntu — Uses Ctrl+X to detach from the container.
  • Or start an existing container, and enter this container with bash:
# start on backend
sudo docker start <container_id>

# start with interactive mode
sudo docker exec -it <container_id> bash

Modify, Commit and Push your own Image

# run a Container with interactive mode
sudo docker run -it --gpus=all  --name=<container_name> --env NVIDIA_DISABLE_REQUIRE=1  [Img_Name][:[Tag]]  bash

##### container since here: #####
# install something for your environment
pip install xxx

sudo docker commit  -m="[Msg]" -a="[Author_Info]" <container_id> lijiaqiisai/ubuntu24.04-cuda-python3:v0.1

This will create a new Image named lijiaqiisai/dl-env-torch:torch-2.4.1

You can create a tag of this Image with

sudo docker tag lijiaqiisai/ubuntu24.04-cuda-python3:v0.1 lijiaqiisai/ubuntu24.04-cuda-python3:latest

You can push it to your Docker Hub with

sudo docker push lijiaqiisai/ubuntu24.04-cuda-python3:latest
sudo docker push lijiaqiisai/ubuntu24.04-cuda-python3:v0.1

Remark: differences between docker commit and docker tag

Feature docker commit docker tag
Purpose Create a new image from a container Add a new tag to an existing image
Operation Saves changes made in a container Renames or versions an existing image
Creates a layer Yes No
Syntax docker commit <CONTAINER> <IMAGE> docker tag <SOURCE_IMAGE> <TARGET_IMAGE>

Execute your program with Docker Container

Create a new Container for your program:

  • Manually
# adding '--rm' will remove it after closing the container
sudo docker run -it --rm --gpus=all --env NVIDIA_DISABLE_REQUIRE=1 -v /[Local_Project_Path]:/workspace/project  [Img_Name][:[Tag]]  zsh

cd ${PROJECT_DIR}
python main.py
  • Automatically on Container Start

(Option 1): Configure CMD in the Dockerfile

# add this line to the Dockerfile
CMD ["python", "/workspace/project/train.py"]

then run

sudo docker run -it --rm --gpus=all --env NVIDIA_DISABLE_REQUIRE=1 -v /[Local_Project_Path]:/workspace/project  [Img_Name][:[Tag]]

(Option 2): Pass Script as a Command

sudo docker run -it --rm --gpus=all --env NVIDIA_DISABLE_REQUIRE=1 -v /[Local_Project_Path]:/workspace/project  [Img_Name][:[Tag]] cd ${PROJECT_DIR} && python main.py

Launch an existing Container for your program:

docker exec will launch an existing Container rather than creating a new one

sudo docker exec -it --gpus=all --env NVIDIA_DISABLE_REQUIRE=1 -v /[Local_Project_Path]:/workspace/project  <Container_id_or_name> <your_program>

Running Code via JupyterLab

sudo docker run -it -p 8888:8888 --rm --gpus=all --env NVIDIA_DISABLE_REQUIRE=1 -v /[Local_Project_Path]:/workspace/project  [Img_Name][:[Tag]]

then you can access the Jupyter notebook via http://localhost:8888

Run a container on clusters with Slurm:

# Load any required modules (e.g., CUDA, Docker)
module load docker  # if needed to load Docker via module system

PROJECT_DIR="[Local_Project_Path]"
DOCKER_IMAGE="[Img_Name][:[Tag]]"

# Run Docker with GPU support using --gpus flag
srun docker run --gpus all \
     -v ${PROJECT_DIR}:/workspace/project \
     ${DOCKER_IMAGE} \
     python /workspace/project/train.py

Apptainer

The concept in Apptainer (previously-known as Singularity) is less sophisticated. It only has the concept of Container, that’s how we call each virtual instance that we run.

Installation

References: install apptainer

  • Install unprivileged from pre-built binaries
curl -s https://raw.githubusercontent.com/apptainer/apptainer/main/tools/install-unprivileged.sh | \
    bash -s - install-dir
  • Install Debian packages

Pre-built Debian packages are only available on GitHub and only for the amd64 architecture.

For the non-setuid installation use these commands:

sudo apt update
sudo apt install -y wget
cd /tmp
wget https://github.com/apptainer/apptainer/releases/download/v1.3.4/apptainer_1.3.4_amd64.deb
sudo apt install -y ./apptainer_1.3.4_amd64.deb

For the setuid installation do above commands first and then these:

$ wget https://github.com/apptainer/apptainer/releases/download/v1.3.4/apptainer-suid_1.3.4_amd64.deb
$ sudo dpkg -i ./apptainer-suid_1.3.4_amd64.deb
  • Install Ubuntu packages
# First, on Ubuntu based containers install software-properties-common package to obtain add-apt-repository command.
# On Ubuntu Desktop/Server derived systems skip this step.
sudo apt update
sudo apt install -y software-properties-common

# For the non-setuid installation use these commands:
sudo add-apt-repository -y ppa:apptainer/ppa
sudo apt update
sudo apt install -y apptainer

# For the setuid installation do above commands first and then these:
sudo add-apt-repository -y ppa:apptainer/ppa
sudo apt update
sudo apt install -y apptainer-suid

Local Apptainer Container Image can be built with two modes:

  • generate a read-only *.sif file
  • create a mutable (read & write) sandbox directory with --sandbox option that can be modified later

Create Apptainer Image into .sif (read-only)

Build a *.sif Image with apptainer build command:

Options for apptainer build:

  • --nv: inject host Nvidia libraries during build for post and test sections;
  • --nvccli: use nvidia-container-cli for GPU setup (experimental)
  • --band src[:dest[:opts]] or -B src[:dest[:opts]]: a user-bind path specification. spec has the format src[:dest[:opts]],where src and dest are outside (on host machine) and inside (on container) paths. If dest is not given,it is set equal to src. Mount options (opts) may be specified as ro(read-only) or rw (read/write, which is the default). Multiple bind paths can be given by a comma separated list.
  • -f, --fakeroot: build with the appearance of running as root (default when building from a definition file unprivileged)
  • --sandbox: will be introduced in the next section

The Apptainer Image file *.sif is read-only and portable.

Download from Docker Hub

apptainer build dl-env.sif docker://lijiaqiisai/ubuntu24.04-cuda-python3:cuda12.6.0-python3.12.3

Download from other Library API Registries

apptainer build/pull dl-env.sif [Library]://[Image_name]

Build from Apptainer definition file (analogue to Dockerfile)

References: https://apptainer.org/docs/user/1.0/definition_files.html

Headers in Apptainer definition file

From a remote Registry (e.g., Docker Hub)

# from Docker Hub
Bootstrap: docker
From: lijiaqiisai/ubuntu24.04-cuda-python3:cuda12.6.0-py3.12.3

or from a local Image

Bootstrap: localimage
From: <Old_Image>.sif
Fingerprints: 12045C8C0B1004D058DE4BEDA20C27EE7FF7BA84,22045C8C0B1004D058DE4BEDA20C27EE7FF7BA84

Sections in Apptainer definition file

We explain the sections by the order of execution:

flowchart LR
  A[header] --> B[%arguments];
  B --> C[%setup];
  C --> D[%file];
  D --> E[%post];
  E --> F[%test];
  F --> G[%environment]
  G --> H[%startscript]
  G --> I[%runscript]

  G --> J[%labels]
  G --> K[%help]

\(\downarrow\) %arguments: define custom arguments or flags that can be passed when building. The variables defined in %arguments can be accessed in %setup, %post

\(\downarrow\) %setup: some commands that will be firstly executed on the host system outside of the container after the base OS has been installed. The container file system will be referred as environment variable $APPTAINER_ROOTFS in this section

Warning:

Should be careful with the commands within %setup section since the operations are done on host system.

\(\downarrow\) %files: allows to copy files into the container with greater safety than using the %setup section

%files [from <stage>]
    <source> [<destination>]
    ...

\(\downarrow\) %post: download pacakges, install softwares and libraries, write configuration files, create new directories, etc;

    apt-get update && apt-get install -y netcat
    NOW=`date`
    echo "export NOW=\"${NOW}\"" >> $APPTAINER_ENVIRONMENT
  • Please note that the above commands also set an environmental variable NOW at build time. The value of this variable cannot be anticipated, and therefore cannot be set during the %environment section. For situations like this, the $APPTAINER_ENVIRONMENT variable is provided. Redirecting text to this variable will cause it to be written to a file called /.singularity.d/env/91-environment.sh that will be sourced at runtime.
  • **Priority**

    : environmental variables set in the %post section through $APPTAINER_ENVIRONMENT > those added via %environment.

\(\downarrow\) %test: runs at the very end of the build process to validate the container using a method of your choice.

  • can be excuted with apptainer test *.sif;
  • build with --notest option to build a container without running the %test section, like sudo apptainer build --notest my_container.sif my_container.def;

\(\downarrow\) %environment: allows you to define environment variables that will be set at runtime.

  • Note: variables in the %environment section are not made available at build time. This means that if you need the same variables during the build process, you should also define them in your %post section.
  • during build: The %environment section is written to a file in the container metadata directory. This file is not sourced.
  • during runtime: The file in the container metadata directory is sourced.

\(\downarrow\) %startscript: the contents of the %startscript will be executed when apptainer instance start xxx. Only once when starting the instance.

\(\downarrow\) %runscript: The contents of the %runscript will be executed when apptainer run xxx. Every time the container is run

  • $*: a single string that ehe options passed to the container at runtime
  • $@: a quoted array that the options are passed to echo

\(\downarrow\) %labels: is used to add metadata to the file /.singularity.d/labels.json within your container.

  • To inspect the labels, using apptainer inspect <Image>.sif

\(\downarrow\) %help: any text in this section is transcribed into a metadata file in the container during the build.

TIP

The lifecycle of environmental variables defined in different sections

Place of environmental variables definition: Available build-time Available runtime Notes
In %arguments ONLY for building process in %setup or %post sections
In %post: set by export xxx  
In %post: set by $APPTAINER_ENVIRONMENT Will save to /.singularity.d/env/91-environment.sh and will be sourced during runtime
In %environment lower priority compared to setting through $APPTAINER_ENVIRONMENT in %post

Two examples

Base example:

The following *.def file will pull from nvidia/cuda, and just install latest Python.

# apptaner-base.def

# Header
Bootstrap: docker # from Docker Hub
From: nvidia/cuda:-cudnn-runtime-ubuntu
Stage: build

# sections
%arguments
  CUDA_VERSION=12.6.0
  OS_VERSION=24.04

%setup

%file

%post
    # update, upgrade and set timezone
    export TZ=America/Toronto
    export DEBIAN_FRONTEND=noninteractive
    apt-get update && apt-get upgrade -y --no-install-recommends
    apt-get -y install tzdata
    ln -fs /usr/share/zoneinfo/${TZ} /etc/localtime
    dpkg-reconfigure -f noninteractive tzdata
    export DEBIAN_FRONTEND=dialog
    ### Fix the warning "Key is stored in legacy trusted.gpg keyring (/etc/apt/trusted.gpg), see the DEPRECATION section in apt-key(8) for details."
    # cd /etc/apt && cp trusted.gpg trusted.gpg.d && cd -
    # Install softwares
    apt-get install -y --no-install-recommends python3 python3-pip python3-virtualenv
    ln -s /usr/bin/python3 /usr/bin/python
    export VENV_DIR=/workshapce/venv
    python -m virtualenv $VENV_DIR

    ### hold these variables after building process
    NOW=`date`
    echo "export NOW=\"${NOW}\"" >> $APPTAINER_ENVIRONMENT
    echo "export VENV_DIR=\"${VENV_DIR}\"" >> $APPTAINER_ENVIRONMENT
    echo "export CUDA_VERSION=\"${CUDA_VERSION}\"" >> $APPTAINER_ENVIRONMENT
    echo "export OS_VERSION=\"${OS_VERSION}\"" >> $APPTAINER_ENVIRONMENT

%test
    grep -q NAME=\"Ubuntu\" /etc/os-release
    if [ $? -eq 0 ]; then
        echo "Container base is Ubuntu as expected."
    else
        echo "Container base is not Ubuntu."
        exit 1
    fi

%environment
    export LISTEN_PORT=12345
    export LC_ALL=C
    export PATH=/usr/local/cuda/bin:$PATH
    export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
    export PATH=${VENV_DIR}/bin:$PATH

# will be executed when executing 'apptainer instance start xxx.sif`
%startscript
    echo "Container was created $NOW"
    echo "Ubuntu version: $OS_VERSION"
    echo "CUDA version: $CUDA_VERSION"
    echo "Virtual Env path: $VENV_DIR"

# will be executed when executing 'apptainer run xxx.sif`
%runscript
    echo "Container was created $NOW"
    echo "Ubuntu version: $OS_VERSION"
    echo "CUDA version: $CUDA_VERSION"
    echo "Virtual Env path: $VENV_DIR"
    echo "Arguments received: $*"

    ### Execute the following args with the default program "python"
    ### --> `apptainer run <Image>.sif main.py`
    echo "Executing '$VENV_DIR/bin/python $*'"
    exec $VENV_DIR/bin/python $@

%labels
    Author lijiaqi

%help
    This is a .def file to create a python environment based on `nvidia/cuda` docker image.

then build with

sudo apptainer build --nv --build-arg CUDA_VERSION=12.6.0  apptainer-base.sif apptainer-base.def

Detailed example:

a .def file for building from my customized Image ubuntu24.04-cuda-python3 Docker Hub, the modification is installing torch into it.

# apptainer-detailed.def

# Header
Bootstrap: docker # from Docker Hub
From: lijiaqiisai/ubuntu24.04-cuda-python3:cuda
Stage: build

%arguments
    CUDA_VERSION=12.6.0

%setup

# copy from Host path to Container path
%files
    requirements.txt

%post
    NOW=`date`
    echo "export NOW=\"${NOW}\"" >> $APPTAINER_ENVIRONMENT
    $VENV_DIR/bin/pip install -r requirements.txt

%test
    grep -q NAME=\"Ubuntu\" /etc/os-release
    if [ $? -eq 0 ]; then
        echo "Container base is Ubuntu as expected."
    else
        echo "Container base is not Ubuntu."
        exit 1
    fi

%environment
    export LISTEN_PORT=12345
    export LC_ALL=C
    export PATH=/usr/local/cuda/bin:$PATH
    export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
    export PATH=${VENV_DIR}/bin:$PATH

%startscript
    echo "Container was created $NOW"
    echo "Ubuntu version: $OS_VERSION"
    echo "CUDA version: $CUDA_VERSION"
    echo "Virtual Env path: $VENV_DIR"

%runscript
    echo "Container was created $NOW"
    echo "Ubuntu version: $OS_VERSION"
    echo "CUDA version: $CUDA_VERSION"
    echo "Virtual Env path: $VENV_DIR"
    echo "Arguments received: $*"

    ### Execute the following args with the default program "python"
    ### --> `apptainer run <Image>.sif main.py`
    echo "Executing '$VENV_DIR/bin/python $*'"
    exec $VENV_DIR/bin/python $@

%labels
    Author lijiaqi

%help
    This is a .def file to create a python environment based on `lijiaqiisai/ubuntu24.04-cuda-python3` docker image.

then build with

sudo apptainer build --nv --build-arg CUDA_VERSION=12.6.0  apptainer-detailed.sif apptainer-detailed.def

It will follows the requirements.txt to build an Image file

Build/Pack from an existing sandbox directory

sudo apptainer build <Image_name>.sif <SANDBOX_FOLDER>

What is the sandbox directory <SANDBOX_FOLDER> in the above example? Let’s move to next step.

Create Apptainer Image sandbox directory (then *.sif) with --sandbox option

By using the option --sandbox when using apptainer build --sandbox <SANDBOX_FOLDER> <Image_name>.sif/<URL>, we can create a mutable sandbox directory that we can customize to build our own virtual environment:

WARNING

It’s possible to create a sandbox without root privileges, but to ensure proper file permissions it is recommended to do so as root.

# create a sandbox directory
## from an existing *.sif
sudo apptainer build --nv --sandbox <SANDBOX_FOLDER>/ <Local>.sif
## from a remote docker hub
sudo apptainer build --nv --sandbox <SANDBOX_FOLDER>/ docker://<Image>:[Tag]
### e.g.
sudo apptainer build --nv --sandbox <SANDBOX_FOLDER>/ docker://lijiaqiisai/ubuntu24.04-cuda-python3:latest

Enter the sandbox container:

sudo apptainer shell --writable <SANDBOX_FOLDER>/

WARNING

Please note that with --writable mode, the nv files may not be bound.

Install softwares:

# (optional) Install Python
apt-get update && apt-get -y upgrade
apt-get -y install python3 python3-pip python3-virtualenv

# (optional) fix the issue from "nvidia/cuda" docker hub
# cd /etc/apt && cp trusted.gpg trusted.gpg.d && cd -

# (optional) Install torch==2.4.1
pip install torch==2.4.1 torchvision

Tip

Or if you want to install Python packages from a requirements.txt, please remember to bind its path to the Container

sudo apptainer shell --writable -B <<host_path>>/requirements.txt:<<container_path>>/requirements.txt <SANDBOX_FOLDER>/
pip install -r <<container_path>>/requirements.txt

After install packages, you may want to pack the Sandbox Directory into a *.sif Image file

# exit the container
exit

# package into .sif file
sudo apptainer build --nv  <SANDBOX_ENV>.sif <SANDBOX_FOLDER>/

# (optional) clean the directories
rm -rf <SANDBOX_FOLDER>/

Discovery

I found that the Apptainer Image file *.sif seems smaller than the Docker Image when containing same packages. Not sure. Need to be verifyed.

Run a Container

After build a portable *.sif Image file, you can use it for production at other place. There are several ways to run an instance based on your Apptainer Image file:

Command: apptainer run

executes the default runscripts defined in the container (e.g., see %runscript in apptainer.def)

# `apptainer run`if `%runscript` has been defined for this `*.sif` Image file (e.g., in the `apptainer.def` file)
apptainer run --nv dl-env.sif

Command: apptainer exec

followed by a specified command or program to execute within the container

# `apptainer exec` followed by your program
apptainer exec --nv dl-env.sif <your_program>
## e.g., 
apptainer exec --nv dl-env.sif python main.py

# can also run a specific shell; in this case, equivalent to `apptainer shell`
apptainer exec --nv dl-env.sif /bin/bash

Tip

Here is the comparisons between apptainer run and apptainer exec

Feature apptainer run apptainer exec
Purpose Executes the default runscripts defined in the container Executes a specified command or program within the container
Runscripts Requires that the container has a defined runscripts; will fail if none exists Does not rely on runscripts; you specify the command to run
Usage Scenario Ideal for running pre-configured applications or workflows Useful for debugging, testing, or running specific commands
Command Syntax apptainer run <container.sif> apptainer exec <container.sif> <command>
Additional Options --nv for GPU support --nv for GPU support, --bind/-B for binding host directories
Example apptainer run example.sif apptainer exec example.sif python script.py
Interactivity Does not support interactive shell unless the runscripts allow it Can launch an interactive shell (e.g., apptainer exec example.sif /bin/bash)
Flexibility Less flexible; limited to the runscripts defined in the image More flexible; allows execution of any command or script
Error Handling Fails if no runscripts are defined Fails if the specified command is not found or fails to execute

Command: apptainer shell

run the container instance under the interactive mode.

Command: apptainer instance

apptainer instance [options] <command> allows users to create, start, stop, and remove instances of containers, providing a way to run multiple instances of the same container image concurrently.

Some options:

  • start: Start a new container instance (running in the backend)
apptainer instance start <Image_name>.sif <instance_name>
  • stop: Stop a running container instance.
# stop a specific instance
apptainer instance stop <instance_name>
  • list: List all currently running instances.
apptainer instance list

(Personal) Some notes for runing Apptainer on Compute Canada

References:

This part provides some best practices for using Apptainer on Compute Canada.

apptainer exec --nv dl-env.sif my_script.sh

When use run, shell, instance, exec commands on Compute Canada:

  • Always use one of -C, -c or -e options:
    • -C: hides filesystems, PID, IPC, and environment;
    • -c: use minimal \dev, shared-with-host directories will appear empty, e.g., \tmp, unless explicitly bind mounted;
    • -e: clean environment before running container;
  • Always use the -W dir option with dir being a path to a real directory that you have write-access to
    • In sbatch scripts, set -W $SLURM_TMPDIR
  • When using NVIDIA GPUs, use -nv to expose the NVIDIA hardwares to the container.

  • When access to host directories is needed, bind mount the top-level directories of those filesystem, or, the desired directories themselves.
    • useful bind mounts: -B /home -B /project -B /scratch
# an general example
apptainer exec -C --nv -B /home -B /project -B /scratch dl-env.sif -W $SLURM_TMPDIR my_program

# for commands with `srun`, e.g., MIP program
srun apptainer run dl-env.sif /path/to/your/mpi-program

Summary

A quick overview and comparison between Docker and Apptainer

Here is a take-away summary of two platforms:

Feature Docker Apptainer (previously-known as Singularity)
Image
Container ✓ (what we will run is a Container)
Build Image by pull docker pull --name=[Local_Image:Tag] <Image>:[Tag] SIF Image: apptainer pull/build <Image>.sif docker://<Image>:[Tag] or Sandbox: apptainer build --sandbox <SANDBOX_DIR> /docker://<Image>:[Tag]
Build Image by from existing Image N/A??? (need several steps, i.e., docker run to creat a Container from an Image –> make changes –> commit to a new Image with docker commit <Container_id> <New_Image>:<Tag>) SIF Image: apptainer build <New>.sif <Old>.sif (meanless, just like copy) or Sandbox: apptainer build --sandbox <SANDBOX_DIR> <Old>.sif
Build Image by a definition file docker build -t/--tag [Img_Name][:[Tag]] . or docker build -t/--tag [Img_Name][:[Tag]] -f/--file ./Dockerfile SIF Image: apptainer build *.sif apptainer.def or Sandbox: apptainer build --sandbox <SANDBOX_DIR> apptainer.def
Start a New Container (instance) docker run --name=<Container_name> <Image_name> apptainer instance start *.sif <intance_name>
Enter the Shell of a running Container docker exec -it <container_id_or_name> /bin/bash N/A (once the Container instance is created in the background, we cannot enter its interactive mode)
Start a new Container as Interactive mode (combine the above two steps into one) docker run -it --name=<container_name> <Image> apptainer shell <Image>.sif
Start an existing instance docker start <instance_id_or_name> (should already exist) N/A (since Apptainer instances will NOT be saved)
Start an existing instance with Interactive mode (one-step) docker start -it <instance_id_or_name> (should already exist) N/A (since Apptainer instances will NOT be saved)
Stop a running instance docker stop <instance_id_or_name> apptainer instance stop <instance>
Pack into a new Image after modification docker commit -m="xxx" -a="xx" <Container_id> <Image>:<Tag> apptainer build --nv <New>.sif <SANDBOX_DIR>
Check running Container instances docker ps (or docker ps -a for all including stopped ones) apptainer instance list
Check existing Images docker images N/A (just need to check *.sif files)
Run an application program default CMD command with docker run --rm <Image> or speific program with docker run --rm <Image> <your_program> default %runscript with apptainer run *.sif or specific program with apptainer exec *.sif <your_program>
Mount/Bind a directory (useful for running programs) -v <host_path>:<container_path> --band/-B host_path[:container_path[:options]]
Use Nvidia GPUs in Container (useful for running programs) --gpus=xxx --nv
set workspace folder (useful for running programs) -w=<dir_on_container> N/A (no such a concept)

Core Commands

It is noteworthy that the names of the commands between Docker and Apptainer have some overlap but some of them have different functions. Here is a summary of some common commands within each platform:

Docker

  • Build
    • docker pull: pull an existing Image from remote registry (e.g., Docker Hub)
    • docker build: build an Image from a Dockerfile
  • Run
    • docker run: create a Container from an Image. Will execute the following command according to CMD and ENTRYPOINT in the Dockerfile;
      • docker run -it: interactive mode. Analogue to apptainer instance start <Image>.sif <container_name>
      • docker run -d: detach mode (background). Analogue to apptainer shell <Image>.sif
    • docker exec: Executes a command in an existing running container. Will not create a new container
      • docker exec <container_id_or_name> <your_program>
    • docker start: start an existing Container (it may be closed on the background)
    • docker stop: stop a running Container (if it is not running with the interactive mode and you cannot close it with exit)

Apptainer

  • Build
    • apptainer pull: pull an existing Image from remote registry like docker://xxx (Docker Hub)
    • apptainer build: build (1) .sif Image file; or (2) Sandbox directories with --sandbox from (1) remote registry (e.g., Docker Hub) or (2) an existing local *.sif file; (1)-(1) will align with the behaviour of apptainer pull [Local].sif docker://xxx
  • Run
    • apptainer run: only for executing the default command defined in %runscript in apptainer.def
    • apptainer exec: launch a Container instance to execut a program
    • apptainer shell: build an Container instance with interactive mode
    • apptainer instance: oprations w.r.t. the Container instances
      • apptainer instance start: start a Container instance
      • apptainer instance stop: stop a Container instance
      • apptainer instance list: list all running Container instances



    Enjoy Reading This Article?

    Here are some more articles you might like to read next:

  • Vision Transformer from scratch
  • Transformer model from scratch
  • Distributed Training with PyTorch