The modern toolbox for machine learning is delicate and complicated. Yet we would like to have a reproducible environment for machine learning which can be run on multiple different platforms for our works to be reproducible. In this devlog I’ll demonstrate a minimum working example for running PyTorch with CUDA.

The code is in this repository.

The objective of this tutorial is to produce a repository which (from the most to least important)

  1. Can run PyTorch and PyTorch-Geometric on CUDA/ROCM
  2. Is able to produce a singularity image that can be run on the cluster (specifically Sherlock)
  3. Can be built with or without Nix

Project configuration

First of all, select compatible PyTorch and PyTorch Geometric versions as well as the CUDA version. In this example it will be CUDA 11.8. We can then write this into the pyproject.toml file:

[tool.poetry]
name = "reproducible"
version = "0.0.1"
description = "Reproducible Machine Learning Environment"
authors = ["Leni Aniva <v@leni.sh>"]
readme = "README.md"
include = []
packages = [{ include = "reproducible" }]

[tool.poetry.scripts]
reproducible = "reproducible.util:main"

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

[tool.poetry.dependencies]
python = "^3.11"
numpy = "^1.26"
#torch = { url = "https://download.pytorch.org/whl/cu118/torch-2.2.1+cu118-cp311-cp311-linux_x86_64.whl" }
torch = { version = "^2.2.0", source = "pytorch-cu118" }
torch-geometric = "^2.5.2"
pyg-lib = { version = "0.4.0", source = "pyg-torch220-cu118" }
torch-cluster = { version = "1.6.3", source = "pyg-torch220-cu118" }
torch-scatter = { version = "2.1.2", source = "pyg-torch220-cu118" }
torch-sparse = { version = "0.6.18", source = "pyg-torch220-cu118" }
torch-spline-conv = { version = "1.2.2", source = "pyg-torch220-cu118" }

[[tool.poetry.source]]
name = "pytorch-cu118"
url = "https://download.pytorch.org/whl/cu118"
priority = "explicit"

[[tool.poetry.source]]
name = "pyg-torch220-cu118"
url = "https://data.pyg.org/whl/torch-2.2.0+cu118.html"
priority = "explicit"

Note that although fetching PyTorch from a URL directly is possible, due to a current bug in poetry2nix, building derivations containing PyTorch will be impossible.

Then we add a few Python files for diagnostics:

# reproducible/__init__.py

# reproducible/__main__.py
import reproducible.util

reproducible.util.main()

# reproducible/util.py