Back to the future: modules for Guix packages
Some things in our software world are timeless. The venerable
Environment Modules are one of these.
If you’ve ever used a high-performance cluster in the last three
decades, chances are you’re already familiar with it. Modules is about
managing software environments, just like Guix is—or, perhaps more
accurately, guix shell
.
You will be delighted, or surprised, to learn that Guix now has a compatibility layer with Modules.
The legacy of Modules
As Furlani’s 1991 introductory paper
explains,
Modules were—and still are—a key enabler for Unix users, especially in
high-performance computing (HPC). The module
command lets users
manipulate their software environment in terms of packages, without
having to be Unix or shell experts; they let them compose packages and
build the software environment of their choice, without interfering with
other users; they give a level of flexibility that Unix alone wouldn’t
provide. The command-line interface is easily understood:
module load gcc/11.2
“loads” GCC 11.2 in your shell. You can “load” and “unload” software components at will:
module load python/3.8
module unload gcc
As an interface, Modules are easy to use and understand.
However, they leave it up to sysadmins (sometimes users) to
actually deploy the software. The common approach has been for
sysadmins to build and install, by themselves, the software that
Modules refer to. The end result is that modules vary from machine to
machine. For example the gcc
module shown above might refer to
GCC 11.2 on one cluster and GCC 8 on another; it might have an entirely
different name on a third cluster. Likewise, the python/3.8
module
above might refer to different patch-level versions of Python 3.8, or
it might refer to a variant of Python
built with different dependencies or different build flags.
These issues have been largely mitigated by package managers such as EasyBuild and Spack: both automate package builds, and both can generate module files—Tcl snippets that define environment variables to set when “loading” a module. With EasyBuild and Spack, it becomes possible to not only automate deployment and module file generation, but also to deploy similar software on different machines.
“Similar”, though, does not mean “the same”. Software built with Spack or EasyBuild depends on software already available on the host system: it is built on top of a GNU/Linux distribution, which could be CentOS 7.4 (released in 2017), or Ubuntu 22.04, or really anything else. Thus, software installed with these tools depends on software provided by the underlying distribution, at build time and at run time.
This “hidden dependency” makes it hard to redeploy the exact same environment on a different machine or at a different point in time: the same build process might fail, or it might succeed but the resulting software might behave differently. Our approach in Guix is to not have that “hidden dependency”. Instead, the package dependency graph that Guix manipulates is self-contained: it includes package definitions for all the user-land software one may use.
From Guix to Modules
The news today is the release of
Guix-Modules, a new tool to
generate module files from
Guix packages. The primary goal, as with the module file generation
tools in EasyBuild and Spack, is to make it easy for HPC cluster
sysadmins to provide a set of modules for their users—more on that
below. Guix-Modules is an extension of Guix. To use it, you need to
install it and to set the GUIX_EXTENSIONS_PATH
environment variable,
like so:
guix install guix-modules
export GUIX_EXTENSIONS_PATH="$HOME/.guix-profile/share/guix/extensions"
That gives you a new guix module
sub-command.
Let’s say you want to generate modules to /opt/modules
for selected
packages; you can do so by running:
guix module create -o /opt/modules \
coreutils gcc-toolchain python python-numpy
As with all Guix commands, it will build or download the packages if they’re not
around already and populate /opt/modules
with a bunch of module files.
If /opt/modules
already existed, it has been backed up under
/var/guix/profiles
, which lets you roll back to the previous modules
should you regret your changes.
As an admin, you can periodically update the set of modules by running:
guix pull
guix module create -o /opt/modules …
The good thing is that users can still access the previous module set,
until you explicitly remove it, under /var/guix/profiles
.
Instead of having those long guix module create
command lines, you can
opt for listing the packages of interest in a manifest
file,
which you can keep under version control. As with most other guix
commands, you can pass the manifest with:
guix module create -m my-modules.scm -o /opt/modules
Once the modules have been generated, you can happily load and unload
them using the familiar module
sub-commands:
unset MODULEPATH
module use /opt/modules
module load gcc-toolchain/11.2.0
module load python/3.9.9
Voilà! If you’re a sysadmin, here’s a new way to offer scientific software to your users without asking them to change their habits. The generated module files work equally well with the “original” Module implementation and with Lmod.
Provenance tracking
Since we, Guix developers, pride ourselves on providing a deployment
tool with good support for provenance tracking, we couldn’t just let
that guix module
command generate module files of unclear provenance.
Users—we think—ought to be able to determine the provenance of the
modules they use. We want to avoid the scenario many HPC practitioners
are familiar with whereby, six months after publishing an article, you
can no longer reproduce the computational results it contains because
the relevant modules have been upgraded or removed from under your feet
and you just don’t know how to reproduce them.
Thus, guix module create
records provenance data in the module files
it generates. You can view that info by running module help
:
$ module help openblas
----------- Module Specific Help for 'openblas/0.3.18' ------------
This module was generated from a GNU Guix package.
Provenance data (channels):
(list (channel
(url "https://git.savannah.gnu.org/git/guix.git")
(branch "master")
(commit
"4ba35ccd18f90314caa76ea1833ffc383559401c")
(name 'guix)
(introduction
(make-channel-introduction
"9edb3f66fd807b096b48283debdcddccfea34bad"
(openpgp-fingerprint
"BBB0 2DDF 2CEA F6A8 0D1D E643 A2A0 6DF2 A33A 54FA")))))
What module help
shows is the list of
channels
from which this particular package was built. The information is in a
format that guix time-machine
can readily consume. Assuming you
store the (list (channel …))
snippet in file channels.scm
, you can
go to another machine, at a later point in time, and deploy the exact
same software with this command:
guix time-machine -C channels.scm -- \
shell gcc-toolchain openblas
For users, it makes a big difference: modules are no longer ephemeral—they’re now a reproducible artifact that you can redeploy with Guix anywhere, anytime.
Customization
HPC users are often demanding when it comes to customizing
software build processes. Guix supports this need with a gamut of
package transformation
options
available from the command line as well as through programming
interfaces.
Good news: guix module create
honors package transformation options.
Among those, the --tune
option, which instructs Guix to optimize
relevant packages for the host
micro-architecture,
may come in handy. If you know your cluster contains only Skylake CPUs,
you’d rather make sure relevant packages are optimized for Skylake. To
do that, you would run, say:
guix module create --tune=skylake \
gcc-toolchain openblas gsl
In this particular case, GSL gets
built for Skylake, using GCC’s -march=skylake
option (OpenBLAS itself
chooses optimized routines at run
time
so it is unaffected).
“But what about reproducibility?”, you ask. The chosen package
transformation option(s)—--tune
in this case—are also recorded as
part of the provenance data. This is what module help
reports:
$ module help gsl
----------- Module Specific Help for 'gsl/2.7' --------------------
This module was generated from a GNU Guix package.
Provenance data (channels):
(list (channel
(url "https://git.savannah.gnu.org/git/guix.git")
(branch "master")
(commit
"4ba35ccd18f90314caa76ea1833ffc383559401c")
(name 'guix)
(introduction
(make-channel-introduction
"9edb3f66fd807b096b48283debdcddccfea34bad"
(openpgp-fingerprint
"BBB0 2DDF 2CEA F6A8 0D1D E643 A2A0 6DF2 A33A 54FA")))))
Package transformations:
((tune . "skylake"))
The “Package transformations” bit is self-explanatory; it can be
passed as-is to
options->transformation
in a manifest.
We strongly believe one shouldn’t have to choose between performance and reproducibility and this is what this feature set supports.
Why all the fuss?
Guix is ten years
old,
Guix-HPC itself is turning five this
year, so you might
wonder why after all these years we’re adding a Modules compatibility layer. After
all, guix shell
can set up software environments on-the-fly in a way that is comparable to
module load
. For instance, to start a shell to use GCC and Python as
in the example above, you would type:
guix shell gcc-toolchain@11 python@3.8
More generally, Guix puts users in control: it lets them upgrade when they want to and allows them to travel in time; it lets them customize packages, and it lets them replicate the same environment elsewhere or at a different point in time.
Using Guix directly remains the most empowering approach for users, but module files created from Guix packages can satisfy a number of user needs:
- Matching user habits. For some communities, not having to learn a new command—even if it’s not all that different, even if it has more to offer—is a big plus. It’s not uncommon for cluster admins to offer Modules in addition to Guix or other tools for that reason.
- Supporting incremental software environment construction. With
module
, you can “load” and “unload” modules until you obtain the desired environment, whereasguix shell
currently expects a list of packages upfront. While exploring a problem space, the incremental mode might be more convenient—and indeed, patches have recently been discussed to support an incremental mode inguix shell
. - Supporting simple Guixy cluster setups. The Guix typical cluster
setup
requires running the build daemon, ensuring it can access the
network to download source or binaries, making it accessible to
front nodes and (optionally) build nodes, and setting up a couple
of NFS exports. Sysadmins who’d rather not do that can instead use
guix module create
and offer those modules to users. The/gnu/store
directory still needs to be exported over NFS, but that’s a read-only export, and it’s all that’s needed—a simpler setup.
If you’re an HPC cluster user or system administrator, we’d love to hear
your thoughts on the guix-science
mailing list or #guix-hpc
channel
on Libera.chat!
Unless otherwise stated, blog posts on this site are copyrighted by their respective authors and published under the terms of the CC-BY-SA 4.0 license and those of the GNU Free Documentation License (version 1.3 or later, with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts).