Faster relocatable packs with Fakechroot
The guix pack
command creates “application bundles” that can be used to deploy
software on machines that do not run Guix (yet!), such as HPC clusters. Since
its inception in
2017,
it has seen a number of improvements, such as the ability to create
Docker and Singularity container images. Some clusters lack these
tools, though, and the addition of relocatable
packs
was a way to address that. This post looks at a new execution engine
for relocatable packs that has just
landed with the goal of improving
performance.
Before we get into that, let’s recap how relocatable packs work.
Relocatable packs
Essentially, a relocatable pack is a plain old tarball that contains the applications of your choosing along with all their dependencies, such that you can run them on any GNU/Linux machine. To create a pack containing Python and NumPy, run:
guix pack -RR python python-numpy -S /bin=bin
The -RR
flag asks for the creation of what we jokingly refer
to as a reliably relocatable pack
(more on that below), while the -S
flag asks for the creation of a
/bin
symbolic link in the tarball.
The result of that command is a tarball that you can send on another machine, unpack, and then run Python directly from there without any special privileges:
tar xf pack.tar.gz
./bin/python
That’s it! All you need on the target machine is tar
, and the rest
just works.
Relocation with PRoot
guix pack -R
(with a single -R
) creates relocatable packs that
require kernel support for unprivileged user
namespaces.
However, some systems have them disabled, and older systems do not
support them at all—the ./bin/python
command above wouldn’t work on
them.
The -RR
option we saw above adds a universal fallback option: on a
system where unprivileged user namespaces are not available, the
./bin/python
command above automatically falls back to using
PRoot. PRoot achieves
file system virtualization by intercepting the process’ system calls
with ptrace
.
The advantage is that it always works—it doesn’t rely on any special
kernel feature, ptrace
has “always been there” so to speak. The
drawback is that it incurs significant overhead at every system call.
This is acceptable for an interactive program, or, say, for a
single-threaded number-crunching application. But the performance hit
is prohibitive, for example, for an MPI or multi-threaded
application—input/output and synchronization happen via system calls.
Enter Fakechroot
To address this performance issue, we have just added a third execution
engine to relocatable packs relying on
ELF
trickery. Users of relocatable packs can now choose at run time an
execution engine by setting the GUIX_EXECUTION_ENGINE
environment
variable. If you choose the performance
engine, the application will
choose user namespaces or, if they are not supported, fall back to
the new fakechroot
engine:
export GUIX_EXECUTION_ENGINE=performance
./bin/python
guix pack -RR
wraps the application executables, in this case
python
. Those wrappers are small statically-linked programs that
implement the execution
engines.
The new fakechroot
engine works like that:
The
PT_INTERP
segment of the wrapped executable contains the file name of the dynamic linker,ld.so
, under/gnu/store
. Since/gnu/store
doesn’t exist on the host machine, the dynamic linker is invoked directly, with its file name computed relative to the wrapper’s file name.The loader is told to preload the Fakechroot shared library, which interposes on the file system functions of the C library (
open
,stat
, etc.) and “translates”/gnu/store
absolute file names to their actual location.The
RUNPATH
of Guix executables and shared libraries lists the/gnu/store
directories that contain the libraries they depend on. Theopen
calls thatld.so
itself makes are not interposable, so Fakechroot doesn’t help here. Fortunately, the little-known audit interface of the GNU dynamic linker comes in handy: itsla_objsearch
hook allows you to alter the wayld.so
looks for shared libraries. Thus, a few lines of C are all it takes to getld.so
to translate/gnu/store
file names. Neat!
The fakechroot
engine incurs very little overhead, and only on file
system function calls, making it a great option for HPC workloads. The
default engine remains user namespaces with a fallback to PRoot, so be
sure to set GUIX_EXECUTION_ENGINE=performance
. See the
manual
for more info.
A call to HPC system administrators
guix pack -RR
allows you to deploy software stacks on a Guix-less
cluster that lacks both support for unprivileged user namespaces and a
container facility such as Singularity, without loss of performance.
A similar combination of execution engines for unprivileged users can be
found in udocker, though the
tool has different goals. Having discussed these techniques,
it’s good to take a step back and look at the bigger picture.
All these shenanigans would be unnecessary if unprivileged user
namespaces were universally available. In fact, when we released guix pack -R
two years
ago,
we thought (hoped?) that widespread availability of unprivileged user
namespaces was imminent. After all, the feature had already been
available in the Linux kernel since version 3.8, released in 2013.
Unfortunately, today, major academic HPC clusters still run a derivative of Red Hat Enterprise Linux (RHEL) or CentOS 7, released in 2015 with Linux 3.10, where the decision was made to disable user namespaces. RHEL 8 and derivatives are documented as having an easy way to set up user namespaces.
We encourage HPC system administrators to consider enabling unprivileged user namespaces. They allow unprivileged users to deploy pre-built software, be it through a relocatable Guix pack or via container run-time support tools like runC, with virtually no overhead. More generally, user namespaces enable reproducible software environments, a prerequisite for reproducible scientific experiments!
Acknowledgments
Many thanks to Carlos O’Donell, steward for the GNU C Library, for
reviewing initial revisions of the fakechroot
execution engine and for
suggesting the use of the ld.so
audit interface.
Unless otherwise stated, blog posts on this site are copyrighted by their respective authors and published under the terms of the CC-BY-SA 4.0 license and those of the GNU Free Documentation License (version 1.3 or later, with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts).