Reproducible research hackathon: experience report
Two weeks ago, on June 27th, we held an second on-line hackathon on reproducible research issues. This hackathon was a collaborative effort to bring GNU Guix to concrete examples inspired by contributions to the online journal ReScience C.
A small but enthusiastic group of about 5 people connected to the
#guix-hpc
IRC channel on Libera.chat and hacked the good
reproducibility hack.
The day was interspersed by three video chats; the first to exchange about
interests, background and working plan, the second to report the work in
progress and the last to address the achievements and list future ideas.
As we are advocating, this command line:
guix time-machine -C channels.scm -- shell -m manifest.scm
… captures all the requirements for redeploying the same computational environment. Specifically:
channels.scm
pins a specific revision of Guix and potentially other channels;manifest.scm
specifies the packages required by the computational environment.
The three goals of the hackathon were:
- Pick a ReScience C
submission
and add these two files:
channels.scm
andmanifest.scm
. - If needed, define packages. These could then go to Guix itself or one of the relevant dedicated channels: Guix-Science, Guix-Past, etc.
- Identify open issues that hinder reproducibility of software environment environments.
Here’s a recap. TLDR, it was a success!
Complete “Guixification”
These two papers based on Python software were considered:
- [Re] Neural Network Model of Memory Retrieval, ReScience C 6, 3, #8, 2021.
- [Re] A general model of hippocampal and dorsal striatal learning and decision making, ReScience C 8, 1, #4, 2022.
Writing the two files, channels.scm
and manifest.scm
, was rather
straightforward. This led to two pull requests again the original papers:
here and
there.
Nothing fancy: most of the work consisted in “translating” the
requirements.txt
file used by pip
to manifest.scm
.
On a side note, would it be possible to take advantage of GitHub’s continuous integration, GitHub Action, to guide the review process? The first idea would be to let GitHub Action run some part of the numerical processing. However, the resources offered by GitHub are limited or are not suitable for numerical experiments. Instead, GitHub Action can be exploited to pack the software environment and publish the resulting artifact. For instance, Docker images are popular and Guix can produce them; for details about producing Docker images using Guix on the top of GitHub Action, see this example based on ReScience article above (8, 1, #4, 2022). In a nutshell, GitHub Action runs the following command:
guix time-machine -C channels.scm \
-- pack -f docker --save-provenance -m manifest.scm
A reviewer could then load this Docker image artifact produced by Guix. Or
they could directly generate the software environment from the files
channels.scm
and manifest.scm
. Either way, a reviewer is thus able
to inspect the software environment of the submission. Last, because of the
--save-provenance
option, the Docker image brings Guix
information for reproducing
itself.
Partial port to Guix
Other papers tracked by ReScience had been considered:
- [Re] Groups of diverse problem-solvers outperform groups of highest-ability problem-solvers - most of the time, 8, 1, #6, 2022.
- [Re] Modeling Insect Phenology Using Ordinal Regression and Continuation Ratio, 7, 1, #5, 2021.
- [Re] A circuit model of auditory cortex, review still pending.
- [Re] Particle Image Velocimetry with Optical Flow, initial paper from 1998 and the reproduction had been sent for the Ten Years Reproducibility Challenge.
We did not complete the reproduction of all of these papers using Guix
due to lack of time or computational resources. Progress on the first
paper is visible in this Git
repository.
The main pitfall illustrated by this paper is that not all of the
experiment’s source code was available in the repository; some of it was
stored elsewhere on-line and transparently downloaded and run via
Python’s httpimport
.
This is problematic for several reasons: that code might simply vanish,
it could be modified between the time the authors submitted the paper
and the time someone else attempts to reproduce it, or it could be
maliciously modified. The solution was to get the current copy of the
relevant code inside the repository and to remove uses of httpimport
.
This experiment is computationally very expensive though, and we could
not run it on time on our local cluster.
About the second paper, the main difficulty was related to time zone. The
variable TZDIR
required an adjustment. Hopefully, thanks to the
inferiors Guix
feature, a custom manifest combining two different Guix revisions allows to
generate the software environment based on R ecosystem where the numerical
experiment of the paper can be run.
The ReScience reviewer of the third paper took advantage of the hackathon for
resuming and trying Guix for the software environment. The files channels.scm
and manifest.scm
were created without any big issue. The paper’s
computational experiment runs on Jupyter Notebook, and it runs
out-of-the-box with the --pure
option of guix shell
—running it with --container
,
for improved isolation, is left as an exercising for the reader. One
drawback was that the paper’s author invokes apt install
in the middle of
the notebook. On the Guix side, one difficulty was
finding the right TeX Live
packages;
another one was the interaction with the Python library matplotlib
, which can be
troublesome. The session was a double opportunity: dive in Guix-specific
details—this hackathon was the right place to share knowledge!—and
this specific review, which started in March, is now almost finished. Win-win!
The fourth and last paper were a challenge: produce a software environment where C code from 1998 can run. And that’s a positive result! The two tables agree with those in the paper. The C code compiles and runs, although some warnings are raised and possibly turned off via specific compiler flags, and the Bash shell scripts are not fully portable and required minor tweaks. The C code has no dependencies and thus it significantly simplify the portability and eases the reproducibility.
Towards long-term and archivable reproducibility
Over the years running Guix daily in scientific context, we have already identified many potential roadblocks to achieve long-term reproducible software environments—from unfixed bugs to unimplemented features. Verifiable environment deployment can only be achieved when all the following conditions are met:
- availability of all the source code;
- backward-compatibility of the Linux kernel system call interface;
- some compatibility of the hardware (CPU, etc.);
- no “time bomb”—software whose behavior is a function of the current time.
This hackathon was a nice opportunity to check their status and list what already works and what still remains, all based on a concrete example:
- [Re] Storage Tradeoffs in a Collaborative Backup Service for Mobile Devices, ReScience C 6, 1, #6, 2020.
This paper runs Guix end-to-end: it uses Guix to compile all the requirements, run all the experiments and last generate the final report. Let us check if two independent observers are able to verify the same result with three years between the two observations (2020—2023).
We know that this paper’s computational experiment is reproducible with Guix today under “normal circumstances” (try it!), so we set out to experiment with an extreme worst-case scenario: no pre-built binaries are available—everything needs to be rebuilt from source—and none of the source code hosting sites is reachable, with the exception of the Software Heritage archive. The ambition of Software Heritage is to collect, preserve, and share all software that is publicly available in source code form. Guix fetches code from Software Heritage as a fallback when source code hosting sites disappear. To our knowledge, redeploying software under such extreme conditions is practically impossible, unless of course one is using Guix—or at least that’s what we wanted to verify.
In summary, the outcome of this experiment is impressive. Considering this extreme worst-case setup, it's awesome that it almost works out-of-the-box. The remaining open issues we identified are:
- Guix user interface annoyances: manual
--fallback
or--no-substitutes
options and inconsistent error messages. - Holes in Software Heritage and Disarchive coverage of the source code we needed.
- Source origin hash mismatches between Guix normalization and Software Heritage normalization.
- “Time bomb”: the test suite of some packages is failing because it is time-dependent (example).
- Weaknesses in the full-source bootstrap.
- The archive of all the binary seeds of this bootstrap.
For the interested reader, take a look at the complete details. Does it mean we have a roadmap the next hackathon? If you are interested, we’d love to hear your ideas!
Last but not least, a one-day on-line get-together is a great opportunity to tackle longstanding topics while helping each other and welcoming newcomers on board. Thanks to everyone for joining! It’s been a pleasant and productive experience, so stay tuned for other rounds!
Unless otherwise stated, blog posts on this site are copyrighted by their respective authors and published under the terms of the CC-BY-SA 4.0 license and those of the GNU Free Documentation License (version 1.3 or later, with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts).