Two weeks ago, on June 27th, we held an second on-line hackathon on reproducible research issues. This hackathon was a collaborative effort to bring GNU Guix to concrete examples inspired by contributions to the online journal ReScience C.

A small but enthusiastic group of about 5 people connected to the #guix-hpc IRC channel on Libera.chat and hacked the good reproducibility hack. The day was interspersed by three video chats; the first to exchange about interests, background and working plan, the second to report the work in progress and the last to address the achievements and list future ideas.

As we are advocating, this command line:

guix time-machine -C channels.scm -- shell -m manifest.scm

… captures all the requirements for redeploying the same computational environment. Specifically:

channels.scm pins a specific revision of Guix and potentially other channels;
manifest.scm specifies the packages required by the computational environment.

The three goals of the hackathon were:

Pick a ReScience C submission and add these two files: channels.scm and manifest.scm.
If needed, define packages. These could then go to Guix itself or one of the relevant dedicated channels: Guix-Science, Guix-Past, etc.
Identify open issues that hinder reproducibility of software environment environments.

Here’s a recap. TLDR, it was a success!

Complete “Guixification”

These two papers based on Python software were considered:

[Re] Neural Network Model of Memory Retrieval, ReScience C 6, 3, #8, 2021.
[Re] A general model of hippocampal and dorsal striatal learning and decision making, ReScience C 8, 1, #4, 2022.

Writing the two files, channels.scm and manifest.scm, was rather straightforward. This led to two pull requests again the original papers: here and there. Nothing fancy: most of the work consisted in “translating” the requirements.txt file used by pip to manifest.scm.

On a side note, would it be possible to take advantage of GitHub’s continuous integration, GitHub Action, to guide the review process? The first idea would be to let GitHub Action run some part of the numerical processing. However, the resources offered by GitHub are limited or are not suitable for numerical experiments. Instead, GitHub Action can be exploited to pack the software environment and publish the resulting artifact. For instance, Docker images are popular and Guix can produce them; for details about producing Docker images using Guix on the top of GitHub Action, see this example based on ReScience article above (8, 1, #4, 2022). In a nutshell, GitHub Action runs the following command:

guix time-machine -C channels.scm \
     -- pack -f docker --save-provenance -m manifest.scm

A reviewer could then load this Docker image artifact produced by Guix. Or they could directly generate the software environment from the files channels.scm and manifest.scm. Either way, a reviewer is thus able to inspect the software environment of the submission. Last, because of the --save-provenance option, the Docker image brings Guix information for reproducing itself.

Partial port to Guix

Other papers tracked by ReScience had been considered:

[Re] Groups of diverse problem-solvers outperform groups of highest-ability problem-solvers - most of the time, 8, 1, #6, 2022.
[Re] Modeling Insect Phenology Using Ordinal Regression and Continuation Ratio, 7, 1, #5, 2021.
[Re] A circuit model of auditory cortex, review still pending.
[Re] Particle Image Velocimetry with Optical Flow, initial paper from 1998 and the reproduction had been sent for the Ten Years Reproducibility Challenge.

We did not complete the reproduction of all of these papers using Guix due to lack of time or computational resources. Progress on the first paper is visible in this Git repository. The main pitfall illustrated by this paper is that not all of the experiment’s source code was available in the repository; some of it was stored elsewhere on-line and transparently downloaded and run via Python’s httpimport. This is problematic for several reasons: that code might simply vanish, it could be modified between the time the authors submitted the paper and the time someone else attempts to reproduce it, or it could be maliciously modified. The solution was to get the current copy of the relevant code inside the repository and to remove uses of httpimport. This experiment is computationally very expensive though, and we could not run it on time on our local cluster.

About the second paper, the main difficulty was related to time zone. The variable TZDIR required an adjustment. Hopefully, thanks to the inferiors Guix feature, a custom manifest combining two different Guix revisions allows to generate the software environment based on R ecosystem where the numerical experiment of the paper can be run.

The ReScience reviewer of the third paper took advantage of the hackathon for resuming and trying Guix for the software environment. The files channels.scm and manifest.scm were created without any big issue. The paper’s computational experiment runs on Jupyter Notebook, and it runs out-of-the-box with the --pure option of guix shell—running it with --container, for improved isolation, is left as an exercising for the reader. One drawback was that the paper’s author invokes apt install in the middle of the notebook. On the Guix side, one difficulty was finding the right TeX Live packages; another one was the interaction with the Python library matplotlib, which can be troublesome. The session was a double opportunity: dive in Guix-specific details—this hackathon was the right place to share knowledge!—and this specific review, which started in March, is now almost finished. Win-win!

The fourth and last paper were a challenge: produce a software environment where C code from 1998 can run. And that’s a positive result! The two tables agree with those in the paper. The C code compiles and runs, although some warnings are raised and possibly turned off via specific compiler flags, and the Bash shell scripts are not fully portable and required minor tweaks. The C code has no dependencies and thus it significantly simplify the portability and eases the reproducibility.

Towards long-term and archivable reproducibility

Over the years running Guix daily in scientific context, we have already identified many potential roadblocks to achieve long-term reproducible software environments—from unfixed bugs to unimplemented features. Verifiable environment deployment can only be achieved when all the following conditions are met:

availability of all the source code;
backward-compatibility of the Linux kernel system call interface;
some compatibility of the hardware (CPU, etc.);
no “time bomb”—software whose behavior is a function of the current time.

This hackathon was a nice opportunity to check their status and list what already works and what still remains, all based on a concrete example:

[Re] Storage Tradeoffs in a Collaborative Backup Service for Mobile Devices, ReScience C 6, 1, #6, 2020.

This paper runs Guix end-to-end: it uses Guix to compile all the requirements, run all the experiments and last generate the final report. Let us check if two independent observers are able to verify the same result with three years between the two observations (2020—2023).

We know that this paper’s computational experiment is reproducible with Guix today under “normal circumstances” (try it!), so we set out to experiment with an extreme worst-case scenario: no pre-built binaries are available—everything needs to be rebuilt from source—and none of the source code hosting sites is reachable, with the exception of the Software Heritage archive. The ambition of Software Heritage is to collect, preserve, and share all software that is publicly available in source code form. Guix fetches code from Software Heritage as a fallback when source code hosting sites disappear. To our knowledge, redeploying software under such extreme conditions is practically impossible, unless of course one is using Guix—or at least that’s what we wanted to verify.

In summary, the outcome of this experiment is impressive. Considering this extreme worst-case setup, it's awesome that it almost works out-of-the-box. The remaining open issues we identified are:

Guix user interface annoyances: manual --fallback or --no-substitutes options and inconsistent error messages.
Holes in Software Heritage and Disarchive coverage of the source code we needed.
Source origin hash mismatches between Guix normalization and Software Heritage normalization.
“Time bomb”: the test suite of some packages is failing because it is time-dependent (example).
Weaknesses in the full-source bootstrap.
The archive of all the binary seeds of this bootstrap.

For the interested reader, take a look at the complete details. Does it mean we have a roadmap the next hackathon? If you are interested, we’d love to hear your ideas!

Last but not least, a one-day on-line get-together is a great opportunity to tackle longstanding topics while helping each other and welcoming newcomers on board. Thanks to everyone for joining! It’s been a pleasant and productive experience, so stay tuned for other rounds!

Reproducible research hackathon: experience report

Complete “Guixification”

Partial port to Guix

Towards long-term and archivable reproducibility