DESOSA 2022

Snakemake

Snakemake is a workflow engine that provides a readable Python-based workflow definition language and a powerful execution environment that scales from single-core workstations to compute clusters without modifying the workflow. It is the first system to support the use of automatically inferred multiple named wildcards (or variables) in input and output filenames. The aim of Snakemake is to enable sustainable data analysis for scientific research. Snakemake is highly popular, with on average more than 6 new citations per week, and over 200k downloads (https://github.com/snakemake/snakemake). It is inspired by GNU Make (https://www.gnu.org/software/make/).

Authors

Hielke Walinga

Started with a Bachelor Nanobiology on the TU Delft. Now a student Computer Science doing the bioinformatics specialication.

Maarten van Tartwijk

Academic background: Bsc. Nanobiology and Bsc. Computer Science at TU Delft. Currently doing the Computer Science master's at TU Delft (Artificial Intelligence Track). Current main interests: computer vision, interpretability/explainability of AI models, energy transition, sustainability.

Jan-Willem van Rhenen

Finished my Bachelor Nanobiology at the TU Delft. Now I'm in my last full quarter of courses of the AI track of Computer Science. My focus during the master was on Machine Learning and Interactive Intelligence.

Lars van Koetsveld van Ankeren

First-year master student at the TU Delft. Finished Bachelor Computer Science at TU Delft as well, interests lie in algorithmics and responsible software engineering.

Faceting Snakemake: The Added Value of a Plugin System

Introduction Snakemake 1 is an open source workflow management system inspired by the GNU Make 2 build automation tool. Snakemake aims to facilitate sustainable data analysis by supporting reproducible, adaptable, and transparent data research. In this essay we analyze scalability challenges for Snakemake and propose a solution for the identified issues. We opted for a different approach than the original assignment, analyzing the scalability of the userbase and development of Snakemake instead of technical scalability.

Snakemake: Keeping the polish

Introduction Snakemake 1 is an open source workflow management system inspired by the GNU Make 2 build automation tool. Snakemake aims to facilitate sustainable data analysis by supporting reproducible, adaptable, and transparent data research. In this essay we analyze the code quality and how this is tested as well as the evolution of the codebase. This article is based on an interview we conducted with the code owner of Snakemake, Johannes Köster, as well as our own research into the repository of Snakemake.

Digging deeper into Snakemake: architectural gold and pyrite

Introduction Snakemake is an open source workflow management system inspired by the GNU Make 1 build automation tool. Snakemake aims to facilitate sustainable data analysis by supporting reproducible, adaptable, and transparent data research. The core of a Snakemake workflow consists of a Snakefile that defines all the steps of a workflow as rules. These rules determine how output files are created from input files, while Snakemake automatically resolves dependencies between the rules.

Snakemake: A Hidden Gem for Sustainable Data Science from the Field of Bioinformatics

Problem analysis Snakemake is an open source workflow management system inspired by the GNU Make [^gnu-make] build automation tool. Snakemake aims to facilitate sustainable data analysis by supporting reproducible, adaptable, and transparent data research. This is done by using workflows, which are data analysis pipelines. The core of a Snakemake workflow consists of a Snakefile that defines all the steps of a workflow as rules. These rules determine how output files are created from input files, while Snakemake automatically resolves dependencies between the rules.

Contributions

Fix Snakemake Docs Build Environment Problem

snakemake/snakemake

Description

Bumped Python to 3.7

Removed docutils constraint on version 0.12 (resulted in unresolved dependencies)

Added missing dependency myst-parser

Reflection

When fixing my other small contribution I couldn’t create the conda environment to correctly build the docs.

I debugged this problem and fixed the environment file.

This was harder than it should be. Conda is slow, and the errors cryptic. In the process, I learned I should use the faster mamba implementation. It was faster, but errors kept being cryptic.

merged
Open PR

Fix typo in Snakemake documentation

snakemake/snakemake

Description

Fixed a typo mentioned in https://github.com/snakemake/snakemake/issues/1113.

Reflection

The hardest part about all of this was getting Snakemake documentation to build locally using Sphinx.

But in the end I succeeded and fixed a simple typo.

merged
Open PR