From c2927c16e5f8bf111fdd04e7b1f9ca5fa8936c91 Mon Sep 17 00:00:00 2001 From: "James A. Fellows Yates" Date: Sun, 5 Jan 2025 02:37:22 +0100 Subject: [PATCH] Add longer form tutorials on how to add, update, and build recipes (#31) * Start adding reference pages * Add skeleton and mostly port the adding software tutorial * Update other post links * Fix linting errors other pages * Add debugging and updating * Address first round of comments from @bgruening * Update source/tutorials/2024-updating-bioinformatic-software-to-bioconda.rst --- source/contributor/building-locally.rst | 5 + source/contributor/workflow.rst | 22 +- source/index.rst | 4 +- ...ing-bioinformatic-software-to-bioconda.rst | 461 ++++++++++++++++++ ...ing-bioinformatic-software-to-bioconda.rst | 186 +++++++ ...ing-bioinformatic-software-to-bioconda.rst | 139 ++++++ source/tutorials/index.rst | 6 +- 7 files changed, 816 insertions(+), 7 deletions(-) create mode 100644 source/tutorials/2024-adding-bioinformatic-software-to-bioconda.rst create mode 100644 source/tutorials/2024-debugging-bioinformatic-software-to-bioconda.rst create mode 100644 source/tutorials/2024-updating-bioinformatic-software-to-bioconda.rst diff --git a/source/contributor/building-locally.rst b/source/contributor/building-locally.rst index 0368c9d..c36b06b 100644 --- a/source/contributor/building-locally.rst +++ b/source/contributor/building-locally.rst @@ -11,6 +11,11 @@ do so, each with their own caveats. .. _bioconda_utils: +For more gentle step-by-step guides to local building and debugging see: + +* :doc:`/tutorials/2024-updating-bioinformatic-software-to-bioconda` +* :doc:`/tutorials/2024-debugging-bioinformatic-software-to-bioconda` + Using bioconda-utils ~~~~~~~~~~~~~~~~~~~~ diff --git a/source/contributor/workflow.rst b/source/contributor/workflow.rst index 15544f0..58b9aaa 100644 --- a/source/contributor/workflow.rst +++ b/source/contributor/workflow.rst @@ -32,13 +32,23 @@ them turns out to be more complicated than you thought. .. _make_edits: -2. Make Some Edits -~~~~~~~~~~~~~~~~~~ +2. Add a Recipe or Make Some Edits +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Now you need to make your edits. Naturally, this can become rather +Now you need to add your files or make your edits. Naturally, this can become rather complex. -If you have a PyPi recipe you want to package for Bioconda, you could +Typically this involves creating within a in `recipes/` two files: + +* A `meta.yaml` recipe file +* A (optional) `build.sh` script. + +For more detailed guides, see the the following tutorials + +* :doc:`/tutorials/2024-adding-bioinformatic-software-to-bioconda` +* :doc:`/tutorials/2024-updating-bioinformatic-software-to-bioconda` + +However for some fast tips: if you have a PyPi recipe you want to package for Bioconda, you could start with the ``conda skeleton`` command creating a template automatically:: @@ -52,6 +62,8 @@ it is a general purpose package. Those need to be added to Now edit the file(s) in the newly created folder (named according to the package). +If you can't use a skeleton, copy from other recipes written in the same language as your tool. + You can verify your edits by looking at the ``diff``:: git diff @@ -60,6 +72,8 @@ Or, you can have a look at the changed files and status of your repository using git status +You can also at this point do a local build to test that the recipe will work correctly. +See :doc:`building-locally` or if you have problems, see the following tutorial: :doc:`/tutorials/2024-debugging-bioinformatic-software-to-bioconda` .. _push_changes: diff --git a/source/index.rst b/source/index.rst index e432dd6..70e06fa 100644 --- a/source/index.rst +++ b/source/index.rst @@ -185,7 +185,7 @@ package and a Docker container Contributors work together to fix any issues (which are tested again) and the process repeats until all tests pass. - Our `build system`_, `bioconda-utils`, orchestrates the various building + Our `build system`_, ``bioconda-utils``, orchestrates the various building and testing steps on CI infrastructure like CircleCI, Azure Pipelines, and GitHub Actions. The output consists of both a `conda package`_ and a `Biocontainer`_ that can be inspected before merging the pull request. @@ -207,7 +207,7 @@ containing over 8000 bioinformatics packages `_. -:circlednumber:`⑤` Users can then use the package with `conda install` or `docker pull` +:circlednumber:`⑤` Users can then use the package with ``conda install`` or ``docker pull`` .. details:: Details diff --git a/source/tutorials/2024-adding-bioinformatic-software-to-bioconda.rst b/source/tutorials/2024-adding-bioinformatic-software-to-bioconda.rst new file mode 100644 index 0000000..7ffe61b --- /dev/null +++ b/source/tutorials/2024-adding-bioinformatic-software-to-bioconda.rst @@ -0,0 +1,461 @@ +Adding bioinformatic software to Bioconda - a reference guide +############################################################# + +*This guide was originally written by James A. Fellows Yates and reviewed by the µbinfie community for the* `µbinfie blog `_. +*You can see the original post* `here `_ + +This tutorial aims to give a gentle introduction to adding bioinformatic software to Bioconda. + +The main sections of this tutorial are: + +- TL;DR +- Prerequisites +- Adding a new tool or package +- Debugging a recipe build +- Updating an existing tool or package recipe + +Please note that this tutorial comes with 'no warranty'!(!), as the Bioconda build steps could change at any point. +However, the steps here should act as a good starting point. +Furthermore, if you are planning to add someone else's tool or package to Bioconda, it's always good etiquette to ask or just inform the original authors that it will happen. + +TL;DR Relevant Commands +***************** + +.. code-block:: bash + + ## Create environment with conda building tools + conda create -n bioconda-build -c conda-forge -c bioconda conda-build bioconda-utils greyskull + conda activate bioconda-build + + ## Clone repo of your fork of https://github.com/bioconda/bioconda-recipes and make branch + git clone + git switch -c add- + + ## Make recipe meta.yaml + ## Option 1: If using Greyskull + cd recipes/ + greyskull + + ## Option 2: If not using Greyskull + mkdir recipes/ + touch recipes//meta.yaml + + ## Lint recipe meta.yaml + bioconda-utils lint recipes/ --packages pathphynder + + ## Perform a local build (two options) + ## Option 1: + conda build recipes/ + + ## Option 2: + bioconda-utils build --docker --mulled-test --packages + + ## Debugging + ## Option 1: conda-build + cd ////envs//conda-bld/linux-64 + conda create -n -c ./ + conda build recipes/ --keep-old-work + + ## Option 2: bioconda-utils + docker run -t --net host --rm -v /tmp/tmp/build_script.bash:/opt/build_script.bash -v ////envs//conda-bld/:/opt/host-conda-bld -v ////recipes/:/opt/recipe -e LC_ADDRESS=en_GB.UTF-8 -e LC_NAME=en_GB.UTF-8 -e LC_MONETARY=en_GB.UTF-8 -e LC_PAPER=en_GB.UTF-8 -e LANG=en_GB.UTF-8 -e LC_IDENTIFICATION=en_GB.UTF-8 -e LC_TELEPHONE=en_GB.UTF-8 -e LC_MEASUREMENT=en_GB.UTF-8 -e LC_TIME=en_GB.UTF-8 -e LC_NUMERIC=en_GB.UTF-8 -e HOST_USER_ID=1000 quay.io/bioconda/bioconda-utils-build-env-cos7:2.11.1 bash + + conda mambabuild -c file:///opt/host-conda-bld --override-channels --no-anaconda-upload -c conda-forge -c bioconda -c defaults -e /opt/host-conda-bld/conda_build_config_0_-e_conda_build_config.yaml -e /opt/host-conda-bld/conda_build_config_1_-e_bioconda_utils-conda_build_config.yaml /opt/recipe/meta.yaml 2>&1 + conda activate /opt/conda/conda-bld//_build_env + + ## Testing the Docker image artifact + docker run -it + + +Prerequisites +************* + +1. Make a fork of the `bioconda-recipes `_ GitHub repository, and clone this to our local machine [1]_. + +2. Install on our local machine the following software: + + - ``conda`` itself + - I used to use `miniconda `_, but now switching to `miniforge `_ due to licensing issues [2]_ + - Bioconda configured as a source channel (see `bioconda documentation `_) + - The following conda packages: + + - ``conda-utils`` + - ``bioconda-build`` + - ``greyskull`` (optional: for Python software on pypi or R packages on CRAN) + + I typically dump all of the above in a specific conda environment, generated with the following command: + + .. code-block:: bash + + conda create -n bioconda-build -c conda-forge -c bioconda conda-build bioconda-utils greyskull + conda activate bioconda-build + + - ``docker`` (optional: for local build testing) + +Preparation +*********** + +0. Ask: *is my software already on Bioconda?* + + - Search the Bioconda website `https://bioconda.github.io/ `_ to make sure some kind soul hasn't already done this. + - Also double check the software doesn't already exist on another conda channel on `Anaconda `_. + +1. Ask: *Is the software right for Bioconda?* + + - Bioconda is for bioinformatics software. + - If the tool is a more generic tool or for a different domain, we may want to consider adding it to conda-forge [3]_. + - One common caveat to this is R packages - if our biology-related package is on CRAN (`https://cran.r-project.org/ `_), it should go on conda-forge, if it's on Bioconductor (`https://www.bioconductor.org/ `_) it should go on Bioconda (if it's not already there). + +2. Check: *Does the software have a compatible license?* (i.e., allows redistribution) + +3. Check: *Does the software have a stable release?* + + - I.e., an unmodifiable file (tarball or zip) and stable URL that that specific version can be always be downloaded from. + - An example is a GitHub release (e.g. for a `Kraken2 release `_, we use the link of the 'Source code (tar.gz)', i.e.,: `https://github.com/DerrickWood/kraken2/archive/refs/tags/v2.1.3.tar.gz `_). + - Using GitHub 'tags' are sort of OK. + - Using specific commits (i.e., no versioned release tarballs) are strongly frowned upon. + +If we are all good with the above, we can put our tool or package on Bioconda. + +Writing the recipe +****************** + +A Bioconda recipe at a minimum can consist of a single file called ``meta.yaml``. +This is often sufficient for PyPi Python and many R packages (respectively). +For more information, the `conda-forge `_ project has `very nice description `_ of what each section of a ``meta.yaml`` does. + +1. Create a new git branch for the tool we wish to add within the forked and cloned ``bioconda-recipes`` repository: + + .. code-block:: bash + + git switch -c add- + + +2. Make a ``meta.yaml`` file within the created directory, with one of two methods: + + 1. If the tool is a Python package on pypi or a R package on CRAN, we can use ``grayskull`` to generate this for us. + + .. code-block:: bash + + cd recipes/ + greyskull + + + 2. In all other cases, make a new directory in the ``recipes/`` directory, named after the software we wish to add. + + .. code-block:: bash + + mkdir recipes/ + + + The name of the software must be formatted in all lower case, and with only letters, numbers, and hyphens. + + If our package is an R package, we should prefix the name with ``r-``. + + ⚠ Make sure a tool with the same name doesn't exist! + If it does - consider adding a suffix. + For example, `'-mg' to indicate software for metagenomics `_, or `'-lite' for a version of a recipe that doesn't include preinstalled databases `_. + + Then, create an empty text file called ``meta.yaml`` in the new directory. + + .. code-block:: bash + + touch recipes//meta.yaml + + +3. Add the following sections in the ``meta.yaml`` file (or double check if already made with ``grayskull``). + When in doubt, copy from other similar existing recipes already on Bioconda: + + - ``package:`` + - Specify the name (same specifications as above) and version of the tool/package. + - ``source:`` + - Specify the URL to the source code tarball or zip file for conda to download. + - The e.g. ``sha265`` hash string of the file for download verification. + - ``build:`` + - Specify the build number (for new packages or new software version, always ``0``). + - Possibly the architecture (e.g. ```noarch`` for Python packages). + - A ``run_exports`` subpackage pinning. + - ``requirements:`` + - Specify a list of the various dependencies of the software needs during various sections of the build process, i.e., ``host``, ``build``, and ``run``. + - Should have a minimum versions, and ideally a with `'>=' notation `_. + - ``test:`` + - One or more (e.g. if multiple CLI tools or scripts exist under the package) commands to test the software installed correctly. + - Typically simply running the tool with ``--help`` or ``--version`` is sufficient, but must have a ``0`` exit code to indicate success. + - If ``--help`` ends with a non-``0`` code, we can try ``grep``ing for a string in the help message. + - ``about:`` + - URL of such as source code repository or documentation home page. + - License type [4]_. + - Corresponding license file name as in the tarball. + - A short one-sentence summary and/or long-form description of the software. + - ``extras:`` + - other metadata information such as the DOI identifier of any associated publication the software may have. + - Other identifiers of the software. + + An example of a ``meta.yaml`` is as follows: + + .. code-block:: yaml + + {% set name = "centrifuge" %} + {% set version = "1.0.4.1" %} + + package: + name: {{ name|lower }} + version: {{ version }} + + build: + number: 2 + skip: true # [osx] + run_exports: + - {{ pin_subpackage("centrifuge", max_pin="x.x") }} + + source: + url: https://github.com/DaehwanKimLab/centrifuge/archive/refs/tags/v{{ version }}.tar.gz + sha256: 638cc6701688bfdf81173d65fa95332139e11b215b2d25c030f8ae873c34e5cc + patches: + - centrifuge-linux-aarch64.patch # [linux and aarch64] + + requirements: + build: + - make + - {{ compiler('cxx') }} + host: + - zlib + run: + - zlib + - perl + - wget + - tar + - python + + test: + commands: + - centrifuge --help + + about: + home: https://github.com/DaehwanKimLab/centrifuge + license: GPL-3.0-only + license_file: LICENSE + license_family: GPL3 + summary: 'Classifier for metagenomic sequences. Supports classifier scripts' + + extra: + additional-platforms: + - linux-aarch64 + identifiers: + - biotools:Centrifuge + - doi:10.1101/gr.210641.116 + +*A relatively simple example* `conda recipe example for Centrifuge `_, *based on the descriptions above.* + +4. Lint our ``meta.yaml`` for any errors pertaining to Bioconda `linting guidelines `_ (make sure we're in the root of the repository!). + + .. code-block:: bash + + bioconda-utils lint recipes/ --packages + + If there are any errors, I recommend fixing them before proceeding, as getting the same errors during the Bioconda GitHub CI takes a long time (as we'll see later). + In particular, the ``missing_run_exports`` is a new linting check that has been added recently, that many people are not aware of. + To solve this one, look at recently merged recipes, as the PR template describes how to set this under 'Instructions for avoiding API, ABI, and CLI breakage issues', such as on this `pango-collapse PR `_. + +Writing a build script (optional) +********************************* + +For some tools, we may also need to create a ``build.sh`` script [5]_ in the same directory alongside the ``meta.yaml`` file. + +This is simply a shell script that is run during the build process after cloning of the source code. +The commands executed in this script are run in a specific build environment. + +The purpose of this script varies, so I can't give a precise definition or explicit steps for writing one, but in my experience it is most often used in cases of: + +- Tools that need to be compiled from source code (e.g. C++ tools and ``make install``). +- Tools that are simply just an executable binary that needs to be linked or copied to the ``bin/`` of the eventual conda environment (e.g. Java ``.jar`` files). +- Tools that have additional 'auxiliary' or 'helper' scripts outside of (and in addition to) the main tool that also need to be copied to the ``bin/`` of the eventual conda environment. +- Patching files to allow them to run (often for simple patching with e.g. ``sed``, more complex patching can use a git style ``patch`` file specified in the ``meta.yaml``). + + - Patching can be stuff like adding a ``shebang`` at the top of a file + - Replacing hardcode paths or variables in ``make`` files etc. + +- Tools that may require other files to be copied to other directories in the conda environment (e.g. databases). + +You can see an example of a ``build.sh`` script below: + +.. code-block:: bash + + #!/bin/bash + + set -xe + + export LDFLAGS="-L$PREFIX/lib" + export CPATH=${PREFIX}/include + + mkdir -p $PREFIX/bin + + case $(uname -m) in + aarch64) + CXXFLAGS="${CXXFLAGS} -fsigned-char" + ARCH_OPTS="SSE_FLAG= POPCNT_CAPABILITY=0" + ;; + *) + ARCH_OPTS="" + ;; + esac + + make -j ${CPU_COUNT} CXX=$CXX RELEASE_FLAGS="$CXXFLAGS" ${ARCH_OPTS} + make install prefix=$PREFIX + + cp evaluation/{centrifuge_evaluate.py,centrifuge_simulate_reads.py} $PREFIX/bin + +*A relatively simple example* `build.sh script for Centrifuge `_, *based on the descriptions above. Here it includes both `make install` compilation examples with Bioconda C++ environment variables and copying of the additional auxiliary scripts to the `bin/` directory.* + +However, as always, check other tools/packages for examples. + +Examples of small ``build.sh`` scripts from the four examples above: + +- `kallisto `_ (make install). +- `MALT `_ (java jar file). +- `metabinner `_ (auxiliary scripts). +- `phynder `_ (patching). +- `grid `_ (database files). + +To provide further guidance based on my experience: + +The ``$PREFIX`` variable corresponds to the the root of the conda environment that eventually gets made on a users system when they install the conda package. +You can explore our own conda environments to see what the ``$PREFIX`` looks like by running ``conda env list`` to see all of our own conda environments, and changing into the one of the directory listed in there. +They often will look very similar to Unix root directories, with folders such as ``etc/``, ``bin/``, ``lib/``, ``share/``, etc. +for example, if we have an executable or scripts that need to go into ``bin/``, we must copy this into ``$PREFIX/bin``. +For some tools we may have to copy other files into other directories, such as databases [6]_, but this is less common. + +Another tricky thing is compiling of C++ code, which can be a bit of a pain. +For reasons [7]_, we need to use specific variables that point to the non-standard (it seems) places that conda stores its libraries and headers. +These are described `here `_, and in particular for `zlib `_. +You often will need to patch the ``make`` files and other compilation related scripts to use these variables, and also to use the ``--prefix=$PREFIX`` flag when running ``make install``. + +For all of the above, regardless of language, I recommend looking at the the `contributor guidelines `_. + +Build testing +************* + +Once we think we've got our ``meta.yaml`` and ``build.sh`` (if needed) files ready, we can now try to see if this works. + +We have two options here, either: + +- Test it locally (less slow, but may not perfectly replicate the build). +- Open the pull request onto the main ``bioconda-recipes`` repository and see if it passes the tests there (slow). + +If we want to just let the Bioconda CI do the testing, skip to the [next section](#opening-the-pull-request). + +Otherwise, in our Bioconda-build conda environment, we can run one of two options (in both cases from the root directory of our ``bioconda-recipes`` fork): + +- The standard ``conda build`` command: + + .. code-block:: bash + + conda build recipes/ + + +- The ```bioconda-utils``` command, which should better replicate the CI environment and also gives us the Biocontainer Docker version of our conda environment (but requires Docker, and is slower): + + .. code-block:: bash + + bioconda-utils build --docker --mulled-test --packages + + +Hopefully, if everything worked correctly the first time, we should have a successful build and we can proceed with submitting to bioconda. +If something goes wrong, see :doc:`/tutorials/2024-debugging-bioinformatic-software-to-bioconda` on debugging the Bioconda builds. + +Regardless, in both local build approaches, these commands will dump a huge amount of output to the terminal, and if it fails, we'll have to trawl through it to debug it. + +I generally find the ``bioconda-utils`` method is slightly easier to debug because of the use of colours in the logging, with added benefit of making it easier to check the Biocontainer Docker image that gets created, but which method is up to personal preference. + +Opening the Pull Request +************************ + +Once we're happy with our recipe, we can open a pull request on the main ``bioconda-recipes`` repository on GitHub. + +We can do this (if you're not too familiar with GitHub), by: + +1. On your local repo, ``git add``ing the files you've added, commit, and push. +2. Go to the main ``bioconda-recipes`` repository on GitHub. +3. Switch to the Pull Requests tab. +4. Press the green 'New Pull Request' button. +5. In the top bar use the dropdowns to select our fork and branch (which should then be going *into* ``bioconda/bioconda-recipes`` and the ``master`` branch). +6. Make sure the title of the pull request is follows the recommendations, typically just ``Add [tool/package]`` or ``Update [tool/package]``. +7. Once we open the pull request, the Bioconda CI will run. + +We can see the overall status of the checks near the bottom of the page below the 'Review required' message. +For most builds this currently happens away from GitHub on Microsoft Azure, and can take a while (sometimes up to 1 hour!) to complete (so be patient). + +To get more information on the status of the CI test, and also logs, press 'details' next to one of the checks (it generally doesn't matter which one), then press the 'View more details on Azure Pipelines' link on the resulting page. + +On the Azure website we should see a series of 'stages', that run in order. The tests that are run in these stages are: + +1. ``lint``: checks we've not missed anything (e.g. the LICENSE). +2. ``test_linux``: that the recipe builds on a Linux system (i.e., doesn't error and the test command completes). +3. ``test_osx``: that the recipe builds on a macOS system (i.e., doesn't error and the test command completes). + +A given stage has a completed (green tick), running (blue spinny icon), or failed (red cross) status. +If we click on any of the stages, we should see log files that similar or identical what we would do if we were building locally (see the tutorial :doc:`/tutorials/2024-debugging-bioinformatic-software-to-bioconda` for debugging advice, if we skipped local building). + +If you get errors or something goes wrong, see :doc:`/tutorials/2024-debugging-bioinformatic-software-to-bioconda` on how to locally debug the Bioconda build. + +Test driving the docker Biocontainer (optional) +*********************************************** + +If we used the ``bioconda-utils`` command to build our recipe, we can also optionally test the Biocontainer Docker image that was generated from the conda environment that was built. + +If we did a local build, the Docker image is already on our own machine. + +If we let the automated Bioconda CI do the testing on Azure, we can leave a comment with '@BiocondaBot please fetch artifacts' and this will generate a comment on the PR with two tables. +We can download the ``LinuxArtifacts.zip`` file from the top table (``Package(s) built are ready...``), unzip it and then run the command given in ``Docker image(s) built`` table to load the container. + +Then for both local or GitHub build cases, we can just access the created Docker container by finding it in the the output of ``docker images``. +The image will be named something like ``quay.io/biocontainers/[toolname]``, and I typically run the following command to access container and run additional test commands or experiments within the container. + +.. code-block:: bash + + docker run -it [image_id_from_docker_images_command] + + +This should dump us within a shell in the container so we can test commands etc. as we would with any other Docker container. + +If something goes wrong here and you encounter issues with the build within the container, you can see :doc:`/tutorials/2024-debugging-bioinformatic-software-to-bioconda` to get tips and tricks how to manually re-build the recipe step-by-step. +Otherwise, if you're happy you can continue to finalise the PR in the next section. + +Finalising the PR +***************** + +If the CI on Microsoft Azure passes, then back on GitHub we can leave a comment in our PR saying '@BiocondaBot please add label'. +This will add a label to our PR indicating a Bioconda team member can review our recipe to ensure it matches the guidelines. +If they give an approval, they or we can merge our PR into the main ``bioconda-recipes`` repository! +We're now officially a Bioconda recipe maintainer 🎉. + +Once the recipe is merged in, we can normally install the official version of our tool/package with conda within a few minutes. +At the same time, on merging, the auto-generated Docker Biocontainer gets uploaded to the Biocontainers ``quay.io`` repository. +For the Singularity version of the Docker container, this can take up to 24h before it's visible on the `Galaxy project's 'depot' `_. + +Conclusion +********** + +This guide hopefully has given you enough pointers on the steps required to *make* a recipe and submit your tool/package to Bioconda. + +- To go through how to update an existing recipe, see the tutorial :doc:`/tutorials/2024-updating-bioinformatic-software-to-bioconda`. +- To go through how to manually debug the build process if things go wrong, see the tutorial :doc:`/tutorials/2024-debugging-bioinformatic-software-to-bioconda`. + +As with all bioinformatics and software development in general, things rarely just 'work' straight out of the box. +My three biggest points of advice: + +- Always copy and paste from other similar tools or packages on the Bioconda recipes repository. +- Take the time to read through the whole log messages (sometimes you can find critical clues hidden amongst the verbose information). +- Take the time to go step by step trying to follow exactly what Bioconda does during it's own building on Azure with local building. + +I found by taking the time, I very quickly learnt common issues and how to solve them. +However, if you're really stuck (even after reading the third part of this guide), you can always ask the very friendly volunteer Bioconda team on the `Bioconda gitter/matrix channel `_. + +.. rubric:: Footnotes + +.. [1] Note that conda-forge has a different system for adding packages! +.. [2] You can do a shallow clone ``git clone --depth 1``, to make the size of the cloned repo smaller on your machine. Thanks to @Wytamma for the tip! +.. [3] Various Bioconda documentation pages say we should use ``mamba``, but recent versions of conda include ``lib-mamba`` by default, so generally we can use standard ``conda``. But if you're having problems with things being very slow, try switching to ``mamba``. +.. [4] Possibly from a fixed list, and how to format these, I don't know... I just copy and paste from other recipes. +.. [5] I've noticed in a few more recent recipes that these commands can go within the ``meta.yaml`` itself `in an entry `_ called ``script:`` under ``build:``, but I guess this only works for very simple commands... +.. [6] Even though I absolutely HATE this, as often it leads to gigantic multi-gigabyte conda environments which we can't use on small CI runners. Give me the choice where to store my databases already! Don't force me to place them in a specific place /rant. +.. [7] That I've never found a good explanation or documentation for. diff --git a/source/tutorials/2024-debugging-bioinformatic-software-to-bioconda.rst b/source/tutorials/2024-debugging-bioinformatic-software-to-bioconda.rst new file mode 100644 index 0000000..3bbe056 --- /dev/null +++ b/source/tutorials/2024-debugging-bioinformatic-software-to-bioconda.rst @@ -0,0 +1,186 @@ +Debugging a Bioconda build - a reference guide +============================================== + +*This guide was originally written by James A. Fellows Yates and reviewed by the µbinfie community for the* `µbinfie blog `_. +*You can see the original post* `here `_ + +This tutorial aims to give a gentle introduction to debugging bioinformatic software when adding to or updating on Bioconda. + +During the recipe creation (see :doc:`/tutorials/2024-adding-bioinformatic-software-to-bioconda`) or updating process (see :doc:`/tutorials/2024-updating-bioinformatic-software-to-bioconda`), we may encounter problems +or issues. + +This guide provides steps how to test both a standard ``conda-build`` +build, but also a ``bioconda-utils`` process that occurs within a Docker +container. + +Prerequisite +************ + +Make sure to familiarise yourself :doc:`/tutorials/2024-adding-bioinformatic-software-to-bioconda` to +understand the basics of adding a new tool to Bioconda. + +1. Make a fork of the `bioconda-recipes `_ GitHub repository, and clone this to our local machine [1]_. + +2. Install on our local machine the following software: + + - `conda` itself + - I used to use `miniconda `_, but now switching to `miniforge `_ due to licensing issues [2]_ + - Bioconda configured as a source channel (see `bioconda documentation `_) + - The following conda packages: + + - ``conda-utils`` + - ``bioconda-build`` + - ``greyskull`` (optional: for Python software on pypi or R packages on CRAN) + + I typically dump all of the above in a specific conda environment, generated with the following command: + + .. code-block:: bash + + conda create -n bioconda-build -c conda-forge -c bioconda conda-build bioconda-utils greyskull + conda activate bioconda-build + + - ``docker`` (optional: for local build testing) + +With conda-build +**************** + +If we have issues with the build process when using ``conda-build``., we +can try to debug it in the following ways. + +1. Read carefully the very long log that gets generated from bottom to + top. While tedious, often we can find the issue there, such as if the + ``test`` command didn’t work correctly. + +2. Inspect the resulting environment itself. + + We can do this by changing into the ``conda-bld/`` directory of our + Bioconda build conda environment (called here ``bioconda-bld/``). + + Then we can try installing the environment but specifying that the + conda *channel* to take the software from is the directory we’re in + with ``-c ./`` (if we miss this, we’ll install existing versions of + the tool if they exist, or have an error that conda can’t find the + tool): + + .. code:: bash + + cd ////envs//conda-bld/linux-64 + conda create -n -c ./ + +3. Run the build process again but keeping all work directories, and + investigate these (if the error message refers to one of those + directories): + + .. code:: bash + + conda build recipes/ --keep-old-work + + +With bioconda-utils +******************* + +If build with the ``bioconda-utils`` command, and this fails (and we’ve +used the ``--docker`` command), and the error isn’t obvious, we can deep +dive into the Docker container that was created by the build process +(i.e. recreating the ‘exact’ environment Bioconda itself will use), and +follow the *exact* steps the build process goes through: + +1. The error will produce a ``COMMAND FAILED`` message with a Docker + command. It will look something like: + + .. code:: bash + + docker run -t --net host --rm -v /tmp/tmp/build_script.bash:/opt/build_script.bash -v ////envs//conda-bld/:/opt/host-conda-bld -v ////recipes/:/opt/recipe -e LC_ADDRESS=en_GB.UTF-8 -e LC_NAME=en_GB.UTF-8 -e LC_MONETARY=en_GB.UTF-8 -e LC_PAPER=en_GB.UTF-8 -e LANG=en_GB.UTF-8 -e LC_IDENTIFICATION=en_GB.UTF-8 -e LC_TELEPHONE=en_GB.UTF-8 -e LC_MEASUREMENT=en_GB.UTF-8 -e LC_TIME=en_GB.UTF-8 -e LC_NUMERIC=en_GB.UTF-8 -e HOST_USER_ID=1000 quay.io/bioconda/bioconda-utils-build-env-cos7:2.11.1 bash + +2. Copy and paste that command, but replace ``docker run -t`` to + ``docker run -it``. This will open an ‘interactive’ session so we can + play around within the container. + + .. attention:: + + Basic tools such as ``vim`` are not in there! So depending on our preference, we will have to exit the Docker container to edit our ``meta.yaml`` or ``build.sh`` file each time, and re-run the command. + + 3. Once in, there are two main locations of interest: + + - ``/opt/recipe``: contains our entire recipe directory (e.g. with + ``meta.yaml`` and ``build.sh``). + - ``/opt/build_script.sh``: the commands that Bioconda actually run + during the build process. + +3. To carry out the manual debugging, ``cat build_script.sh`` and run + one-by-one each command in that file. Alternatively, copy and paste + the entire contents, but DO NOT run the ``set -eo pipefile`` command + at the top (this will exit the Docker container if something goes + wrong). + +4. The first command I found commonly resulted in errors is: + + .. code:: bash + + conda mambabuild -c file:///opt/host-conda-bld --override-channels --no-anaconda-upload -c conda-forge -c bioconda -c defaults -e /opt/host-conda-bld/conda_build_config_0_-e_conda_build_config.yaml -e /opt/host-conda-bld/conda_build_config_1_-e_bioconda_utils-conda_build_config.yaml /opt/recipe/meta.yaml 2>&1 + + This is the primary command that runs the entire building of the + recipe. + +5. If step 6 fails during the ``build.sh`` steps (as indicated by the + console log), we will want to manually execute the ``build.sh`` + script. Before we do this, we must make sure to activate the build + environment (the one within which we would e.g. compile a ``c++`` + tool): + + .. code:: bash + + conda activate /opt/conda/conda-bld//_build_env + + When running the commands in the ``build.sh``, we may also need to + manually ``export`` the ``PREFIX`` bash environment variable when + dealing with ``build.sh``. To find this, look for the long horrible + ``_test_env_placehold_placehold_placehold_placehold_p<...>`` + directory that gets reported in the log during our initial building + run. + +6. To check the actual build output files, i.e., the working directory + that ``build.sh`` is executed in: + + .. code:: bash + + /opt/conda/conda-bld/_/work + +It’s still not working! +*********************** + +If none of this solves your issue, we can ask for help from the Bioconda +community by opening a Pull Request and leaving a comment pinging +@bioconda/ (replacing ‘’ with the respective one from the +list that should come up). + +Once everything is solved, you can proceed with the last three sections +in the :doc:`/tutorials/2024-adding-bioinformatic-software-to-bioconda`, to +open the Pull Request and get a review. + +Conclusion +********** + +This guide hopefully has given you enough pointers on the steps required to *debug* a conda recipe build using ``conda-build`` and +``bioconda-utils`` approaches. + +As with all bioinformatics and software development in general, things +rarely just ‘work’ straight out of the box. My three biggest points of +advice: + +- Always copy and paste from other similar tools or packages on the + Bioconda recipes repository. +- Take the time to read through the whole log messages (sometimes you + can find critical clues hidden amongst the verbose information). +- Take the time to go step by step trying to follow exactly what + Bioconda is doing within ``bioconda-utils``. + +I found by taking the time, I very quickly learnt common issues and how +to solve them. + +Worst comes to worst, you can always ask the very friendly Bioconda team +on the `Bioconda gitter/matrix +channel `__. + +.. [1] Note that conda-forge has a different system for adding packages! +.. [2] You can do a shallow clone ``git clone --depth 1``, to make the size of the cloned repo smaller on your machine. Thanks to @Wytamma for the tip! diff --git a/source/tutorials/2024-updating-bioinformatic-software-to-bioconda.rst b/source/tutorials/2024-updating-bioinformatic-software-to-bioconda.rst new file mode 100644 index 0000000..db1cc17 --- /dev/null +++ b/source/tutorials/2024-updating-bioinformatic-software-to-bioconda.rst @@ -0,0 +1,139 @@ +Updating bioinformatic software to Bioconda - a reference guide +=============================================================== + + +*This guide was originally written by James A. Fellows Yates and reviewed by the µbinfie community for the* `µbinfie blog `_. +*You can see the original post* `here `_ + +This tutorial aims to give a gentle introduction to updating bioinformatic software to Bioconda. + +Updating a Bioconda recipe is often a relatively easy process as much of +the 'hard work' and problems has been solved in the initial recipe +creation. Typically updates to a Bioconda recipe consist of updating the +version number of the tool and the hash of the tool or packages source +code tarball. In some cases you may need to add a few dependencies +(easy), and in rare cases change the build process (more complex). +However in all cases you can refer to :doc:`/tutorials/2024-adding-bioinformatic-software-to-bioconda` to understand these more +complex scenarios. + +In general however the process for updating or fixing an existing recipe +the process is similar to the later steps in :doc:`/tutorials/2024-adding-bioinformatic-software-to-bioconda`. + +*Note that if we use GitHub releases for our tool/package, Bioconda +tries to *automatically* update Bioconda recipes for us, so we may not +need to do many of the steps this manually. +*Of course, this works if +there are no changes to the dependencies or tests that can cause the +tests and thus the recipe building to fail.* + +Prerequisite +************ + +Make sure to familiarise yourself :doc:`/tutorials/2024-adding-bioinformatic-software-to-bioconda` to +understand the basics of adding a new tool to Bioconda. + +1. Make a fork of the `bioconda-recipes `_ GitHub repository, and clone this to our local machine [1]_. + +2. Install on our local machine the following software: + + - `conda` itself + - I used to use `miniconda `_, but now switching to `miniforge `_ due to licensing issues [2]_ + - Bioconda configured as a source channel (see `bioconda documentation `_) + - The following conda packages: + + - ``conda-build`` + - ``bioconda-utils`` + - ``greyskull`` (optional: for Python software on pypi or R packages on CRAN) + + I typically dump all of the above in a specific conda environment, generated with the following command: + + .. code-block:: bash + + conda create -n bioconda-build -c conda-forge -c bioconda conda-build bioconda-utils greyskull + conda activate bioconda-build + + - ``docker`` (optional: for local build testing) + +Updating the Bioconda recipe +**************************** + +Otherwise, to manually update or fix a recipe: + +1. Make sure our ``bioconda-recipes`` fork is up to date with the main. + +2. Make a new branch for the update. + +3. Edit the ``meta.yaml``, ``build.sh`` files of the recipe with our + changes. + +4. Update the build number: + + - If it is simply *fixing* a recipe with no version change of the + tool, bump the ``build_number`` by ``+1``. + - If this is a new version of the tool, set the ``build_number`` to + ``0``. + + For most updates, the differences would simply look like this in the + ``meta.yaml`` file: + + .. code:: diff + + - {% set version = "2.0.6" %} + + {% set version = "2.0.7" %} + + package: + name: cami-amber + version: {{ version }} + + source: + url: https://pypi.io/packages/source/c/cami-amber/cami-amber-{{ version }}.tar.gz + - sha256: d2d3d13a135f7ce4dff6bc1aab014945b0e5249b02f9afff3e6df1d82ef45d5a + + sha256: 01f11fbab7cb0f24497932669b00981292b1dc0df2ce6cd4b707a7ddd675bf8d + + build: + noarch: python + +5. Add all files, commit and push to our fork. + +6. Open the PR on ``bioconda-recipes``, wait for the CI to to complete + successfully, and tag for review with '@BiocondaBot please add label' + as above. For more details see :doc:`/tutorials/2024-updating-bioinformatic-software-to-bioconda`. + + - If something goes wrong and something does not complete + successfully, check the hash and build numbers are correct + - If linting goes wrong, this is typically related to a missing + ``run_exports`` section, see the opening instructions on the + `pango-collapse + PR `_. + +In case something goes wrong during step 6 above, see :doc:`/tutorials/2024-debugging-bioinformatic-software-to-bioconda` on how to debug a +Bioconda build in case something goes wrong. + +If the tool needs a new build procedure, see :doc:`/tutorials/2024-adding-bioinformatic-software-to-bioconda` for more information on how to +write ``build.sh`` scripts. + +Conclusion +********** + +This guide hopefully has given you enough pointers on the steps required to *update* a recipe and submit your tool/package to Bioconda. + +As with all bioinformatics and software development in general, things +rarely just 'work' straight out of the box. My three biggest points of +advice: + +- Always copy and paste from other similar tools or packages on the + Bioconda recipes repository. +- Take the time to read through the whole log messages (sometimes you + can find critical clues hidden amongst the verbose information). +- Take the time to go step by step trying to follow exactly what + Bioconda does during it's own building on Azure with local building. + +I found by taking the time, I very quickly learnt common issues and how +to solve them. + +Worst comes to worst, you can always ask the very friendly Bioconda team +on the `Bioconda gitter/matrix +channel `__. + +.. [1] Note that conda-forge has a different system for adding packages! +.. [2] You can do a shallow clone ``git clone --depth 1``, to make the size of the cloned repo smaller on your machine. Thanks to @Wytamma for the tip! diff --git a/source/tutorials/index.rst b/source/tutorials/index.rst index 8b16296..a253d43 100644 --- a/source/tutorials/index.rst +++ b/source/tutorials/index.rst @@ -3,5 +3,9 @@ Tutorials .. toctree:: + :maxdepth: 2 - gcb2020 + 2024-adding-bioinformatic-software-to-bioconda + 2024-updating-bioinformatic-software-to-bioconda + 2024-debugging-bioinformatic-software-to-bioconda + gcb2020 \ No newline at end of file