Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preparing for future bioconda submission #455

Open
jdblischak opened this issue Sep 19, 2022 · 10 comments
Open

Preparing for future bioconda submission #455

jdblischak opened this issue Sep 19, 2022 · 10 comments

Comments

@jdblischak
Copy link
Collaborator

This isn't urgent, but I recently investigated the potential submission to bioconda, so I wanted to create this reminder and share my notes:

  • Because TileDB already uses the conda-forge feedstock infrastructure to build conda binaries for TileDB-VCF, TileDB-Inc/tiledb-vcf-feedstock, it would be trivial to migrate to the conda-forge channel
  • However, TileDB-VCF cannot be submitted to conda-forge because it depends on htslib, which is only available from bioconda. conda-forge has a long-standing policy of requiring all of its dependencies to be available from conda-forge (recent example to confirm this policy still holds)
  • The bioconda channel depends on conda-forge packages, but it uses its own distinct mechanisms for building and maintaining recipes. The biggest difference is that they use a mono-repo, bioconda-recipes
  • The bioconda channel accepts GitHub Releases as stable URLs, so TileDB-VCF already meets its requirements (ie no need to first submit to an external repository such as PyPI)
  • The bioconda channel has its own auto-updater that can detect GitHub releases, and will open a new PR with the new version and checksum
  • Thus once on bioconda, it would be extra work to continue maintaining TileDB-Inc/tiledb-vcf-feedstock, so this would only be worth the effort if there was a separate use case (eg nightly conda builds)

xref: #47

@atrigila
Copy link

atrigila commented Jan 5, 2023

Hi team, I would like to add TileDB-VCF as a nf-core module for Nextflow pipelines. I believe that making your work available on the nf-core repository will be valuable to the community.
I was wondering if you have any plans of submitting a Bioconda recipe for your package in the near future. Bioconda recipes would make it easier for Nextflow users to create modules and share them across several pipelines.
I don't have experience building bioconda recipes but I can attempt to if interested.
Thanks!
Anabella.

@awenocur
Copy link
Contributor

awenocur commented Jan 5, 2023

Hi @atrigila,
Thank you for your interest in incorporating TileDB-VCF into a Nextflow pipeline!
The current barriers we're encountering are related to linking htslib and other dependencies in a manner that is compatible with other packages in the Bioconda channel. Additionally, it would be necessary to adapt our multi-package recipe to meet Bioconda's criteria, which are different from those of conda-forge. We however have our own channel, tiledb, which does offer this package; would it be possible to integrate that into a pipeline as a stopgap measure?

Our existing package can be viewed on the Anaconda website, here.

If you would like to discuss some ideas of how this module would be used in existing NF-core pipelines such as Sarek and variant catalogue, we might be able to help. @leipzig can be reached either on the NF-core Slack channel or via email: jeremy.leipzig@tiledb.com.

We would also be happy to work with you if you wish to assist creating a Bioconda recipe.

Feel free to send an email to adam.wenocur@tiledb.com to discuss further; we can arrange a call as well.

@atrigila
Copy link

atrigila commented Jan 6, 2023

Hi Adam, thanks for the reply! I will surely send an E-mail on Monday to discuss this further.

@jdblischak
Copy link
Collaborator Author

Quick update. The main motivation for submitting this recipe to bioconda was the htslib dependency. However, in order to build conda binaries for the latest release (0.21.0, TileDB-Inc/tiledb-vcf-feedstock#62), we had to switch to vendoring htslib (ie building it from source as part of the build; TileDB-Inc/tiledb-vcf-feedstock#64). Thus from a maintenance perspective of the entire TileDB stack, it would be easier to submit TileDB-VCF to conda-forge as well. We could use the existing recipe in https://github.com/TileDB-Inc/tiledb-vcf-feedstock as is, and we could in the future build binaries for additional platforms (eg win, osx-arm64, linux-aarch64) that are supported by conda-forge but not bioconda.

@jdblischak
Copy link
Collaborator Author

Another update. A big benefit of submitting to bioconda is that it automatically creates biocontainers (ie Docker images with the single tool installed). These biocontainers are strongly preferred by nf-core compared to self-maintained Docker images.

However, even if we submit to conda-forge instead of bioconda, we can still automatically build biocontainers for the conda binaries by submitting a PR to https://github.com/BioContainers/mulled

@jdblischak
Copy link
Collaborator Author

Another twist: building htslib as part of the superbuild was laborious. Building for all the variants of libdeflate and openssl required many builds (TileDB-Inc/tiledb-vcf-feedstock#70; purely for the sake of htslib), the CMake commands to install htslib weren't properly linking libdeflate (TileDB-Inc/tiledb-vcf-feedstock#74), and vendoring a dynamically linked library can easily break a conda env (TileDB-Inc/tiledb-vcf-feedstock#76).

For now we are building htslib in a dedicated TileDB-Inc feedstock and uploading it to the tiledb channel (TileDB-Inc/htslib-feedstock). This gives us full control of the build variants (eg libdeflate, openssl), and opens the potential to building for arm in the future (TileDB-Inc/tiledb-vcf-feedstock#42; TileDB-Inc/tiledb-vcf-feedstock#66), which bioconda doesn't support. And given that we already have to maintain our bespoke build of m2w64-htslib, it makes sense to also continue maintaining our own build of htslib.

Overall, this conda recipe doesn't fit well into the requirements for either bioconda nor conda-forge, so unless provided a very compelling motivation, I think it should remain in the tiledb channel.

@nextgenusfs
Copy link

New user here. I appreciate the pain of the builds in bioconda, etc -- however, in the current setup its difficult to build and environment with other reasonable dependencies, ie its natural to think you'd want a Conda environment that has recent builds of htslib tools (samtools, bcftools, etc) alongside tiledbvcf, general variant calling packages, etc. While I did manage to solve an environment with the dependencies I needed, the latest tiledbvcf version I could get was 0.20.2.

$ mamba create -n myenv -c conda-forge -c bioconda -c tiledb tiledbvcf-py libtiledbvcf nextgenmap minimap2 gfftk pyfastx pandas bigtree numpy interlap loguru mosdepth samtools bedtools bcftools samclip sniffles requests natsort

If I try to upgrade I run into dependency issues, unsurprisingly it's ultimately the openssl dependency but this is likely due to the older htslib 1.16. I didn't look at exactly at how these are defined in the recipe or if there is some reason that you need to pin htslib 1.16 (August 2022). I thought all of the bioconda openssl issues with samtools/htslib have more recently been fixed (don't recall exactly what versions this impacted).

$ mamba upgrade -c conda-forge -c bioconda -c tiledb -n myenv "libtiledbvcf>=0.26.6" "tiledbvcf-py>=0.26.6"
Could not solve for environment specs
The following package could not be installed
└─ libtiledbvcf >=0.26.6  is installable and it requires
   ├─ htslib >=1.16,<1.17.0a0  with the potential options
   │  ├─ htslib 1.16 would require
   │  │  └─ openssl >=1.1.1q,<1.1.2a , which can be installed;
   │  ├─ htslib 1.16 would require
   │  │  └─ libdeflate >=1.14,<1.15.0a0 , which can be installed;
   │  ├─ htslib 1.16 would require
   │  │  └─ libdeflate >=1.12,<1.13.0a0 , which can be installed;
   │  ├─ htslib 1.16 would require
   │  │  └─ libdeflate >=1.13,<1.14.0a0 , which can be installed;
   │  ├─ htslib 1.16 would require
   │  │  └─ libdeflate >=1.16,<1.17.0a0 , which can be installed;
   │  ├─ htslib 1.16 would require
   │  │  └─ libdeflate >=1.17,<1.18.0a0 , which can be installed;
   │  ├─ htslib 1.16 would require
   │  │  └─ openssl >=1.1.1t,<1.1.2a , which can be installed;
   │  └─ htslib 1.16 would require
   │     └─ openssl >=1.1.1u,<1.1.2a , which can be installed;
   └─ tiledb >=2.17.4,<2.18.0a0  with the potential options
      ├─ tiledb 2.17.4 would require
      │  └─ openssl >=1.1.1w,<1.1.2a , which can be installed;
      └─ tiledb 2.17.4 would require
         └─ openssl >=3.1.4,<4.0a0 , which conflicts with any installable versions previously reported.

I guess I can try to build from source in an environment that has the latest htslib, etc and see if that works.

@jdblischak
Copy link
Collaborator Author

I can reproduce the solver error. The problem is because we build libtiledbvcf against the latest htslib created by TileDB-Inc/htslib-feedstock, which is still at 1.16. Because of the run exports of htslib recipe:

build:
  number: 0
  run_exports:
    - {{ pin_subpackage('htslib', max_pin='x.x') }}

the runtime requirement is pinned to htslib >=1.16,<1.17.0a0, which is very restrictive. Thus for now it is only possible to install libtiledbvcf along with htslib/samtools/bcftools 1.16.

The good news is that this can be fixed. I'll update our htslib recipe to 1.19, and then we'll can rebuild libtiledbvf. In the short-term, you'll need to use the 1.16 versions

@nextgenusfs
Copy link

Thanks @jdblischak!

@jdblischak
Copy link
Collaborator Author

I've created a new Issue to track updating the htslib version linked to libtiledbvcf TileDB-Inc/tiledb-vcf-feedstock#106

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants