These are upgrade ideas that were proposed by the [Kembellab](https://kembellab.ca/), the [LaboBioinfoUQAM](https://diallolab.com/) or that were though of while reading documentation for other projects. * R bindings to python functions & classes using the [Reticulate](https://rstudio.github.io/reticulate/) package * Pipeline refinement using [Nextflow](https://www.nextflow.io/) or [SnakeMake](https://snakemake.readthedocs.io/en/stable/) * MAGs discovery using clustering methods * Classification refinement * Extraction of significative k-mers profile using Kevolve methods adapted to bacteria or methods similar to Clark. * Usage of [Mesh tensor data representation](https://www.tensorflow.org/api_docs/python/tf/experimental/dtensor/Mesh) * Stream k-mers extraction to training/classification per sequence * Outputs in [BIOM format](https://biom-format.org/) * Outputs in [CAMI format](https://github.com/CAMI-challenge/contest_information/blob/master/file_formats/CAMI_TP_specification.mkd) * Software engineering to improve code base. For example, refactoring your code to make functions smaller and more readable, writing docstrings and tests, CI/CD workflows for automation of linting, testing, etc. * Mixture of experts ([MoE](https://github.com/drawbridge/keras-mmoe)) implementation for splitting training data into upper taxonomic level classes * Adding integration between Ray and [Apache Spark + Apache Hive] to handle data more efficiently using [RayDP](https://github.com/oap-project/raydp)