You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+14-16Lines changed: 14 additions & 16 deletions
Original file line number
Diff line number
Diff line change
@@ -17,11 +17,9 @@ pip install path/to/Caribou/
17
17
18
18
### Dependencies
19
19
The Caribou analysis pipeline is packed with executables for all dependencies that cannot be installed through the python wheel. These dependencies are:
- Process : Execute instructions scripts when using the container for production on a compute cluster managed by schedulers (ex: Slurm, Torque, PBS, etc.). Instructions for usage of the exec command are provided in the [documentation.](https://apptainer.org/docs/user/main/cli/apptainer_exec.html) and applied example on Compute Canada clusters using Slurm Workload Manager can be found on their [wiki.](https://docs.computecanada.ca/wiki/Singularity#Running_a_single_command). Usage may differ slightly depending on the HPC clusters and Workload Managers used.
65
+
- Process : Execute instructions scripts when using the container for production on a compute cluster managed by schedulers (ex: Slurm, Torque, PBS, etc.). Instructions for usage of the exec command are provided in the [documentation.](https://apptainer.org/docs/user/main/cli/apptainer_exec.html) and applied example on Compute Canada clusters using Slurm Workload Manager can be found on their [wiki.](https://docs.computecanada.ca/wiki/Singularity#Running_a_single_command). Usage may differ slightly depending on the HPC clusters and Workload Managers used.-->
68
66
69
-
### [Optional] GPU acceleration
70
-
Usage of machine learning models can be accelerated by using a GPU but it is not necessary.
71
-
If using a container, these dependencies are already installed. Otherwise they should be installed prior to analysis accelerated by GPU.
67
+
### [Recommended] GPU acceleration
68
+
The learning process of machine learning models can be accelerated by using a GPU especially for Neural Networks and is strongly recommended should the user want to retrain a model.
69
+
<!--If using a container, these dependencies are already installed. Otherwise they should be installed prior to analysis accelerated by GPU.-->
72
70
73
71
To install GPU dependencies on your machine, refer to following tutorials for installation :
-[GPU for tensorflow](https://www.tensorflow.org/install/gpu)
77
75
78
-
### [Optional] Python virtual environment
79
-
If not using a container, it is recommended to use the analysis pipeline in a virtual environment to be sure that no other installed package can interfere. \
80
-
Here is an example of Unix-like command shell to install Caribou in a new virtual environment by modifying the paths:
76
+
### [Recommended] Python virtual environment
77
+
It is recommended to use the analysis pipeline in a virtual environment to be sure that no other installed package can interfere. \
78
+
Here is an example of linux command shell to install Caribou in a new virtual environment by modifying the paths:
Caribou was developed having in mind that the models should be trained on the [GTDB taxonomy database](https://gtdb.ecogenomic.org/). \
100
98
Theoritically, any database could be used to train and classify using Caribou but a certain structure should be used for feeding to the program. The specific structure of the database files necessary for training is explained in more details in the [database section of the wiki](https://github.com/bioinfoUQAM/Caribou/wiki/Building-database).
101
99
102
-
### GTDB pre-extracted K-mers
103
-
Extracted K-mers profile files for the [GTDB representatives version 202](https://data.gtdb.ecogenomic.org/releases/release202/202.0/) with a length of k = 20 can be found on Canada's [FRDR]().
100
+
<!--### GTDB pre-extracted K-mers-->
101
+
<!--Extracted K-mers profile files for the [GTDB representatives version 202](https://data.gtdb.ecogenomic.org/releases/release202/202.0/) with a length of k = 6 can be found on Canada's [FRDR]().-->
104
102
105
-
### Building GTDB from another release
106
-
Should the user want to use a more recent release of the GTDB taxonomy, this can be done using the template script to build data in one large fasta file and extract classes into a csv file. This template must be modified by the user to insert filepaths and comment the host section if there is no host to be used.
103
+
### Building GTDB database
104
+
Should the user want to build the training database from the GTDB taxonomy, this can be done using the template script to build data in one large fasta file and extract classes into a csv file. This template must be modified by the user to insert filepaths and comment the host section if there is no host to be used.
107
105
108
106
The modified template can be submitted to an HPC cluster managed by Slurm (ex: Compute Canada) using the following command :
0 commit comments