Skip to content

Commit 695bb4f

Browse files
committed
update README
1 parent 3f5f4f1 commit 695bb4f

File tree

1 file changed

+14
-16
lines changed

1 file changed

+14
-16
lines changed

README.md

Lines changed: 14 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,9 @@ pip install path/to/Caribou/
1717

1818
### Dependencies
1919
The Caribou analysis pipeline is packed with executables for all dependencies that cannot be installed through the python wheel. These dependencies are:
20-
- [faSplit](https://github.com/ucscGenomeBrowser/kent/blob/8b379e58f89d4a779e768f8c41b042bda714d101/src/utils/faSplit/faSplit.c)
21-
- [KMC](https://github.com/refresh-bio/KMC)
2220
- [KronaTools](https://github.com/marbl/Krona/tree/master/KronaTools)
2321

24-
### [Recommanded] Containers
22+
<!-- ### [Recommended] Containers
2523
Containers with the Caribou package and all dependencies already installed can be found in the folder `Caribou/containers`.
2624
It is recommended to execute the Caribou pipeline inside a container to ease usage and reproductibility.
2725
@@ -64,20 +62,20 @@ As with docker, the environment can be used in two ways :
6462
```
6563
singularity shell --nv -B a/folder/containing/data/to/bind/in/the/environment path/to/Caribou/containers/Caribou_singularity.sif
6664
```
67-
- Process : Execute instructions scripts when using the container for production on a compute cluster managed by schedulers (ex: Slurm, Torque, PBS, etc.). Instructions for usage of the exec command are provided in the [documentation.](https://apptainer.org/docs/user/main/cli/apptainer_exec.html) and applied example on Compute Canada clusters using Slurm Workload Manager can be found on their [wiki.](https://docs.computecanada.ca/wiki/Singularity#Running_a_single_command). Usage may differ slightly depending on the HPC clusters and Workload Managers used.
65+
- Process : Execute instructions scripts when using the container for production on a compute cluster managed by schedulers (ex: Slurm, Torque, PBS, etc.). Instructions for usage of the exec command are provided in the [documentation.](https://apptainer.org/docs/user/main/cli/apptainer_exec.html) and applied example on Compute Canada clusters using Slurm Workload Manager can be found on their [wiki.](https://docs.computecanada.ca/wiki/Singularity#Running_a_single_command). Usage may differ slightly depending on the HPC clusters and Workload Managers used. -->
6866

69-
### [Optional] GPU acceleration
70-
Usage of machine learning models can be accelerated by using a GPU but it is not necessary.
71-
If using a container, these dependencies are already installed. Otherwise they should be installed prior to analysis accelerated by GPU.
67+
### [Recommended] GPU acceleration
68+
The learning process of machine learning models can be accelerated by using a GPU especially for Neural Networks and is strongly recommended should the user want to retrain a model.
69+
<!-- If using a container, these dependencies are already installed. Otherwise they should be installed prior to analysis accelerated by GPU. -->
7270

7371
To install GPU dependencies on your machine, refer to following tutorials for installation :
7472
- [CUDA](https://developer.nvidia.com/cuda-downloads)
7573
- [cudnn](https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html)
7674
- [GPU for tensorflow](https://www.tensorflow.org/install/gpu)
7775

78-
### [Optional] Python virtual environment
79-
If not using a container, it is recommended to use the analysis pipeline in a virtual environment to be sure that no other installed package can interfere. \
80-
Here is an example of Unix-like command shell to install Caribou in a new virtual environment by modifying the paths:
76+
### [Recommended] Python virtual environment
77+
It is recommended to use the analysis pipeline in a virtual environment to be sure that no other installed package can interfere. \
78+
Here is an example of linux command shell to install Caribou in a new virtual environment by modifying the paths:
8179

8280
```
8381
python3 -m venv /path/to/your/environment/folder
@@ -99,23 +97,23 @@ source /path/to/your/environment/folder/bin/activate
9997
Caribou was developed having in mind that the models should be trained on the [GTDB taxonomy database](https://gtdb.ecogenomic.org/). \
10098
Theoritically, any database could be used to train and classify using Caribou but a certain structure should be used for feeding to the program. The specific structure of the database files necessary for training is explained in more details in the [database section of the wiki](https://github.com/bioinfoUQAM/Caribou/wiki/Building-database).
10199

102-
### GTDB pre-extracted K-mers
103-
Extracted K-mers profile files for the [GTDB representatives version 202](https://data.gtdb.ecogenomic.org/releases/release202/202.0/) with a length of k = 20 can be found on Canada's [FRDR]().
100+
<!-- ### GTDB pre-extracted K-mers -->
101+
<!-- Extracted K-mers profile files for the [GTDB representatives version 202](https://data.gtdb.ecogenomic.org/releases/release202/202.0/) with a length of k = 6 can be found on Canada's [FRDR](). -->
104102

105-
### Building GTDB from another release
106-
Should the user want to use a more recent release of the GTDB taxonomy, this can be done using the template script to build data in one large fasta file and extract classes into a csv file. This template must be modified by the user to insert filepaths and comment the host section if there is no host to be used.
103+
### Building GTDB database
104+
Should the user want to build the training database from the GTDB taxonomy, this can be done using the template script to build data in one large fasta file and extract classes into a csv file. This template must be modified by the user to insert filepaths and comment the host section if there is no host to be used.
107105

108106
The modified template can be submitted to an HPC cluster managed by Slurm (ex: Compute Canada) using the following command :
109107
```
110108
sbatch Caribou/data/build_data_scripts/template_slurm_datagen.sh
111109
```
112110

113-
The modified template can also be ran in a Unix-like command shell by running the following command :
111+
The modified template can also be ran in a linux command shell by running the following command :
114112
```
115113
sh Caribou/data/build_data_scripts/template_slurm_datagen.sh
116114
```
117115

118-
Finally each script used by the template can be used alone in Unix-like command shell by running the following commands :
116+
Finally each script used by the template can be used alone in linux command shell by running the following commands :
119117
```
120118
# Generate a list of all fastas to be merged
121119
sh Caribou/data/build_data_scripts/generateFastaList.sh -d [directory] -o [outputFile]

0 commit comments

Comments
 (0)