-
Notifications
You must be signed in to change notification settings - Fork 1
API reference
Descriptions of scripts usage.
For descriptions on each steps, see the analysis description part of this wiki
Run the entire Caribou analysis Pipeline
Script:
Caribou_pipeline.py
Arguments:
-h, --help show this help message and exit
-c CONFIG, --config CONFIG
PATH to a configuration file containing the choices made by the user. Please refer to the wiki for further details : https://github.com/bioinfoUQAM/Caribou/wiki
Extract K-mers profile of a given dataset and save it to drive.
Script:
Caribou_kmers.py
Arguments:
-h, --help show this help message and exit
-s SEQ_FILE, --seq_file SEQ_FILE
PATH to a fasta file containing bacterial genomes to build k-mers from or a folder containing fasta files with one sequence per file
-c CLS_FILE, --cls_file CLS_FILE
PATH to a csv file containing classes of the corresponding fasta
-dt DATASET_NAME, --dataset_name DATASET_NAME
Name of the dataset used to name files
-sh SEQ_FILE_HOST, --seq_file_host SEQ_FILE_HOST
PATH to a fasta file containing host genomes to build k-mers from or a folder containing fasta files with one sequence per file
-ch CLS_FILE_HOST, --cls_file_host CLS_FILE_HOST
PATH to a csv file containing classes of the corresponding host fasta
-dh HOST_NAME, --host_name HOST_NAME
Name of the host used to name files
-k K_LENGTH, --k_length K_LENGTH
Length of k-mers to extract
-l KMERS_LIST, --kmers_list KMERS_LIST
PATH to a file containing a list of k-mers to be extracted if the dataset is not a training database
-o OUTDIR, --outdir OUTDIR
PATH to a directory on file where outputs will be saved
-wd WORKDIR, --workdir WORKDIR
Optional. Path to a working directory where tuning data will be spilled
Train a model and extract bacteria / host sequences.
Script: Caribou_extraction.py
Arguments:
-h, --help show this help message and exit
-db DATA_BACTERIA, --data_bacteria DATA_BACTERIA
PATH to a npz file containing the data corresponding to the k-mers profile for the bacteria database
-dh DATA_HOST, --data_host DATA_HOST
PATH to a npz file containing the data corresponding to the k-mers profile for the host
-dn DATABASE_NAME, --database_name DATABASE_NAME
Name of the bacteria database used to name files
-hn HOST_NAME, --host_name HOST_NAME
Name of the host database used to name files
-dm DATA_METAGENOME, --data_metagenome DATA_METAGENOME
PATH to a npz file containing the data corresponding to the k-mers profile for the metagenome to classify
-mn METAGENOME_NAME, --metagenome_name METAGENOME_NAME
Name of the metagenome to classify used to name files
-m MERGED, --merged MERGED
PATH to a npz file containing the k-mers profile for the merged bacteria and host databases
-v VALIDATION, --validation VALIDATION
PATH to a npz file containing the k-mers profile for the validation dataset
-model {None,onesvm,linearsvm,attention,lstm,deeplstm}, --model_type {None,onesvm,linearsvm,attention,lstm,deeplstm}
The type of model to train
-bs BATCH_SIZE, --batch_size BATCH_SIZE
Size of the batch size to use, defaults to 32
-e TRAINING_EPOCHS, --training_epochs TRAINING_EPOCHS
The number of training iterations for the neural networks models if one ise chosen, defaults to 100
-o OUTDIR, --outdir OUTDIR
PATH to a directory on file where outputs will be saved
-wd WORKDIR, --workdir WORKDIR
Optional. Path to a working directory where Ray Tune will output and spill tuning data
Train and cross-validate a model for the bacteria extraction / host removal step.
Script: Caribou_extraction_train_cv.py
Arguments:
-h, --help show this help message and exit
-db DATA_BACTERIA, --data_bacteria DATA_BACTERIA
PATH to a npz file containing the data corresponding to the k-mers profile for the bacteria database
-dh DATA_HOST, --data_host DATA_HOST
PATH to a npz file containing the data corresponding to the k-mers profile for the host
-dn DATABASE_NAME, --database_name DATABASE_NAME
Name of the bacteria database used to name files
-hn HOST_NAME, --host_name HOST_NAME
Name of the host database used to name files
-m MERGED, --merged MERGED
PATH to a npz file containing the k-mers profile for the merged bacteria and host databases
-v VALIDATION, --validation VALIDATION
PATH to a npz file containing the k-mers profile for the validation dataset
-t TEST, --test TEST PATH to a npz file containing the k-mers profile for the test dataset
-model {onesvm,linearsvm,attention,lstm,deeplstm}, --model_type {onesvm,linearsvm,attention,lstm,deeplstm}
The type of model to train
-bs BATCH_SIZE, --batch_size BATCH_SIZE
Size of the batch size to use, defaults to 32
-e TRAINING_EPOCHS, --training_epochs TRAINING_EPOCHS
The number of training iterations for the neural networks models if one is chosen, defaults to 100
-o OUTDIR, --outdir OUTDIR
PATH to a directory on file where outputs will be saved
-wd WORKDIR, --workdir WORKDIR
Optional. Path to a working directory where Ray Tune will output and spill tuning data
Train a model and classify bacteria sequences iteratively over known taxonomic ranks.
Script: Caribou_classification.py
Arguments:
-h, --help show this help message and exit
-db DATA_BACTERIA, --data_bacteria DATA_BACTERIA
PATH to a npz file containing the data corresponding to the k-mers profile for the bacteria database
-dt DATABASE_NAME, --database_name DATABASE_NAME
Name of the bacteria database used to name files
-mg DATA_METAGENOME, --data_metagenome DATA_METAGENOME
PATH to a npz file containing the data corresponding to the k-mers profile for the metagenome to classify
-mn METAGENOME_NAME, --metagenome_name METAGENOME_NAME
Name of the metagenome to classify used to name files
-v VALIDATION, --validation VALIDATION
PATH to a npz file containing the k-mers profile for the validation dataset
-model {sgd,mnb,lstm_attention,cnn,widecnn}, --model_type {sgd,mnb,lstm_attention,cnn,widecnn}
The type of model to train
-tx TAXA, --taxa TAXA
The taxonomic level to use for the classification, defaults to species. Can be one level or a list of levels separated by commas.
-bs BATCH_SIZE, --batch_size BATCH_SIZE
Size of the batch size to use, defaults to 32
-e TRAINING_EPOCHS, --training_epochs TRAINING_EPOCHS
The number of training iterations for the neural networks models if one ise chosen, defaults to 100
-o OUTDIR, --outdir OUTDIR
PATH to a directory on file where outputs will be saved
-wd WORKDIR, --workdir WORKDIR
Optional. Path to a working directory where Ray Tune will output and spill tuning data
Train and cross-validate a model for the bacteria classification step.
Script: Caribou_classification_train_cv.py
Arguments:
-h, --help show this help message and exit
-db DATA_BACTERIA, --data_bacteria DATA_BACTERIA
PATH to a npz file containing the data corresponding to the k-mers profile for the bacteria database
-dn DATABASE_NAME, --database_name DATABASE_NAME
Name of the bacteria database used to name files
-v VALIDATION, --validation VALIDATION
PATH to a npz file containing the k-mers profile for the validation dataset
-t TEST, --test TEST PATH to a npz file containing the k-mers profile for the test dataset
-model {sgd,mnb,lstm_attention,cnn,widecnn}, --model_type {sgd,mnb,lstm_attention,cnn,widecnn}
The type of model to train
-tx TAXA, --taxa TAXA
The taxonomic level to use for the classification, defaults to None. Can be one level or a list of levels separated by commas.
-bs BATCH_SIZE, --batch_size BATCH_SIZE
Size of the batch size to use, defaults to 32
-e TRAINING_EPOCHS, --training_epochs TRAINING_EPOCHS
The number of training iterations for the neural networks models if one ise chosen, defaults to 100
-o OUTDIR, --outdir OUTDIR
PATH to a directory on file where outputs will be saved
-wd WORKDIR, --workdir WORKDIR
Optional. Path to a working directory where Ray Tune will output and spill tuning data
Produce outputs from the results of classified data by Caribou.
Script: Caribou_outputs.py
Arguments:
-h, --help show this help message and exit
-db DATA_BACTERIA, --data_bacteria DATA_BACTERIA
PATH to a npz file containing the data corresponding to the k-mers profile for the bacteria database
-cd CLASSIFIED_DATA, --classified_data CLASSIFIED_DATA
PATH to a npz file containing the data classified by Caribou
-model {sgd,mnb,lstm_attention,cnn,widecnn}, --model_type {sgd,mnb,lstm_attention,cnn,widecnn}
The type of model used for classification
-dt DATASET_NAME, --dataset_name DATASET_NAME
Name of the classified dataset used to name files
-dh HOST_NAME, --host_name HOST_NAME
Name of the host database used to name files
-m, --mpa Should the mpa-style output be generated?
-k, --kronagram Should the interactive kronagram be generated?
-r, --report Should the abundance report be generated?
-wd WORKDIR, --workdir WORKDIR
Optional. Path to a working directory where tuning data will be spilled
Features decomposition to a given k-mers dataset and then apply it
Script:
Caribou_dimensions_decomposition.py
Arguments:
-h, --help show this help message and exit
-db DATASET, --dataset DATASET
PATH to a npz file containing the data corresponding to the k-mers profile for the bacteria database
-l KMERS_LIST, --kmers_list KMERS_LIST
PATH to a file containing a list of k-mers that will be reduced
-n NB_COMPONENTS, --nb_components NB_COMPONENTS
Number of components to decompose data into
-o OUTDIR, --outdir OUTDIR
PATH to a directory on file where outputs will be saved
-wd WORKDIR, --workdir WORKDIR
Optional. Path to a working directory where tuning data will be spilled
Features reduction to a given k-mers dataset and then apply it
Script:
Caribou_reduce_features.py
Arguments:
-h, --help show this help message and exit
-db DATASET, --dataset DATASET
PATH to a npz file containing the data corresponding to the k-mers profile for the bacteria database
-dt DATASET_NAME, --dataset_name DATASET_NAME
Name of the dataset used to name files
-l KMERS_LIST, --kmers_list KMERS_LIST
PATH to a file containing a list of k-mers that will be reduced
-t TAXA, --taxa TAXA The taxonomic level to use for the classification, defaults to Phylum.
-o OUTDIR, --outdir OUTDIR
PATH to a directory on file where outputs will be saved
-wd WORKDIR, --workdir WORKDIR
Optional. Path to a working directory where tuning data will be spilled