-
Notifications
You must be signed in to change notification settings - Fork 1
Simulate data
Nicolas de Montigny edited this page Dec 11, 2024
·
1 revision
Caribou implements a wrapper class and a script for using the InSilicoSeq package.
This script can be used to easily generate reads from a collection of genomes in fasta
format. It can also produce validation and test subsets.
Simulate sequencing reads for validation and/or testing dataset(s) from a whole genome dataset
Script:
Caribou_simulate_test_val.py
Arguments:
-h, --help show this help message and exit
-db DATASET, --dataset DATASET
PATH to a npz file containing the data corresponding to the k-mers profile for the bacteria database
-dt DATASET_NAME, --dataset_name DATASET_NAME
Name of the dataset used to name files
-dh HOSTSET, --hostset HOSTSET
Path to .npz data for extracted k-mers profile of host
-ds HOSTSET_NAME, --hostset_name HOSTSET_NAME
Name of the host database used to name files
-v, --validation Flag argument for making a "validation"-named simulated dataset
-t, --test Flag argument for making a "test"-named simulated dataset
-l KMERS_LIST, --kmers_list KMERS_LIST
Optional. PATH to a file containing a list of k-mers to be extracted after the simulation. Should be the same as the reference database
-o OUTDIR, --outdir OUTDIR
Path to folder for outputing tuning results
-wd WORKDIR, --workdir WORKDIR
Optional. Path to a working directory where tuning data will be spilled