Caribou implements a wrapper class and a script for using the [InSilicoSeq](https://insilicoseq.readthedocs.io/en/latest/) package. This script can be used to easily generate reads from a collection of genomes in `fasta` format. It can also produce validation and test subsets. ## API reference Simulate sequencing reads for validation and/or testing dataset(s) from a whole genome dataset **Script**: ``Caribou_simulate_test_val.py`` **Arguments**: ``` -h, --help show this help message and exit -db DATASET, --dataset DATASET PATH to a npz file containing the data corresponding to the k-mers profile for the bacteria database -dt DATASET_NAME, --dataset_name DATASET_NAME Name of the dataset used to name files -dh HOSTSET, --hostset HOSTSET Path to .npz data for extracted k-mers profile of host -ds HOSTSET_NAME, --hostset_name HOSTSET_NAME Name of the host database used to name files -v, --validation Flag argument for making a "validation"-named simulated dataset -t, --test Flag argument for making a "test"-named simulated dataset -l KMERS_LIST, --kmers_list KMERS_LIST Optional. PATH to a file containing a list of k-mers to be extracted after the simulation. Should be the same as the reference database -o OUTDIR, --outdir OUTDIR Path to folder for outputing tuning results -wd WORKDIR, --workdir WORKDIR Optional. Path to a working directory where tuning data will be spilled ```