-
Notifications
You must be signed in to change notification settings - Fork 10
Home
#Welcome to the Halvade wiki!
##Getting started
To run the program a script has been added, this reads configuration from two files:
- halvade.conf - contains configuration for the cluster (details here) / amazon EMR (details here)
- halvade_run.conf - a list of options for a DNA seq run (details here)
To set an option, remove the #
before the line and add an argument (between "..."
if option is a string) if necessary.
After all options are set, run runHalvade.py and wait until completion.
###Local cluster To configure a cluster the options in halvade.conf need to be set, for a local cluster this is:
- nodes: sets he number of nodes in the cluster
- vcores: sets the number of threads that can run on each node
- B: sets the absolute path to the directory containing the bin.tar.gz file
- D: sets the absolute path to SNP database file
- R: sets the absolute path of the fastq file of the reference, all other reference files should be in the same folder with this path as prefix
Make sure that all options for Amazon EMR are disabled by putting the line in comment (add #
before the line)
Once this is set for your cluster you only need to change halvade_run.conf for jobs you want to run. Two options that are mandatory are the input I, which gives the path to the input directory and output O, which gives the path to the output directory. With this all options are set and Halvade can be run.
###Amazon EMR To run on Amazon EMR the Amazon EMR command line interface (instructions from Amazon) needs to be installed. To configure a cluster the options in halvade.conf need to be set, for Amazon EMR this is:
- nodes: sets he number of nodes in the cluster
- vcores: sets the number of threads that can run on each node
- B: sets the absolute path to the directory containing the bin.tar.gz file
- D: sets the absolute path to SNP database file on S3
- R: sets the absolute path of the fastq file of the reference, all other reference files should be in the same folder with this path as prefix
- emr_jar: sets the absolute path of _HalvadeWithLibs.jar on S3
- emr_script: sets the absolute path of halvade_bootstrap.sh on S3
- emr_type: sets the Amazon EMR instance type (e.g. "c3.8xlarge")
- emr_ami_v: sets the AMI number for Amazon EMR, should be set to "3.1.0" or newer
- tmp: this should be set to "/mnt/halvade/"
For locations on S3 a uri of this form should be used: s3://bucketname/directory/to/file
Once this is set for your cluster you only need to change halvade_run.conf for jobs you want to run. Two options that are mandatory are the input I, which gives the path to the input directory and output O, which gives the path to the output directory. With this all options are set and Halvade can be run.