Skip to content

This project generates Astral-III species trees from gene trees generated from RAxML using UCE data generated by the Phyluce pipeline. Then runs an Astral III analysis.

Notifications You must be signed in to change notification settings

cyhkim/RAxML_Astral_trees

 
 

Repository files navigation

RAxML_Astral_trees from a Phyluce UCE pipeline

This project generates Astral-III species trees from gene trees generated from RAxML using UCE data generated by the Phyluce pipeline (ctrl-click to open in new window).

There are currently four scripts, three shell and one R (you will only run astral_prep.sh and astral_run.sh at the comand line). The first script, astral_prep.sh, creates directories for each UCE alignment and calls an R sript, 'RCmds'. The R script, RCmds, uses 'ips' and 'parallel' libraries to convert formats (nexus to phylip) and it calls a third script, 'run_RAxML.sh', which launches parallel RAxML analyses depending on how many threads you allocate in the RCmds R script. The fourth script, astral_run.sh, creates new directories, moves the RAxMl trees into one of the directories, merges all of the trees into a single file, moves the bootstrap files into a directory, creates a file with a list of paths pointing to each bootstrap file, and launches Astral-III.

If your UCE alignments have a lot of identical sequences, RAxML will generate a .reduced alignment that can be used to rerun the RAxML ML ands bootstrap analyses and then rerun Astral III.

there are two addition scripts appended with "_reduced" for RAxML and Astral analyses. RCmds_reduced Astral_run_reduced.sh

Installation and prep for Ubuntu 16.04:

  1. If you do not have the latest parallel version of RAxML installed, follow the "Compiling RAxML" directions in the RAxML v8 manual (ctrl-click to open in new window). Be sure to compile one of the two parallele option, MPI or PTHREADS (Note: PTHREADS has several options SSE3, AVX, AVX2. You need to check you CPU to see which of these instructions it supports - SSE3 is slowest, AVX2 is fastest (New Xeon chips support AVX2 as of April 2018).
  2. Clone the RAxML_Astral_trees git files to your home directory (or wherever you like) (Note: '$' is your prompt, do not enter this character):
$ git clone https://github.com/calacademy-research/RAxML_Astral_trees.git
  1. You will now have a directory named 'RAxML_Astral_trees' with all required scripts:
~/RAxML_Astral_trees
  1. Make sure R is installed and that the R libraries ips and parallel are installed. Launch R at the command prompt (you should see a greater than symbol as your new prompt '>') and enter install.packages(c("ips", "parallel")):
$ R
>
>install.packages(c("ips", "parallel"))

Printed and scrolling on screen, you will see the installation progress. You may get a message that one or both libraries already exists, which should be OK. If you get errors that you cannot resolve, you will have to consult your Sys Admin.

  1. There are many ways to organize your data, but the following is what I do. I create symlinks for all of the scripts in the phyluce alignment directory (my "working directory"), which contains all of your final uce nexus alignments, ~/ABySS/mafft-nexus-min75-taxa for e.g.:
$cd ~/ABySS/mafft-nexus-min75-taxa
~/ABySS/mafft-nexus-min75-taxa$ ln -s ~/RAxML_Astral_trees/astral_prep.sh astral_prep.sh
~/ABySS/mafft-nexus-min75-taxa$ ln -s ~/RAxML_Astral_trees/astral_run.sh astral_run.sh
~/ABySS/mafft-nexus-min75-taxa$ ln -s ~/RAxML_Astral_trees/run_RAxML.sh run_RAxML.sh
~/ABySS/mafft-nexus-min75-taxa$ ln -s ~/RAxML_Astral_trees/RCmds RCmds
  1. Y0u must edit the RCmds R script to set your working directory. Our example here is ~/ABySS/mafft-nexus-min75-taxa. Open the RCmds script in your favorite editor (I use vi or nano) and edit the line setwd("/my/working/directory/") to match your working directory. For e.g.:
$ setwd("~/ABySS/mafft-nexus-min75-taxa")
  1. You must also edit the RCmds R script to set the number of threads to match what is available on your server. The deafult line is 'final_raxml = mclapply(cmd, system, mc.cores=getOption("mc.cores", 48)) ### 48 threads'. If your server has 16 threads available then replace 48 with 16 as follows:
$ final_raxml = mclapply(cmd, system, mc.cores=getOption("mc.cores", 16))  ### 16 threads

If you are not sure of the number threads available on your system do:

$ cat /proc/cpuinfo | grep processor | wc -l

or launch htop, which will show all threads. If you do not have htop installed do:

$ sudo apt-get install htop
  1. You will need to edit the last line of the astral_run.sh file to match your prefered final tree name and the amount of memory in your system. The -Xmx100G flag tells java to use 100GB of RAM. Change this to match your system RAM. The last item on this line names your final Astral species tree. Edit as you like.
java -Xmx100G -jar ~/ASTRAL/astral.5.5.6.jar -i tree_files/RAx_genetrees_merge.tre -b boot_trees/bootstrap.filedir.list.txt -r 100 -o My_AstralIII_sp_tree.tre

If you are not sure of how much RAM your system has, do:

$ free -m

This will print to screen something like this:

bsimison@tdobz:~$ free -m
             total       used       free     shared    buffers     cached
Mem:       1032052     992002      40050         10        230     941896
-/+ buffers/cache:      49875     982176
Swap:        76293        682      75611

This particular system has a total of 1,032,052MB or 1,000GB or 1TB RAM. 9. Depending on which version of RAxML you compiled, you will have to edit the run_RAxML.sh script to match your compiled version of RAxML. Edit the line "~/standard-RAxML/raxmlHPC-AVX -f a -m GTRGAMMA -N 100 -x 12345 -p 25258 -n ${id}.txt -s $phy" to match the location and version of RAxML. For e.g., if you have raxml-PTHREADS-AVX2

Run the scripts

  1. The first step is to run 'astral_prep.sh' from your working directory (this example uses the "~/ABySS/mafft-nexus-min75-taxa" directory). Make sure all of the scripts are executable by you by running the following command for each script (you may need sudo privileges) (Note:"~/ABySS/mafft-nexus-min75-taxa$" is your prompt, do not enter this):
~/ABySS/mafft-nexus-min75-taxa$ chmod +x astral_prep.sh

Then run:

~/ABySS/mafft-nexus-min75-taxa$ ./astral_prep.sh

Depending on your server and the number of alignments and taxa you have, this could take hours to days to complete. If you are running this remotely, I recommend using 'screen' and putting it in the background in case your local terminal or computer crashes and terminates your runs. How to use screens (ctrl-click to open in new window).

  1. Once the astral_prep.sh run finishes, run the astral-run.sh script:
~/ABySS/mafft-nexus-min75-taxa$ ./astral_run.sh

About

This project generates Astral-III species trees from gene trees generated from RAxML using UCE data generated by the Phyluce pipeline. Then runs an Astral III analysis.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 100.0%