 # **ERVmap**
 ERVmap is one part curated database of human proviral ERV loci and one part a stringent algorithm to determine which ERVs are transcribed in their RNA seq data.
-## Citation 
[![Actions Status](https://github.com/eipm/ERVmap/workflows/Docker/badge.svg)](https://github.com/eipm/ERVmap/actions) [![Github](https://img.shields.io/badge/github-latest-green?style=flat&logo=github)](https://github.com/eipm/ERVmap) [![Docker Hub](https://img.shields.io/badge/docker%20hub-latest-blue?style=flat&logo=docker)](https://hub.docker.com/repository/docker/eipm/ervmap) [![GitHub Container Registry](https://img.shields.io/badge/GitHub%20Container%20Registry-latest-blue?style=flat&logo=docker)](https://github.com/orgs/eipm/packages/container/package/ervmap)
## Citation
 Tokuyama M. et. al., ERVmap analysis reveals genome-wide transcription of human endogenous retroviruses. Proc Natl Acad Sci USA 2018 Dec 11;115(50):12565-12572. [doi: 10.1073/pnas.1814589115](http:/doi.org/10.1073/pnas.1814589115).
 ## **How to use it**
 ### Install
 This version of the tool consists on 2 steps: 1. alignment to the human genome (GRC38) and 2. quantification of the ERV regions. To download and install ERVmap latest version provided as docker image, simply type:
 docker pull eipm/ervmap:latest
 **NOTE**: for a specific version replace `latest` with the release version. 
 ### **How to run ERVmap**
 To run ERVmap, you'd need: 1. an indexed genome reference for STAR; 2. A bed file with the curated ERV regions on the human genome (see `ERVmap.bed`); 3. the input FASTQ data (gzipped).  Assuming that your sample is called `SAMPLE`, and has 2 FASTQ files (one per read) in the folder `/path/to/input/data`; the reference genome is in `/path/to/genome` and the ERV bed file is in `/path/to/erv/file` here is the command:
 docker run --rm  \
     -u $(id -u):$(id -g) \
     -v /path/to/input/data:/data:ro \
     --output SAMPLE/SAMPLE. \
     --mode ALL
 This command will generate the alignment files (BAMs) in the `/path/to/output/SAMPLE/` folder and all files will have the prefix `SAMPLE.`. The generated files will be:
 (See [STAR documentation](https://github.com/alexdobin/STAR) for the description of the output files of the STAR aligner ). 
 The results of ERV quantification will be in the `SAMPLE.ERVresults.txt` file. This is a tab-delimited file with 7 columns from  [bedtools](https://bedtools.readthedocs.io/en/latest/). For example:
 1       896176  898458  5803    500     +       70
 1       1412251 1418852 5804    500     +       36
 1       3801730 3806808 5807    500     +       6
 ## The **`--mode`** option
 This option can only have 3 values: { `ALL`, `STAR`, `BED` }:
 * `ALL` to run both the STAR aligner and the ERV quantification from start to finish; 
 * `STAR` to only perform the alignment;
 * `BED` to only run the ERV quantification.
 ### <a id='optparam'></a>Optional parameters (*recommended*) 
 There are a few parameters that can be added to the ERVmap image to make the process more efficient.
 * `--cpus 20`: if you have a multi-core system (and you should have one), you can specify the number of CPUs to use (e.g. 20);
-* `--limit-ram 48000000000`: this limits the amount of RAM used to avoid overusing the resources 
* `--limit-ram 48000000000`: this limits the amount of RAM used to avoid overusing the resources
 You can see the full set of parameters by typing: `docker run --rm ervmap`.
-There are also other parameters from Docker that should be included before `ervmap` in the command line, e.g. 
There are also other parameters from Docker that should be included before `ervmap` in the command line, e.g.
     --memory 50G \
     --memory-swap 100G
-# Nextflow version
## Nextflow version
 To run this pipeline using [Nextflow](https://www.nextflow.io/), simply run the following:
 `nextflow -C nextflow.config run main.nf`
 where `nextflow.config` include the minimum set of parameters to run ERVmap within the docker container. Specifically:
 params {
     genome='/path/to/genome'               # external path to the indexed genome for the STAR aligner
     inputDir='path/to/input/folder'        # external path of the input data
 ## Published version
 ## **Installing**
 ## **Installing**
### Install dependencies
 ### Install .pl and r files
@@ -126,19 +143,22 @@ normalize_deseq.r
 This step will yield raw counts for cellular genes and ERVmap loci as separate files.
-### For single-end sequences:
### For single-end sequences
 erv_genome.pl -stage 1 -stage2 6 -fastq /${i}_SS.fastq.gz
-### For pair-end sequences:
### For pair-end sequences
 interleaved.pl --read1  ${i}_R1.fastq.gz  --read2 ${i}_R2.fastq.gz > ${i}.fastq.gz
 erv_genome.pl -stage 1 -stage2 6 -fastq /${i}.fastq.gz
 ### Store output files
 mv ./sample/herv_coverage_GRCh38_genome.txt ./output/erv/${i}.e
 mv ./sample/GRCh38/htseq.cnt ./output/cellular/${i}.c
@@ -146,10 +166,10 @@ mv ./sample/GRCh38/htseq.cnt ./output/cellular/${i}.c
 ## **Clean up data, merge, and normalize**
-These steps will yield normalized ERV read counts based on size factors obtained through DESeq2 analysis. 
-Use the output files from above. 
These steps will yield normalized ERV read counts based on size factors obtained through DESeq2 analysis.
Use the output files from above.
+Use the output files from above.
 run_clean_htseq.pl ./output/cellular c c2 __
 merge_count.pl 3 6 e ./output/erv > ./output/erv/merged_erv.txt
 merge_count.pl 0 1 c2 ./output/cellular > ./output/cellular/merged_cellular.txt
 * Maria Tokuyama
 * Yong Kong