diff --git a/README.md b/README.md
index b4a76b3..1c153de 100644
--- a/README.md
+++ b/README.md
@@ -7,11 +7,11 @@ CryProcessor is a high-troughtput tool for the Cry toxins mining from the fasta-
## About CryProcessor
-CryProcessor is a python-written tool for searching and extracting Cry toxins from illumina sequence data or from the protein fasta files. It includes several parts: an hmm-based scanning for potential Cry toxins, obtaining information about the domains, extracting Cry toxins with 3 domains only and comparing found toxins with Bt nomenclature.
The mode for performing the toxins search directly from the illumina reads implies building an assembly graph (using SPAdes) and the subsequent mining toxins directly from the obtained assebmly graph.
+CryProcessor is a python-written tool for searching and extracting Cry toxins from illumina sequence data or from the protein fasta files. It includes several parts: an hmm-based scanning for potential Cry toxins; annotation and mapping of the domains found; extraction of Cry toxins with proper domain content; finally, comparison of the toxins with Bt nomenclature.
The mode for read-imputed search implies building an assembly graph (using SPAdes) and the subsequent mining toxins directly from the obtained assebmly graph.
## CryProcessor Pipeline
-The following text stands for the full pipeline description (for the illumina reads). To start, SPAdes (http://cab.spbu.ru/software/spades/) or metaSPAdes (http://cab.spbu.ru/software/meta-spades/) are implemented to get the assembly graph from the fastq-files. After that, the potential Cry toxins (with at least 30% identity to the hmm-consensus) are extracted from the assembly paths via PathRacer (http://cab.spbu.ru/software/pathracer/). Then hmmsearch (http://hmmer.org/) is used to find Cry toxin domains in the obtained sequences. In the next step, the results of hmmsearch are combined to get the toxins that posses all three domains.
The coordinates of the domains are used to cut flanking sequences and save the domains with the corresponding linkers. The full sequences (without processing procedure) are used to compare the obtained toxins with the Bt nomenclature database via diamond blastp (https://github.com/bbuchfink/diamond). The non-identical sequences are extracted and marked as the potentially new toxins.
For all the found sequences (both identical to presented in Bt nomenclature and the novel sequences) an online ipg-annotation (Identical Protein Group) is performed (to see the annotation output read the annotation output section below). Finally, nucleotide sequences, corresponding to the protein sequences of the found toxins, are downloaded. Metadata will be uploaded only if the accession numbers are present in the query.
+The following text stands for the full pipeline description (for the Illumina reads). To start, SPAdes (http://cab.spbu.ru/software/spades/) or metaSPAdes (http://cab.spbu.ru/software/meta-spades/) are implemented to get the assembly graph from the fastq-files. After that, the potential Cry toxins (with at least 30% identity to the hmm-consensus) are extracted from the assembly paths via PathRacer (http://cab.spbu.ru/software/pathracer/). Then hmmsearch (http://hmmer.org/) is used to find Cry toxin domains in the obtained sequences. In the next step, the results of hmmsearch are combined to get the toxins that posses all three domains.
The coordinates of the domains are used to cut flanking sequences and save the domains with the corresponding linkers. The full sequences (without processing procedure) are used to compare the obtained toxins with the Bt nomenclature database via diamond blastp (https://github.com/bbuchfink/diamond). The non-identical sequences are extracted and marked as the potentially new toxins.
For all the found sequences (both identical to presented in Bt nomenclature and the novel sequences) an online ipg-annotation (Identical Protein Group) is performed (to see the annotation output read the Annotation output section below). Finally, nucleotide sequences, corresponding to the protein sequences of the found toxins, are downloaded. Metadata will be uploaded only if the accession numbers are present in the query.
## Installation and Usage
### Prerequisites