Skip to content

Commit

Permalink
docs + version update
Browse files Browse the repository at this point in the history
  • Loading branch information
mikolmogorov committed Mar 3, 2020
1 parent 322541b commit 0e0641a
Show file tree
Hide file tree
Showing 5 changed files with 63 additions and 17 deletions.
21 changes: 16 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,18 +3,29 @@ Flye assembler

[![BioConda Install](https://img.shields.io/conda/dn/bioconda/flye.svg?style=flag&label=BioConda%20install)](https://anaconda.org/bioconda/flye)

### Version: 2.7b
### Version: 2.7

Flye is a de novo assembler for single molecule sequencing reads,
such as those produced by PacBio and Oxford Nanopore Technologies.
It is designed for a wide range of datasets, from small bacterial projects
to large mammalian-scale assemblies. The package represents a complete
pipeline: it takes raw PB / ONT reads as input and outputs polished contigs.
Flye also includes a special mode for metagenome assembly.
pipeline: it takes raw PacBio / ONT reads as input and outputs polished contigs.
Flye also has a special mode for metagenome assembly.

Latest updates
--------------

### Flye 2.7 release (03 Mar 2020)
* Better assemblies of real (and comlpex) metagenomes
* New option to retain alternative haplotypes, rather than collapsing them (`--keep-haplotypes`)
* PacBio HiFi mode
* Using Bam instead of Sam to reduce storage requirements and IO load
* Improved human assemblies
* Annotation of alternative contigs
* Better polishing quality for the newest ONT datasets
* Trestle module is disabled by default (use `--trestle` to enable)
* Many big fixes and improvements

### Flye 2.6 release (19 Sep 2019)
* This release introduces Python 3 support (no other changes)

Expand Down Expand Up @@ -155,7 +166,7 @@ Before posting an issue/question, consider to look through the FAQ
and existing issues (opened and closed) - it is possble that your question
has already been answered.

If you reporting a problem, please include the `flye.log` file and provide some
details about your dataset (if possible).
If you reporting a problem, please include the `flye.log` file and provide
details about your dataset.

In case you prefer personal communication, please contact Mikhail at fenderglass@gmail.com.
12 changes: 12 additions & 0 deletions docs/NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,15 @@
Flye 2.7 release (03 Mar 2020)
==============================
* Better assemblies of real (and comlpex) metagenomes
* New option to retain alternative haplotypes, rather than collapsing them (`--keep-haplotypes`)
* PacBio HiFi mode
* Using Bam instead of Sam to reduce storage requirements and IO load
* Improved human assemblies
* Annotation of alternative contigs
* Better polishing quality for the newest ONT datasets
* Trestle module is disabled by default (use `--trestle` to enable)
* Many big fixes and improvements

Flye 2.6 release (19 Sep 2019)
==============================
* This release introduces Python 3 support (no other changes)
Expand Down
43 changes: 33 additions & 10 deletions docs/USAGE.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ The original dataset is available at the
We coverted the raw `bas.h5` file to the FASTA format for the convenience.

wget https://zenodo.org/record/1172816/files/E.coli_PacBio_40x.fasta
flye --pacbio-raw E.coli_PacBio_40x.fasta --out-dir out_pacbio --genome-size 5m --threads 4
flye --pacbio-raw E.coli_PacBio_40x.fasta --out-dir out_pacbio --genome-size 5m --threads 4

with `5m` being the expected genome size, the threads argument being optional
(you may adjust it for your environment), and `out_pacbio` being the directory
Expand All @@ -117,7 +117,7 @@ The dataset was originally released by the
[Loman lab](http://lab.loman.net/2015/09/24/first-sqk-map-006-experiment/).

wget https://zenodo.org/record/1172816/files/Loman_E.coli_MAP006-1_2D_50x.fasta
flye --nano-raw Loman_E.coli_MAP006-1_2D_50x.fasta --out-dir out_nano --genome-size 5m --threads 4
flye --nano-raw Loman_E.coli_MAP006-1_2D_50x.fasta --out-dir out_nano --genome-size 5m --threads 4


## <a name="inputdata"></a> Supported Input Data
Expand All @@ -132,6 +132,9 @@ however we saw examples of incorrect third-party raw -> fastq conversions,
which resulted into incorrectly trimmed data. In case Flye is failing to
get reasonable assemblies, make sure that your reads are properly preprocessed.

Flye now supports assembly of PacBio HiFi protocol via `--pacbio-hifi` option.
The expected read error is <1%.

### Oxford Nanopore data

We performed our benchmarks with raw ONT reads (R7-R9) with error rate ~15%.
Expand All @@ -142,7 +145,7 @@ ONT data than with PacBio data, especially in homopolymer regions.

While Flye was designed for assembly of raw reads (and this is the recommended way),
it also supports error-corrected PacBio/ONT reads as input (use the ```corr``` option).
The parameters are optimized for error rates <2%. If you are getting highly
The parameters are optimized for error rates <3%. If you are getting highly
fragmented assembly - most likely error rates in your reads are higher. In this case,
consider to assemble using the raw reads instead.

Expand Down Expand Up @@ -181,19 +184,37 @@ based on the read length distribution (reads N90) and does not require manual se
Typical value is 3k-5k (and down to 1k for datasets with shorter read length).
Intuitively, we want to set this parameter as high as possible, so the
repeat graph is less tangled. However, higher values might lead to assembly gaps.
In some *rare* cases (for example in case of biased read length distribution)
it makes sense to set this parameter manualy.

In some *rare* cases it makes sense to manually increase minimum overlap
for assemblies of big genomes with long reads and high coverage.

### Metagenome mode

Metagenome assembly mode, that is designed for highly non-uniform coverage and
is sensitive to underrepresented sequence at low coverage (as low as 2x).
In some examples of simple metagenomes, we observed that the normal (isolate)
Metagenome assembly mode, that is designed for highly non-uniform coverage and
is sensitive to underrepresented sequence at low coverage (as low as 2x).
In some examples of simple metagenomes, we observed that the normal (isolate)
Flye mode assembled more contigious bacterial
consensus sequence, while the metagenome mode was slightly more fragmented, but
revealed strain mixtures. For relatively complex metagenome `--meta` mode
is the recommended way.

### Haplotype mode

By default, Flye (and metaFlye) collapses graph structures caused by
alternative haplotypes (bubbles, superbubbles, roundabouts) to produce
longer consensus contigs. The option `--keep-haplotypes` retains
the alternative paths on the graph, producing less contigouos, but
more detailed assembly.

### Trestle

Trestle is an extra module that resolves simple repeats of
multipicity 2 that were not bridged by reads. Depending on the
datasets, it might resolve a few extra repeats, which is helpfu;
for small (bacterial genomes). Use `--trestle` option to enable the module.
On large genomes, the contiguity improvements are usually minimal,
but the computation might take a lot of time.

### Reduced contig assembly coverage

Typically, assemblies of large genomes at high coverage require
Expand Down Expand Up @@ -253,12 +274,14 @@ It is a tab-delimited table with the columns as follows:
* Is circular (representing circular sequence, such as bacterial chromosome or plasmid)
* Is repetitive (represents repeated, rather than unique sequence)
* Multiplicity (inferred multiplicity based on coverage)
* Alternative group
* Graph path (repeat graph path corresponding to this contig/scaffold).

Scaffold gaps are marked with `??` symbols, and `*` symbol denotes a
terminal graph node.

`scaffolds.fasta` file is a symlink to `assembly.fasta`, which is
retained for the backward compatibility.
Alternative contigs (representing alternative haplotypes) will have the same
alt. group ID. Primary contigs are marked by `*`

## <a name="graph"></a> Repeat graph

Expand Down
2 changes: 1 addition & 1 deletion flye/__build__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__build__ = 1582
__build__ = 1583
2 changes: 1 addition & 1 deletion flye/__version__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "2.7b"
__version__ = "2.7"

0 comments on commit 0e0641a

Please sign in to comment.