@@ -58,7 +58,7 @@ from PacBio and ONT are supported. The expected error rates are
58
58
<30% for raw and <2% for corrected reads. Additionally,
59
59
``` --subassemblies ``` option performs a consensus assembly of multiple
60
60
sets of high-quality contigs. You may specify multiple
61
- fles with reads (separated by spaces). Mixing different read
61
+ files with reads (separated by spaces). Mixing different read
62
62
types is not yet supported.
63
63
64
64
You must provide an estimate of the genome size as input,
@@ -117,8 +117,8 @@ ONT data than with PacBio data, especially in homopolymer regions.
117
117
118
118
### Error-corrected reads input
119
119
120
- While Flye was designed for assembly of raw reads (and this is the recommended option ),
121
- it also supports error-corrected PacBio/ONT reads as input (use the correpsonding option).
120
+ While Flye was designed for assembly of raw reads (and this is the recommended way ),
121
+ it also supports error-corrected PacBio/ONT reads as input (use the ``` corr ``` option).
122
122
The parameters are optimized for error rates <2%. If you are getting highly
123
123
fragmented assembly - most likely error rates in your reads are higher. In this case,
124
124
consider to assemble using the raw reads instead.
@@ -171,10 +171,15 @@ errors (due to improvements on how reads may align to the corrected assembly;
171
171
especially for ONT datasets). If the parameter is set to 0, the polishing will
172
172
not be performed.
173
173
174
- ### Resuming existing jobs
174
+ ### Starting from a particular assembly stage
175
175
176
- Use --resume to resume a previous run of the assembler that may have terminated
177
- prematurely. The assembly will continue from the last previously completed step.
176
+ Use ``` --resume ``` to resume a previous run of the assembler that may have terminated
177
+ prematurely (using the same output directory).
178
+ The assembly will continue from the last previously completed step.
179
+
180
+ You might also resume from a particular stage with ``` --resume-from stage_name ``` ,
181
+ where ``` stage_name ``` is a choice of ``` assembly, consensus, repeat, polishing ``` .
182
+ For example, you might supply different sets of reads for different stages.
178
183
179
184
## <a name =" graph " ></a > Assembly graph
180
185
@@ -256,16 +261,16 @@ for more detailed information. The assembly pipeline is organized as follows:
256
261
257
262
* Kmer counting / erroneous kmer pre-filtering
258
263
* Solid kmer selection (kmers with sufficient frequency, which are unlikely to be erroneous)
259
- * Finding read overlaps based on the A-Bruijn graph
260
- * Detection of chimeric sequences
261
- * Contig assembly by read extension
264
+ * Contig extension. The algorithm starts from a single read and extends it
265
+ with a next overlapping read (overlaps are dynamically detected using the selected
266
+ solid k-mers).
262
267
263
- The resulting contig assembly is now simply a concatenation of read parts
264
- and is error-prone. Flye then aligns the reads on the draft contigs using minimap2 and
265
- calls a rough consensus. Afterwards, the algorithm performs additional repeat analysis
266
- as follows:
268
+ Note that we do not attempt to resolve repeats at this stage, thus
269
+ the reconstructed contigs might contain misassemblies.
270
+ Flye then aligns the reads on these draft contigs using minimap2 and
271
+ calls a consensus. Afterwards, Flye performs repeat analysis as follows:
267
272
268
- * Repeat graph is reconstructed from the assembled sequence
273
+ * Repeat graph is constructed from the (possibly misassembled) contigs
269
274
* In this graph all repeats longer than minimum overlap are collapsed
270
275
* The algorithm resolves repeats using the read information and graph structure
271
276
* The unbranching paths in the graph are output as contigs
0 commit comments