Skip to content

Commit

Permalink
ENH update read the docs
Browse files Browse the repository at this point in the history
  • Loading branch information
psj1997 committed Feb 27, 2024
1 parent d2e8efb commit 3c4f79a
Show file tree
Hide file tree
Showing 2 changed files with 30 additions and 0 deletions.
4 changes: 4 additions & 0 deletions docs/subcommands.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ Reconstruct bins with single or co-assembly binning using one command.
* `-i/--input-fasta` : Path to the input contig fasta file (`gzip` and `bzip2` compression are accepted).
* `-b/--input-bam`: Path to the input BAM (`.bam` extension) or CRAM (`.cram`) files. You can pass multiple BAM files, one per sample.
* `-o/--output`: Output directory (will be created if non-existent).
* `-a/--abundance` Path to the abundance file from strobealign-aemb. This can only be used when samples used in binning above or equal 5.

#### Recommended arguments

Expand Down Expand Up @@ -126,6 +127,7 @@ These are the are same as for `single_easy_bin`.
* `--ml-threshold`
* `--taxonomy-annotation-table`
* `--tmpdir`
* `-a/--abundance`

These are the are same as for `single_easy_bin`.

Expand All @@ -138,6 +140,7 @@ The subcommand `generate_sequence_features_single` requires the contig file and
* `-i/--input-fasta`
* `-b/--input-bam`
* `-o/--output`
* `-a/--abundance`

These are the are same as for `single_easy_bin`.

Expand All @@ -161,6 +164,7 @@ The subcommand `generate_sequence_features_multi` requires the combined contig f
* `-i/--input-fasta`
* `-o/--output`
* `-b/--input-bam`
* `-a/--abundance`

These are the same as for `multi_easy_bin`.

Expand Down
26 changes: 26 additions & 0 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -419,3 +419,29 @@ SemiBin2 generate_cannot_links -i S5.fa -o S5_output

See the comment above about how you can bypass most of the computation if you have run `mmseqs2` to annotate your contigs against GTDB already.


## Running SemiBin with strobealign-aemb

Strobealign-aemb is a fast abundance estimation method for metagenomic binning.
As strobealign-aemb can not provide the mapping information for every position of the contig, so we can not run SemiBin2 with strobealign-aemb in binning modes where samples used smaller 5 and need to split the contigs to generate the must-link constratints.


1. Split the fasta files
```bash
python script/generate_split.py -c contig.fa -o output
```
2. Map reads using [strobealign-aemb](https://github.com/ksahlin/strobealign) to generate the abundance information
```bash
strobealign --aemb output/split.fa read1_1.fq read1_2.fq -R 6 > sample1.txt
strobealign --aemb output/split.fa read2_1.fq read2_2.fq -R 6 > sample2.txt
strobealign --aemb output/split.fa read3_1.fq read3_2.fq -R 6 > sample3.txt
strobealign --aemb output/split.fa read4_1.fq read4_2.fq -R 6 > sample4.txt
strobealign --aemb output/split.fa read5_1.fq read5_2.fq -R 6 > sample5.txt
```
3. Run SemiBin2 (like running SemiBin with BAM files)
```bash
SemiBin2 generate_sequence_features_single -i contig.fa -a *.txt -o output
SemiBin2 generate_sequence_features_multi -i contig.fa -a *.txt -s : -o output
SemiBin2 single_easy_bin -i contig.fa -a *.txt -o output
SemiBin2 multi_easy_bin i contig.fa -a *.txt -s : -o output

0 comments on commit 3c4f79a

Please sign in to comment.