Skip to content

Commit

Permalink
Release 0.8.3
Browse files Browse the repository at this point in the history
Release 0.8.3
  • Loading branch information
farchaab authored May 24, 2024
2 parents 8e02ed1 + b06c09f commit 4587e12
Show file tree
Hide file tree
Showing 65 changed files with 1,924 additions and 1,948 deletions.
20 changes: 20 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
docs/

mess.egg-info/
mess/__pycache__
build/

htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*,cover
tests/__pycache__
.pytest_cache

.snakemake
mess/workflow/conda
mess/workflow/taxonkit
68 changes: 68 additions & 0 deletions .github/workflows/docker-publish.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
name: Docker publish

on:
push:
branches: ["main"]
tags: ["v*.*.*"]
pull_request:
branches: ["main"]

env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}

jobs:
build:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
id-token: write

steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Install cosign
if: github.event_name != 'pull_request'
uses: sigstore/cosign-installer@v3.5.0
with:
cosign-release: "v2.2.4"

- name: Set up QEMU
uses: docker/setup-qemu-action@v3

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

- name: Log into registry ${{ env.REGISTRY }}
if: github.event_name != 'pull_request'
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Extract Docker metadata
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}

- name: Build and push Docker image
id: build-and-push
uses: docker/build-push-action@v5
with:
context: .
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max

- name: Sign the published Docker image
if: ${{ github.event_name != 'pull_request' }}
env:
TAGS: ${{ steps.meta.outputs.tags }}
DIGEST: ${{ steps.build-and-push.outputs.digest }}
run: echo "${TAGS}" | xargs -I {} cosign sign --yes {}@${DIGEST}
17 changes: 17 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
FROM mambaorg/micromamba
LABEL org.opencontainers.image.source=https://github.com/metagenlab/MeSS
LABEL org.opencontainers.image.description="Snakemake pipeline for simulating shotgun metagenomic samples"
LABEL org.opencontainers.image.licenses=MIT
ADD . /tmp/repo
WORKDIR /tmp/repo
ENV LANG C.UTF-8
ENV SHELL /bin/bash
USER root

RUN micromamba install -q -y -c bioconda -c conda-forge -n base \
mess --only-deps && \
micromamba install -q -y -c conda-forge -n base mamba && \
micromamba clean -afy

ENV PATH /opt/conda/bin:${PATH}
RUN pip install .
1 change: 0 additions & 1 deletion docs/benchmarking.md

This file was deleted.

9 changes: 9 additions & 0 deletions docs/benchmarks/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Benchmarks
We benchmarked MeSS and CAMISIM, the state-of-the art metagenome simulator, in terms of species composition and resource usage.

We demonstrated that, MeSS generates the same species composition as CAMISIM, while being 10x faster.

## [Species composition](species-composition.md)

## [Resource usage](resource-usage.md)

29 changes: 29 additions & 0 deletions docs/benchmarks/resource-usage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
16 samples were used to benchmark MeSS and CAMISIM resources usage.

Samples were create by subsampling 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 40, 80, 160, 320, 640 genomes from a total of 2000 complete bacterial genomes (downloaded with [assembly_finder](https://github.com/metagenlab/assembly_finder)).

Each genome was covered at 1x using art_illumina with CAMISIM's custom MBARC error model.

See [this nextflow pipeline](https://github.com/farchaab/benchmark-MeSS-CAMISIM) to run the benchmark.
## Results
### Physical RAM usage

![ram](../images/ram-usage.svg)

### CPU usage

![cpu-usage](../images/cpu-usage.svg)

### CPU time

![cpu-usage](../images/cpu-time.svg)

!!! warning
To simulate a sample with 2.4G base pairs, using one CPU, CAMISIM takes 32 hours, while MeSS takes 3 hours.

## Conclusions
MeSS vs CAMISIM on average:

- [x] 5x more parallel (CPU usage)
- [x] 10x faster using one CPU (CPU time)
- [x] Uses 16.7x less memory (physical RAM)
61 changes: 61 additions & 0 deletions docs/benchmarks/species-composition.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
5 samples from the [human microbiome project](https://www.hmpdacc.org/hmp/) were were classified with [kraken2](https://github.com/DerrickWood/kraken2) and [bracken](https://github.com/jenniferlu717/Bracken). Taxa with at least at 200 reads were kept and used as input to both MeSS and CAMISIM.

Use [this nextflow pipeline](https://github.com/farchaab/benchmark-MeSS-CAMISIM) to generate the fastqs.

## Results

[microViz](https://github.com/david-barnett/microViz/) was used for the ordination plots and statistical tests.

### Bray-curtis

![bray](../images/species-bray-NMDS.svg)

:material-arrow-right: Samples from the same bodysite cluster together. In addition, simulated samples cluster well with real samples (gold_standard and gs_filtered).

### PERMANOVA

:simple-hypothesis: **Null hypothesis** : No significant difference in species composition between simulated and non simulated samples

??? info "**Code**"
```R
perm <- dist_permanova(mdist,
variables = "origin:simulated+body_site",
n_perms = 999,
n_processes = 3
)
```

```R
Df SumOfSqs R2 F Pr(>F)
body_site 3 12.153 0.37843 15.6933 0.001 ***
origin:simulated 3 1.117 0.03479 1.4429 0.067 .
Residual 73 18.844 0.58678
Total 79 32.115 1.00000
```

:material-arrow-right: Significant difference between body sites. No significant difference between simulated and real samples

### Beta dispersion

:simple-hypothesis: **Null hypothesis** : No significant difference in dispersion between samples of different origin

```R
Fit: aov(formula = distances ~ group, data = df)

$group
diff lwr upr p adj
gs_filtered-gold_standard 2.249163e-03 -0.03593552 0.04043384 0.9986690
camisim-gold_standard -2.310968e-02 -0.06129435 0.01507500 0.3905351
mess-gold_standard -2.308946e-02 -0.06127414 0.01509522 0.3913195
camisim-gs_filtered -2.535884e-02 -0.06354352 0.01282584 0.3082419
mess-gs_filtered -2.533862e-02 -0.06352330 0.01284606 0.3089344
mess-camisim 2.021632e-05 -0.03816446 0.03820490 1.0000000
```

:material-arrow-right: No significant difference between filtered and non-filtered samples, simulated and real samples.

## Conclusions

- [x] Same species composition between original and filtered samples
- [x] Same species composition between MeSS and CAMISIM

2 changes: 1 addition & 1 deletion docs/citation.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
# Citation

![`mess citation`](docs/images/mess-citation.svg)
![`mess citation`](images/mess-citation.svg)
Loading

0 comments on commit 4587e12

Please sign in to comment.