Skip to content

Commit

Permalink
Merge pull request #25 from bioinfoUQAM/dev
Browse files Browse the repository at this point in the history
Dev
  • Loading branch information
nicdemon authored Mar 17, 2023
2 parents 3e7b3e1 + c932de9 commit 23e9cf6
Show file tree
Hide file tree
Showing 14 changed files with 2,159 additions and 26 deletions.
133 changes: 133 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,136 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# Output data from example script
output/

build/
.vscode/
__pycache__/
Expand Down
7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,13 @@
# Caribou
Alignment-free bacterial identification and classification in metagenomics sequencing data using machine learning.

## Proof of Concept
The jupyter notebook `workflow_example.ipynb` shows the workflow and it's output using example data. In this notebook, the steps are identified for better understanding.

Data used in the `workflow_example.ipynb` is located in the `example_data/` folder.

This data was also used to test and debug the Caribou analysis pipeline.

## Installation
The Caribou analysis pipeline was developped in python3 and can be easily installed through the python wheel. The repo must be cloned first and then the package can be installed using the following commands lines in the desired folder :
```
Expand Down
31 changes: 31 additions & 0 deletions example_data/30_genomes.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
id,species,genus,family,order,class,phylum,domain
VBOR01000009.1,WS-7 sp005893165,WS-7,SZUA-252,SZUA-252,RBG-16-71-46,Eisenbacteria,Bacteria
PMOP01000016.1,Palsa-360 sp003161495,Palsa-360,UBA7541,UBA7541,Acidobacteriae,Acidobacteriota,Bacteria
DHUT01000069.1,Sedimentibacter sp002409285,Sedimentibacter,Sedimentibacteraceae,Tissierellales,Clostridia,Firmicutes_A,Bacteria
JAAZAC010000025.1,Actinotalea sp012514545,Actinotalea,Cellulomonadaceae,Actinomycetales,Actinomycetia,Actinobacteriota,Bacteria
URUG01000300.1,Faecalimonas sp900550975,Faecalimonas,Lachnospiraceae,Lachnospirales,Clostridia,Firmicutes_A,Bacteria
CAJCBY010000033.1,Aquirufa sp903960725,Aquirufa,Spirosomaceae,Cytophagales,Bacteroidia,Bacteroidota,Bacteria
JABXKY010000147.1,RBG-16-57-9 sp013619005,RBG-16-57-9,TCS64,TCS64,Bathyarchaeia,Thermoproteota,Archaea
JABBOX010000109.1,UBA4719 sp012927555,UBA4719,Dermatophilaceae,Actinomycetales,Actinomycetia,Actinobacteriota,Bacteria
JAABRC010000419.1,JAABRC01 sp011391115,JAABRC01,Burkholderiaceae,Burkholderiales,Gammaproteobacteria,Proteobacteria,Bacteria
CAAFRK010000216.1,Veillonella sp900765235,Veillonella,Veillonellaceae,Veillonellales,Negativicutes,Firmicutes_C,Bacteria
URSE01000035.1,Veillonella sp900550455,Veillonella,Veillonellaceae,Veillonellales,Negativicutes,Firmicutes_C,Bacteria
UQEY01000009.1,Eubacterium_R sp900540235,Eubacterium_R,Acutalibacteraceae,Oscillospirales,Clostridia,Firmicutes_A,Bacteria
NZ_QEST01000278.1,Streptomyces sp003311645,Streptomyces,Streptomycetaceae,Streptomycetales,Actinomycetia,Actinobacteriota,Bacteria
NZ_LRTR01000260.1,Streptomyces europaeiscabiei,Streptomyces,Streptomycetaceae,Streptomycetales,Actinomycetia,Actinobacteriota,Bacteria
JACMKV010000045.1,JACMKV01 sp014379915,JACMKV01,JACMKV01,Burkholderiales,Gammaproteobacteria,Proteobacteria,Bacteria
CACNVV010000042.1,Pelagibacter sp902624015,Pelagibacter,Pelagibacteraceae,Pelagibacterales,Alphaproteobacteria,Proteobacteria,Bacteria
WLHF01000026.1,Planktophila sp009702835,Planktophila,Nanopelagicaceae,Nanopelagicales,Actinomycetia,Actinobacteriota,Bacteria
DKBA01000026.1,UBA6912 sp002450985,UBA6912,UBA5794,UBA5794,Acidimicrobiia,Actinobacteriota,Bacteria
WBXD01000017.1,UBA1315 sp008932935,UBA1315,Akkermansiaceae,Verrucomicrobiales,Verrucomicrobiae,Verrucomicrobiota,Bacteria
PBSX01000072.1,CABZJG01 sp002726375,CABZJG01,Rhodobacteraceae,Rhodobacterales,Alphaproteobacteria,Proteobacteria,Bacteria
JAAYQI010000217.1,JAAYQI01 sp012519385,JAAYQI01,Anaerotignaceae,Lachnospirales,Clostridia,Firmicutes_A,Bacteria
PMSQ01000054.1,Sulfotelmatobacter sp003168355,Sulfotelmatobacter,Koribacteraceae,Acidobacteriales,Acidobacteriae,Acidobacteriota,Bacteria
NZ_LCZE01000023.1,Pseudomonas_E fluorescens_N,Pseudomonas_E,Pseudomonadaceae,Pseudomonadales,Gammaproteobacteria,Proteobacteria,Bacteria
DNMQ01000225.1,Pseudomonas_A sp003488145,Pseudomonas_A,Pseudomonadaceae,Pseudomonadales,Gammaproteobacteria,Proteobacteria,Bacteria
CAIXRL010000197.1,CAIXRL01 sp903921835,CAIXRL01,RBG-16-71-46,RBG-16-71-46,RBG-16-71-46,Eisenbacteria,Bacteria
JAAYXU010000041.1,JAAYXU01 sp012515725,JAAYXU01,UMGS416,Christensenellales,Clostridia_A,Firmicutes_A,Bacteria
DHMB01000127.1,CAG-841 sp002405565,CAG-841,CAG-272,Oscillospirales,Clostridia,Firmicutes_A,Bacteria
JACNFQ010000081.1,NIOZ-UU106 sp014384545,NIOZ-UU106,UBA6624,UBA6624,UBA6624,UBP7,Bacteria
CAJCHR010000269.1,Novosphingobium sp903970225,Novosphingobium,Sphingomonadaceae,Sphingomonadales,Alphaproteobacteria,Proteobacteria,Bacteria
QMMC01000579.1,B10-G4 sp003647065,B10-G4,SG8-38,Polyangiales,Polyangia,Myxococcota,Bacteria
Binary file added example_data/30_genomes.fna.gz
Binary file not shown.
4 changes: 4 additions & 0 deletions example_data/cucurbita_sample_3.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
id,species,domain
NW_019663258,Cucurbita,host
NW_019657536,Cucurbita,host
NEWN01002765,Cucurbita,host
Binary file added example_data/cucurbita_sample_3.fna.gz
Binary file not shown.
Loading

0 comments on commit 23e9cf6

Please sign in to comment.