Skip to content

Commit

Permalink
18-10-2024-10-09
Browse files Browse the repository at this point in the history
  • Loading branch information
BrownAdrien committed Oct 18, 2024
1 parent 9ebf155 commit 8f47d18
Show file tree
Hide file tree
Showing 17 changed files with 382 additions and 265 deletions.
116 changes: 66 additions & 50 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Before setting up your own Brownotate server, you can try a ***demo version*** w

## Installation

# Clone the Repository
### Clone the Repository

First, clone the Brownotate repository:

Expand All @@ -19,11 +19,11 @@ git clone https://github.com/LSMBO/Brownotate.git
cd Brownotate
```

# Install Conda
### Install Conda

If you do not have Conda installed, follow these steps to install it:

1. ***Download Anaconda***:
1. **Download Anaconda**:

```
wget https://repo.anaconda.com/archive/Anaconda3-2024.06-1-Linux-x86_64.sh
Expand All @@ -37,13 +37,13 @@ Follow the instructions to complete the installation:
-Choose the default installation location
-Confirm updating your shell profile to initialize Conda automatically

2. ***Initialize Conda***:
2. **Initialize Conda**:

```
conda init
```

# Create and Activate Conda Environments
### Create and Activate Conda Environments

Create and activate the required Conda environments:

Expand All @@ -61,44 +61,48 @@ cd /path/to/Brownotate
conda env create -f environment_sra_download.yml
```

# Configure MongoDB
### Configure MongoDB

1. ***Download MongoDB Community Server:*** Go to [MongoDB Community Download](https://www.mongodb.com/try/download/community), select:
1. **Download MongoDB Community Server:**

-**Version**: 7.0.14 (current)
-**Platform**: Ubuntu 22.04 x64
-**Package**: Server
Go to [MongoDB Community Download](https://www.mongodb.com/try/download/community), select:

Click on ***Download***.
- **Version**: 7.0.14 (current)
- **Platform**: Ubuntu 22.04 x64
- **Package**: Server

2. ***Install MongoDB:***
Click on **Download**.

2. **Install MongoDB:**

```
sudo dpkg -i mongodb-org-server_7.0.14_amd64.deb
```

3. ***Start MongoDB:***
3. **Start MongoDB:**

```
sudo systemctl start mongod
sudo systemctl status mongod
```

4. ***Download MongoDB Shell:*** Go to [MongoDB Shell Download](https://www.mongodb.com/try/download/shell), select:
4. **Download MongoDB Shell:**

Go to [MongoDB Shell Download](https://www.mongodb.com/try/download/shell), select:

-**Version:** 2.3.0
-**Platform:** Debian (10+) / Ubuntu (18.04+) x64
-**Package:** deb

Click on ***Download***.
Click on **Download**.

5. ***Install MongoDB Shell:***
5. **Install MongoDB Shell:**

```
sudo dpkg -i mongodb-mongosh_2.3.0_amd64.deb
```

6. ***Configure MongoDB:***
6. **Configure MongoDB:**

```
mongosh
Expand All @@ -109,12 +113,11 @@ db.createCollection("runs")
db.createCollection("processes")
```

# Configure `config.json`
### Configure `config.json`

Edit the `config.json` file located in the root directory of the project:

```
json
{
"email": "",
"MONGO_URI": "",
Expand All @@ -125,13 +128,13 @@ json
```

- **`MONGO_URI`**: This follows the format `mongodb://<ip>:<port>/brownotate-db`. You can find the correct URI by running the `mongosh` command in your terminal. The IP is typically localhost and the port is usually 27017.
-**`BROWNOTATE_PATH`**: This should be the directory where you cloned the Brownotate repository.
-**`BROWNOTATE_ENV_PATH`**: Use the command `conda info --envs` to locate the path to the br Conda environment.
-**`SRA_DOWNLOAD_ENV_PATH`**: Use the command `conda info --envs` to locate the path to the sra-download Conda environment.
- **`BROWNOTATE_PATH`**: This should be the directory where you cloned the Brownotate repository.
- **`BROWNOTATE_ENV_PATH`**: Use the command `conda info --envs` to locate the path to the br Conda environment.
- **`SRA_DOWNLOAD_ENV_PATH`**: Use the command `conda info --envs` to locate the path to the sra-download Conda environment.

# Running Brownotate
## Running Brownotate

1. ***Web Application:***
1. **Web Application:**

To set up the Brownotate web application, you need to configure both the client from (https://github.com/LSMBO/brownotate-app) and the backend (https://github.com/LSMBO/brownotate-app).

Expand All @@ -146,26 +149,8 @@ conda activate br
gunicorn -w 4 --worker-class eventlet --bind 0.0.0.0:8800 --timeout 2592000 run_flask:app
```

- The IP **0.0.0.0** allows the server to accept requests from any IP address. The port **8800** corresponds to the port on wich the server listens for requests from the cleint. Ensure that the port you choose is the same one configured in the web client's **config.js** file (see example below).

***Example: ***

You have a server with the public IP address **1.2.3.4**. This server hosts the Brownotate client, the web application will be accessible via the URL **http://1.2.3.4:80** because the client is hosted on an Apache server using port 80 (for more details, see https://github.com/LSMBO/brownotate-app).

You configure the **config.js** file like this:

```
const CONFIG = {
API_BASE_URL: 'http://5.6.7.8:8800'
};
export default CONFIG;
```

This setup means the client will send requests to the server **5.6.7.8** on port **8800**.
The IP **0.0.0.0** allows the server to accept requests from any IP address. The port **8800** corresponds to the port on which the server listens for requests from the client.

On the server **5.6.7.8** where Brownotate backend is installed, you need to launch the Flask application with the gunicorn command with **0.0.0.0:8800**. This will ensure that the server can receive requests from any client on port **8800** (as **0.0.0.0** allows connections from any IP).

NB: It is also possible to host both the Brownotate client and the Brownotate backend on the same server. In this case, you can configure the **brownotate-app/config.js** file with **API_BASE_URL: 'http://localhost:8800'**

2. ***Command-Line Interface:***

Expand Down Expand Up @@ -227,35 +212,66 @@ Brownotate offers a flexible command-line interface for genome annotation and pr
**Run in automatic mode for species "Homo sapiens":**

```
python /path/to/Brownota/main.py -s "Homo sapiens" -a
python /path/to/Brownotate/main.py -s "Homo sapiens" -a
```

**Run the database search (DBS) for "Homo sapiens" with a specific genome file:**

```
python /path/to/Brownota/main.py -s "Homo sapiens" --dbs-only
python /path/to/Brownotate/main.py -s "Homo sapiens" --dbs-only
```

**Run the database search (DBS) for "Mus musculus" by searching for sequencing only, and only Illumina sequencing:**

```
python /path/to/Brownota/main.py -s "Mus musculus" --dbs-only --no-genome --no-proteins --only-illumina
python /path/to/Brownotate/main.py -s "Mus musculus" --dbs-only --no-genome --no-proteins --only-illumina
```

**Run for Mus musculus with a custom genome assembly, skipping busco:**

```
python /path/to/Brownota/main.py -s "Mus musculus" -g /path/to/mus_musculus_genome.fasta --skip-busco
python /path/to/Brownotate/main.py -s "Mus musculus" -g /path/to/mus_musculus_genome.fasta --skip-busco
```

**Run for Drosophila melanogaster (taxid: 7227) with 2 sequencing datasets from NCBI SRA database:**

```
python /path/to/Brownota/main.py -s 7227 -d SRR30623762 -d SRR30623766
python /path/to/Brownotate/main.py -s 7227 -d SRR30623762 -d SRR30623766
```

**Resume a previous run:**

```
python /path/to/Brownota/main.py --resume run_id
```
python /path/to/Brownotate/main.py --resume run_id
```


## Other scripts

- check_species_exists.py

Searches for the species in the UniprotKB Taxonomy database. If it exists, it displays its name and taxID like this "Staphylococcus aureus;1280". If it does not exist it raise an error.

Example:
```
python /path/to/Brownotate/check_species_exists.py -s "staphylococcus aureus"
```
- database_admin.py

Adds a user to the mongodb database. Works with -email and -password. If the email is already in the database, this changes the password.

Example:
```
python /path/to/Brownotate/database_admin.py -email test@email.com -password 48141514
```

Note: The password is encrypted using bcrypt before being stored in the database for added security.

- clear_working_dir.py

Proposes to delete old run working directories. Data can quickly accumulate, so a bit of tidying up from time to time is not a bad idea. Using input() methods, the script proposes to delete each run one after the other.

Example:
```
python /path/to/Brownotate/clear_working_dir.py
```
34 changes: 27 additions & 7 deletions database_search/better_data.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from database_search.uniprot import UniprotTaxo
import sys

def betterData(search_data_res):
if 'genome' in search_data_res:
Expand All @@ -23,16 +24,35 @@ def betterData(search_data_res):

def betterEvidence(evidence, taxo):
ensembl_evidence_score = -1
if evidence["ensembl"]:
if evidence["ensembl"] and evidence["ensembl"]["taxonId"]:
ensembl_evidence = evidence["ensembl"]
ensembl_evidence_score = getEvidenceScore(ensembl_evidence, taxo)
uniprot_proteome_evidence = evidence["uniprot_proteome"]
uniprot_proteome_evidence_score = getEvidenceScore(uniprot_proteome_evidence, taxo)
refseq_evidence = evidence["refseq"]
refseq_evidence_score = getEvidenceScore(refseq_evidence, taxo)
genbank_evidence = evidence["genbank"]
genbank_evidence_score = getEvidenceScore(genbank_evidence, taxo)

uniprot_proteome_evidence_score = -1
if evidence["uniprot_proteome"] and evidence["uniprot_proteome"]["taxonId"]:
uniprot_proteome_evidence = evidence["uniprot_proteome"]
uniprot_proteome_evidence_score = getEvidenceScore(uniprot_proteome_evidence, taxo)
else:
print(f"Warning: uniprot_proteome_evidence = {evidence['uniprot_proteome']}")

refseq_evidence_score = -1
if evidence["refseq"] and evidence["refseq"]["taxonId"]:
refseq_evidence = evidence["refseq"]
refseq_evidence_score = getEvidenceScore(refseq_evidence, taxo)
else:
print(f"Warning: refseq_evidence = {evidence['refseq']}")

genbank_evidence_score = -1
if evidence["genbank"] and evidence["genbank"]["taxonId"]:
genbank_evidence = evidence["genbank"]
genbank_evidence_score = getEvidenceScore(genbank_evidence, taxo)
else:
print(f"Warning: genbank_evidence = {evidence['genbank']}")

if (ensembl_evidence_score == -1 and uniprot_proteome_evidence_score == -1 and refseq_evidence_score == -1 and genbank_evidence_score == -1):
print("Error: No evidences found. Please try again with custom evidence.")
sys.exit(1)

best_evidence = None
best_score = -1
if ensembl_evidence_score > best_score:
Expand Down
6 changes: 3 additions & 3 deletions database_search/ensembl.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@
from database_search.uniprot import UniprotTaxo
from . import ncbi

def getBetterEnsembl(scientific_name, taxonomy, data_type, search_similar_species=False):
def getBetterEnsembl(scientific_name, taxonomy, data_type, search_similar_species=False, config=None):
results = ensembl.getDataFromFTP(data_type, [scientific_name])
if results:
taxonId = ncbi.getTaxonID(results["scientific_name"])
taxonId = ncbi.getTaxonID(results["scientific_name"], config)
results["taxonId"] = taxonId
return results
if search_similar_species == False:
Expand Down Expand Up @@ -37,7 +37,7 @@ def getBetterEnsembl(scientific_name, taxonomy, data_type, search_similar_specie
if results:
taxonId = UniprotTaxo.fetch_taxon_id(results["scientific_name"])
if not taxonId:
taxonId = ncbi.getTaxonID(results["scientific_name"])
taxonId = ncbi.getTaxonID(results["scientific_name"], config)
results["taxonId"] = taxonId
return results
return {}
32 changes: 23 additions & 9 deletions database_search/genome.py
Original file line number Diff line number Diff line change
@@ -1,29 +1,43 @@
from . import ensembl
from . import ensembl as dbs_ensembl
from ftp import ensembl as ftp_ensembl
from . import ncbi

def getGenomes(synonyms_scientific_names, taxonomy, search_similar_species):
def getGenomes(synonyms_scientific_names, taxonomy, search_similar_species, proteins_data, config):
# ENSEMBL
json_ensembl = {}
if not isProkaryotaOrArchaea(taxonomy):
if proteins_data and 'ensembl' in proteins_data and 'url' in proteins_data['ensembl'] and proteins_data['ensembl']['scientific_name'] in synonyms_scientific_names:
json_ensembl = ftp_ensembl.getAssemblyFTPrepository(proteins_data['ensembl']['url'], proteins_data['ensembl']['scientific_name'])
i = 0
while not json_ensembl and i < len(synonyms_scientific_names):
json_ensembl = ensembl.getBetterEnsembl(synonyms_scientific_names[i], taxonomy, 'dna', False)
json_ensembl = dbs_ensembl.getBetterEnsembl(synonyms_scientific_names[i], taxonomy, 'dna', False, config)
i += 1
if not json_ensembl and search_similar_species:
json_ensembl = ensembl.getBetterEnsembl(synonyms_scientific_names[0], taxonomy, 'dna', True)
json_ensembl = dbs_ensembl.getBetterEnsembl(synonyms_scientific_names[0], taxonomy, 'dna', True, config)

# REFSEQ
json_refseq = {}
json_genbank = {}
if proteins_data and 'refseq' in proteins_data and 'url' in proteins_data['refseq'] and proteins_data['refseq']['scientific_name'] in synonyms_scientific_names:
json_refseq = ncbi.fetchAssemblyDetails(proteins_data['refseq']['entrez_id'], 'genome', 'refseq')
i = 0
while not json_refseq and i < len(synonyms_scientific_names):
json_refseq = ncbi.getBetterNCBI(synonyms_scientific_names[i], taxonomy, 'refseq', 'genome', False)
json_refseq = ncbi.getBetterNCBI(synonyms_scientific_names[i], taxonomy, 'refseq', 'genome', False, config)
i += 1
if json_refseq and json_refseq['scientific_name'] in synonyms_scientific_names:
json_genbank = ncbi.fetchAssemblyDetails(json_refseq['entrez_id'], 'genome', 'genbank')
if not json_refseq and search_similar_species:
json_refseq = ncbi.getBetterNCBI(synonyms_scientific_names[0], taxonomy, 'refseq', 'genome', True)
json_genbank = {}
json_refseq = ncbi.getBetterNCBI(synonyms_scientific_names[0], taxonomy, 'refseq', 'genome', True, config)

# GENBANK
if proteins_data and 'genbank' in proteins_data and 'url' in proteins_data['genbank'] and proteins_data['genbank']['scientific_name'] in synonyms_scientific_names:
json_genbank = ncbi.fetchAssemblyDetails(proteins_data['genbank']['entrez_id'], 'genome', 'genbank')
i = 0
while not json_genbank and i < len(synonyms_scientific_names):
json_genbank = ncbi.getBetterNCBI(synonyms_scientific_names[i], taxonomy, 'genbank', 'genome', False)
json_genbank = ncbi.getBetterNCBI(synonyms_scientific_names[i], taxonomy, 'genbank', 'genome', False, config)
i += 1
if not json_genbank and search_similar_species:
json_genbank = ncbi.getBetterNCBI(synonyms_scientific_names[0], taxonomy, 'genbank', 'genome', True)
json_genbank = ncbi.getBetterNCBI(synonyms_scientific_names[0], taxonomy, 'genbank', 'genome', True, config)

return {
"ensembl": json_ensembl,
Expand Down
Loading

0 comments on commit 8f47d18

Please sign in to comment.