Skip to content

Commit

Permalink
DOC: Improve documentation for v0.3.0
Browse files Browse the repository at this point in the history
- Added docstrings and comments to functions in lib.py, crude_db_harmonisation.py and get_mapping_table.py
- Added whatsnew.md to docs directory (read the docs)
- Moved unreleased to top of CHANGELOG.md
  • Loading branch information
Vedanth-Ramji authored and luispedro committed Apr 27, 2024
1 parent d10424a commit 1f0788c
Show file tree
Hide file tree
Showing 6 changed files with 85 additions and 23 deletions.
44 changes: 22 additions & 22 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,27 @@
# Changelog

## Unreleased

### Handling gene clusters & reverse complements in resfinder
- Resfinder has gene clusters which can't be passed through RGI using 'contig' mode.
- Gene clusters were identified and were manually assigned ARO numbers.
- A seperate file with manual curation for gene clusters and RCs was created, and their AROs were updated after concatenating RGI results and genes not in RGI results.
- 40 gene clusters present.
- 9 genes in reverse complement form also present.
- RC genes were manually curated.

### Using amino acid file for argannot & resfinder rather than nucleotide file
- ARG-ANNOT and Resfinder are comprised of coding sequences. The data wasn't being handled properly before as contig mode was used when passing coding sequences to RGI. Now, the amino acid versions of ARG-ANNOT & Resfinder are used with protein mode when running the database in RGI.
- ARG-ANNOT AA file is available online. Resfinder AA file is generated using biopython.
- One to many ARO mapping such as NG_047831:101-955 to Erm(K) and almG in ARG-ANNOT eliminated as protein mode used
- A total of 10 ARO mappings changed in ARG-ANNOT

### argnorm.lib: Making argNorm more usable as a library
- A file called `lib.py` will be introduced so that users can use argNorm as a library more easily.
- Users can import the `map_to_aro` function using `from argnorm.lib import map_to_aro`. The function takes a gene name as input, maps the gene to the ARO and returns a pronto term object with the ARO mapping.
- The `get_aro_mapping_table` function, previously within the BaseNormalizer class, has also been moved to `lib.py` to give users the ability to access the mapping tables being used for normalization.
- With the introduction of `lib.py`, users will be able to access core mapping utilities through `argnorm.lib`, drug categorization through `argnorm.drug_categorization`, and the traditional normalizers through `argnorm.normalizers`.

## 0.2.0 - 26 March 2024

#### ARO Mapping & Normalization
Expand Down Expand Up @@ -32,25 +54,3 @@
- Initial source code started
- Normalizers: added BaseNormalizer, ARGSOAPNormalizer, DeepARGNormalizer, AbricateNormalizer
- Testing: added basic ARO column test

## Unreleased

### Handling gene clusters & reverse complements in resfinder
- Resfinder has gene clusters which can't be passed through RGI using 'contig' mode.
- Gene clusters were identified and were manually assigned ARO numbers.
- A seperate file with manual curation for gene clusters and RCs was created, and their AROs were updated after concatenating RGI results and genes not in RGI results.
- 40 gene clusters present.
- 9 genes in reverse complement form also present.
- RC genes were manually curated.

### Using amino acid file for argannot & resfinder rather than nucleotide file
- ARG-ANNOT and Resfinder are comprised of coding sequences. The data wasn't being handled properly before as contig mode was used when passing coding sequences to RGI. Now, the amino acid versions of ARG-ANNOT & Resfinder are used with protein mode when running the database in RGI.
- ARG-ANNOT AA file is available online. Resfinder AA file is generated using biopython.
- One to many ARO mapping such as NG_047831:101-955 to Erm(K) and almG in ARG-ANNOT eliminated as protein mode used
- A total of 10 ARO mappings changed in ARG-ANNOT

### argnorm.lib: Making argNorm more usable as a library
- A file called `lib.py` will be introduced so that users can use argNorm as a library more easily.
- Users can import the `map_to_aro` function using `from argnorm.lib import map_to_aro`. The function takes a gene name as input, maps the gene to the ARO and returns a pronto term object with the ARO mapping.
- The `get_aro_mapping_table` function, previously within the BaseNormalizer class, has also been moved to `lib.py` to give users the ability to access the mapping tables being used for normalization.
- With the introduction of `lib.py`, users will be able to access core mapping utilities through `argnorm.lib`, drug categorization through `argnorm.drug_categorization`, and the traditional normalizers through `argnorm.normalizers`.
21 changes: 21 additions & 0 deletions argnorm/lib.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,16 @@
_ROOT = os.path.abspath(os.path.dirname(__file__))

def get_aro_mapping_table(database):
"""
Description: Returns the ARO mapping table for a specific supported databases.
Parameters:
database (str): name of database. Can be: argannot, deeparg, megares, ncbi, resfinderfg and sarg
Returns:
aro_mapping_table (DataFrame): A pandas dataframe with ARGs mapped to AROs.
"""

aro_mapping_table = pd.read_csv(
os.path.join(_ROOT, 'data', f'{database}_ARO_mapping.tsv'),
sep='\t')
Expand All @@ -26,6 +36,17 @@ def get_aro_mapping_table(database):
return aro_mapping_table

def map_to_aro(gene, database):
"""
Description: Gets ARO mapping for a specific gene in a database.
Parameters:
gene (str): The original ID of the gene as mentioned in source database.
database (str): name of database. Can be: argannot, deeparg, megares, ncbi, resfinderfg and sarg
Returns:
ARO[result] (pronto.term.Term): A pronto term with the ARO number of input gene. ARO number can be accessed using 'id' attribute and gene name can be accessed using 'name' attribute.
"""

if database not in ['ncbi', 'deeparg', 'resfinder', 'sarg', 'megares', 'argannot']:
raise Exception(f'{database} is not a supported database.')

Expand Down
4 changes: 4 additions & 0 deletions db_harmonisation/crude_db_harmonisation.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ def get_megares_db():
url = 'https://www.meglab.org/downloads/megares_v3.00/megares_database_v3.00.fasta'
return download_file(url, 'dbs/megares.fna')

# NCBI db has '*' at end of each protein sequence. RGI can't handle that, so '*' is removed
@TaskGenerator
def fix_ncbi(ncbi_amr_faa):
ofile = './dbs/ncbi.faa'
Expand All @@ -69,6 +70,7 @@ def fix_ncbi(ncbi_amr_faa):

return ofile

# Needed when nucleotide database (eg. resfinder) needs to be run through RGI with protein mode
@TaskGenerator
def fna_to_faa(ifile):
ofile = ifile.replace('.fna', '.faa')
Expand Down Expand Up @@ -111,10 +113,12 @@ def run_rgi(fa):
get_aro_for_hits(fa, rgi_ofile + '.txt', db_name).to_csv(ofile, sep='\t', index=False)
return ofile

# Moving ARO mapping tables over to argnorm/data
@TaskGenerator
def move_mappings_to_argnorm(aro_mapping):
shutil.copy(aro_mapping, '../argnorm/data')

# Calling tasks
create_out_dirs()
barrier()
for db in [
Expand Down
3 changes: 3 additions & 0 deletions db_harmonisation/get_mapping_table.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@ def check_file(path):
raise argparse.ArgumentTypeError(f"{path} can't be read")

def get_aro_for_hits(fa, rgi_output, database):
"""
Generates ARO mapping tables by copying Best_Hit_ARO, ARO and ORF_ID/Contig columns from RGI output
"""
database_entries = []
for record in SeqIO.parse(str(fa), 'fasta'):
if 'mutation' not in record.id:
Expand Down
34 changes: 34 additions & 0 deletions docs/images/whatsnew.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
## What's New

## 0.2.0 - 26 March 2024

#### ARO Mapping & Normalization

- Updated mappings and manual curation tables for latest RGI
- Hamronized ResFinderFG support
- Removed python syntax in output

#### Drug Categorization

- Improved drug categorization by using superclasses whenever direct drug categorization is not possible
- Added better column headings for drug categorization (confers_resistance_to and resistance_to_drug_class)

#### Internal Improvements: Testing

- Improved pytest testing
- Added integration tests

## 0.1.0 - 20 December, 2023

- Added hamronized support for AMRFinderPlus
- Fixed ARO:nan issue (added manually curated mapping tables and integrated it with normalizers)
- Added drug categorization feature and integrated it with normalizers
- Added AMRFinderPlusNormalizer, ResFinderNormalizer
- Added specific smoke tests for ARGSOAPNormalizer, DeepARGNormalizer, AbricateNormalizer, AMRFinderPlusNormalizer and ResFinderNormalizer

## 0.0.1 - 13 June, 2022

- First release
- Initial source code started
- Normalizers: added BaseNormalizer, ARGSOAPNormalizer, DeepARGNormalizer, AbricateNormalizer
- Testing: added basic ARO column test
2 changes: 1 addition & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,4 @@ theme:

nav:
- 'argNorm': index.md

- "What's New": whatsnew.md

0 comments on commit 1f0788c

Please sign in to comment.