Skip to content

Releases: BigDataBiology/argNorm

Version 0.6.0

26 Aug 15:15
Compare
Choose a tag to compare

Big change is adding GROOT support

Full Changelog:

  • argNorm supports the GROOT v1.1.2 ARG annotation tool: https://github.com/will-rowe/groot
  • GROOT support is via the GrootNormalizer (for use in python scripts) and the groot tool parameter with the groot-db, groot-core-db, groot-argannot, groot-card, and groot-resfinder db parameters in the CLI.

Other

funcscan integration

DB harmonisation

Version 0.5.0

02 Aug 16:38
Compare
Choose a tag to compare

Updated the drug categorization and improved manual curation

USER-FACING CHANGES

Improved drug categorization

  • drugs_to_drug_classes() also uses the 'has_part' ARO relationship now to get drug classes for antibiotic mixtures. In case of antibiotic mixtures, the drug classes of the drugs associated with 'has_part' are returned rather than 'antibiotic mixture' (ARO:3000707).
  • 'antibiotic mixture' will not be reported as a drug class, rather the individual antibiotic classes making up the antibiotic mixture will be reported.

Improved manual curation

  • manual curation (argannot): (Tet)tetH:EF460464:6286-7839:1554 was incorrectly annotated as ARO:3004797 which is a beta-lactamase due to a loose RGI hit. This was manually curated to ARO:3000175.
  • Improved curation:
    • resfinder_curation: grdA_1_QJX10702 -> 3007380 & EstDL136_1_JN242251 -> 3000557
    • megares_curation: MEG_2865|Drugs|Phenicol|Chloramphenicol_hydrolase|ESTD -> 3000557

Bugfixes

  • confers_resistance_to() now gets drugs information even if it is encoded at a higher level in the ARO. For example, OXA-19 previously only returned cephalosporin and penam, but now will also return oxacillin (from AMR gene family).
  • drugs_to_drug_classes() now correctly only returns the immediate child of 'antibiotic molecule' as the drug class (this was previously not the case for certain corner cases).
  • inconsistent ARO versions deeparg, megares, resfinderfg & sarg curation: ARO:3004445 -> ARO:3005440, this was due to a change in the ARO and the ARO number for the RSA2 gene changing, but the version of ARO bundled with argNorm was out of sync.

INTERNAL CHANGES

  • AROs were previously handled as integers in the get_aro_mapping_table() function and this posed challenges when ARO numbers such as 'ARO:0010004' (leading zeros leading to issues). To fix this, AROs are now treated as strings so leading zeros can be maintained.

Version 0.4.0

10 Jun 10:09
Compare
Choose a tag to compare

Major changes:

  • Bundle a specific version of ARO with the package instead of downloading it from the internet (ensures reproducibility)
  • Add missing ARO mappings to manual curation.
  • Command line tool accept database/tool names in case-independent way (by @sebastianLedzianowski)
  • lib.map_to_aro returns None if there is no mapping (raises an exception if the name is missing)

Version 0.3.0

27 Apr 04:14
Compare
Choose a tag to compare

Main changes are updates to the Resfinder and ARG-ANNOT mappings

Detailed changes

Handling gene clusters & reverse complements in resfinder

  • Resfinder has gene clusters which can't be passed through RGI using 'contig' mode.
  • Gene clusters were identified and were manually assigned ARO numbers.
  • A seperate file with manual curation for gene clusters and RCs was created, and their AROs were updated after concatenating RGI results and genes not in RGI results.
  • 40 gene clusters present.
  • 9 genes in reverse complement form also present.

Using amino acid file for argannot & resfinder rather than nucleotide file

  • ARG-ANNOT and Resfinder are comprised of coding sequences. The data wasn't being handled properly before as contig mode was used when passing coding sequences to RGI. Now, the amino acid versions of ARG-ANNOT & Resfinder are used with protein mode when running the database in RGI.
  • ARG-ANNOT AA file is available online. Resfinder AA file is generated using biopython.
  • One to many ARO mapping such as NG_047831:101-955 to Erm(K) and almG in ARG-ANNOT eliminated as protein mode used
  • A total of 10 ARO mappings changed in ARG-ANNOT

argnorm.lib: Making argNorm more usable as a library

  • Introduce argnorm.lib module
  • Users can import the map_to_aro function from argnorm.lib. The function takes a gene name as input, maps the gene to the ARO and returns a pronto term object with the ARO mapping.
  • The get_aro_mapping_table function, previously within the BaseNormalizer class, has also been moved to lib.py to give users the ability to access the mapping tables being used for normalization.
  • With the introduction of lib.py, users will be able to access core mapping utilities through argnorm.lib, drug categorization through argnorm.drug_categorization, and the traditional normalizers through argnorm.normalizers.

Version 0.2.0

12 Apr 05:51
Compare
Choose a tag to compare

ARO Mapping & Normalization

  • Updated mappings and manual curation tables for latest RGI
  • Hamronized ResFinderFG support
  • Removed python syntax in output

Drug Categorization

  • Improved drug categorization by using superclasses whenever direct drug categorization is not possible
  • Added better column headings for drug categorization (confers_resistance_to and resistance_to_drug_class)

Testing

  • Improved pytest testing
  • Added integration tests