Search (#79)

* feat: adds sg class parent to assembly class; gbk outputs * fix: no mstart when scatter=T * fix: async +threading option * feat: bgc search Also many other updates/functionality
socialgene · Dec 21, 2023 · 17968f9 · 17968f9
1 parent 42bdcb8
commit 17968f9
Show file tree

Hide file tree

Showing 49 changed files with 1,607 additions and 993 deletions.
diff --git a/Take an input BGC.md b/Take an input BGC.md
@@ -11,7 +11,7 @@
     - max_outdegree (int): HMM model annotations with an outdegree higher than this will be dropped
     - scatter (bool, optional): Choose a random subset of proteins to search that are spread across the length of the input BGC. Defaults to False.
     - bypass (List[str], optional): List of locus tags that will bypass filtering. This is the ID found in a GenBank file "/locus_tag=" field. Defaults to None.
-    - bypass_eid (List[str], optional): Less preferred than `bypass`. List of external protein IDs that will bypass filtering. This is the ID found in a GenBank file "/protein_id=" field. Defaults to None.
+    - protein_id_bypass_list (List[str], optional): Less preferred than `bypass`. List of external protein IDs that will bypass filtering. This is the ID found in a GenBank file "/protein_id=" field. Defaults to None.
 7. Search the database for all proteins that have the same HMM model annotations as the input BGC proteins
     - Output from database is a data frame with columns: ['assembly_uid', 'nucleotide_uid', 'target', 'n_start', 'n_end', 'query']
 8. The initial hits output is filtered based on the following criteria:

diff --git a/environment.yml b/environment.yml
@@ -6,7 +6,7 @@ channels:
   - defaults
 
 dependencies:
-  - conda-forge::python==3.10
+  - conda-forge::python==3.12
   - conda-forge::pip>=23.1.2
   - conda-forge::biopython>=1.79
   - conda-forge::numpy

diff --git a/new_search.py b/new_search.py
diff --git a/new_search2.py b/new_search2.py
diff --git a/pyproject.toml b/pyproject.toml
@@ -64,6 +64,7 @@ sg_get_goterms          = "socialgene.utils.goterms:main"
 # search
 sg_mm_create            = "socialgene.mmseqs.create_database:main"
 sg_mm_search            = "socialgene.mmseqs.search:main"
+sg_search_gc            = "socialgene.cli.search.gene_cluster:main"
 # Modify database
 sgdb_import_classyfire = "socialgene.dbmodifiers.classyfire.import:main"
 

diff --git a/socialgene/base/compare_protein.py b/socialgene/base/compare_protein.py
@@ -3,7 +3,7 @@
 
 import pandas as pd
 
-from socialgene.compare_proteins.hmm.scoring import mod_score
+from socialgene.compare_proteins.hmm_scoring import mod_score
 from socialgene.neo4j.neo4j import Neo4jQuery
 from socialgene.utils.logging import log