Skip to content

Commit

Permalink
Search (#79)
Browse files Browse the repository at this point in the history
* feat: adds sg class parent to assembly class; gbk outputs

* fix: no mstart when scatter=T

* fix: async +threading option

* feat: bgc search

Also many other updates/functionality
  • Loading branch information
chasemc authored Dec 21, 2023
1 parent 42bdcb8 commit 17968f9
Show file tree
Hide file tree
Showing 49 changed files with 1,607 additions and 993 deletions.
2 changes: 1 addition & 1 deletion Take an input BGC.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
- max_outdegree (int): HMM model annotations with an outdegree higher than this will be dropped
- scatter (bool, optional): Choose a random subset of proteins to search that are spread across the length of the input BGC. Defaults to False.
- bypass (List[str], optional): List of locus tags that will bypass filtering. This is the ID found in a GenBank file "/locus_tag=" field. Defaults to None.
- bypass_eid (List[str], optional): Less preferred than `bypass`. List of external protein IDs that will bypass filtering. This is the ID found in a GenBank file "/protein_id=" field. Defaults to None.
- protein_id_bypass_list (List[str], optional): Less preferred than `bypass`. List of external protein IDs that will bypass filtering. This is the ID found in a GenBank file "/protein_id=" field. Defaults to None.
7. Search the database for all proteins that have the same HMM model annotations as the input BGC proteins
- Output from database is a data frame with columns: ['assembly_uid', 'nucleotide_uid', 'target', 'n_start', 'n_end', 'query']
8. The initial hits output is filtered based on the following criteria:
Expand Down
2 changes: 1 addition & 1 deletion environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ channels:
- defaults

dependencies:
- conda-forge::python==3.10
- conda-forge::python==3.12
- conda-forge::pip>=23.1.2
- conda-forge::biopython>=1.79
- conda-forge::numpy
Expand Down
88 changes: 0 additions & 88 deletions new_search.py

This file was deleted.

88 changes: 0 additions & 88 deletions new_search2.py

This file was deleted.

1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ sg_get_goterms = "socialgene.utils.goterms:main"
# search
sg_mm_create = "socialgene.mmseqs.create_database:main"
sg_mm_search = "socialgene.mmseqs.search:main"
sg_search_gc = "socialgene.cli.search.gene_cluster:main"
# Modify database
sgdb_import_classyfire = "socialgene.dbmodifiers.classyfire.import:main"

Expand Down
2 changes: 1 addition & 1 deletion socialgene/base/compare_protein.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

import pandas as pd

from socialgene.compare_proteins.hmm.scoring import mod_score
from socialgene.compare_proteins.hmm_scoring import mod_score
from socialgene.neo4j.neo4j import Neo4jQuery
from socialgene.utils.logging import log

Expand Down
Loading

0 comments on commit 17968f9

Please sign in to comment.