-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use UUID for singleton molecular family? #144
Milestone
Comments
I'd be in favour of UUIDs for singleton families - or maybe even all families? |
CunliangGeng
added a commit
that referenced
this issue
Jul 3, 2023
* change BGC attributes type from list to tuple The following BGC attributes are updated: - product_prediction - mibig_bgc_class - smiles * use positional-only parameter in BGC and GCF Parameters before "/" are positional-only parameters, see https://docs.python.org/3/glossary.html#term-parameter. * update BGC's `__eq__` and `__hash__` * update GCF's `__eq__` and `__hash__` * Update gcf.py * update Spectrum's `__eq__` and `__hash__` * update MolecularFamily `__eq__` and `__hash__` * Update molecular_family.py * update Strain `__eq__` and `__hash__` * update StrainCollection `__eq__` * add TODO comments to ObjectLink * add parameter type check for `add_alias` * add `__contains__` to Strain class * update `lookup` method of StrainCollection * update `__contains__` in StrainCollection * remove from __eq__ * update `__eq__` logic for Strain * rename `_strain_dict_id` to `_strain_dict_name` in StrainCollection * add comments to `get_common_strains` * add comments and rename variables for DataLinks * add comment about `met_only` parameter * add todo comments to LinkFinder * add comments to GNPSSpectrumLoader to figure out how `spectrum_id` is set * change Spectrum.spectrum_id from type int to str * update spec_dict * Update tests * update `__eq__` in MolecularFamily * change `MolecularFamily.family_id` from type int to str * add method `has_strain` to MolecularFamily * Update metcalf_scoring.py * change array to dataframe in DataLinks 1. Change array to dataframe: - self.M_gcf_strain -> self.gcf_strain_occurrence - self. M_spec_strain -> self.spec_strain_occurrence - self. M_fam_strain -> mf_strain_occurrence 2. update relevant methods to get the new dataframes 3. update logics of method `common_strains` using the new dataframes * update references of the new dataframes from DataLinks * update logics of `get_links` in NPLinker class * Update test_nplinker.py - add code to remove cached results * move SCORING_METHODS to LinkFinder * update method name to `get_common_strains` * refactor mapping dataframes in DataLinks * add TODOs and deprecation to LinkFinder * refactor cooccurrence in DataLinks * merge `load_data` and `find_correlations` to init in DataLinks * refactor DataLinks attributes - Move assignment of attributes to `__init__` - Rename attributes - Replace `fam` or `molfam` with `mf` to refer to molecular family - Add docstrings * Delete test_data_links.py * update get_common_strains methods - update parameters to be more clear and specific - change strain id in returned dict to strain objects - update docstrings * remove lookup_index method from StrainCollection (#90) - remove method `lookup_index` - remove attribute `_strain_dict_index` * Remove integer id from GCF * update lookup methods and attributes in NPLikner class * change cooccurrence from array to DataFrame in DataLinks * format link_finder.py * temp replace array with dataframe in LinkFinder for metcalf scoring * refactor `LinkFinder.get_scores` method * refactor `LinkFinder.metcalf_scoring` method - rename parameter name - wrap parameters for weights to one parameter - extract private method `_cal_mean_std` * refactor get_links * remove unused methods and scorings from LinkFinder - remove unused `likescore` and `hg` scoring types - remove all unused methods * refactor returned type of `LinkFinder.get_links` method * add `lookup_mf` method in NPLinker class * refactor MetcalfScoring class * add deprecation to LinkLikelihood class * add `__init__.py` to linking module * rename `data_linking.py` to `data_links.py` * rename `data_linking_functions.py` to `utils.py` * rename `test_data_linking_functions.py` to `test_linking_utils.py`.py * Delete test_scoring.py * add dtype to DataLinks dataframes * remove mapping dataframes and relevant method from DataLinks Removed: - self.mapping_spec - self.mapping_gcf - self.mapping_fam -self.mapping_strain - _get_mappings_from_occurrence() method * Create test_data_links.py * add `conftest.py` for scoring tests * update LinkFinder's attribute and private method - refactor method `_cal_mean_std` - rename attribute `raw_score_fam_gcf` to `raw_score_mf_gcf` * Create test_link_finder.py * Update vscode plugin autodocstring template - fix indentation bug in autodocsting - remove `Examples:` section * add scope for fixtures * Create test_metcalf_scoring.py * add docstrings and type hints to `MetcalfScoring` class * add util func `isinstance_all` * replace `_isinstance` with util func `isinstance_all` * update validation of args for `DataLinks` * Update test_data_links.py - add docstrings - add more tests * add type hints for returned values to unit tests * update exception types for invalid input * add docstrings and type hints to `LinkFinder` class * add more unit tests for `LinkFinder` * fix input type bug for `DataLinks.get_common_strains` * Create test_nplinker_scoring.py * add todo comments to `NPLinker` class * remove local integration tests for scoring part of `NPLinker` - rename `test_nplinker.py` to `test_nplinker_local.py` * remove unused imports * Fix mypy warnings as much as possible * check strain existence using strain dict * change calculate abbrevation from "cal" to "calc" * remove resolved TODO comment * move shared fixtures to conftest.py * remove unnecessary type hints * update docstrings for cooccurrences * use uuid for singleton molecular families #144 * add TODO comment for GNPSLoader * update type hints for `*args` parameter
The current family id looks good enough to support the features of nplinker right now. I'd like to keep it as it is until we see requirements for uuid. |
CunliangGeng
added a commit
that referenced
this issue
Jul 4, 2023
* change BGC attributes type from list to tuple The following BGC attributes are updated: - product_prediction - mibig_bgc_class - smiles * use positional-only parameter in BGC and GCF Parameters before "/" are positional-only parameters, see https://docs.python.org/3/glossary.html#term-parameter. * update BGC's `__eq__` and `__hash__` * update GCF's `__eq__` and `__hash__` * Update gcf.py * update Spectrum's `__eq__` and `__hash__` * update MolecularFamily `__eq__` and `__hash__` * Update molecular_family.py * update Strain `__eq__` and `__hash__` * update StrainCollection `__eq__` * add TODO comments to ObjectLink * add parameter type check for `add_alias` * add `__contains__` to Strain class * update `lookup` method of StrainCollection * update `__contains__` in StrainCollection * remove from __eq__ * update `__eq__` logic for Strain * rename `_strain_dict_id` to `_strain_dict_name` in StrainCollection * add comments to `get_common_strains` * add comments and rename variables for DataLinks * add comment about `met_only` parameter * add todo comments to LinkFinder * add comments to GNPSSpectrumLoader to figure out how `spectrum_id` is set * change Spectrum.spectrum_id from type int to str * update spec_dict * Update tests * update `__eq__` in MolecularFamily * change `MolecularFamily.family_id` from type int to str * add method `has_strain` to MolecularFamily * Update metcalf_scoring.py * change array to dataframe in DataLinks 1. Change array to dataframe: - self.M_gcf_strain -> self.gcf_strain_occurrence - self. M_spec_strain -> self.spec_strain_occurrence - self. M_fam_strain -> mf_strain_occurrence 2. update relevant methods to get the new dataframes 3. update logics of method `common_strains` using the new dataframes * update references of the new dataframes from DataLinks * update logics of `get_links` in NPLinker class * Update test_nplinker.py - add code to remove cached results * move SCORING_METHODS to LinkFinder * update method name to `get_common_strains` * refactor mapping dataframes in DataLinks * add TODOs and deprecation to LinkFinder * refactor cooccurrence in DataLinks * merge `load_data` and `find_correlations` to init in DataLinks * refactor DataLinks attributes - Move assignment of attributes to `__init__` - Rename attributes - Replace `fam` or `molfam` with `mf` to refer to molecular family - Add docstrings * Delete test_data_links.py * update get_common_strains methods - update parameters to be more clear and specific - change strain id in returned dict to strain objects - update docstrings * remove lookup_index method from StrainCollection (#90) - remove method `lookup_index` - remove attribute `_strain_dict_index` * Remove integer id from GCF * update lookup methods and attributes in NPLikner class * change cooccurrence from array to DataFrame in DataLinks * format link_finder.py * temp replace array with dataframe in LinkFinder for metcalf scoring * refactor `LinkFinder.get_scores` method * refactor `LinkFinder.metcalf_scoring` method - rename parameter name - wrap parameters for weights to one parameter - extract private method `_cal_mean_std` * refactor get_links * remove unused methods and scorings from LinkFinder - remove unused `likescore` and `hg` scoring types - remove all unused methods * refactor returned type of `LinkFinder.get_links` method * add `lookup_mf` method in NPLinker class * refactor MetcalfScoring class * add deprecation to LinkLikelihood class * add `__init__.py` to linking module * rename `data_linking.py` to `data_links.py` * rename `data_linking_functions.py` to `utils.py` * rename `test_data_linking_functions.py` to `test_linking_utils.py`.py * Delete test_scoring.py * add dtype to DataLinks dataframes * remove mapping dataframes and relevant method from DataLinks Removed: - self.mapping_spec - self.mapping_gcf - self.mapping_fam -self.mapping_strain - _get_mappings_from_occurrence() method * Create test_data_links.py * add `conftest.py` for scoring tests * update LinkFinder's attribute and private method - refactor method `_cal_mean_std` - rename attribute `raw_score_fam_gcf` to `raw_score_mf_gcf` * Create test_link_finder.py * Update vscode plugin autodocstring template - fix indentation bug in autodocsting - remove `Examples:` section * add scope for fixtures * Create test_metcalf_scoring.py * add docstrings and type hints to `MetcalfScoring` class * add util func `isinstance_all` * replace `_isinstance` with util func `isinstance_all` * update validation of args for `DataLinks` * Update test_data_links.py - add docstrings - add more tests * add type hints for returned values to unit tests * update exception types for invalid input * add docstrings and type hints to `LinkFinder` class * add more unit tests for `LinkFinder` * fix input type bug for `DataLinks.get_common_strains` * Create test_nplinker_scoring.py * add todo comments to `NPLinker` class * remove local integration tests for scoring part of `NPLinker` - rename `test_nplinker.py` to `test_nplinker_local.py` * remove unused imports * Fix mypy warnings as much as possible * check strain existence using strain dict * change calculate abbrevation from "cal" to "calc" * remove resolved TODO comment * move shared fixtures to conftest.py * remove unnecessary type hints * update docstrings for cooccurrences * use uuid for singleton molecular families #144 * add TODO comment for GNPSLoader * fix typos * remove useless parameter `met_only` The `met_only` is useless. NPlinker will stop working if met_only=True. * update exception type * refactor the usage of PODPDownloader 1. create instance in the private method, only when it's needed 2. change the scope of the instance from global to local * rename private config attributes in class DatasetLoader - add prefix `_config` for all config attributes - add comments to restructure `__init__` code * change the variable of app data dir to be global this variable is independent of DatasetLoader and other classes, so it should be a global variable * change two public methods to variables * change one public method to attribute for DatasetLoader * add value validation to Config - move the validation of antismash format config in DatasetLoader to Config class - refactor the config data validations into a private method * add TODO comments about init and validate paths * remove unused attribute `growth_media` * remove commented code * add TODO comments * remove unused imports * format the code * reorder methods in loader.py
CunliangGeng
added a commit
that referenced
this issue
Jul 4, 2023
* change BGC attributes type from list to tuple The following BGC attributes are updated: - product_prediction - mibig_bgc_class - smiles * use positional-only parameter in BGC and GCF Parameters before "/" are positional-only parameters, see https://docs.python.org/3/glossary.html#term-parameter. * update BGC's `__eq__` and `__hash__` * update GCF's `__eq__` and `__hash__` * Update gcf.py * update Spectrum's `__eq__` and `__hash__` * update MolecularFamily `__eq__` and `__hash__` * Update molecular_family.py * update Strain `__eq__` and `__hash__` * update StrainCollection `__eq__` * add TODO comments to ObjectLink * add parameter type check for `add_alias` * add `__contains__` to Strain class * update `lookup` method of StrainCollection * update `__contains__` in StrainCollection * remove from __eq__ * update `__eq__` logic for Strain * rename `_strain_dict_id` to `_strain_dict_name` in StrainCollection * add comments to `get_common_strains` * add comments and rename variables for DataLinks * add comment about `met_only` parameter * add todo comments to LinkFinder * add comments to GNPSSpectrumLoader to figure out how `spectrum_id` is set * change Spectrum.spectrum_id from type int to str * update spec_dict * Update tests * update `__eq__` in MolecularFamily * change `MolecularFamily.family_id` from type int to str * add method `has_strain` to MolecularFamily * Update metcalf_scoring.py * change array to dataframe in DataLinks 1. Change array to dataframe: - self.M_gcf_strain -> self.gcf_strain_occurrence - self. M_spec_strain -> self.spec_strain_occurrence - self. M_fam_strain -> mf_strain_occurrence 2. update relevant methods to get the new dataframes 3. update logics of method `common_strains` using the new dataframes * update references of the new dataframes from DataLinks * update logics of `get_links` in NPLinker class * Update test_nplinker.py - add code to remove cached results * move SCORING_METHODS to LinkFinder * update method name to `get_common_strains` * refactor mapping dataframes in DataLinks * add TODOs and deprecation to LinkFinder * refactor cooccurrence in DataLinks * merge `load_data` and `find_correlations` to init in DataLinks * refactor DataLinks attributes - Move assignment of attributes to `__init__` - Rename attributes - Replace `fam` or `molfam` with `mf` to refer to molecular family - Add docstrings * Delete test_data_links.py * update get_common_strains methods - update parameters to be more clear and specific - change strain id in returned dict to strain objects - update docstrings * remove lookup_index method from StrainCollection (#90) - remove method `lookup_index` - remove attribute `_strain_dict_index` * Remove integer id from GCF * update lookup methods and attributes in NPLikner class * change cooccurrence from array to DataFrame in DataLinks * format link_finder.py * temp replace array with dataframe in LinkFinder for metcalf scoring * refactor `LinkFinder.get_scores` method * refactor `LinkFinder.metcalf_scoring` method - rename parameter name - wrap parameters for weights to one parameter - extract private method `_cal_mean_std` * refactor get_links * remove unused methods and scorings from LinkFinder - remove unused `likescore` and `hg` scoring types - remove all unused methods * refactor returned type of `LinkFinder.get_links` method * add `lookup_mf` method in NPLinker class * refactor MetcalfScoring class * add deprecation to LinkLikelihood class * add `__init__.py` to linking module * rename `data_linking.py` to `data_links.py` * rename `data_linking_functions.py` to `utils.py` * rename `test_data_linking_functions.py` to `test_linking_utils.py`.py * Delete test_scoring.py * add dtype to DataLinks dataframes * remove mapping dataframes and relevant method from DataLinks Removed: - self.mapping_spec - self.mapping_gcf - self.mapping_fam -self.mapping_strain - _get_mappings_from_occurrence() method * Create test_data_links.py * add `conftest.py` for scoring tests * update LinkFinder's attribute and private method - refactor method `_cal_mean_std` - rename attribute `raw_score_fam_gcf` to `raw_score_mf_gcf` * Create test_link_finder.py * Update vscode plugin autodocstring template - fix indentation bug in autodocsting - remove `Examples:` section * add scope for fixtures * Create test_metcalf_scoring.py * add docstrings and type hints to `MetcalfScoring` class * add util func `isinstance_all` * replace `_isinstance` with util func `isinstance_all` * update validation of args for `DataLinks` * Update test_data_links.py - add docstrings - add more tests * add type hints for returned values to unit tests * update exception types for invalid input * add docstrings and type hints to `LinkFinder` class * add more unit tests for `LinkFinder` * fix input type bug for `DataLinks.get_common_strains` * Create test_nplinker_scoring.py * add todo comments to `NPLinker` class * remove local integration tests for scoring part of `NPLinker` - rename `test_nplinker.py` to `test_nplinker_local.py` * remove unused imports * Fix mypy warnings as much as possible * check strain existence using strain dict * change calculate abbrevation from "cal" to "calc" * remove resolved TODO comment * move shared fixtures to conftest.py * remove unnecessary type hints * update docstrings for cooccurrences * use uuid for singleton molecular families #144 * add TODO comment for GNPSLoader * fix typos * remove useless parameter `met_only` The `met_only` is useless. NPlinker will stop working if met_only=True. * update exception type * refactor the usage of PODPDownloader 1. create instance in the private method, only when it's needed 2. change the scope of the instance from global to local * rename private config attributes in class DatasetLoader - add prefix `_config` for all config attributes - add comments to restructure `__init__` code * change the variable of app data dir to be global this variable is independent of DatasetLoader and other classes, so it should be a global variable * change two public methods to variables * change one public method to attribute for DatasetLoader * add value validation to Config - move the validation of antismash format config in DatasetLoader to Config class - refactor the config data validations into a private method * add TODO comments about init and validate paths * remove unused attribute `growth_media` * remove commented code * add TODO comments * remove unused imports * format the code * reorder methods in loader.py * add function `generate_genome_bgc_mappings_file` * Update __init__.py * add tests for `generate_genome_bgc_mappings_file` * update strain test when 113 issue is closed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The issue is created from #142 (comment)
check the influence of using UUID, e.g. how to verify the singletons
if feasible, using UUID for singletons
does it make sense to use UUID for all types of MFs? not for now.
The text was updated successfully, but these errors were encountered: