Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement removing the strain also from self._lookup indices. [Bug] #90

Closed
Tracked by #134
hechth opened this issue Nov 7, 2022 · 0 comments · Fixed by #142
Closed
Tracked by #134

Implement removing the strain also from self._lookup indices. [Bug] #90

hechth opened this issue Nov 7, 2022 · 0 comments · Fixed by #142
Assignees

Comments

@hechth
Copy link
Collaborator

hechth commented Nov 7, 2022

        TODO: Implement removing the strain also from self._lookup indices.

https://github.com/hechth/nplinker/blob/ae713862655d7308c1597bd6c4dfd56e3f6a88da/src/nplinker/strain_collection.py#L46

CunliangGeng added a commit that referenced this issue Mar 31, 2023
self._lookup -> self._strain_dict_id
self._lookup_indices -> self._strain_dict_index

fix bug in remove method (#90)

solve issue #90
@CunliangGeng CunliangGeng self-assigned this Mar 31, 2023
CunliangGeng added a commit that referenced this issue May 12, 2023
- remove method `lookup_index`
- remove attribute `_strain_dict_index`
@CunliangGeng CunliangGeng linked a pull request May 12, 2023 that will close this issue
CunliangGeng added a commit that referenced this issue Jul 3, 2023
* change BGC attributes type from list to tuple

The following BGC attributes are updated:
- product_prediction
- mibig_bgc_class
- smiles

* use positional-only parameter in BGC and GCF

Parameters before "/" are positional-only parameters, see https://docs.python.org/3/glossary.html#term-parameter.

* update BGC's `__eq__` and `__hash__`

* update GCF's `__eq__` and `__hash__`

* Update gcf.py

* update Spectrum's `__eq__` and `__hash__`

* update MolecularFamily `__eq__` and `__hash__`

* Update molecular_family.py

* update Strain `__eq__` and `__hash__`

* update StrainCollection `__eq__`

* add TODO comments to ObjectLink

* add parameter type check for `add_alias`

* add `__contains__` to Strain class

* update `lookup` method of StrainCollection

* update `__contains__` in StrainCollection

* remove from __eq__

* update `__eq__` logic for Strain

* rename `_strain_dict_id` to `_strain_dict_name` in StrainCollection

* add comments to `get_common_strains`

* add comments and rename variables for DataLinks

* add comment about `met_only` parameter

* add todo comments to LinkFinder

* add comments to GNPSSpectrumLoader

to figure out how `spectrum_id` is set

* change Spectrum.spectrum_id from type int to str

* update spec_dict

* Update tests

* update `__eq__` in MolecularFamily

* change `MolecularFamily.family_id` from type int to str

* add method `has_strain` to MolecularFamily

* Update metcalf_scoring.py

* change array to dataframe in DataLinks

1. Change array to dataframe:
- self.M_gcf_strain -> self.gcf_strain_occurrence
- self. M_spec_strain -> self.spec_strain_occurrence
- self. M_fam_strain -> mf_strain_occurrence
2. update relevant methods to get the new dataframes
3. update logics of method `common_strains` using the new dataframes

* update references of the new dataframes from DataLinks

* update logics of `get_links` in NPLinker class

* Update test_nplinker.py

- add code to remove cached results

* move SCORING_METHODS to LinkFinder

* update method name to `get_common_strains`

* refactor mapping dataframes in DataLinks

* add TODOs and deprecation to LinkFinder

* refactor cooccurrence in DataLinks

* merge `load_data` and `find_correlations` to init in DataLinks

* refactor DataLinks attributes

- Move assignment of attributes to `__init__`
- Rename attributes
- Replace `fam` or `molfam` with `mf` to refer to molecular family
- Add docstrings

* Delete test_data_links.py

* update get_common_strains methods

- update parameters to be more clear and specific
- change strain id in returned dict to strain objects
-  update docstrings

* remove lookup_index method from StrainCollection (#90)

- remove method `lookup_index`
- remove attribute `_strain_dict_index`

* Remove integer id from GCF

* update lookup methods and attributes in NPLikner class

* change cooccurrence from array to DataFrame in DataLinks

* format link_finder.py

* temp replace array with dataframe in LinkFinder for metcalf scoring

* refactor `LinkFinder.get_scores` method

* refactor `LinkFinder.metcalf_scoring` method

- rename parameter name
- wrap parameters for weights to one parameter
- extract private method `_cal_mean_std`

* refactor get_links

* remove unused methods and scorings from LinkFinder

-  remove unused `likescore` and `hg` scoring types
- remove all unused methods

* refactor returned type of `LinkFinder.get_links` method

* add `lookup_mf` method in NPLinker class

* refactor MetcalfScoring class

* add deprecation to LinkLikelihood class

* add `__init__.py` to linking module

* rename `data_linking.py` to `data_links.py`

* rename `data_linking_functions.py` to `utils.py`

* rename `test_data_linking_functions.py` to `test_linking_utils.py`.py

* Delete test_scoring.py

* add dtype to DataLinks dataframes

* remove mapping dataframes and relevant method from DataLinks

Removed:
- self.mapping_spec
- self.mapping_gcf
- self.mapping_fam
-self.mapping_strain
- _get_mappings_from_occurrence() method

* Create test_data_links.py

* add `conftest.py` for scoring tests

* update LinkFinder's attribute and private method

- refactor method `_cal_mean_std`
- rename attribute `raw_score_fam_gcf` to `raw_score_mf_gcf`

* Create test_link_finder.py

* Update vscode plugin autodocstring template

- fix indentation bug in autodocsting
- remove `Examples:` section

* add scope for fixtures

* Create test_metcalf_scoring.py

* add docstrings and type hints to `MetcalfScoring` class

* add util func `isinstance_all`

* replace `_isinstance` with util func `isinstance_all`

* update validation of args for `DataLinks`

* Update test_data_links.py

- add docstrings
- add more tests

* add type hints for returned values to unit tests

* update exception types for invalid input

* add docstrings and type hints to `LinkFinder` class

* add more unit tests for `LinkFinder`

* fix input type bug for `DataLinks.get_common_strains`

* Create test_nplinker_scoring.py

* add todo comments to `NPLinker` class

* remove local integration tests for scoring part of `NPLinker`

- rename `test_nplinker.py` to `test_nplinker_local.py`

* remove unused imports

* Fix mypy warnings as much as possible

* check strain existence using strain dict

* change calculate abbrevation from "cal" to "calc"

* remove resolved TODO comment

* move shared fixtures to conftest.py

* remove unnecessary type hints

* update docstrings for cooccurrences

* use uuid for singleton molecular families #144

* add TODO comment for GNPSLoader

* update type hints for `*args` parameter
CunliangGeng added a commit that referenced this issue Jul 4, 2023
* change BGC attributes type from list to tuple

The following BGC attributes are updated:
- product_prediction
- mibig_bgc_class
- smiles

* use positional-only parameter in BGC and GCF

Parameters before "/" are positional-only parameters, see https://docs.python.org/3/glossary.html#term-parameter.

* update BGC's `__eq__` and `__hash__`

* update GCF's `__eq__` and `__hash__`

* Update gcf.py

* update Spectrum's `__eq__` and `__hash__`

* update MolecularFamily `__eq__` and `__hash__`

* Update molecular_family.py

* update Strain `__eq__` and `__hash__`

* update StrainCollection `__eq__`

* add TODO comments to ObjectLink

* add parameter type check for `add_alias`

* add `__contains__` to Strain class

* update `lookup` method of StrainCollection

* update `__contains__` in StrainCollection

* remove from __eq__

* update `__eq__` logic for Strain

* rename `_strain_dict_id` to `_strain_dict_name` in StrainCollection

* add comments to `get_common_strains`

* add comments and rename variables for DataLinks

* add comment about `met_only` parameter

* add todo comments to LinkFinder

* add comments to GNPSSpectrumLoader

to figure out how `spectrum_id` is set

* change Spectrum.spectrum_id from type int to str

* update spec_dict

* Update tests

* update `__eq__` in MolecularFamily

* change `MolecularFamily.family_id` from type int to str

* add method `has_strain` to MolecularFamily

* Update metcalf_scoring.py

* change array to dataframe in DataLinks

1. Change array to dataframe:
- self.M_gcf_strain -> self.gcf_strain_occurrence
- self. M_spec_strain -> self.spec_strain_occurrence
- self. M_fam_strain -> mf_strain_occurrence
2. update relevant methods to get the new dataframes
3. update logics of method `common_strains` using the new dataframes

* update references of the new dataframes from DataLinks

* update logics of `get_links` in NPLinker class

* Update test_nplinker.py

- add code to remove cached results

* move SCORING_METHODS to LinkFinder

* update method name to `get_common_strains`

* refactor mapping dataframes in DataLinks

* add TODOs and deprecation to LinkFinder

* refactor cooccurrence in DataLinks

* merge `load_data` and `find_correlations` to init in DataLinks

* refactor DataLinks attributes

- Move assignment of attributes to `__init__`
- Rename attributes
- Replace `fam` or `molfam` with `mf` to refer to molecular family
- Add docstrings

* Delete test_data_links.py

* update get_common_strains methods

- update parameters to be more clear and specific
- change strain id in returned dict to strain objects
-  update docstrings

* remove lookup_index method from StrainCollection (#90)

- remove method `lookup_index`
- remove attribute `_strain_dict_index`

* Remove integer id from GCF

* update lookup methods and attributes in NPLikner class

* change cooccurrence from array to DataFrame in DataLinks

* format link_finder.py

* temp replace array with dataframe in LinkFinder for metcalf scoring

* refactor `LinkFinder.get_scores` method

* refactor `LinkFinder.metcalf_scoring` method

- rename parameter name
- wrap parameters for weights to one parameter
- extract private method `_cal_mean_std`

* refactor get_links

* remove unused methods and scorings from LinkFinder

-  remove unused `likescore` and `hg` scoring types
- remove all unused methods

* refactor returned type of `LinkFinder.get_links` method

* add `lookup_mf` method in NPLinker class

* refactor MetcalfScoring class

* add deprecation to LinkLikelihood class

* add `__init__.py` to linking module

* rename `data_linking.py` to `data_links.py`

* rename `data_linking_functions.py` to `utils.py`

* rename `test_data_linking_functions.py` to `test_linking_utils.py`.py

* Delete test_scoring.py

* add dtype to DataLinks dataframes

* remove mapping dataframes and relevant method from DataLinks

Removed:
- self.mapping_spec
- self.mapping_gcf
- self.mapping_fam
-self.mapping_strain
- _get_mappings_from_occurrence() method

* Create test_data_links.py

* add `conftest.py` for scoring tests

* update LinkFinder's attribute and private method

- refactor method `_cal_mean_std`
- rename attribute `raw_score_fam_gcf` to `raw_score_mf_gcf`

* Create test_link_finder.py

* Update vscode plugin autodocstring template

- fix indentation bug in autodocsting
- remove `Examples:` section

* add scope for fixtures

* Create test_metcalf_scoring.py

* add docstrings and type hints to `MetcalfScoring` class

* add util func `isinstance_all`

* replace `_isinstance` with util func `isinstance_all`

* update validation of args for `DataLinks`

* Update test_data_links.py

- add docstrings
- add more tests

* add type hints for returned values to unit tests

* update exception types for invalid input

* add docstrings and type hints to `LinkFinder` class

* add more unit tests for `LinkFinder`

* fix input type bug for `DataLinks.get_common_strains`

* Create test_nplinker_scoring.py

* add todo comments to `NPLinker` class

* remove local integration tests for scoring part of `NPLinker`

- rename `test_nplinker.py` to `test_nplinker_local.py`

* remove unused imports

* Fix mypy warnings as much as possible

* check strain existence using strain dict

* change calculate abbrevation from "cal" to "calc"

* remove resolved TODO comment

* move shared fixtures to conftest.py

* remove unnecessary type hints

* update docstrings for cooccurrences

* use uuid for singleton molecular families #144

* add TODO comment for GNPSLoader

* fix typos

* remove useless parameter `met_only`

The `met_only` is useless. NPlinker will stop working if met_only=True.

* update exception type

* refactor the usage of PODPDownloader

1. create instance in the private method, only when it's needed
2. change the scope of the instance from global to local

* rename private config attributes in class DatasetLoader

-  add prefix `_config` for all config attributes
- add comments to restructure `__init__` code

* change the variable of app data dir to be global

this variable is independent of DatasetLoader and other classes, so it should be a global variable

* change two public methods to variables

* change one public method to attribute for DatasetLoader

* add value validation to Config

- move the validation of antismash format config in DatasetLoader to Config class
- refactor the config data validations into a private method

* add TODO comments about init and validate paths

* remove unused attribute `growth_media`

* remove commented code

* add TODO comments

* remove unused imports

* format the code

* reorder methods in loader.py
CunliangGeng added a commit that referenced this issue Jul 4, 2023
* change BGC attributes type from list to tuple

The following BGC attributes are updated:
- product_prediction
- mibig_bgc_class
- smiles

* use positional-only parameter in BGC and GCF

Parameters before "/" are positional-only parameters, see https://docs.python.org/3/glossary.html#term-parameter.

* update BGC's `__eq__` and `__hash__`

* update GCF's `__eq__` and `__hash__`

* Update gcf.py

* update Spectrum's `__eq__` and `__hash__`

* update MolecularFamily `__eq__` and `__hash__`

* Update molecular_family.py

* update Strain `__eq__` and `__hash__`

* update StrainCollection `__eq__`

* add TODO comments to ObjectLink

* add parameter type check for `add_alias`

* add `__contains__` to Strain class

* update `lookup` method of StrainCollection

* update `__contains__` in StrainCollection

* remove from __eq__

* update `__eq__` logic for Strain

* rename `_strain_dict_id` to `_strain_dict_name` in StrainCollection

* add comments to `get_common_strains`

* add comments and rename variables for DataLinks

* add comment about `met_only` parameter

* add todo comments to LinkFinder

* add comments to GNPSSpectrumLoader

to figure out how `spectrum_id` is set

* change Spectrum.spectrum_id from type int to str

* update spec_dict

* Update tests

* update `__eq__` in MolecularFamily

* change `MolecularFamily.family_id` from type int to str

* add method `has_strain` to MolecularFamily

* Update metcalf_scoring.py

* change array to dataframe in DataLinks

1. Change array to dataframe:
- self.M_gcf_strain -> self.gcf_strain_occurrence
- self. M_spec_strain -> self.spec_strain_occurrence
- self. M_fam_strain -> mf_strain_occurrence
2. update relevant methods to get the new dataframes
3. update logics of method `common_strains` using the new dataframes

* update references of the new dataframes from DataLinks

* update logics of `get_links` in NPLinker class

* Update test_nplinker.py

- add code to remove cached results

* move SCORING_METHODS to LinkFinder

* update method name to `get_common_strains`

* refactor mapping dataframes in DataLinks

* add TODOs and deprecation to LinkFinder

* refactor cooccurrence in DataLinks

* merge `load_data` and `find_correlations` to init in DataLinks

* refactor DataLinks attributes

- Move assignment of attributes to `__init__`
- Rename attributes
- Replace `fam` or `molfam` with `mf` to refer to molecular family
- Add docstrings

* Delete test_data_links.py

* update get_common_strains methods

- update parameters to be more clear and specific
- change strain id in returned dict to strain objects
-  update docstrings

* remove lookup_index method from StrainCollection (#90)

- remove method `lookup_index`
- remove attribute `_strain_dict_index`

* Remove integer id from GCF

* update lookup methods and attributes in NPLikner class

* change cooccurrence from array to DataFrame in DataLinks

* format link_finder.py

* temp replace array with dataframe in LinkFinder for metcalf scoring

* refactor `LinkFinder.get_scores` method

* refactor `LinkFinder.metcalf_scoring` method

- rename parameter name
- wrap parameters for weights to one parameter
- extract private method `_cal_mean_std`

* refactor get_links

* remove unused methods and scorings from LinkFinder

-  remove unused `likescore` and `hg` scoring types
- remove all unused methods

* refactor returned type of `LinkFinder.get_links` method

* add `lookup_mf` method in NPLinker class

* refactor MetcalfScoring class

* add deprecation to LinkLikelihood class

* add `__init__.py` to linking module

* rename `data_linking.py` to `data_links.py`

* rename `data_linking_functions.py` to `utils.py`

* rename `test_data_linking_functions.py` to `test_linking_utils.py`.py

* Delete test_scoring.py

* add dtype to DataLinks dataframes

* remove mapping dataframes and relevant method from DataLinks

Removed:
- self.mapping_spec
- self.mapping_gcf
- self.mapping_fam
-self.mapping_strain
- _get_mappings_from_occurrence() method

* Create test_data_links.py

* add `conftest.py` for scoring tests

* update LinkFinder's attribute and private method

- refactor method `_cal_mean_std`
- rename attribute `raw_score_fam_gcf` to `raw_score_mf_gcf`

* Create test_link_finder.py

* Update vscode plugin autodocstring template

- fix indentation bug in autodocsting
- remove `Examples:` section

* add scope for fixtures

* Create test_metcalf_scoring.py

* add docstrings and type hints to `MetcalfScoring` class

* add util func `isinstance_all`

* replace `_isinstance` with util func `isinstance_all`

* update validation of args for `DataLinks`

* Update test_data_links.py

- add docstrings
- add more tests

* add type hints for returned values to unit tests

* update exception types for invalid input

* add docstrings and type hints to `LinkFinder` class

* add more unit tests for `LinkFinder`

* fix input type bug for `DataLinks.get_common_strains`

* Create test_nplinker_scoring.py

* add todo comments to `NPLinker` class

* remove local integration tests for scoring part of `NPLinker`

- rename `test_nplinker.py` to `test_nplinker_local.py`

* remove unused imports

* Fix mypy warnings as much as possible

* check strain existence using strain dict

* change calculate abbrevation from "cal" to "calc"

* remove resolved TODO comment

* move shared fixtures to conftest.py

* remove unnecessary type hints

* update docstrings for cooccurrences

* use uuid for singleton molecular families #144

* add TODO comment for GNPSLoader

* fix typos

* remove useless parameter `met_only`

The `met_only` is useless. NPlinker will stop working if met_only=True.

* update exception type

* refactor the usage of PODPDownloader

1. create instance in the private method, only when it's needed
2. change the scope of the instance from global to local

* rename private config attributes in class DatasetLoader

-  add prefix `_config` for all config attributes
- add comments to restructure `__init__` code

* change the variable of app data dir to be global

this variable is independent of DatasetLoader and other classes, so it should be a global variable

* change two public methods to variables

* change one public method to attribute for DatasetLoader

* add value validation to Config

- move the validation of antismash format config in DatasetLoader to Config class
- refactor the config data validations into a private method

* add TODO comments about init and validate paths

* remove unused attribute `growth_media`

* remove commented code

* add TODO comments

* remove unused imports

* format the code

* reorder methods in loader.py

* add function `generate_genome_bgc_mappings_file`

* Update __init__.py

* add tests for `generate_genome_bgc_mappings_file`

* update strain test when 113 issue is closed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants