Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed bug for writer initialized by Chem.SDWriter(...). #9929

Merged
merged 1 commit into from
Jan 9, 2025

Conversation

drivanov
Copy link
Contributor

@drivanov drivanov commented Jan 9, 2025

Without the writer.close() statement, the file written by writer will not be closed properly. As a result, in our test the end of the file /workspace/data/MoleculeGPT/raw/molecules.sdf is missing. This is what it looks like:

472184
     RDKit          2D

  1  0  0  0  0  0  0  0  0  0999 V2000
    2.0000    0.0000    0.0000 Os  0  0  0  0  0 15  0  0  0  0  0  0
M  CHG  1   1   4
M  END
>  <PUBCHEM_COMPOUND_CID>  (4303)
472184

>  <PUBCHEM_COMPOUND_CANONICALIZED>  (4303)
1

>  <PUBCHEM_CACTVS_COMPLEXITY>  (4303)
0

>  <PUBCHEM_CACTVS_HBOND_ACCEPTOR>  (4303)
0

>  <PUBCHEM_CACTVS_HBOND_DONOR>  (4303)
0

>  <PUBCHEM_CACTVS_ROTATABLE_BOND>  (4303)
0

>  <PUBCHEM_CACTVS_SUBSKEYS>  (4303)
AAADcQAAAAAAAAAAAAAAEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA==

>  <PUBCHEM_IUPAC_OPENEYE_NAME

Even the <PUBCHEM_IUPAC_OPENEYE_NAME tag does not end correctly with the > character, and the last molecule (#4303) is missing. As a result, we get a crash later when running the test:

Traceback (most recent call last):
  File "/workspace/examples/llm/molecule_gpt.py", line 187, in <module>
    train(
  File "/workspace/examples/llm/molecule_gpt.py", line 69, in train
    dataset = MoleculeGPTDataset(path)
              ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch_geometric/datasets/molecule_gpt_dataset.py", line 217, in __init__
    super().__init__(root, transform, pre_transform, pre_filter,
  File "/usr/local/lib/python3.12/dist-packages/torch_geometric/data/in_memory_dataset.py", line 81, in __init__
    super().__init__(root, transform, pre_transform, pre_filter, log,
  File "/usr/local/lib/python3.12/dist-packages/torch_geometric/data/dataset.py", line 115, in __init__
    self._process()
  File "/usr/local/lib/python3.12/dist-packages/torch_geometric/data/dataset.py", line 262, in _process
    self.process()
  File "/usr/local/lib/python3.12/dist-packages/torch_geometric/datasets/molecule_gpt_dataset.py", line 436, in process
    CAN_SMILES = mol.GetProp("PUBCHEM_OPENEYE_CAN_SMILES")
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'PUBCHEM_OPENEYE_CAN_SMILES'

Copy link
Contributor

@puririshi98 puririshi98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for finding this fix

@puririshi98 puririshi98 merged commit ef02854 into pyg-team:master Jan 9, 2025
16 of 17 checks passed
@drivanov drivanov deleted the molecule_gpt branch January 10, 2025 00:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants