Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust Formal Grammar to simplify no leading zeros #23

Open
vfscalfani opened this issue Sep 29, 2021 · 3 comments
Open

Adjust Formal Grammar to simplify no leading zeros #23

vfscalfani opened this issue Sep 29, 2021 · 3 comments

Comments

@vfscalfani
Copy link
Member

In the current IUPAC SMILES+ draft document, leading zeros are not allowed for atom properties including isotope, H count, charge, atom class, and ring bonds.

In an effort to clarify this in the formal grammar, digit notation was added:

digit ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
digit_nonzero ::= '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'

We then added notation like this below to specify no leading zeros (The isotope specification supports up to 3 digits, nnn):

isotope ::= digit | digit_nonzero digit | digit_nonzero digit digit

In hindsight, there is probably a cleaner way to do this by defining a number, which can not have leading zeros. If you ideas on how best to do this with the formal grammar, please comment!

@merkys
Copy link

merkys commented Sep 30, 2021

In an effort to clarify this in the formal grammar, digit notation was added:

digit ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' digit_nonzero ::= '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'

We then added notation like this below to specify no leading zeros (The isotope specification supports up to 3 digits, nnn):

isotope ::= digit | digit_nonzero digit | digit_nonzero digit digit

This indeed looks nice and clear. I am not sure if there is chemical sense in allowing isotope 0, though.

In hindsight, there is probably a cleaner way to do this by defining a number, which can not have leading zeros. If you ideas on how best to do this with the formal grammar, please comment!

This could be done as:

natural_number ::= digit_nonzero | natural_number digit

However, by using such rule there is no way to limit the number of digits. Thus I prefer your grammar notation instead.

@vfscalfani
Copy link
Member Author

Great, thanks for the feedback. I'm glad the new notation makes sense. Yes, I agree with you that isotope 0 does not make chemical sense, but we do need to define it in the sense of parsing SMILES. This is actually one of the changes we made compared to OpenSMILES. In OpenSMILES an isotope value of 0 is a zero isotope, while in the IUPAC SMILES+ draft, it states:

A 0 isotope specification is equivalent to undefined, and the atom is assumed to have the naturally-occurring isotopic ratios. For example, [0S] is equivalent to [S].

@merkys
Copy link

merkys commented Oct 1, 2021

Thanks for pointing out the description of isotope 0 in IUPAC SMILES+ draft to me. Nevertheless I think isotope 0 should not be allowed, as I cannot see benefit of writing [0S] instead of just [S].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants