-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add bonds
for structure type entries
#465
base: develop
Are you sure you want to change the base?
Conversation
It has been pointed out that bonds might cross the unit cell and this should as well be reflected. Edit: Added a way to describe translations. |
Co-authored-by: Antanas Vaitkus <antanas.vaitkus90@gmail.com>
Co-authored-by: Antanas Vaitkus <antanas.vaitkus90@gmail.com>
Co-authored-by: Antanas Vaitkus <antanas.vaitkus90@gmail.com>
Pinging @d-beltran for comments on how this proposal suits macromolecules. |
Pinging @utf who participated in the discussions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @merkys, here's my initial comments. I wonder if we also want this field to capture "non-chemical" connectivity, but I can't think of a good use case
optimade.rst
Outdated
- **Examples**: | ||
|
||
- :val:`[ {"sites": [1, 2]} ]`: a structure with a bond between sites 1 and 2. | ||
- :val:`[ {"sites": [1, 1], "translations": [ [0, 0, 0], [0, 0, 1] ]} ]`: a 1D polymer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current scheme, I think this is the representation of example of primitive NaCl:
- :val:`[ {"sites": [1, 1], "translations": [ [0, 0, 0], [0, 0, 1] ]} ]`: a 1D polymer. | |
- :val:``` | |
[ {"sites": [0, 1], "translations": [ [0, 0, 0], [-1, -1, 0] ]}, | |
{"sites": [0, 1], "translations": [ [0, 0, 0], [-1, 0, -1] ]}, | |
{"sites": [0, 1], "translations": [ [0, 0, 0], [0, -1, -1] ]}, | |
{"sites": [0, 1], "translations": [ [0, 0, 0], [1, 1, 0] ]} | |
{"sites": [0, 1], "translations": [ [0, 0, 0], [1, 0, 1] ]} | |
{"sites": [0, 1], "translations": [ [0, 0, 0], [0, 1, 1] ]} | |
] | |
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This property could potentially be used to describe crystal topology which may involve various types of contact distances, but that seems to be a bit a beast of its own (at least according to work done in the TOPO_CIF
dictionary [1]). I would probably limit it to chemical bonding for now, especially since currently we purposely are trying to avoid specifying the bond type/order.
[1] https://github.com/COMCIFS/TopoCif/blob/main/dictionary/Topology_0.9.5.dic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vaitkus Right, there have already been discussions in the workshop about having different networks for the same structure entry, but this seems to be difficult to accommodate at the moment.
Co-authored-by: Matthew Evans <7916000+ml-evs@users.noreply.github.com>
Co-authored-by: Antanas Vaitkus <antanas.vaitkus90@gmail.com>
|
||
- *sites*: a non-decreasing list of 0-based indexes of the two sites that form a chemical bond. | ||
|
||
- If translations are needed by at least one of the sites of a bond, the following key SHOULD be used: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the translations
field is not mandatory in general, maybe we should require it if at least one of the sites is translated?
That is, maybe we should change SHOULD to MUST?
Edit: changed the field name from sites
to translations
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you are talking about translations
, as sites
is mandatory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@merkys Yes, I actually meant translations
(fixed).
Works for me :)
The type of bond is not specified in most topology formats in our field but I think this is very inconvenient. So happy to know this is provisional and the type will be specified in a future. |
@eimrek suggested leaving translation vector only for one of the sites as at least one site will stay in the unit cell, or can be translated there. How about this: {
"sites": [63, 64],
"translation_site": 1, // the second of the two sites is translated
"translation_vector": [0, 0, 1]
} The |
@merkys I would leave at least the option to provide both of the translations. In the COD there are multiple non-polymeric molecules that spans several unit cells, so both translation vectors will be needed to correctly represent the complete molecule. I attach an example of such molecule to this comment (1540421.cif.txt, remove the txt extension before viewing), but @sauliusg could probably provide an even more extreme example (I seem to recall a molecule that spans 5 unit cells). |
Workshop: We are happy to merge an explicit bond strucutre datastructure, but we must consider the exact format so the types of queries that one wants to do can be performed (with the present filter language, preferably). Inheriting the current CIF framework for this should be seriously considered. |
It is always possible to back-translate one of the sites into the primary unit cell without losing the connectivity information. Having both non-zero translation vectors is a matter of convenience, I think, or does this retain some more information? |
Unless we introduce some data redundancy here, the most powerful queries would be the ones based on correlated arrays (a.k.a. zips), as OPTIMADE does not support anything more intricate than that. To make |
Well, if you back-translate sites into the primary unit cell you lose some information and end up with a set of disjointed bonded fragments. You could, of course, translate these fragments from the primary unit cell back into their proper place, however, it is not ye obvious to me that this is a straightforward task. What is the drawback of allowing to specify both sites? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we are thinking about implementation of fractional coordinates in the future, in such structures only sites within an asymmetric unit will be given. Maybe we also need a key symmetry_operation
in addition to the key translation
?
- **Type**: list of dictionary with keys: | ||
|
||
- :property:`sites`: a list of integers (REQUIRED) | ||
- :property:`translations`: a list of list of integers (OPTIONAL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably specify very clearly that translations must be applied to the coordinates of atoms as they are currently given, and not to, e.g., coordinates reduced to the [0;1) unit cell.
In issue #426 I proposed adding more chemical properties to OPTIMADE structures. This PR implements my suggestion on representation of chemical connectivity between pairs of sites in OPTIMADE:
I intentionally omit the bond types as this might be difficult to agree upon, whereas having just the connectivity is already beneficial.
Pinging people who have expressed their interest for comments: @eimrek @BobHanson @Austin243
Edit: I have introduced means to express connections with translation equivalents of the sites in
sites
.