From 9ec5181e0052f8159f6a4e954c2266eff4a4d088 Mon Sep 17 00:00:00 2001 From: Andrius Merkys Date: Fri, 25 Nov 2022 16:32:14 +0200 Subject: [PATCH 1/9] Describe SMILES data type. --- optimade.rst | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/optimade.rst b/optimade.rst index 01ef3fca0..2aac163c1 100644 --- a/optimade.rst +++ b/optimade.rst @@ -211,7 +211,7 @@ Hence, entry properties are described in this proposal using context-independent types that are assumed to have some form of representation in all contexts. They are as follows: -- Basic types: **string**, **integer**, **float**, **boolean**, **timestamp**. +- Basic types: **string**, **integer**, **float**, **boolean**, **timestamp**, **smiles**. - **list**: an ordered collection of items, where all items are of the same type, unless they are unknown. A list can be empty, i.e., contain no items. - **dictionary**: an associative array of **keys** and **values**, where **keys** are pre-determined strings, i.e., for the same entry property, the **keys** remain the same among different entries whereas the **values** change. @@ -443,6 +443,7 @@ In the JSON response format, property types translate as follows: - **string**, **boolean**, **list** are represented by their similarly named counterparts in JSON. - **integer**, **float** are represented as the JSON number type. - **timestamp** uses a string representation of date and time as defined in `RFC 3339 Internet Date/Time Format `__. +- **smiles** uses a string representation of chemical structure as defined in `OpenSMILES specification `__. - **dictionary** is represented by the JSON object type. - **unknown** properties are represented by either omitting the property or by a JSON :field-val:`null` value. @@ -1525,7 +1526,7 @@ The following tokens are used in the filter query component: (Note that at the end of the string value above the four final backslashes represent the two terminal backslashes in the value, and the final double quote is a terminator, it is not escaped.) - String value tokens are also used to represent **timestamps** in form of the `RFC 3339 Internet Date/Time Format `__. + String value tokens are also used to represent **timestamps** in form of the `RFC 3339 Internet Date/Time Format `__ and **smiles** according to the `OpenSMILES specification `__. - **Numeric values** are represented as decimal integers or in scientific notation, using the usual programming language conventions. A regular expression giving the number syntax is given below as a `POSIX Extended Regular Expression (ERE) `__ or as a `Perl-Compatible Regular Expression (PCRE) `__: @@ -1554,7 +1555,7 @@ More examples of the number tokens and machine-readable definitions and tests ca - **Boolean values** are represented with the tokens :filter-op:`TRUE` and :filter-op:`FALSE`. - **Operator tokens** are represented by usual mathematical relation symbols or by case-sensitive keywords. - Currently the following operators are supported: :filter-op:`=`, :filter-op:`!=`, :filter-op:`<=`, :filter-op:`>=`, :filter-op:`<`, :filter-op:`>` for tests of number, string (lexicographical) or timestamp (temporal) equality, inequality, less-than, more-than, less, and more relations; :filter-op:`AND`, :filter-op:`OR`, :filter-op:`NOT` for logical conjunctions, and a number of keyword operators discussed in the next section. + Currently the following operators are supported: :filter-op:`=`, :filter-op:`!=`, :filter-op:`<=`, :filter-op:`>=`, :filter-op:`<`, :filter-op:`>` for tests of number, string (lexicographical), timestamp (temporal) or SMILES representation (structural) equality, inequality, less-than, more-than, less, and more relations; :filter-op:`AND`, :filter-op:`OR`, :filter-op:`NOT` for logical conjunctions, and a number of keyword operators discussed in the next section. In future extensions, operator tokens that are words MUST contain only upper-case letters. This requirement guarantees that no operator token will ever clash with a property name. @@ -1773,11 +1774,13 @@ Type handling and conversions in comparisons The definitions of specific properties in this standard define their types. Similarly, for `database-provider-specific properties`_, the database provider decides their types. -In the syntactic constructs that can accommodate values of more than one type, types of all participating values are REQUIRED to match, with a single exception of timestamps (see below). +In the syntactic constructs that can accommodate values of more than one type, types of all participating values are REQUIRED to match, with the exception of timestamps and SMILES representations (see below). Different types of values MUST be reported as :http-error:`501 Not Implemented` errors, meaning that type conversion is not implemented in the specification. -As the filter language syntax does not define a lexical token for timestamps, values of this type are expressed using string tokens in `RFC 3339 Internet Date/Time Format `__. +As the filter language syntax does not define lexical tokens for timestamps and SMILES, values of these types are expressed using string tokens. +For timestamps `RFC 3339 Internet Date/Time Format `__ representation is used and `OpenSMILES specification `__ is used for SMILES. In a comparison with a timestamp property, a string token represents a timestamp value that would result from parsing the string according to RFC 3339 Internet Date/Time Format. +In a comparison with a SMILES property, a string token represents a chemical structure that would result from parsing the string according to the SMILES specification. Interpretation failures MUST be reported with error :http-error:`400 Bad Request`. Optional filter features From 775fda016d77a1f8847623d047ff9307f86086b3 Mon Sep 17 00:00:00 2001 From: Andrius Merkys Date: Fri, 25 Nov 2022 16:36:44 +0200 Subject: [PATCH 2/9] Add SMILES property as is done in #392, but using SMILES data type. --- optimade.rst | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/optimade.rst b/optimade.rst index 2aac163c1..666ef8b6c 100644 --- a/optimade.rst +++ b/optimade.rst @@ -2451,6 +2451,24 @@ chemical\_formula\_anonymous - A filter that matches an exactly given formula is :filter:`chemical_formula_anonymous="A2B"`. +smiles +~~~~~~ + +- **Description**: The SMILES (Simplified Molecular Input Line Entry System) representation of the structure. +- **Type**: smiles +- **Requirements/Conventions**: + + - **Support**: OPTIONAL support in implementations, i.e., MAY be :val:`null`. + - **Query**: Support for queries on this property is OPTIONAL. + - Value MUST adhere to the `OpenSMILES specification v1.0 `__. + - When structures or their parts cannot be unambiguously represented in SMILES according to OpenSMILES recommendations, using the guidelines from `Quirós et al. 2018 `__ is RECOMMENDED. + - Providers MAY canonicalize (i.e., use rules to establish stable order of atoms) produced SMILES representations, but this is not mandatory. + Generally, providers SHOULD NOT change the representation more frequently than the structure itself is modified. + +- **Examples**: + + - caffeine: `CN1C=NC2=C1C(=O)N(C(=O)N2C)C` + dimension\_types ~~~~~~~~~~~~~~~~ From 06409e05eba888d64807dcbe8d0186c13f1f8eab Mon Sep 17 00:00:00 2001 From: Andrius Merkys Date: Fri, 25 Nov 2022 17:01:59 +0200 Subject: [PATCH 3/9] Describe comparisons involving SMILES. --- optimade.rst | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/optimade.rst b/optimade.rst index 666ef8b6c..06e3fecd9 100644 --- a/optimade.rst +++ b/optimade.rst @@ -1648,6 +1648,21 @@ Examples: - :filter:`property != FALSE` - :filter:`_exmpl_has_inversion_symmetry AND NOT _exmpl_is_primitive` +Comparisons of SMILES values +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Equality comparisons ('=' and '!=') MUST be supported for SMILES values. +When handling equality comparisons of SMILES values, an implementation SHOULD NOT regard them as simple strings. +Instead, an implementation SHOULD either compare the described chemical structures or canonicalize SMILES representations and then perform direct string matching. +In addition to equality comparison operators, :val:`CONTAINS` MAY be supported optionally as an operator to check whether one structure is a substructure of another. +Other comparison operators MUST NOT be supported. + +Examples: + +- :filter:`smiles = "c1ccccc1"` +- :filter:`smiles != "O"` +- :filter:`smiles CONTAINS "c1ccccc1"` + Comparisons of list properties ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From 59af3faca5b01ca4dc69a516f9faeb0bb56d422a Mon Sep 17 00:00:00 2001 From: Andrius Merkys Date: Mon, 12 Dec 2022 18:09:52 +0200 Subject: [PATCH 4/9] Decouple SMILES from other data types in the enumeration of operator tokens. --- optimade.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/optimade.rst b/optimade.rst index 06e3fecd9..6d55bac45 100644 --- a/optimade.rst +++ b/optimade.rst @@ -1555,7 +1555,8 @@ More examples of the number tokens and machine-readable definitions and tests ca - **Boolean values** are represented with the tokens :filter-op:`TRUE` and :filter-op:`FALSE`. - **Operator tokens** are represented by usual mathematical relation symbols or by case-sensitive keywords. - Currently the following operators are supported: :filter-op:`=`, :filter-op:`!=`, :filter-op:`<=`, :filter-op:`>=`, :filter-op:`<`, :filter-op:`>` for tests of number, string (lexicographical), timestamp (temporal) or SMILES representation (structural) equality, inequality, less-than, more-than, less, and more relations; :filter-op:`AND`, :filter-op:`OR`, :filter-op:`NOT` for logical conjunctions, and a number of keyword operators discussed in the next section. + Currently the following operators are supported: :filter-op:`=`, :filter-op:`!=`, :filter-op:`<=`, :filter-op:`>=`, :filter-op:`<`, :filter-op:`>` for tests of number, string (lexicographical) or timestamp (temporal) equality, inequality, less-than, more-than, less, and more relations; :filter-op:`AND`, :filter-op:`OR`, :filter-op:`NOT` for logical conjunctions, and a number of keyword operators discussed in the next section. + Of these, SMILES data type supports only equality and inequality comparison operators (:filter-op:`=` and :filter-op:`!=`). In future extensions, operator tokens that are words MUST contain only upper-case letters. This requirement guarantees that no operator token will ever clash with a property name. From 08997e388c74c5fd94518d25e4edbefc05a257ca Mon Sep 17 00:00:00 2001 From: Andrius Merkys Date: Mon, 12 Dec 2022 18:13:05 +0200 Subject: [PATCH 5/9] Update optimade.rst Co-authored-by: Antanas Vaitkus --- optimade.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/optimade.rst b/optimade.rst index 6d55bac45..49deee721 100644 --- a/optimade.rst +++ b/optimade.rst @@ -1796,7 +1796,7 @@ Different types of values MUST be reported as :http-error:`501 Not Implemented` As the filter language syntax does not define lexical tokens for timestamps and SMILES, values of these types are expressed using string tokens. For timestamps `RFC 3339 Internet Date/Time Format `__ representation is used and `OpenSMILES specification `__ is used for SMILES. In a comparison with a timestamp property, a string token represents a timestamp value that would result from parsing the string according to RFC 3339 Internet Date/Time Format. -In a comparison with a SMILES property, a string token represents a chemical structure that would result from parsing the string according to the SMILES specification. +In a comparison with a SMILES property, a string token represents a chemical structure that would result from parsing the string according to the OpenSMILES specification v1.0. Interpretation failures MUST be reported with error :http-error:`400 Bad Request`. Optional filter features From bcc69c170a15af3326ea7ad10f0679f451ed502f Mon Sep 17 00:00:00 2001 From: Andrius Merkys Date: Mon, 12 Dec 2022 18:13:47 +0200 Subject: [PATCH 6/9] Update optimade.rst Co-authored-by: Antanas Vaitkus --- optimade.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/optimade.rst b/optimade.rst index 49deee721..09d893f0b 100644 --- a/optimade.rst +++ b/optimade.rst @@ -1793,7 +1793,7 @@ Similarly, for `database-provider-specific properties`_, the database provider d In the syntactic constructs that can accommodate values of more than one type, types of all participating values are REQUIRED to match, with the exception of timestamps and SMILES representations (see below). Different types of values MUST be reported as :http-error:`501 Not Implemented` errors, meaning that type conversion is not implemented in the specification. -As the filter language syntax does not define lexical tokens for timestamps and SMILES, values of these types are expressed using string tokens. +As the filter language syntax does not define lexical tokens for timestamps or SMILES, values of these types are expressed using string tokens. For timestamps `RFC 3339 Internet Date/Time Format `__ representation is used and `OpenSMILES specification `__ is used for SMILES. In a comparison with a timestamp property, a string token represents a timestamp value that would result from parsing the string according to RFC 3339 Internet Date/Time Format. In a comparison with a SMILES property, a string token represents a chemical structure that would result from parsing the string according to the OpenSMILES specification v1.0. From 57b6ca847d71cdd7c2bf3f4a058b87cde9a71add Mon Sep 17 00:00:00 2001 From: Andrius Merkys Date: Tue, 17 Jan 2023 10:30:52 +0200 Subject: [PATCH 7/9] Removing sentences mentioning SMILES canonicalization as they do not add much of a benefit for the specification. --- optimade.rst | 2 -- 1 file changed, 2 deletions(-) diff --git a/optimade.rst b/optimade.rst index 6d55bac45..9e72fb701 100644 --- a/optimade.rst +++ b/optimade.rst @@ -2478,8 +2478,6 @@ smiles - **Query**: Support for queries on this property is OPTIONAL. - Value MUST adhere to the `OpenSMILES specification v1.0 `__. - When structures or their parts cannot be unambiguously represented in SMILES according to OpenSMILES recommendations, using the guidelines from `Quirós et al. 2018 `__ is RECOMMENDED. - - Providers MAY canonicalize (i.e., use rules to establish stable order of atoms) produced SMILES representations, but this is not mandatory. - Generally, providers SHOULD NOT change the representation more frequently than the structure itself is modified. - **Examples**: From 8fb816cee87c31a4c8980d821be431b55204aebd Mon Sep 17 00:00:00 2001 From: Andrius Merkys Date: Tue, 21 Feb 2023 15:23:09 +0200 Subject: [PATCH 8/9] Define `x-optimade-type` for smiles as #457 is merged now. --- optimade.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/optimade.rst b/optimade.rst index 4007e15de..dad105753 100644 --- a/optimade.rst +++ b/optimade.rst @@ -1970,6 +1970,7 @@ The format described in this subsection forms a subset of the `JSON Schema Valid - timestamps are represented by setting the :field:`type` field to :val:`"string"` and the :field:`format` field to :val:`"date-time"`. In this case it is MANDATORY to include the field :field:`format`. + - smiles are represented by setting both fields :field:`type` and :field:`format` to :val:`"string"` and setting :field:`x-optimade-type` to :val:`"smiles"`. Output formats that represent these OPTIMADE data types in other ways have to recognize them and reinterpret the definition accordingly. From afb319b6e46a4884a0ce1a3d5c3434f64ed5238f Mon Sep 17 00:00:00 2001 From: Andrius Merkys Date: Mon, 5 Jun 2023 15:38:51 +0200 Subject: [PATCH 9/9] Update optimade.rst Co-authored-by: Johan Bergsma <29785380+JPBergsma@users.noreply.github.com> --- optimade.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/optimade.rst b/optimade.rst index ad3987023..8cb5779ea 100644 --- a/optimade.rst +++ b/optimade.rst @@ -1585,7 +1585,7 @@ More examples of the number tokens and machine-readable definitions and tests ca - **Operator tokens** are represented by usual mathematical relation symbols or by case-sensitive keywords. Currently the following operators are supported: :filter-op:`=`, :filter-op:`!=`, :filter-op:`<=`, :filter-op:`>=`, :filter-op:`<`, :filter-op:`>` for tests of number, string (lexicographical) or timestamp (temporal) equality, inequality, less-than, more-than, less, and more relations; :filter-op:`AND`, :filter-op:`OR`, :filter-op:`NOT` for logical conjunctions, and a number of keyword operators discussed in the next section. - Of these, SMILES data type supports only equality and inequality comparison operators (:filter-op:`=` and :filter-op:`!=`). + Of the comparison operators, SMILES data type supports only equality and inequality operators (:filter-op:`=` and :filter-op:`!=`). In future extensions, operator tokens that are words MUST contain only upper-case letters. This requirement guarantees that no operator token will ever clash with a property name.