-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improves validation performance of types defined in the spec #286
Conversation
Codecov ReportPatch coverage is 📢 Thoughts on this report? Let us know!. |
9e119e7
to
aa64722
Compare
@@ -64,7 +64,6 @@ class TypeTest { | |||
val iter = violations.iterator() | |||
val violation = iter.next() | |||
assertEquals("type_mismatch", violation.code) | |||
assertEquals("type", violation.constraint?.fieldName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Question) Why was this removed? I couldn't find any changes in Violation
structure for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old way of creating the spec-defined types involved reading an Ion struct and then extracting the symbol from the struct. This meant that the symbols had the field name "type". See
ion-schema-kotlin/ion-schema/src/main/kotlin/com/amazon/ionschema/internal/SchemaCore.kt
Lines 86 to 118 in a3fabe9
private const val CORE_TYPES = | |
""" | |
{ type: blob } | |
{ type: bool } | |
{ type: clob } | |
{ type: decimal } | |
{ type: document } | |
{ type: float } | |
{ type: int } | |
{ type: string } | |
{ type: symbol } | |
{ type: timestamp } | |
{ type: list } | |
{ type: sexp } | |
{ type: struct } | |
""" | |
private const val ION_TYPES = | |
""" | |
{ type: ${'$'}blob } | |
{ type: ${'$'}bool } | |
{ type: ${'$'}clob } | |
{ type: ${'$'}decimal } | |
{ type: ${'$'}float } | |
{ type: ${'$'}int } | |
{ type: ${'$'}null } | |
{ type: ${'$'}string } | |
{ type: ${'$'}symbol } | |
{ type: ${'$'}timestamp } | |
{ type: ${'$'}list } | |
{ type: ${'$'}sexp } | |
{ type: ${'$'}struct } | |
""" |
Now we create the symbol directly instead, which means that the symbol no longer has a field name. (This whole thing is a weird consequence of the unfortunate decision that an IonValue
knows its own field name.) Anyway, this lack of field name causes no difference in the violation messages, and the violation constraint field name is not actually used anywhere in the library.
Issue #, if available:
None
Description of changes:
I discovered recently that the types defined in the spec are (subjectively) slow-ish for validation. I tried an alternative implementation, and ended with a pretty good improvement. Serendipitously, I discovered that this fixes #284 and partially #169.
This change:
TypeBuiltin
BuiltInTypes
, an object for holding allTypeBuiltin
instancesSchemaCore
,TypeBuiltInImpl
,TypeIon
, andTypeCore
.TypeNullable
My informal performance testing revealed some really big improvements. I chose some representative spec types that corresponded to 0, 1, "many", and "all" Ion types, and I created test cases that used the spec types to validate some sample data (one value of each Ion type except for clob and sexp). These results were measured by running warmups until the time was stable, using an "autorange" approach to determine the number of invocations in each sample, and then collecting and averaging the ops/s for 30 samples.
I tested with both ISL 1.0 and 2.0, and the results were basically the same, but this is especially impactful for ISL 1.0, because any user type definition that does not have the
type
constraint has an impliedtype: any
.Edit:
This PR also removes some unnecessary noise in the violations messages. E.g. when testing
null.symbol
against typeany
, the current message is a lot of noise, but the important thing is "not applicable for null values", which is rather cryptic.The new violation message is a one liner that mentions the
null
-ness and the Ion type of the value.Related PRs in ion-schema, ion-schema-tests, ion-schema-schemas:
None.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.