-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate assert/discriminator expressions after groupContent #987
base: main
Are you sure you want to change the base?
Conversation
This is primarily WIP as it needs more test coverage and I have been informed that I need to hold off on working on this for a little while. It is fully functional, just needs more tests for edge cases surrounding evaluating expressions after the content of sequences, groups, and choices. Also need to add tests for the setVariable change. |
According to 9.5 of the DFDL spec assert and discriminators with expressions should be processed after the content of their enclosing sequence, group, or choice. Before these expressions were always being processed before the content. This commit also moves the setVariable expression evaluaiton to the correct place, which is before the enclosing group. DAFFODIL-1971, DAFFODIL-1590
Tests for the following: - Sequence body succeeds but discriminator fails - Sequence body fails and discriminator fails - Sequence body fails but discriminator succeeds and references an element in the partial sequence body infoset - Sequence body fails and discriminator fails and references an element in the partial sequence body infoset
94fe3ae
to
ef3ec79
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the same assert could be evaluated multiple times?
Also, have you run this against the regression suite? I wonder if there are any schemas that rely on the current behavior and could mess up how they use discriminators?
daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/DFDLStatementMixin.scala
Outdated
Show resolved
Hide resolved
new SeqCompParser( | ||
context.runtimeData, | ||
parserChildren.toVector, | ||
assertExpressionChildren.toVector, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the paserChildren
vector contains the assertExpressionChildren
parsers, which I think means the assert parsers will be evaluated twice? Once when all the parserChildren
are evaluated and again after the parserChildren finis and then we evaluate the asserts Parsers? Seems like parserChildren should not contain the assert children?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I have fixed this. I didn't realize (or forgot since starting this pull request a year ago) that we internally handle discriminator expressions as an assert expression just with a discriminator flag set.
I've changed it so that now we are filtering out just the discriminator statements from the rest of the sequence child parsers and run those after the rest of the sequence body.
testAssert.foreach { ap => | ||
pstate.withTempSuccess(ap.parse1) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I guess I now see why it's okay for childrenParsers
to also include the assertion parsers. If all the non-assert parsers succeed then we'll don't evaluate the assertion parsers.
However, I think there is still an issue. Say all the non-assertion parsers succeed, then we start evaluating the assertion parsers, and assume one of them fails. The pstate.processorStatus ne Success
will be true, and then we'll evaluate all the assertion parsers, including ones we've already evaluated.
So I think this approach still feels like it needs a tweak. Almost feels like the assert parsers need to be completely separate from the other parsers or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I Have addressed this with the changes I mentioned above by separating out the discriminator statements and running them after the rest of the sequence body. Nothing should be getting run twice anymore.
setSuccess() | ||
func(this) | ||
if (processorStatus eq Success) | ||
_processorStatus = priorProcessorStatus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conditionally resetting back to the previous status feels like it might be confusing? Feels like if this is temporarily ignoring status it should always reset back to whatever the status was before. Maybe the way this wants to work is it it returns the status of func
, and the caller can choose to do with it whatever they want? Something more like
def withTempSuccess(func: (PState) => Unit): ProcessorResult = {
val priorStatus = processorStatus
setSuccess()
func(this)
val funcStatus = processorStatus
_processorStatus = priorStatus
funcStatus
}
Also, Is there any value in passing in a lambda? An alternative would be to just pass in a Parser
, and then instead of calling func(this)
, it could call parser.parse1(this)
. That ensures that this is always called with a Parser
instead of a function that just happens to accept a PState
, which is probably safer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modeled withTempSuccess after the withPointOfUncertainty function, so that is why it is using a labmda here. I vaguely recall tasking with @mbeckerle about this function, but I could be mis-remembering things.
I'll make the change to not be conditionally changing the processStatus.
if (pstate.processorStatus ne Success) | ||
if (pstate.processorStatus ne Success) { | ||
if (testAssert.nonEmpty) { | ||
testAssert.foreach { ap => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just had another thought about this, this will evaluate all assertions even if one fails. Is that the correct behavior, or should it stop after the first assertion failure? Seems like if one assertion fails it shouldn't continue to evaluate other assertions?
@@ -89,10 +90,19 @@ class SeqComp private (context: SchemaComponent, children: Seq[Gram]) | |||
_.isInstanceOf[NadaParser] | |||
} | |||
|
|||
lazy val assertExpressionChildren = parserChildren.filter { | |||
_.isInstanceOf[AssertExpressionEvaluationParser] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the spec some more, it says:
an attempt to evaluate a discriminator MUST be made even if preceding statements or the parse of the schema component ended in a Processing Error.
I think this implies asserts should not be evaluated if the preceding statements fails? Which kindof makes sense since an assert only causes backtracking and does not discriminate points of uncertainty?
So it feels like the logic needs to differentiate between asserts and discriminators? Both should be evaluated at the end, but only discriminators should be evaluated if the prior parses succeed?
Another question, are discriminators always evaluated before asserts? Or are they evaluated depending on the order defined in the DFDL schema? For example, if you have:
<xs:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:assert test="..." />
<dfdl:discriminator test="..." />
<dfdl:assert test="..." />
</xs:appinfo>
Should that evaluate the first assert, then the discriminiator, then second assert. I don't know if the spec clarifies that or if it's implied, but I think we always evaluate asserts before discrims regardless of how they are defined in the schema?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good question. The way I've implemented it we will perform all dfdl:assert statements first and then do all dfdl:discriminator statements, regardless of whether or not any of the other statements or parsers have failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am raising this issue about the spec with the DFDL workgroup. (We have a call tomorrow).
The spec is completely silent about order of evaluation of multiple asserts in a single location of the schema, whether they are done in schema order, whether the first failure stops that evaluation, etc.
The spec does say that the discriminators are evaluated last, even if there are other failures (to parse, or assertions). This allows for them to be optimized and executed earlier, when they might have been evaluated to true before the parser got a failure. The spec also prevents there from being more than one discriminator at any annotation point.
I've now run it against the regression test suite and I am seeing failures in dfdl-edifact, dfdl-gif, and dfdl-jpeg2000 that I don't see in main. I'll need to spend some time looking into those to confirm that they are relying on the old behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the discriminators should be evaluated before any asserts.
Are we missing an important test case?
Also I want to be clear that if a discriminator fails then no asserts are evaluated. If any assert fails no subsequent asserts are evaluated.
daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/DFDLStatementMixin.scala
Outdated
Show resolved
Hide resolved
daffodil-core/src/main/scala/org/apache/daffodil/core/grammar/HasStatementsGrammarMixin.scala
Outdated
Show resolved
Hide resolved
daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/DFDLStatementMixin.scala
Outdated
Show resolved
Hide resolved
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/PState.scala
Show resolved
Hide resolved
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/Parser.scala
Outdated
Show resolved
Hide resolved
</tdml:errors> | ||
</tdml:parserTestCase> | ||
|
||
<tdml:parserTestCase name="discrimExpression_08" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm looking for a test where the sequence body fails, the discriminator succeeds and references things in the sequence body that in fact exist and are correct, so the behavior is as if the parse failed after the discrminator already evaluated to true.
The use case that motivates all this complexity is a bunch of record types, each a sequence of child elements, each record type has a discriminator that looks at one or a few of the fields of the record to decide if the data is in fact of this record type. Those discriminators could be hoisted and evaluated as soon as the fields of the record needed by the expressions have been parsed. If parsing of other later fields of the record then fails, the discriminator has already been set true, so those failures don't cause other record types to be attempted. Rather the whole choice fails.
This is very close conceptually to a choice by dispatch. Once the dispatch key has been computed and the branch selected, a subsequent failure does not cause backtracking of that choice, but a failure of the whole choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this not what discrimExpression_07 covers?
It has a discriminator expression that references the first element of a sequence (which does parse correctly). The 2nd element of the sequence then fails, which causes a parse error instead of back tracking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Verified. This does appear to cover it. I added one comment about tightening up the error message since the error should be talking specifically about the int2 field being the one that failed.
@mbeckerle I think I've addressed all of your comments except the missing test case. Can you double check that discrimExpression_07 doesn't cover the scenario you were describing? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 with one minor improvement to a test case
...odil-test/src/test/resources/org/apache/daffodil/section07/discriminators/discriminator.tdml
Show resolved
Hide resolved
</tdml:errors> | ||
</tdml:parserTestCase> | ||
|
||
<tdml:parserTestCase name="discrimExpression_08" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Verified. This does appear to cover it. I added one comment about tightening up the error message since the error should be talking specifically about the int2 field being the one that failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think discirminators will be evaluated twice, also a a little concerned about possible regressions if schemas rely on this.
val assertDiscrimExpressions = childParsers.collect { case ae: AssertExpressionEvaluationParser => ae } | ||
val discrimExpressions = assertDiscrimExpressions.filter { _.discrim } | ||
val assertExpressions = assertDiscrimExpressions.filterNot { _.discrim } | ||
val nonAssertChildren = childParsers.diff(assertExpressions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think nonAssertChildren
includes discriminator parsers so could discriminators get evaluated twice, once in the first loop and once in the discrimnator loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a base class shared between discriminators and regular asserts. They're both a kind of assertion. This scala code may be using that terminology so nonAssertChildren may also mean no discriminators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stevedlawrence is right. It should be
val nonAssertChildren = childParsers.diff(assertDiscrimExpressions)
then there won't be duplicates. Will change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's the intention, that nonAssertChildren contains no dfdl:assert or dfdl:discrimintor parers. But I think that's not the case. I think this needs to be this?
val nonAssertChildren = childParsers.diff(assertDiscrimExpressions)
So nonAssertChildren excludes both asserts and discriminators from childParsers.
|
||
// Handle all discriminator expressions statements after sequence body | ||
i = 0 | ||
while (i < discrimExpressions.size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There can only be one discriminator, right? This don't really need to be a loop I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it not possible to have multiple discriminators? I had just assumed that it was but am having trouble finding something definitive in the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only one discriminator at each annotation point. In fact I'm not sure we enforce this, but one annotation point is supposed to have only a discriminator OR asserts (1 or more), but not both:
Section 7.6: "The resolved set of statement annotations for a schema component can contain only a single dfdl:discriminator or one or more dfdl:assert annotations, but not both. To clarify: dfdl:assert annotations and dfdl:discriminator annotations are exclusive of each other. It is a Schema Definition Error otherwise."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's true we can simplify the logic a bit and remove the filtering of asserts. We just need something like
val optDiscrimParser = ... // find an optional discrim parser
val nonDiscrimParsers = ... // remove discrim parsers if it exists
def parser() {
...
// call parse() for all nonDiscrimParsers
// call parse() for optDiscrimParser if set
...
}
If nonDiscrimParsers contains any asserts, then optDiscrimParser should be None and the order is correct.
If nonDiscrimParsers does not contain asserts, then otpDiscrimParser will always be evaluated (if it exists), and we don't have to worry about evaluating assert parsers after the discrim because they shouldn't exist.
Should we also enforce the exclusivity as part of this PR, feels important to ensuring the right behavior of assert/discrim evaluation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just verified that we already are enforcing the exclusivity of both allowing only one dfdl:discriminator or multiple dfdl:assert statements. Doing otherwise creates an SDE.
With that in mind I will make some changes to use these expectations to simplify things.
assertExpressions(i).parse1(pstate) | ||
i += 1 | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this order confirmed by the DFDL-WG? First we do all normal parsers, then the discriminator, and then assertions? And that the order has nothing to do with the order things are defined in the schema?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Confirmed in a lengthly workgroup discussion yesterday in fact.
@Test def test_discrimExpression_05() = { runner.runOneTest("discrimExpression_05") } | ||
@Test def test_discrimExpression_06() = { runner.runOneTest("discrimExpression_06") } | ||
@Test def test_discrimExpression_07() = { runner.runOneTest("discrimExpression_07") } | ||
@Test def test_discrimExpression_08() = { runner.runOneTest("discrimExpression_08") } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mentioned you ran things against the regression suite. Have you done so with the recent changes? I'm hesitant to merge anything this close to release that has significant changes to behavior that schemas might rely on. Though, I won't block it if others are fine with the regressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is for the 3.10.0 release, we have to be prepared to back it out should it lead to any significant regressions.
Changing order of assertion evaluation, and discriminator evaluation (because a failing discriminator is equivalent to a failing assertion) can result in different diagnostics coming out, so tests may regress because they're expecting one error text, but are given another. But negative tests failing due to one diagnostic vs. another is a sort of lower-threat regression vs. positive tests that break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 with the fix to nonAssertChildren
Fine to merge for 3.10 if there are no major regressions, otherwise we should hold off to the next release. I'd prefer we figure out if there are regression prior to merging so we don't have to revert and restart the release process.
Given that there can only be either one discriminator or one or more asserts the logic for handling discriminator expressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just minor cleanup
i += 1 | ||
var i = 0 | ||
|
||
// Handle all non assert/discriminator child parsers first |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment should say "all non discriminator child parsers", removing "assert". This will evaluate asserts if they exist.
override lazy val runtimeDependencies = Vector() | ||
override def childProcessors = childParsers | ||
|
||
override def nom = "seq" | ||
|
||
val numChildParsers = childParsers.size | ||
val optDiscrimParser = childParsers.collectFirst { case ae: AssertExpressionEvaluationParser if (ae.discrim) => ae } | ||
val nonDiscrimChildren = if (optDiscrimParser.isDefined) childParsers.diff(Seq(optDiscrimParser.get)) else childParsers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be simplified to
val nonDiscrimChildren = childParsers.diff(optDiscrimParser.toSeq)
var i = 0 | ||
|
||
// Handle all non assert/discriminator child parsers first | ||
while ((i < nonDiscrimChildren.size) && (pstate.processorStatus eq Success)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth storing nonDiscrimChildren.size
as a member val? I assume it's a constant time function so calling it in the loop probably doesn't matter, but numChildParser
used to be a member val so there might have been a reason for that?
After digging into some of the failures I was seeing in the dfdl-regression test suite, there wasn't a straightforward work around for the few schemas that were relying on the old discriminator behavior. I am going to hold off on merging this until after the release of Daffodil 3.10.0 and then this can be added to 4.0.0 and a better solution for these problematic schemas can be investigated. |
Now that 3.10 has been released and 4.0.0 is underway, any objections to merging these changes now? |
Might want to hold off until all the 2.13 changes are in since that's kindof touching everything and might make dealing with merge conflicts difficult? Also, breaking backwards compatibility in 4.0.0 is less of a concern than usualy, but we should still have backwards compatibility notes, especially explaining how to update schemas that relied on the old behavior. Did you ever come up with a generic way to update schemas to work with this change? I thought one of the issues preventing merging in 3.10 was there wasn't really a good way to do that? |
At least at first I'd like 4.0.0 branch to have a scala 2.13 or scala 3 version (or one then the other) which are 100% identical functionality and performance to 3.10.0. I want them to run the exact same schemas unmodified just for testing reasons. I would suggest these changes should be in 4.1.0, or a 3.11.0 if we truly need them out sooner. |
Fair enough. I'll re-evaluate once the 2.13 changes seem to be wrapped up. As for a fix for the failing schemas, it seemed like the correct fix was to add choiceDispatch or initiated content to some of these choice branches that were mistakenly being allowed to backtrack. I know we didn't want to force a change like that in 3.10.0, but I wasn't sure if a change like that would be OK moving into 4.0.0. |
In theory, not all formats can use choice dispatch or initiated content and might still have this behavior. We might need to think about what an alternative would look like for those kinds of schemas. |
According to 9.5 of the DFDL spec assert and discriminators with expressions should be processed after the content of their enclosing sequence, group, or choice. Before these expressions were always being processed before the content.
This commit also moves the setVariable expression evaluaiton to the correct place, which is before the enclosing group.
DAFFODIL-1971, DAFFODIL-1590