Evaluate assert/discriminator expressions after groupContent #987

jadams-tresys · 2023-03-13T19:23:54Z

According to 9.5 of the DFDL spec assert and discriminators with expressions should be processed after the content of their enclosing sequence, group, or choice. Before these expressions were always being processed before the content.

This commit also moves the setVariable expression evaluaiton to the correct place, which is before the enclosing group.

DAFFODIL-1971, DAFFODIL-1590

jadams-tresys · 2023-03-13T19:25:02Z

This is primarily WIP as it needs more test coverage and I have been informed that I need to hold off on working on this for a little while.

It is fully functional, just needs more tests for edge cases surrounding evaluating expressions after the content of sequences, groups, and choices.

Also need to add tests for the setVariable change.

According to 9.5 of the DFDL spec assert and discriminators with expressions should be processed after the content of their enclosing sequence, group, or choice. Before these expressions were always being processed before the content. This commit also moves the setVariable expression evaluaiton to the correct place, which is before the enclosing group. DAFFODIL-1971, DAFFODIL-1590

Tests for the following: - Sequence body succeeds but discriminator fails - Sequence body fails and discriminator fails - Sequence body fails but discriminator succeeds and references an element in the partial sequence body infoset - Sequence body fails and discriminator fails and references an element in the partial sequence body infoset

stevedlawrence

I think the same assert could be evaluated multiple times?

Also, have you run this against the regression suite? I wonder if there are any schemas that rely on the current behavior and could mess up how they use discriminators?

daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/DFDLStatementMixin.scala

stevedlawrence · 2025-01-06T15:13:19Z

daffodil-core/src/main/scala/org/apache/daffodil/core/grammar/Grammar.scala

+      new SeqCompParser(
+        context.runtimeData,
+        parserChildren.toVector,
+        assertExpressionChildren.toVector,


I believe the paserChildren vector contains the assertExpressionChildren parsers, which I think means the assert parsers will be evaluated twice? Once when all the parserChildren are evaluated and again after the parserChildren finis and then we evaluate the asserts Parsers? Seems like parserChildren should not contain the assert children?

I believe I have fixed this. I didn't realize (or forgot since starting this pull request a year ago) that we internally handle discriminator expressions as an assert expression just with a discriminator flag set.

I've changed it so that now we are filtering out just the discriminator statements from the rest of the sequence child parsers and run those after the rest of the sequence body.

stevedlawrence · 2025-01-06T15:20:09Z

daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/Parser.scala

+          testAssert.foreach { ap =>
+            pstate.withTempSuccess(ap.parse1)
+          }
+        }


Ah, I guess I now see why it's okay for childrenParsers to also include the assertion parsers. If all the non-assert parsers succeed then we'll don't evaluate the assertion parsers.

However, I think there is still an issue. Say all the non-assertion parsers succeed, then we start evaluating the assertion parsers, and assume one of them fails. The pstate.processorStatus ne Success will be true, and then we'll evaluate all the assertion parsers, including ones we've already evaluated.

So I think this approach still feels like it needs a tweak. Almost feels like the assert parsers need to be completely separate from the other parsers or something.

I think I Have addressed this with the changes I mentioned above by separating out the discriminator statements and running them after the rest of the sequence body. Nothing should be getting run twice anymore.

stevedlawrence · 2025-01-06T15:29:28Z

daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/PState.scala

+    setSuccess()
+    func(this)
+    if (processorStatus eq Success)
+      _processorStatus = priorProcessorStatus


Conditionally resetting back to the previous status feels like it might be confusing? Feels like if this is temporarily ignoring status it should always reset back to whatever the status was before. Maybe the way this wants to work is it it returns the status of func, and the caller can choose to do with it whatever they want? Something more like

def withTempSuccess(func: (PState) => Unit): ProcessorResult = { val priorStatus = processorStatus setSuccess() func(this) val funcStatus = processorStatus _processorStatus = priorStatus funcStatus }

Also, Is there any value in passing in a lambda? An alternative would be to just pass in a Parser, and then instead of calling func(this), it could call parser.parse1(this). That ensures that this is always called with a Parser instead of a function that just happens to accept a PState, which is probably safer?

I modeled withTempSuccess after the withPointOfUncertainty function, so that is why it is using a labmda here. I vaguely recall tasking with @mbeckerle about this function, but I could be mis-remembering things.

I'll make the change to not be conditionally changing the processStatus.

stevedlawrence · 2025-01-06T15:37:49Z

daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/Parser.scala

-      if (pstate.processorStatus ne Success)
+      if (pstate.processorStatus ne Success) {
+        if (testAssert.nonEmpty) {
+          testAssert.foreach { ap =>


Just had another thought about this, this will evaluate all assertions even if one fails. Is that the correct behavior, or should it stop after the first assertion failure? Seems like if one assertion fails it shouldn't continue to evaluate other assertions?

stevedlawrence · 2025-01-06T15:47:33Z

daffodil-core/src/main/scala/org/apache/daffodil/core/grammar/Grammar.scala

@@ -89,10 +90,19 @@ class SeqComp private (context: SchemaComponent, children: Seq[Gram])
      _.isInstanceOf[NadaParser]
    }

+  lazy val assertExpressionChildren = parserChildren.filter {
+    _.isInstanceOf[AssertExpressionEvaluationParser]


Reading the spec some more, it says:

an attempt to evaluate a discriminator MUST be made even if preceding statements or the parse of the schema component ended in a Processing Error.

I think this implies asserts should not be evaluated if the preceding statements fails? Which kindof makes sense since an assert only causes backtracking and does not discriminate points of uncertainty?

So it feels like the logic needs to differentiate between asserts and discriminators? Both should be evaluated at the end, but only discriminators should be evaluated if the prior parses succeed?

Another question, are discriminators always evaluated before asserts? Or are they evaluated depending on the order defined in the DFDL schema? For example, if you have:

<xs:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:assert test="..." /> <dfdl:discriminator test="..." /> <dfdl:assert test="..." /> </xs:appinfo>

Should that evaluate the first assert, then the discriminiator, then second assert. I don't know if the spec clarifies that or if it's implied, but I think we always evaluate asserts before discrims regardless of how they are defined in the schema?

This is a good question. The way I've implemented it we will perform all dfdl:assert statements first and then do all dfdl:discriminator statements, regardless of whether or not any of the other statements or parsers have failed.

I am raising this issue about the spec with the DFDL workgroup. (We have a call tomorrow).
The spec is completely silent about order of evaluation of multiple asserts in a single location of the schema, whether they are done in schema order, whether the first failure stops that evaluation, etc.

The spec does say that the discriminators are evaluated last, even if there are other failures (to parse, or assertions). This allows for them to be optimized and executed earlier, when they might have been evaluated to true before the parser got a failure. The spec also prevents there from being more than one discriminator at any annotation point.

jadams-tresys · 2025-01-07T16:41:13Z

I've now run it against the regression test suite and I am seeing failures in dfdl-edifact, dfdl-gif, and dfdl-jpeg2000 that I don't see in main. I'll need to spend some time looking into those to confirm that they are relying on the old behavior.

mbeckerle

I think the discriminators should be evaluated before any asserts.

Are we missing an important test case?

Also I want to be clear that if a discriminator fails then no asserts are evaluated. If any assert fails no subsequent asserts are evaluated.

daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/DFDLStatementMixin.scala

daffodil-core/src/main/scala/org/apache/daffodil/core/grammar/HasStatementsGrammarMixin.scala

daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/DFDLStatementMixin.scala

daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/PState.scala

daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/Parser.scala

mbeckerle · 2025-01-08T19:07:15Z

...odil-test/src/test/resources/org/apache/daffodil/section07/discriminators/discriminator.tdml

+    </tdml:errors>
+  </tdml:parserTestCase>
+
+  <tdml:parserTestCase name="discrimExpression_08"


I'm looking for a test where the sequence body fails, the discriminator succeeds and references things in the sequence body that in fact exist and are correct, so the behavior is as if the parse failed after the discrminator already evaluated to true.

The use case that motivates all this complexity is a bunch of record types, each a sequence of child elements, each record type has a discriminator that looks at one or a few of the fields of the record to decide if the data is in fact of this record type. Those discriminators could be hoisted and evaluated as soon as the fields of the record needed by the expressions have been parsed. If parsing of other later fields of the record then fails, the discriminator has already been set true, so those failures don't cause other record types to be attempted. Rather the whole choice fails.

This is very close conceptually to a choice by dispatch. Once the dispatch key has been computed and the branch selected, a subsequent failure does not cause backtracking of that choice, but a failure of the whole choice.

Is this not what discrimExpression_07 covers?
It has a discriminator expression that references the first element of a sequence (which does parse correctly). The 2nd element of the sequence then fails, which causes a parse error instead of back tracking.

Yes. Verified. This does appear to cover it. I added one comment about tightening up the error message since the error should be talking specifically about the int2 field being the one that failed.

jadams-tresys · 2025-01-08T22:01:31Z

@mbeckerle I think I've addressed all of your comments except the missing test case. Can you double check that discrimExpression_07 doesn't cover the scenario you were describing?

mbeckerle

+1 with one minor improvement to a test case

...odil-test/src/test/resources/org/apache/daffodil/section07/discriminators/discriminator.tdml

mbeckerle · 2025-01-09T13:10:56Z

...odil-test/src/test/resources/org/apache/daffodil/section07/discriminators/discriminator.tdml

+    </tdml:errors>
+  </tdml:parserTestCase>
+
+  <tdml:parserTestCase name="discrimExpression_08"


Yes. Verified. This does appear to cover it. I added one comment about tightening up the error message since the error should be talking specifically about the int2 field being the one that failed.

stevedlawrence

I think discirminators will be evaluated twice, also a a little concerned about possible regressions if schemas rely on this.

stevedlawrence · 2025-01-10T18:29:25Z

daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/Parser.scala

+  val assertDiscrimExpressions = childParsers.collect { case ae: AssertExpressionEvaluationParser => ae }
+  val discrimExpressions = assertDiscrimExpressions.filter { _.discrim }
+  val assertExpressions = assertDiscrimExpressions.filterNot { _.discrim }
+  val nonAssertChildren = childParsers.diff(assertExpressions)


I think nonAssertChildren includes discriminator parsers so could discriminators get evaluated twice, once in the first loop and once in the discrimnator loop?

There's a base class shared between discriminators and regular asserts. They're both a kind of assertion. This scala code may be using that terminology so nonAssertChildren may also mean no discriminators.

@stevedlawrence is right. It should be

val nonAssertChildren = childParsers.diff(assertDiscrimExpressions)

then there won't be duplicates. Will change.

I think that's the intention, that nonAssertChildren contains no dfdl:assert or dfdl:discrimintor parers. But I think that's not the case. I think this needs to be this?

val nonAssertChildren = childParsers.diff(assertDiscrimExpressions)

So nonAssertChildren excludes both asserts and discriminators from childParsers.

stevedlawrence · 2025-01-10T18:32:35Z

daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/Parser.scala

+
+    // Handle all discriminator expressions statements after sequence body
+    i = 0
+    while (i < discrimExpressions.size) {


There can only be one discriminator, right? This don't really need to be a loop I think?

Is it not possible to have multiple discriminators? I had just assumed that it was but am having trouble finding something definitive in the spec.

Only one discriminator at each annotation point. In fact I'm not sure we enforce this, but one annotation point is supposed to have only a discriminator OR asserts (1 or more), but not both:

Section 7.6: "The resolved set of statement annotations for a schema component can contain only a single dfdl:discriminator or one or more dfdl:assert annotations, but not both. To clarify: dfdl:assert annotations and dfdl:discriminator annotations are exclusive of each other. It is a Schema Definition Error otherwise."

If that's true we can simplify the logic a bit and remove the filtering of asserts. We just need something like

val optDiscrimParser = ... // find an optional discrim parser val nonDiscrimParsers = ... // remove discrim parsers if it exists def parser() { ... // call parse() for all nonDiscrimParsers // call parse() for optDiscrimParser if set ... }

If nonDiscrimParsers contains any asserts, then optDiscrimParser should be None and the order is correct.

If nonDiscrimParsers does not contain asserts, then otpDiscrimParser will always be evaluated (if it exists), and we don't have to worry about evaluating assert parsers after the discrim because they shouldn't exist.

Should we also enforce the exclusivity as part of this PR, feels important to ensuring the right behavior of assert/discrim evaluation.

I just verified that we already are enforcing the exclusivity of both allowing only one dfdl:discriminator or multiple dfdl:assert statements. Doing otherwise creates an SDE.

With that in mind I will make some changes to use these expectations to simplify things.

stevedlawrence · 2025-01-10T18:34:32Z

daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/Parser.scala

+        assertExpressions(i).parse1(pstate)
+        i += 1
+    }
+  }


Was this order confirmed by the DFDL-WG? First we do all normal parsers, then the discriminator, and then assertions? And that the order has nothing to do with the order things are defined in the schema?

Yes. Confirmed in a lengthly workgroup discussion yesterday in fact.

stevedlawrence · 2025-01-10T18:43:16Z

...il-test/src/test/scala/org/apache/daffodil/section07/discriminators/TestDiscriminators.scala

+  @Test def test_discrimExpression_05() = { runner.runOneTest("discrimExpression_05") }
+  @Test def test_discrimExpression_06() = { runner.runOneTest("discrimExpression_06") }
+  @Test def test_discrimExpression_07() = { runner.runOneTest("discrimExpression_07") }
+  @Test def test_discrimExpression_08() = { runner.runOneTest("discrimExpression_08") }


You mentioned you ran things against the regression suite. Have you done so with the recent changes? I'm hesitant to merge anything this close to release that has significant changes to behavior that schemas might rely on. Though, I won't block it if others are fine with the regressions.

If this is for the 3.10.0 release, we have to be prepared to back it out should it lead to any significant regressions.

Changing order of assertion evaluation, and discriminator evaluation (because a failing discriminator is equivalent to a failing assertion) can result in different diagnostics coming out, so tests may regress because they're expecting one error text, but are given another. But negative tests failing due to one diagnostic vs. another is a sort of lower-threat regression vs. positive tests that break.

stevedlawrence

+1 with the fix to nonAssertChildren

Fine to merge for 3.10 if there are no major regressions, otherwise we should hold off to the next release. I'd prefer we figure out if there are regression prior to merging so we don't have to revert and restart the release process.

Given that there can only be either one discriminator or one or more asserts the logic for handling discriminator expressions.

stevedlawrence

Just minor cleanup

stevedlawrence · 2025-01-11T15:21:27Z

daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/Parser.scala

-      i += 1
+    var i = 0
+
+    // Handle all non assert/discriminator child parsers first


This comment should say "all non discriminator child parsers", removing "assert". This will evaluate asserts if they exist.

stevedlawrence · 2025-01-11T15:24:52Z

daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/Parser.scala

  override lazy val runtimeDependencies = Vector()
  override def childProcessors = childParsers

  override def nom = "seq"

-  val numChildParsers = childParsers.size
+  val optDiscrimParser = childParsers.collectFirst { case ae: AssertExpressionEvaluationParser if (ae.discrim) => ae }
+  val nonDiscrimChildren = if (optDiscrimParser.isDefined) childParsers.diff(Seq(optDiscrimParser.get)) else childParsers


I think this could be simplified to

val nonDiscrimChildren = childParsers.diff(optDiscrimParser.toSeq)

stevedlawrence · 2025-01-11T15:33:07Z

daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/Parser.scala

+    var i = 0
+
+    // Handle all non assert/discriminator child parsers first
+    while ((i < nonDiscrimChildren.size) && (pstate.processorStatus eq Success)) {


It might be worth storing nonDiscrimChildren.size as a member val? I assume it's a constant time function so calling it in the loop probably doesn't matter, but numChildParser used to be a member val so there might have been a reason for that?

jadams-tresys · 2025-01-13T15:00:19Z

After digging into some of the failures I was seeing in the dfdl-regression test suite, there wasn't a straightforward work around for the few schemas that were relying on the old discriminator behavior. I am going to hold off on merging this until after the release of Daffodil 3.10.0 and then this can be added to 4.0.0 and a better solution for these problematic schemas can be investigated.

jadams-tresys · 2025-02-10T18:22:22Z

Now that 3.10 has been released and 4.0.0 is underway, any objections to merging these changes now?

stevedlawrence · 2025-02-10T18:32:15Z

Might want to hold off until all the 2.13 changes are in since that's kindof touching everything and might make dealing with merge conflicts difficult?

Also, breaking backwards compatibility in 4.0.0 is less of a concern than usualy, but we should still have backwards compatibility notes, especially explaining how to update schemas that relied on the old behavior. Did you ever come up with a generic way to update schemas to work with this change? I thought one of the issues preventing merging in 3.10 was there wasn't really a good way to do that?

mbeckerle · 2025-02-10T18:37:53Z

At least at first I'd like 4.0.0 branch to have a scala 2.13 or scala 3 version (or one then the other) which are 100% identical functionality and performance to 3.10.0. I want them to run the exact same schemas unmodified just for testing reasons.

I would suggest these changes should be in 4.1.0, or a 3.11.0 if we truly need them out sooner.

jadams-tresys · 2025-02-10T18:43:16Z

Fair enough. I'll re-evaluate once the 2.13 changes seem to be wrapped up.

As for a fix for the failing schemas, it seemed like the correct fix was to add choiceDispatch or initiated content to some of these choice branches that were mistakenly being allowed to backtrack. I know we didn't want to force a change like that in 3.10.0, but I wasn't sure if a change like that would be OK moving into 4.0.0.

stevedlawrence · 2025-02-10T18:45:40Z

In theory, not all formats can use choice dispatch or initiated content and might still have this behavior. We might need to think about what an alternative would look like for those kinds of schemas.

stevedlawrence marked this pull request as draft September 18, 2024 11:27

Josh Adams added 3 commits January 3, 2025 10:28

Fix formatting

9f0b750

jadams-tresys force-pushed the DAFFODIL-1971 branch from 94fe3ae to ef3ec79 Compare January 6, 2025 14:51

jadams-tresys mentioned this pull request Jan 6, 2025

Reorder DFDL statement processing to match spec #1369

Closed

jadams-tresys requested review from mbeckerle and stevedlawrence January 6, 2025 14:55

stevedlawrence requested changes Jan 6, 2025

View reviewed changes

stevedlawrence reviewed Jan 6, 2025

View reviewed changes

Josh Adams added 2 commits January 8, 2025 10:23

Address issues found in review

7089c5e

Address more comments from review

c7df87a

mbeckerle requested changes Jan 8, 2025

View reviewed changes

More changes based on review

9264167

mbeckerle approved these changes Jan 9, 2025

View reviewed changes

fixup!

c743624

jadams-tresys requested review from stevedlawrence and olabusayoT January 9, 2025 13:25

stevedlawrence reviewed Jan 10, 2025

View reviewed changes

stevedlawrence approved these changes Jan 10, 2025

View reviewed changes

Simplify logic for dealing with discriminator expressions

a8c766b

Given that there can only be either one discriminator or one or more asserts the logic for handling discriminator expressions.

stevedlawrence reviewed Jan 11, 2025

View reviewed changes

fixup!

0ce22ea

fixup!

8be44b2

jadams-tresys changed the title ~~Evaluate assert/discriminator expressions after groupContent: WIP~~ Evaluate assert/discriminator expressions after groupContent Feb 10, 2025

jadams-tresys marked this pull request as ready for review February 10, 2025 18:25

Evaluate assert/discriminator expressions after groupContent #987

Are you sure you want to change the base?

Evaluate assert/discriminator expressions after groupContent #987

Conversation

jadams-tresys commented Mar 13, 2023

jadams-tresys commented Mar 13, 2023 • edited Loading

stevedlawrence left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jadams-tresys commented Jan 7, 2025

mbeckerle left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jadams-tresys commented Jan 8, 2025

mbeckerle left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stevedlawrence left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stevedlawrence left a comment

Choose a reason for hiding this comment

stevedlawrence left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jadams-tresys commented Jan 13, 2025

jadams-tresys commented Feb 10, 2025

stevedlawrence commented Feb 10, 2025

mbeckerle commented Feb 10, 2025

jadams-tresys commented Feb 10, 2025

stevedlawrence commented Feb 10, 2025

jadams-tresys commented Mar 13, 2023 •

edited

Loading