Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
In ToSemanticTreeVisitor, create multiple children in the parent node…
… instead of one 'container' child with multiple children Previously when some rule was found multiple times (e.g. quote), ToSemanticTreeVisitor created an untyped container with multiple children, which was likely to create errors because first it was untyped (which is unusual), second because no DuraLex visitor expect to find such container with multiple children but expect to find directly multiple children, and third it is ugly. On the other side, ToSemanticTreeVisitor needs a DuraLex container node when a Parsimonious node contains multiple DuraLex nodes during half-read operations. Now with this commit, DuraLex container nodes are collected as soon as an upper DuraLex node is able to contain them, and in this case the children of this container node are directly put as children of the DuraLex parent node. In the case the parent Parsimonious node is still not a DuraLex node all container nodes and non-container nodes are flatten in a container node node. A (welcomed) side-effect of this new behaviour is that there is no more danger to see container-of-containers DuraLex nodes given they are always flatten. At the upper side, when a local DuraLex tree is attached to the global DuraLex tree, container nodes are again flatten. With this mechanism, the global DuraLex tree should never have any container nodes, they should only remain in half-read trees in ToSemanticTreeVisitor. To be still able to continue parsing even if it remains container nodes, they now have the type 'parsimonious-list-container'. Obviously if such nodes are found, ToSemanticTreeVisitor will need to be debugged. As a result of this operation, the DuraLex trees created by this visitor are a bit different than previous one in the case of multiple values. For instance the following amendment: À l’article L100-3 du code de l’énergie est ajouté deux phrases ainsi rédigées : "Phrase 1." "Phrase 2." was previously the following DuraLex tree: type: edit; editType: replace ╠═ type: code-reference; id: code de l'énergie ║ ╚═ type: article-reference; id: L100-3 ╠═ type: sentence-reference ║ ╚═ type: quote; words: Phrase 1. ╚═ type: sentence-definition ╚═ type: quote; words: Phrase 2. and it is now: type: edit; editType: replace ╠═ type: code-reference; id: code de l'énergie ║ ╚═ type: article-reference; id: L100-3 ╚═ type: sentence-reference; count: 2 ╠═ type: quote; words: Phrase 1. ╚═ type: quote; words: Phrase 2. I find it is even more meaningful given the number of sentences matches the number of quotes. (I know that in some amendments both sentences are grouped in the same quote, I almost find they could be splitted by a further visitor, to be discussed and evaluated.) From a reusing point of view (in SedLex, both could be easily read (currently SedLex only read the last one in both cases, I will push shortly a light change so that SedLex will always show all quotes). Issue: #9
- Loading branch information