From 0fa890715d3d0c93591222c28c2ad7b7cdb9b1bb Mon Sep 17 00:00:00 2001 From: 1313ou <1313ou@gmail.com> Date: Tue, 25 Feb 2020 10:30:42 +0100 Subject: [PATCH 1/3] 1.1 --- 1.1/EWN-LMF-1.1-relax_idrefs.xsd | 12 ++ 1.1/EWN-LMF-1.1.xsd | 12 ++ 1.1/README.md | 65 ++++++++++ 1.1/WN-LMF-1.1-relax_idrefs.xsd | 12 ++ 1.1/WN-LMF-1.1.xsd | 12 ++ 1.1/core-1.1.xsd | 167 +++++++++++++++++++++++++ 1.1/dc.xsd | 41 +++++++ 1.1/ewn-idtypes-relax_idrefs.xsd | 140 +++++++++++++++++++++ 1.1/ewn-idtypes.xsd | 136 +++++++++++++++++++++ 1.1/ewn-wordtypes.xsd | 35 ++++++ 1.1/idtypes-relax_idrefs.xsd | 140 +++++++++++++++++++++ 1.1/idtypes.xsd | 136 +++++++++++++++++++++ 1.1/ili.xsd | 37 ++++++ 1.1/meta.xsd | 49 ++++++++ 1.1/pwn.xsd | 24 ++++ 1.1/types.xsd | 203 +++++++++++++++++++++++++++++++ 1.1/wordtypes.xsd | 27 ++++ 17 files changed, 1248 insertions(+) create mode 100644 1.1/EWN-LMF-1.1-relax_idrefs.xsd create mode 100644 1.1/EWN-LMF-1.1.xsd create mode 100644 1.1/README.md create mode 100644 1.1/WN-LMF-1.1-relax_idrefs.xsd create mode 100644 1.1/WN-LMF-1.1.xsd create mode 100644 1.1/core-1.1.xsd create mode 100644 1.1/dc.xsd create mode 100644 1.1/ewn-idtypes-relax_idrefs.xsd create mode 100644 1.1/ewn-idtypes.xsd create mode 100644 1.1/ewn-wordtypes.xsd create mode 100644 1.1/idtypes-relax_idrefs.xsd create mode 100644 1.1/idtypes.xsd create mode 100644 1.1/ili.xsd create mode 100644 1.1/meta.xsd create mode 100644 1.1/pwn.xsd create mode 100644 1.1/types.xsd create mode 100644 1.1/wordtypes.xsd diff --git a/1.1/EWN-LMF-1.1-relax_idrefs.xsd b/1.1/EWN-LMF-1.1-relax_idrefs.xsd new file mode 100644 index 0000000..f7dcaa0 --- /dev/null +++ b/1.1/EWN-LMF-1.1-relax_idrefs.xsd @@ -0,0 +1,12 @@ + + + + + + + + + + + + diff --git a/1.1/EWN-LMF-1.1.xsd b/1.1/EWN-LMF-1.1.xsd new file mode 100644 index 0000000..c2a7ee4 --- /dev/null +++ b/1.1/EWN-LMF-1.1.xsd @@ -0,0 +1,12 @@ + + + + + + + + + + + + diff --git a/1.1/README.md b/1.1/README.md new file mode 100644 index 0000000..3efdb5c --- /dev/null +++ b/1.1/README.md @@ -0,0 +1,65 @@ +#WordNet-LMF 1.1 +#=== + +This is to equip WordNet with state-of-the-art validation schemas the way FrameNet did. This move is dictated by the following: + +- DTD does not provide fine-grained control the way XSD does. The most significant difference between DTDs and XML Schema is the capability to create and use **datatypes**. XSD schemas define datatypes for elements and attributes while DTD doesn't support them. This allows for control on what sort of data (ids, content) is expected. Leveraging datatypes gets errors to bubble up that would otherwise go unnoticed. + +- Incidentally the reference to Dublin Core schema is erroneous (as mentioned [here](https://github.com/globalwordnet/schemas/issues/5) ) in that the definition of elements is mistakenly applied to attributes. Any real validation against the Dublin Core definitions would fail. Besides, Dublin Core seems superimposed and unnatural and it is doubtful it is of real use here. + +####name spaces + +Namespaces are left unchanged. Beyond the current namespace, the only namespace is dc:. + +####modules + + The design is modular: + +***dc.xsd*** for dc: namespace. +***(ewn-)idtypes(-relax_idrefs).xsd*** for id types (it defines ID policy). +***(ewn-)wordtypes.xsd*** for word types (it defines word form policy). +***types.xsd*** for core data types. +***pwn.xsd*** for PWN types. +***ili.xsd*** for ili types. +***meta.xsd*** for meta types. +***core-1.1.xsd*** for elements and the core structure. + +This allows for different levels of validation to be performed. + +This makes it possible to bring stricter constraints to bear on the same data. But it does not mean the previous level is incompatible with the next. For example the data that satisfies EWN-LMF-1.1.xsd is a subset of data validated by WN-LMF-1.1.xsd (or WN-LMF-1.1 is a superset of EWN-LMF-1.1). + +Another use is different IDREF validation depending on whether you are attempting at validating merged files or not. + +####id types + +idtypes-1.1.xsd and ewn-idtypes-1.1.xsd differ in that the latter imposes extra constraints on the **well-formedness** of EWN ids. + +####relaxed id types vs strict + +This deals with **id reference** validation. + +*(ewn-)idtypes-1.1.xsd* and *(ewn-)idtypes-1.1-relax_idrefs.xsd* differ in that the latter allows some **non-local references not to have their target in the same file**. This is necessary in the case of part-of-speech cross-references such as the ones found in derivation relations (adj derived from noun, etc...) or maybe other cases (seealso, etc). The target then resides in a different file. This is useful to validate **pre-merging lexicographer files** while the strict mode must be used **to validate the merged file**, to make sure references are not left dangling. + +####some resulting combinations: + +WN-LMF-1.1-relax_idrefs.xsd +WN-LMF-1.1.xsd +EWN-LMF-1.1-relax_idrefs.xsd +EWN-LMF-1.1.xsd + +####EWN compatibility with 1.1. schema + +The current lexicographer files satisfy both: + +- WN-LMF-1.1-relax_idrefs.xsd +- EWN-LMF-1.1-relax_idrefs.xsd + +The current merged file satisfies both: + +- WN-LMF-1.1.xsd +- EWN-LMF-1.1.xsd + +####Validation tool + +[Preferred validation tool](https://github.com/1313ou/ewn-validate2) (based on Saxon, fast and efficient) +[Basic validation tool](https://github.com/1313ou/ewn-validate) (based on standard validation tools that come with Java8, may be slow) diff --git a/1.1/WN-LMF-1.1-relax_idrefs.xsd b/1.1/WN-LMF-1.1-relax_idrefs.xsd new file mode 100644 index 0000000..12625d3 --- /dev/null +++ b/1.1/WN-LMF-1.1-relax_idrefs.xsd @@ -0,0 +1,12 @@ + + + + + + + + + + + + diff --git a/1.1/WN-LMF-1.1.xsd b/1.1/WN-LMF-1.1.xsd new file mode 100644 index 0000000..f653640 --- /dev/null +++ b/1.1/WN-LMF-1.1.xsd @@ -0,0 +1,12 @@ + + + + + + + + + + + + diff --git a/1.1/core-1.1.xsd b/1.1/core-1.1.xsd new file mode 100644 index 0000000..5957c99 --- /dev/null +++ b/1.1/core-1.1.xsd @@ -0,0 +1,167 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/1.1/dc.xsd b/1.1/dc.xsd new file mode 100644 index 0000000..0e20135 --- /dev/null +++ b/1.1/dc.xsd @@ -0,0 +1,41 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/1.1/ewn-idtypes-relax_idrefs.xsd b/1.1/ewn-idtypes-relax_idrefs.xsd new file mode 100644 index 0000000..ec228ab --- /dev/null +++ b/1.1/ewn-idtypes-relax_idrefs.xsd @@ -0,0 +1,140 @@ + + + + + + + +]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/1.1/ewn-idtypes.xsd b/1.1/ewn-idtypes.xsd new file mode 100644 index 0000000..e1c923f --- /dev/null +++ b/1.1/ewn-idtypes.xsd @@ -0,0 +1,136 @@ + + + + + + + +]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/1.1/ewn-wordtypes.xsd b/1.1/ewn-wordtypes.xsd new file mode 100644 index 0000000..e485e4d --- /dev/null +++ b/1.1/ewn-wordtypes.xsd @@ -0,0 +1,35 @@ + + + + + +]> + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/1.1/idtypes-relax_idrefs.xsd b/1.1/idtypes-relax_idrefs.xsd new file mode 100644 index 0000000..593d716 --- /dev/null +++ b/1.1/idtypes-relax_idrefs.xsd @@ -0,0 +1,140 @@ + + + + + + + +]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/1.1/idtypes.xsd b/1.1/idtypes.xsd new file mode 100644 index 0000000..8dde930 --- /dev/null +++ b/1.1/idtypes.xsd @@ -0,0 +1,136 @@ + + + + + + + +]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/1.1/ili.xsd b/1.1/ili.xsd new file mode 100644 index 0000000..1068406 --- /dev/null +++ b/1.1/ili.xsd @@ -0,0 +1,37 @@ + + + + + +]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/1.1/meta.xsd b/1.1/meta.xsd new file mode 100644 index 0000000..89ee20a --- /dev/null +++ b/1.1/meta.xsd @@ -0,0 +1,49 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/1.1/pwn.xsd b/1.1/pwn.xsd new file mode 100644 index 0000000..bc37bc0 --- /dev/null +++ b/1.1/pwn.xsd @@ -0,0 +1,24 @@ + + + + + + +]> + + + + + + + + + + + + + + diff --git a/1.1/types.xsd b/1.1/types.xsd new file mode 100644 index 0000000..c3f5e99 --- /dev/null +++ b/1.1/types.xsd @@ -0,0 +1,203 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/1.1/wordtypes.xsd b/1.1/wordtypes.xsd new file mode 100644 index 0000000..adb56d4 --- /dev/null +++ b/1.1/wordtypes.xsd @@ -0,0 +1,27 @@ + + + + + +]> + + + + + + + + + + + + + + + + + + From ab6281b05cbde1371a3113837a9e3ec4f55d7777 Mon Sep 17 00:00:00 2001 From: 1313ou <1313ou@gmail.com> Date: Thu, 27 Feb 2020 14:28:37 +0100 Subject: [PATCH 2/3] Stricter local IDREF requirement on SyntacticBehaviour (no cross-lexfile reference expected) --- 1.1/core-1.1.xsd | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/1.1/core-1.1.xsd b/1.1/core-1.1.xsd index 5957c99..e4b928c 100644 --- a/1.1/core-1.1.xsd +++ b/1.1/core-1.1.xsd @@ -146,9 +146,7 @@ - - - + From 597e73ac333e32f68d6bc6bf3c301174a50a15e7 Mon Sep 17 00:00:00 2001 From: 1313ou <1313ou@gmail.com> Date: Thu, 27 Feb 2020 15:13:11 +0100 Subject: [PATCH 3/3] Fixes README --- 1.1/README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/1.1/README.md b/1.1/README.md index 3efdb5c..bbb4049 100644 --- a/1.1/README.md +++ b/1.1/README.md @@ -16,7 +16,7 @@ Namespaces are left unchanged. Beyond the current namespace, the only namespace The design is modular: ***dc.xsd*** for dc: namespace. -***(ewn-)idtypes(-relax_idrefs).xsd*** for id types (it defines ID policy). +***(ewn-)idtypes(-relax_idrefs).xsd*** for core id types (it defines ID policy). ***(ewn-)wordtypes.xsd*** for word types (it defines word form policy). ***types.xsd*** for core data types. ***pwn.xsd*** for PWN types. @@ -26,19 +26,19 @@ Namespaces are left unchanged. Beyond the current namespace, the only namespace This allows for different levels of validation to be performed. -This makes it possible to bring stricter constraints to bear on the same data. But it does not mean the previous level is incompatible with the next. For example the data that satisfies EWN-LMF-1.1.xsd is a subset of data validated by WN-LMF-1.1.xsd (or WN-LMF-1.1 is a superset of EWN-LMF-1.1). +This makes it possible to bring stricter constraints to bear on the same data. But it does not mean the previous level is incompatible with the next. For example the data that satisfies EWN-LMF-1.1.xsd is a subset of data validated by WN-LMF-1.1.xsd (or WN-LMF-1.1 is a superset of EWN-LMF-1.1). Another use is different IDREF validation depending on whether you are attempting at validating merged files or not. ####id types -idtypes-1.1.xsd and ewn-idtypes-1.1.xsd differ in that the latter imposes extra constraints on the **well-formedness** of EWN ids. +idtypes.xsd and ewn-idtypes.xsd differ in that the latter imposes extra constraints on the **well-formedness** of EWN ids. ####relaxed id types vs strict This deals with **id reference** validation. -*(ewn-)idtypes-1.1.xsd* and *(ewn-)idtypes-1.1-relax_idrefs.xsd* differ in that the latter allows some **non-local references not to have their target in the same file**. This is necessary in the case of part-of-speech cross-references such as the ones found in derivation relations (adj derived from noun, etc...) or maybe other cases (seealso, etc). The target then resides in a different file. This is useful to validate **pre-merging lexicographer files** while the strict mode must be used **to validate the merged file**, to make sure references are not left dangling. +*(ewn-)idtypes.xsd* and *(ewn-)idtypes-relax_idrefs.xsd* differ in that the latter allows some **non-local references not to have their target in the same file**. This is necessary in the case of part-of-speech cross-references such as the ones found in derivation relations (adj derived from noun, etc...) or maybe other cases (seealso, etc). The target then resides in a different file. This is useful to validate **pre-merging lexicographer files** while the strict mode must be used **to validate the merged file**, to make sure references are not left dangling. ####some resulting combinations: