Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add XSD schema for validation #10

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions 1.1/EWN-LMF-1.1-relax_idrefs.xsd
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
<?xml version="1.0" encoding="UTF-8"?>

<!-- Copyright (c) 2020. Bernard Bou <1313ou@gmail.com>. -->

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" >

<xsd:include schemaLocation='ewn-idtypes-relax_idrefs.xsd' />
<xsd:include schemaLocation='ewn-wordtypes.xsd' />
<xsd:include schemaLocation='types.xsd' />
<xsd:include schemaLocation='core-1.1.xsd' />

</xsd:schema>
12 changes: 12 additions & 0 deletions 1.1/EWN-LMF-1.1.xsd
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
<?xml version="1.0" encoding="UTF-8"?>

<!-- Copyright (c) 2020. Bernard Bou <1313ou@gmail.com>. -->

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" >

<xsd:include schemaLocation='ewn-idtypes.xsd' />
<xsd:include schemaLocation='ewn-wordtypes.xsd' />
<xsd:include schemaLocation='types.xsd' />
<xsd:include schemaLocation='core-1.1.xsd' />

</xsd:schema>
65 changes: 65 additions & 0 deletions 1.1/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
#WordNet-LMF 1.1
#===

This is to equip WordNet with state-of-the-art validation schemas the way FrameNet did. This move is dictated by the following:

- DTD does not provide fine-grained control the way XSD does. The most significant difference between DTDs and XML Schema is the capability to create and use **datatypes**. XSD schemas define datatypes for elements and attributes while DTD doesn't support them. This allows for control on what sort of data (ids, content) is expected. Leveraging datatypes gets errors to bubble up that would otherwise go unnoticed.

- Incidentally the reference to Dublin Core schema is erroneous (as mentioned [here](https://github.com/globalwordnet/schemas/issues/5) ) in that the definition of elements is mistakenly applied to attributes. Any real validation against the Dublin Core definitions would fail. Besides, Dublin Core seems superimposed and unnatural and it is doubtful it is of real use here.

####name spaces

Namespaces are left unchanged. Beyond the current namespace, the only namespace is dc:.

####modules

The design is modular:

***dc.xsd*** for dc: namespace.
***(ewn-)idtypes(-relax_idrefs).xsd*** for core id types (it defines ID policy).
***(ewn-)wordtypes.xsd*** for word types (it defines word form policy).
***types.xsd*** for core data types.
***pwn.xsd*** for PWN types.
***ili.xsd*** for ili types.
***meta.xsd*** for meta types.
***core-1.1.xsd*** for elements and the core structure.

This allows for different levels of validation to be performed.

This makes it possible to bring stricter constraints to bear on the same data. But it does not mean the previous level is incompatible with the next. For example the data that satisfies EWN-LMF-1.1.xsd is a subset of data validated by WN-LMF-1.1.xsd (or WN-LMF-1.1 is a superset of EWN-LMF-1.1).

Another use is different IDREF validation depending on whether you are attempting at validating merged files or not.

####id types

idtypes.xsd and ewn-idtypes.xsd differ in that the latter imposes extra constraints on the **well-formedness** of EWN ids.

####relaxed id types vs strict

This deals with **id reference** validation.

*(ewn-)idtypes.xsd* and *(ewn-)idtypes-relax_idrefs.xsd* differ in that the latter allows some **non-local references not to have their target in the same file**. This is necessary in the case of part-of-speech cross-references such as the ones found in derivation relations (adj derived from noun, etc...) or maybe other cases (seealso, etc). The target then resides in a different file. This is useful to validate **pre-merging lexicographer files** while the strict mode must be used **to validate the merged file**, to make sure references are not left dangling.

####some resulting combinations:

WN-LMF-1.1-relax_idrefs.xsd
WN-LMF-1.1.xsd
EWN-LMF-1.1-relax_idrefs.xsd
EWN-LMF-1.1.xsd

####EWN compatibility with 1.1. schema

The current lexicographer files satisfy both:

- WN-LMF-1.1-relax_idrefs.xsd
- EWN-LMF-1.1-relax_idrefs.xsd

The current merged file satisfies both:

- WN-LMF-1.1.xsd
- EWN-LMF-1.1.xsd

####Validation tool

[Preferred validation tool](https://github.com/1313ou/ewn-validate2) (based on Saxon, fast and efficient)
[Basic validation tool](https://github.com/1313ou/ewn-validate) (based on standard validation tools that come with Java8, may be slow)
12 changes: 12 additions & 0 deletions 1.1/WN-LMF-1.1-relax_idrefs.xsd
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
<?xml version="1.0" encoding="UTF-8"?>

<!-- Copyright (c) 2020. Bernard Bou <1313ou@gmail.com>. -->

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" >

<xsd:include schemaLocation='idtypes-relax_idrefs.xsd' />
<xsd:include schemaLocation='wordtypes.xsd' />
<xsd:include schemaLocation='types.xsd' />
<xsd:include schemaLocation='core-1.1.xsd' />

</xsd:schema>
12 changes: 12 additions & 0 deletions 1.1/WN-LMF-1.1.xsd
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
<?xml version="1.0" encoding="UTF-8"?>

<!-- Copyright (c) 2020. Bernard Bou <1313ou@gmail.com>. -->

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" >

<xsd:include schemaLocation='idtypes.xsd' />
<xsd:include schemaLocation='wordtypes.xsd' />
<xsd:include schemaLocation='types.xsd' />
<xsd:include schemaLocation='core-1.1.xsd' />

</xsd:schema>
165 changes: 165 additions & 0 deletions 1.1/core-1.1.xsd
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
<?xml version="1.0" encoding="UTF-8"?>

<!-- Copyright (c) 2020. Bernard Bou <1313ou@gmail.com>. -->

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:dc='http://purl.org/dc/elements/1.1/'
>

<xsd:import namespace='http://purl.org/dc/elements/1.1/' schemaLocation='dc.xsd' />
<xsd:include schemaLocation='pwn.xsd' />
<xsd:include schemaLocation='ili.xsd' />
<xsd:include schemaLocation='meta.xsd' />

<!-- E L E M E N T S -->

<xsd:element name='LexicalResource'>
<xsd:complexType>
<xsd:sequence>
<xsd:element ref='Lexicon' maxOccurs='unbounded' />
</xsd:sequence>
</xsd:complexType>
</xsd:element>

<xsd:element name='Lexicon'>
<xsd:complexType>
<xsd:sequence>
<xsd:element ref='LexicalEntry' minOccurs='0' maxOccurs='unbounded' />
<xsd:element ref='Synset' minOccurs='0' maxOccurs='unbounded' />
</xsd:sequence>

<xsd:attribute name='id' type='xsd:ID' use='required' />
<xsd:attribute name='label' type='xsd:string' use='required' />
<xsd:attribute name='language' type='xsd:string' use='required' />
<xsd:attribute name='email' type='xsd:string' use='required' />
<xsd:attribute name='license' type='xsd:string' use='required' />
<xsd:attribute name='version' type='xsd:string' use='required' />
<xsd:attribute name='url' type='xsd:string' use='optional' />
<xsd:attribute name='citation' type='xsd:string' use='optional' />
<xsd:attributeGroup ref='Meta' />
</xsd:complexType>
</xsd:element>

<xsd:element name='LexicalEntry'>
<xsd:complexType>
<xsd:sequence>
<xsd:element ref='Lemma' minOccurs='1' maxOccurs='1' />
<xsd:element ref='Form' minOccurs='0' maxOccurs='unbounded' />
<xsd:element ref='Sense' minOccurs='1' maxOccurs='unbounded' />
<xsd:element ref='SyntacticBehaviour' minOccurs='0' maxOccurs='unbounded' />
</xsd:sequence>

<xsd:attribute name='id' type='LexicalEntryIDType' use='required' />
<xsd:attributeGroup ref='Meta' />
</xsd:complexType>
</xsd:element>

<xsd:element name='Lemma'>
<xsd:complexType>
<xsd:sequence>
<xsd:element ref='Tag' minOccurs='0' maxOccurs='unbounded' />
</xsd:sequence>

<xsd:attribute name='writtenForm' type='WrittenFormType' use='required' />
<xsd:attribute name='script' type='ScriptType' use='optional' />
<xsd:attribute name='partOfSpeech' type='PartOfSpeechType' use='required' />
</xsd:complexType>
</xsd:element>

<xsd:element name='Form'>
<xsd:complexType>
<xsd:sequence>
<xsd:element ref='Tag' minOccurs='0' maxOccurs='unbounded' />
</xsd:sequence>

<xsd:attribute name='writtenForm' type='WrittenFormType' use='required' />
<xsd:attribute name='script' type='ScriptType' use='optional' />
</xsd:complexType>
</xsd:element>

<xsd:element name='Sense'>
<xsd:complexType>
<xsd:sequence>
<xsd:element ref='SenseRelation' minOccurs='0' maxOccurs='unbounded' />
<xsd:element ref='Example' minOccurs='0' maxOccurs='unbounded' />
<xsd:element ref='Count' minOccurs='0' maxOccurs='1' />
</xsd:sequence>

<xsd:attribute name='id' type='SenseIDType' use='required' />
<xsd:attribute name='synset' type='LocalSynsetIDREFType' use='required' />
<xsd:attribute name='n' type='NType' use='optional' />
<xsd:attribute name='lexicalized' type='xsd:boolean' default='true' use='optional' />
<xsd:attribute ref='dc:identifier' use='optional' />
<xsd:attributeGroup ref='Meta' />
</xsd:complexType>
</xsd:element>

<xsd:element name='Synset'>
<xsd:complexType>
<xsd:sequence>
<xsd:element ref='Definition' minOccurs='0' maxOccurs='1' />
<xsd:element ref='ILIDefinition' minOccurs='0' maxOccurs='1' />
<xsd:element ref='SynsetRelation' minOccurs='0' maxOccurs='unbounded' />
<xsd:element ref='Example' minOccurs='0' maxOccurs='unbounded' />
</xsd:sequence>

<xsd:attribute name='id' type='SynsetIDType' use='required' />
<xsd:attribute ref='ili' use='required' />
<xsd:attribute name='partOfSpeech' use='optional' type='PartOfSpeechType' />
<xsd:attribute ref='dc:subject' use='optional' />
<xsd:attribute name='lexicalized' type='xsd:boolean' default='true' use='optional' />
<xsd:attributeGroup ref='Meta' />
</xsd:complexType>
</xsd:element>

<xsd:element name='Definition'>
<xsd:complexType mixed='true'>
<xsd:attribute name='language' type='xsd:string' use='optional' />
<xsd:attribute name='sourceSense' type='SynsetIDREFType' use='optional' />
<xsd:attributeGroup ref='Meta' />
</xsd:complexType>
</xsd:element>

<xsd:element name='Example'>
<xsd:complexType mixed='true'>
<xsd:attribute name='language' type='xsd:string' use='optional' />
<xsd:attributeGroup ref='Meta' />
</xsd:complexType>
</xsd:element>

<xsd:element name='SynsetRelation'>
<xsd:complexType>
<xsd:attribute name='target' type='SynsetIDREFType' use='required' />
<xsd:attribute name='relType' type='SynsetRelationType' use='required' />
<xsd:attributeGroup ref='Meta' />
</xsd:complexType>
</xsd:element>

<xsd:element name='SenseRelation'>
<xsd:complexType>
<xsd:attribute name='target' type='SenseIDREFType' use='required' />
<xsd:attribute name='relType' type='SenseRelationType' use='required' />
<xsd:attributeGroup ref='Meta' />
</xsd:complexType>
</xsd:element>

<xsd:element name='SyntacticBehaviour'>
<xsd:complexType>
<xsd:attribute name='subcategorizationFrame' type='xsd:string' use='required' />
<xsd:attribute name='senses' type='LocalSenseIDREFSType' use='optional' />
</xsd:complexType>
</xsd:element>

<xsd:element name='Tag'>
<xsd:complexType mixed='true'>
<xsd:attribute name='category' type='xsd:string' use='required' />
</xsd:complexType>
</xsd:element>

<xsd:element name='Count'>
<xsd:complexType mixed='true'>
<xsd:attributeGroup ref='Meta' />
</xsd:complexType>
</xsd:element>

</xsd:schema>
41 changes: 41 additions & 0 deletions 1.1/dc.xsd
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
<?xml version="1.0" encoding="UTF-8"?>

<!-- Copyright (c) 2020. Bernard Bou <1313ou@gmail.com>. -->

<!DOCTYPE xsd:schema
[
]>

<xsd:schema xmlns:xsd='http://www.w3.org/2001/XMLSchema'
xmlns:dc='http://purl.org/dc/elements/1.1/'
targetNamespace='http://purl.org/dc/elements/1.1/'>

<xsd:import schemaLocation='pwn.xsd' />
<xsd:import schemaLocation='types.xsd' />

<!-- A T T R I B U T E S -->

<!-- meta -->
<xsd:attribute name='contributor' type='xsd:string' />
<xsd:attribute name='coverage' type='xsd:string' />
<xsd:attribute name='creator' type='xsd:string' />
<xsd:attribute name='date' type='xsd:string' />
<xsd:attribute name='description' type='xsd:string' />
<xsd:attribute name='format' type='xsd:string' />
<xsd:attribute name='publisher' type='xsd:string' />
<xsd:attribute name='relation' type='xsd:string' />
<xsd:attribute name='rights' type='xsd:string' />
<xsd:attribute name='source' type='xsd:string' />
<xsd:attribute name='title' type='xsd:string' />
<xsd:attribute name='type' type='xsd:string' />

<!-- sensekey -->
<xsd:attribute name='identifier' type='LegacySenseKeyType' />

<!-- subject -->
<xsd:attribute name='subject' type='LexFileType' />

</xsd:schema>



Loading