Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XSD 1.1 schema #45

Closed
wants to merge 4 commits into from
Closed

XSD 1.1 schema #45

wants to merge 4 commits into from

Conversation

1313ou
Copy link

@1313ou 1313ou commented Apr 24, 2021

XSD version of release 1.1
(Needs more testing but validating migrated current english-wordnet data passes)

PS :
Migrated english-wordnet data is obtained this way:

sed 's|http://purl.org/dc/elements/1.1/|https://globalwordnet.github.io/schemas/dc/|g' $xml

@1313ou 1313ou changed the title 1.1b XSD 1.1 Apr 24, 2021
@1313ou 1313ou changed the title XSD 1.1 XSD 1.1 schema Apr 24, 2021
@goodmami
Copy link
Member

Thanks! But I'm a bit confused about the goal of this PR. Firstly, while this PR is ostensibly about introducing an XSD schema for WN-LMF, it also replaces the top-level README with something about the Extended English WordNet pipeline, which is unrelated. Similarly, there are some .xsd files which appear to be relevant only for PWN or EWN which, I would think, should be managed by those respective projects (perhaps OMW in the case of the WN-LMF release of PWN).

Also:

  • Is this related to Add XSD schema for validation #10? Or is Add XSD schema for validation #10 now outdated and should be closed?
  • I can see that XSD is more powerful than DTD, but it's also more complex and therefore increases the maintenance burden, so what are the practical benefits? Specifically:
    • What problem is the DTD incapable of addressing that XSD can?
    • Is the goal to replace the DTD? I.e., should we switch to XSD and ditch the DTD? We could also do something like what OpenDocument has done and have a RelaxNG schema which can be easily and automatically converted to XSD or DTD as needed (the other direction is not as straightforward).

@1313ou
Copy link
Author

1313ou commented Apr 26, 2021

it also replaces the top-level README with something about the Extended English WordNet pipeline, which is unrelated

True this was imported by mistake (because it shares the same fork as XEWN schemas). This has been fixed in commit #defccfc

Is this related to #10

Yes (it uses the same modularity and philosophy) and no (validates different data)

XSD is more powerful than DTD

DTDs are outdated (they survive but should be ditched). I'm not going to repeat the literature here. Suffice it to say XSD introduces types. For example pronunciation data could be typed to use IPA, anything not IPA would be rejected (by comparison CDATA does not validate anything).

I'm quite happy with the validation as you want it to be. Using a stricter one has proved helpful to the projects I am conducting (and has raised errors in the current one that had otherwise gone unnoticed).

Besides, I see validation as a means of ensuring data coherence, not as the description of a form per se. You may achieve it in different ways, you may want different degrees of coherence (WN/EWN differ in the admissible characters in lemmas, EWN so far having ASCII + oddities), you may want to define supersets and subsets, you may want to have various extension mechanisms (yours should be optional separate DTD not mandatory core (1)).

In a word it doesn't have to be unique (2).

(1) BTW I am not to keen on importing external data in a way that is not IDREF
(2) Hence the building blocks.

@jmccrae
Copy link
Member

jmccrae commented Apr 26, 2021

The XSD schema validation is likely very useful.

I don't think we should be supporting specific validation for individual projects like Open English WordNet and Princeton WordNet. Particularly, we have not yet fully decided how EWN will update with this schema update so we risk this becoming out of date.

@jmccrae jmccrae closed this Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants