-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for old WordNet Versions < 3.0 #199
Comments
Thanks, @jmccrae, this seems like a good idea. I didn't know anyone still used pre-3.0 versions. Some concerns:
|
Following up... with the |
We could make these releases from the Open English Wordnet project. We have done this previously: https://github.com/globalwordnet/english-wordnet/releases/tag/3.1 Then they could be OEWN 2.0 etc, but also happy to see it come from OMW. I can check if my script works better... it has some issues with ILI numbers but I think it is easy to fix. |
Oh I didn't realize you published that one. First, I noticed that only the sources are published and I had to run the We can compare the english-wordnet 3.1 (ewn31) to the OMW English Wordnet 3.1 (omw-en31).
The difference in lexical entries is a bit worrying. I notice that omw-en31 does not use There are also differences in how the respective files escape characters in IDs ( Here's a sample entry from each: ewn31 <LexicalEntry id="ewn-Aurora-n">
<Lemma writtenForm="Aurora" partOfSpeech="n" />
<Sense id="ewn-Aurora-n-09595291-01" synset="ewn-09595291-n" dc:identifier="aurora%1:18:00::" />
</LexicalEntry> omw-en31 <LexicalEntry id="omw-en31-Aurora-n">
<Lemma writtenForm="Aurora" partOfSpeech="n" />
<Form writtenForm="aurorae" />
<Sense id="omw-en31-Aurora-09595291-n" synset="omw-en31-09595291-n" dc:identifier="aurora%1:18:00::" />
</LexicalEntry> Diffs:
Here's a sample synset for each: ewn31 <Synset id="ewn-00751800-v" ili="i25447" partOfSpeech="v" dc:subject="verb.communication">
<Definition>indicate the right path or direction</Definition>
<SynsetRelation relType="hypernym" target="ewn-00751382-v" />
<Example>"The sign pointed the way to London"</Example>
</Synset> omw-en31 <Synset id="omw-en31-00751800-v" ili="i25447" partOfSpeech="v" members="omw-en31-point_the_way-00751800-v" lexfile="verb.communication" dc:identifier="point_the_way.v.01">
<Definition>indicate the right path or direction</Definition>
<SynsetRelation target="omw-en31-00751382-v" relType="hypernym" />
<Example>The sign pointed the way to London</Example>
</Synset> Diffs:
|
I am pretty sure the difference in entry count is that we counted 's' adjectives as different lexical entries. Later version of OEWN have merged these. One big difference is that we use |
same problem over and over again… amazing! I don’t like the s vs a decision on PWN, but better follow it. |
I think you're right. The fact that there are the same number of senses in both is reassuring.
@arademaker The PWN data is the PWN data; nothing has changed. It's just that @jmccrae and I processed it differently when converting WNDB to WN-LMF. Note that omw-en uses data.adj (both entries use
index.adj (only
At least, that's how I understood the WNDB documentation.
Both files use |
Another follow-up... At first I thought that pre-2.1 versions did not include syntactic frames in the data files, but on closer inspection they do; it's just that the frame descriptions are implied and not defined in a I created omwn/omw-data#38 for fixing the problems with the omw-data script. |
Hi,
thanks Michael. I agree that it would be good to host them with omw-data.
Do you think it is worth trying to propagate the ILIs back (for example
with sense keys)?
…On Tue, 1 Oct 2024 at 07:50, Michael Wayne Goodman ***@***.***> wrote:
Another follow-up... At first I thought that pre-2.1 versions did not
include syntactic frames in the data files, but on closer inspection they
do; it's just that the frame descriptions are implied and not defined in a
verb.Framestext file, which I had been using to build a mapping of frame
numbers to frame text. Without the file I get lookup errors. Since the
frames do not seem to change across versions, I just hard-coded them into
the script and I was able to load 2.0, 1.7.1, 1.7, and 1.6. Version 1.5 had
other issues.
I created omwn/omw-data#38 <omwn/omw-data#38>
for fixing the problems with the omw-data script.
—
Reply to this email directly, view it on GitHub
<#199 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIPZRXBLAGZYIMC3DSVNO3ZZIZYTAVCNFSM6AAAAABOYANQKKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOBUHA2TMOJRHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Francis Bond <https://fcbond.github.io/>
|
There are some mappings for old versions to ILIs here: https://github.com/globalwordnet/cili/tree/master/older-wn-mappings But they were automatically constructed, so we may only want to take the high-confidence ones. |
I used the older mappings when converting to WN-LMF, but I ignored the confidence score. I'll use it for filtering and make the confidence threshold an option. |
Is your feature request related to a problem? Please describe.
A lot of datasets use much older releases of WordNet and it would be good to work with them and this modern library
Describe the solution you'd like
Incorporate all the previous versions listed here:
https://wordnet.princeton.edu/download/old-versions
Additional context
I can generate WNLMF files for them using my API so I can send them on to you to include
The text was updated successfully, but these errors were encountered: