-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Addition to NLTK migration guide w.r.t. offsets #183
Comments
Hi, if you have a wordnet derived from PWN 3.0 with the same offsets, then it can be done as follows:
Many people (including omw 1.0) treat all satellite adjectives (pos 's') as adjectives (pos 'a').
|
@BramVanroy thanks for the good questions (here and on the https://github.com/goodmami/penman project, too 👋). I agree that the documentation could be improved in this area, possibly in the NLTK migration guide. And thanks, @fcbond, for the good description and solution. The basic problem is that synset offsets (which are specific to each wordnet version) are not an inherent part of the WN-LMF formatted lexicons that are used by Wn, but for some lexicons (mainly the Note that I also have an unmerged Lines 329 to 342 in 5092e62
@fcbond said:
This is not entirely true. Wn does conflate |
First, thanks for the help! I settled for this: def offset2omw_synset(wnet: wn.Wordnet, offset: str) -> Optional[wn.Synset]:
offset = offset.replace("wn:", "")
offset = "0" * (9-len(offset)) + offset
wnid = f"omw-en-{offset[:-1]}-{offset[-1]}"
wnid_s = None
try:
return wnet.synset(wnid)
except wn.Error:
if wnid[-1] == "a":
wnid_s = f"omw-en-{wnid[:-2]}-s"
try:
return wnet.synset(wnid_s)
except wn.Error:
pass
logging.warning(f"Could not find offset {offset} ({wnid}{' or ' + wnid_s if wnid_s else ''}) in {wnet._lexicons}") I looked at the NLTK branch @goodmami and while I think that would be very useful, I just needed a quick function that I could easily plug into my code (without having to install from GitHub). But I think it'd be a useful API to have - although I can imagine it is a lot of work! And thank you for your work. It seems a coincidence that you are providing exactly the tools that I need for my work. I am very thankful and motivated that you created these libraries - and that they work so well and are well-documented! I've also peeked at the internals/API and documentation to inspire my own work, so a big thank you! |
Thanks for the kind words, @BramVanroy! And I'm glad you were able to find a solution. I'm going to keep the issue open because, as the issue title states, I think this sort of information would be useful in the documentation, so the issue should be closed when that happens. |
Is your feature request related to a problem? Please describe.
Hello
I have access to WordNet synset offset IDs that I retrieve from an API (key: wnSynsetOffset). They look like this
wn:00981304a
. It is relatively straightforward to get these through NLTK:However, it is not clear to me how I can convert this approach to
wn
. I like the API ofwn
more and I would like to make use of thetranslate
feature specifically, so that is why I want to make the transition.Describe the solution you'd like
Perhaps a description in the documentation? I think that this section is relevant but it is not clear to me how to apply it on a use-case. So a real-world example can be helpful, I think.
Describe alternatives you've considered
I have tried the following manipulations but none of them work (yielding empty synset lists):
The text was updated successfully, but these errors were encountered: