Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for improving cross-references #39

Open
Marcusjmdict opened this issue Sep 29, 2021 · 6 comments
Open

Proposal for improving cross-references #39

Marcusjmdict opened this issue Sep 29, 2021 · 6 comments

Comments

@Marcusjmdict
Copy link

Marcusjmdict commented Sep 29, 2021

I started writing a proposal on fleshing out our xref system several years ago which I last edited in 2018. I think I intended to add more examples and other category suggestions but I figure there's enough in here to start a discussion:

I believe we could improve the usefulness of many JMdict entries quite a bit by adding several new cross reference/xref categories ("types").

While we're pretty strict about what entries get the [ant] (antonym) xref, it's kind of an "anything" goes situation with [see].

[see] currently has a couple of different usages:

  1. we often use it to mean "More commonly as:"
  2. we commonly use it to point to the unabbreviated form of an abbreviated entry. (creating a bit of confusion for entries where the abbreviated form is by far the most common!)
  3. we sometimes use it for synonyms that aren't necessarily more common, and between very similar entries like xがy and xのy
  4. we sometimes use it to show etymology or the constituent parts of a compound word or phrase, e.g. [see=スレ] in the 糞スレ entry.
  5. in some cases, we use it to refer to examples of a phrase, e.g. in the 掛ける entry, the "to put on glasses" sense has an xref to 電話をかける (which is kind of the opposite of use 4, which can be confusing, and I think we should only be doing one of these things)
  6. we often use it to "explain" a Japanese term that is given as-is in Romanized Japanese in a gloss e.g. the redirect to 子の日の遊び in the 子の日の松 entry (where the gloss is "pine shoot pulled out during ne-no-hi-no-asobi")
  7. we often use it for "contrasting" use e.g. things that have opposite or contrasting meanings but perhaps not in the strict sense that they qualify for [ant]. e.g. [see=出職] in the 居職 entry
  8. we often use it in conjugated entries to point to the non-conjugated form of the word, e.g. [see=しまう] in the しまった entry
  9. we sometimes use it to xref things that are in a ranked relationship, e.g. the " highest (of a three-tier ranking system)" in the 松 entry has xrefs to both 梅 and 竹
  10. we sometimes use it to refer to "the official name" in entries, for example 細田派 → 清和政策研究会 (proposed), even when the less official name is the more common one.
  11. we sometimes use it to show the standard Japanese equivalent of a dialectal word, for example やろう→だろう

and there's probably other use cases I haven't covered here, as we really don't have any firm rules on when to use it.

It might not be strictly necessary to split the general [see=] xref into 10+ different things, but also... might we not as well? I feel that having more "granular" tags could be a very major improvement to the dictionary. It would give dictionary applications plenty more options on how to format the xrefs and make entries easier to understand. I think this has the potential to help us close the gap between our entries and 中辞典/GG5's in terms of helpfulness/ease-of-understanding.
(For the record, I don't think there's much of a gap when it comes to accuracy/completeness!)

Here's some suggestions for new cross reference types:

  • [commonly=X] (more commonly as X)
  • [abbrof=X] (abbreviation of X)
  • [officially=X] (officially/formally known as X)
  • [cont=X] (contrast withX)
  • [conjug=X] (conjugated form of X)
  • [ranka=X] (ranked above X/in a ranked relationship to X, where this word comes (immediately) above it.
  • [rankb=X] (ranked below X/in a ranked relationship to X, where this word comes (immediately) below it)
    (These 2 could for example be used with 松⇔竹⇔梅, SS⇔S⇔A⇔B..., 甲⇔乙, 大変よくできました⇔よくできました, 公爵⇔侯爵⇔伯爵⇔子爵⇔男爵, 大将⇔中将⇔少将, etc. Maybe even things like 原付一種⇔原付二種)
  • [precededby=X] (for chronological relations when this word was/is preceeded by X)
  • [succeededby=X] (for chronological relations, when this word was/is succeded by X)
    (These 2 could be useful to have separately from [ranka=][rankb=] for clarity. For 年号 e.g. Heisei period → Reiwa, etc. Also for things like the 12 Zodiac entries, the Zodiac-based time entries (午の刻 etc.), the sexagenary circle entries, 丙午⇔丁未.... Even for days of the week - not to help users understand that Tuesday comes after Monday, but as a navigation help, and to give dictionary apps the option to easily display them all together.)
  • [example=X] (to show other entries (compound nouns, phrases) containing this (specific sense of a) word. Could be used to make entries look more like they do in GG5, 中辞典.)
  • [standardequiv=X] (for things like やろう(ksb)→だろう)
  • [see=X] (could be kept as-is for usage 3 and 6, perhaps.)
@Marcusjmdict
Copy link
Author

Marcusjmdict commented Oct 5, 2021

https://jisho.org/forum/615b67d6d5dda76387000000-is-there-an-easy-way-to-find-the-counterpart-in-in-transitive-verb-pairs

Wouldn't it make sense for jisho.org to point to the respective counterpart in a pair? E.g. in entry 割る something like "See also: 割れる".

This could be handled by [see=] or maybe by one or two new xref types.

@Kimtaro
Copy link

Kimtaro commented Oct 5, 2021

I really like this proposal, @Marcusjmdict. I'll definitely add support for showing these cross references in Jisho if they are implemented in JMdict.

@JMdictProject
Copy link
Owner

I'll try and comment later in Marcus's discussion piece. There's certainly scope to expand on the initial attributes I suggested for the new element discussed in http://www.edrdg.org/wiki/index.php/JMdict:_Next_Generation#Cross-References The idea of introducing some form of hyponym/hypernym linking (Marcus' ranka/rankb) is a good one.

As for the linking of transitive/intransitive verb pair entries - that would be a very good move, IMO. It could do with an attribute of its own.

@Marcusjmdict
Copy link
Author

Marcusjmdict commented Dec 17, 2021

Could we consider adding some of these x-refs already now (as opposed to as a new element in jmdict NG or after we've had enough time to discuss exactly which of these we should implement)? It seems it shouldn't be very complicated to allow for new "TYP" in our current system?

I try and add comments like "more commonly as xref" etc. in the comment field so that they can be dug up and converted later through thr jmdictdb advanced search, but having to re-visit them rather than adding this type of xref from the start ends up wasting a lot of precious time.

Specifically I'd love to see "more commonly as ..." and "contrast/compare with ..." and "abbr of ..." implemented as soon as possible.

@parfait8566
Copy link

I think some system of inter-linking similar-meaning homophones (異字同訓) as described in #107 would be helpful.
Also, not directly related, by have more types of [note] (grammar note, usage note, etc.)

@razasyedh
Copy link

I wholeheartedly support Marcus' proposal. Incidentally, I had a draft sitting around in my email from 2020 saying essentially the same thing.

More versatile cross-linking Currently, in JMDict we have the ability to create general references between entries ([see=]), and, less used, noting antonyms. However, I find that merely pointing users to other entries does not give enough information about the relationship being indicated.

Sometimes it's to a compound demonstrating a specific sense of a term. Sometimes the target is another Japanese term we use in the gloss. Sometimes we do it because the target is a more common way of saying the thing. Other times, it's to explain part of a slang term. Finally, it could simply be a closely-related term.

One project that you're likely familiar with is WordNet, which encodes the semantic relationship between words. While I don't think we need such a systematic approach, the results are quite useful.

Similarly, we could be more specific when we build up our own web of interconnections. Actual Japanese dictionaries list out compounds separately, note common usages, and cross reference (usually with an arrow) other entries. So in addition to "see", we could have "derivedFrom", "moreCommonly", "alternateForm", etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants