Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct order of disambiguation methods #30

Merged
merged 3 commits into from
Nov 25, 2020

Conversation

rmzelle
Copy link
Member

@rmzelle rmzelle commented Aug 27, 2013

In response to #25 and https://bitbucket.org/bdarcus/citeproc-test/issue/10/disambiguate_bycitedisambiguateconditiontx .

I'll accept this pull request once people have had a chance to review.

@rmzelle
Copy link
Member Author

rmzelle commented Aug 27, 2013

@fbennett, I noticed that the spec currently doesn't mention how disambiguating cites by adding and expanding names affects the corresponding bibliographic entries. Can you quickly recall how this is supposed to work?

A citation like "(Doe 2000, Doe 2000)" that is expanded to "(Joe Doe 2000, Jane Doe 2000)" requires that the bibliographic entries use a similarly or more detail name form (e.g. "J. Doe" wouldn't cut it). Is that how it works?

@fbennett
Copy link
Member

Yep, that's exactly how it works. The settings in the bibliography are defaults, and will be overridden by the extended name parameters applied during disambiguation, if they are more specific.

@rmzelle
Copy link
Member Author

rmzelle commented Aug 27, 2013

Thanks! I'll add something to that effect.

@rmzelle
Copy link
Member Author

rmzelle commented Sep 7, 2013

I included some text relating to the effects of name disambiguation on the involved bibliographic entries.

@rmzelle
Copy link
Member Author

rmzelle commented Oct 3, 2013

Another try:

In the description of disambiguation methods (1) and (2) above, we assumed that each (disambiguated) cite has an unambiguous link to its bibliographic entry. However, even when all cites are different this may not always be the case, for instance when bibliographic entries show fewer names than their cites, or show names in less detail, without initials or full given names. In this situation, disambiguation methods (1) and (2) also act on all members of a set of ambiguously cited bibliographic entries, until no more entries in the set can be unambiguously cited by adding (expanded) names. Each method only takes effect on the involved bibliographic entries after it has been used to disambiguate cites.

@fbennett?

@rmzelle
Copy link
Member Author

rmzelle commented Nov 2, 2013

After another off-list chat, another version:

In the description of disambiguation methods (1) and (2) above, we assumed that each (disambiguated) cite has an unambiguous link to its bibliographic entry. To assure that each cite does in fact uniquely identify its entry in the bibliography, detail that distinguishes cites (such as names, initials, and full given names) must be shown in the corresponding bibliography entries. If this is not the case, disambiguation methods (1) and (2) also act on all members of a set of ambiguously cited bibliographic entries, until no more entries in the set can be unambiguously cited by adding (expanded) names. Each method only takes effect on the involved bibliographic entries after it has been used to disambiguate cites.

@bdarcus
Copy link
Member

bdarcus commented Jun 6, 2020

Bumping.

Obviously if we merge, the merge conflict would need to be resolved. I think a get merge master will fix it, as it's probably just because I changed the file extension.

@bwiernik
Copy link
Member

bwiernik commented Nov 4, 2020

It's not clear to me how to reconcile this paragraph:

If cites cannot be (fully) disambiguated by expanding the rendered names,
and if disambiguate-add-names is set to "true", then the names still
hidden as a result of et-al abbreviation after the disambiguation attempt of
disambiguate-add-names are added one by one to all members of a set of
ambiguous cites, until no more cites in the set can be disambiguated by
adding expanded names.

With the statement that adding names always takes place before expanding names:

A cite is ambiguous when it matches multiple bibliographic entries [#]_. Four
methods are available to eliminate such ambiguity, which are always tried in the
following order:

  1. Show more names
  2. Expand names (adding initials or full given names)

With the ordering rule, that would imply that there is never a condition when you expand names .

The order of the disambiguation methods is wrong I think. The first method should be Expand names.

Copy link
Member

@bwiernik bwiernik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few places where I am unsure of the meaning or if the ordering is correct. @rmzelle @cormacrelf @adam3smith @fbennett, can you take a look?

Comment on lines 1556 to 1557
1. Show more names
2. Expand names (adding initials or full given names)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Show more names
2. Expand names (adding initials or full given names)
1. Expand names (adding initials or full given names)
2. Show more names

If expand names is used, that should take place before adding names, no? Otherwise, this paragraph never activates:

If cites cannot be (fully) disambiguated by expanding the rendered names, and if ``disambiguate-add-names`` is set to "true", then the names still hidden as a result of et-al abbreviation after the disambiguation attempt of ``disambiguate-add-names`` are added one by one to all members of a set of ambiguous cites, until no more cites in the set can be disambiguated by adding expanded names.

Comment on lines +1651 to +1663
In the description of disambiguation methods (1) and (2) above, we assumed that
each (disambiguated) cite has an unambiguous link to its bibliographic entry. To
assure that each cite does in fact uniquely identify its entry in the
bibliography, detail that distinguishes cites (such as names, initials, and full
given names) must be shown in the corresponding bibliography entries. If this is
not the case, disambiguation methods (1) and (2) also act on all members of a
set of ambiguously cited bibliographic entries, until no more entries in the set
can be unambiguously cited by adding (expanded) names. Each method only takes
effect on the involved bibliographic entries after it has been used to
disambiguate cites.

A disambiguation attempt can also be made by rendering ambiguous cites with the
``disambiguate`` condition testing "true" [Method (3)] (see `Choose`_).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what these lines mean. What are some examples here?

Copy link
Contributor

@cormacrelf cormacrelf Nov 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sold. There are two things going on. One is that if your cite has to add initials/given names to unambiguously refer to a bib entry, then the bib entry should also end up with initials/givens. The way this was ensured previously is that people would have bibliographies with full names in them. Are there even any styles that don't do this? What sort of bibliography declines to print authors' full names? Also it's a bit tricky to implement (but not impossible) -- is it a real problem that people are experiencing? Are there actual style guides out there that require cite-style name disambiguation within the bibliography itself in order to safely use initials? Because if that's not the case, don't bother reading the rest here, it's not worth your time :) But generally the interpretation I've managed to eke out of the rest seems like a lot of work before I've seen it to be a problem at all.

It's a bit of a slog to make sense of the rest, but I guess that's the game we're playing. The best reading I can do is that "The overall goal is for bib entries to be uniquely citable such that a person reading the document can tell which entry a cite refers to, so we try to expand or add names in bib entries, only insofar as the ambiguous cites that refer to them get less ambiguous." Honestly, you can stop there, you don't need to be more detailed than that in the disambiguation spec and I'd prefer we didn't. I think what's meant by "Each method only takes effect..." is that the un-expanded versions of the rendered cites (i.e. <citation>) built specifically for disambiguation against a specific entry should NOT stop being the basis for cites matching that entry during cite disambiguation, but you can just say "This does not affect the cite disambiguation process.". What this requires is:

  1. add/expand a name in a bibliography entry
  2. look up the same name in the dummy cites used to match other cites against, propagate the changes
  3. run through all the cites that matched previously again to check if any of them would be more finely disambiguated after the change. You can ignore the ones that actually refer to the entry.
  4. stop when they can't.

I think the intention is that this improves the yield for cite-disambiguation. You'd have to mount a tricky argument to show me it would help, because cite disambiguation is meant to stop just before it ceases to be effective; if you've already added the names/initials/givens from all the cites referred to in step (3) already (in their "corresponding bibliography entries"), can you even devise a test case that will trigger this uniqueness failsafe? I have no energy left to do that.

Altogether, it's like... OK... but disambiguation is already slow. This not only adds a pretty big computation, it makes it more difficult to make the existing stuff fast. If you think of matching a cite against a possible matching reference as a list of normalised, rendered cites built with a dummy cite for that reference, then you are no longer so free to optimise that list. (With your weapon of choice, DFAs, RegexSet, ...) You have to have the ability to do step (2) on those dummy rendered cites, and none of those code weapons can withstand that mutation. That list was one of the very few things that never changes as long as a reference doesn't change! So -- render them again, and each time you add a single initial to the bib entry, render more? At the very least these things are not "the same disambiguation methods" as for cites. They'll have to be re-written to reflect the is-cite-ambiguity-improved test. Any way you cut it, this is a LOT of work that's already solved by using full names in your bibliographies.

@bwiernik bwiernik added the 1.0.2 label Nov 4, 2020
@bwiernik
Copy link
Member

Pinging @rmzelle @cormacrelf @adam3smith @fbennett. Could you folks take a look at thes?

@bwiernik
Copy link
Member

@denismaier @bdarcus Could you also weigh in?

@bwiernik bwiernik merged commit 0c7426e into citation-style-language:master Nov 25, 2020
@bwiernik
Copy link
Member

Current citeproc-js behavior does apply expand-names first, before add-names, and this is the logical order.

@fbennett
Copy link
Member

@bwiernik: Sorry for failing to respond on this. @cormacrelf and I had a discussion about this in the citeproc-rs tracker, where we reached the same conclusion. zotero/citeproc-rs#60

@bwiernik
Copy link
Member

Great! After running a bunch of tests against current citeproc-js behavior, I figured this had to be the right setup.

Did you @fbennett or @cormacrelf write tests for that?

@denismaier
Copy link
Member

@bwiernik Sorry, for not responding. I also think that this is more logical.
I'm wondering how @jgm's new citeproc behaves in this regard.

@adam3smith
Copy link
Member

yup, I agree. If both types of disambiguation are used, add-givenname should be applied first (that's the right summary, yes?)

@cormacrelf
Copy link
Contributor

I don't have a strong view on the add-givenname ordering but looks like this is a stylistic choice which y'all appear to agree is good. So it's good!

@bwiernik
Copy link
Member

@adam3smith Yes. That's right.

@fbennett
Copy link
Member

fbennett commented Nov 28, 2020 via email

@bwiernik
Copy link
Member

@fbennett Yes? I think that's clear from the spec?

If set to "true" ("false" is the default), names that would otherwise be hidden as a result of et-al abbreviation are added one by one to all members of a set of ambiguous cites, until no more cites in the set can be disambiguated by adding names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants