Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

representing "negative evidence" #250

Closed
wants to merge 1 commit into from
Closed

Conversation

thomast73
Copy link
Contributor

Representing "negative evidence" or "the absence of evidence one would expect to find" is often important in forming a proof argument. We want the GEDCOM X model to support this concept.

Requirements

Documenting negative evidence typically requires the following elements:

  • a citation(s) for the body of records searched
  • a description of the search (methods, search terms, scope, party conducting the search, date of the search, etc.)

From a modeling perspective, we would like "negative evidence" to look like and use the same mechanisms used to represent "evidence".

Proposal

We think that the documentation requirements can be met with a Document object. The text of the document could be used to described the search—the date of the search, who conducted the search, how the search was carried out, search terms, etc. The sources list could contain the source description(s) for the body of records that was searched.

Assuming Document as the documentation mechanism for a negative search, how should we associate this Document with our hypothesis as "negative evidence" supporting our hypothesis?

In the GEDCOM X model, we use EvidenceReference to associate data with a hypothesis as evidence. The resource member points to the data being used as evidence and is a required field. But in the case of "negative evidence", there is no data to point to—by definition, we have an "absence of data". It is by analysis that we have determined that there was no data where there ought to have been some.

We would like to propose that we can represent "negative evidence" with an EvidenceReference instance with the document describing the negative search as the analysis document, and an explicitly missing resource reference.

However, as currently defined, an EvidenceReference REQUIRES a resource reference (explicitly present) and that reference MUST resolve to an object that matches the type of its container (a specialization of Subject). This means that the current constraint would not allow resource to be explicitly missing. Therefore, we would like to modify the constraint to something like the following.

For resource:

REQUIRED, except when modeling "negative evidence", in which case it MUST NOT be present. If provided, MUST resolve to an instance of [http://gedcomx.org/v1/Subject].

For analysis:

OPTIONAL, except when modeling "negative evidence", in which case it MUST be present. If provided, MUST resolve to an instance of [http://gedcomx.org/v1/Document] of type http://gedcomx.org/Analysis

The above changes are include in this pull request.

@jralls
Copy link
Contributor

jralls commented Apr 30, 2013

Hmm. This rather distorts EvidenceReference from n-tier linkage to analysis Document linkage, a different model.

That's OK, though we might want a different name.

But consider that in that model (which is the one taught by the BCG lecturers), the analysis Document linkage goes between all evidence (or information, to use your lexicon from the EvidenceReference pull request) and the several conclusions. When we discussed it last summer Ryan posited that we would use a chain of SourceReferences to connect the chain of actual sources through analysis Documents to Conclusions. Has your thinking changed? Why? And why overload EvidenceReference with that job?

@stoicflame
Copy link
Member

I think you may be overanalyzing it. We have a need to support negative evidence. We're thinking that we have the data structures in place to do so; we just need to be more explicit about how to use them for this purpose.

We believe that using EvidenceReference is a good option for doing so because the "lack of information" is being used as evidence to support our conclusions. Personally, I don't think it's overloading the EvidenceReference because we're still "referencing evidence". It's just that instead of referencing information as evidence (using resource property), it's referencing the lack of information as evidence.

Another option may be to cite the "failure to find information" as a source by describing that search using SourceDescription and referencing it using SourceReference. Are you saying this is the option you'd prefer? We could pursue that, but we'd probably need to add another resource type. Naming that new resource type is probably the hard part (maybe http://gedcomx.org/NegativeSearch?).

Personally, I prefer using EvidenceReference because I think it's a better fit.

@jralls
Copy link
Contributor

jralls commented Apr 30, 2013

I think you may be overanalyzing it.

Probably impossible. The intention is a spec that can be realized in code, and that realization requires lots of detailed analysis.

Another option may be to cite the "failure to find information" as a source by describing that search using SourceDescription and referencing it using SourceReference. Are you saying this is the option you'd prefer? We could pursue that, but we'd probably need to add another resource type. Naming that new resource type is probably the hard part (maybe http://gedcomx.org/NegativeSearch?).

As so often of late, this is territory we've been over before, in #144 among others. The SourceReference model was the agreed solution. Why is it now "another option" when it has been in the spec for 9 months?

We have a need to support negative evidence.

Again, how does the existing solution not meet this need? What about conflicting evidence and indirect evidence?

Personally, I don't think it's overloading the EvidenceReference because we're still "referencing evidence".

Conceptually, perhaps. As a data model, not so much. Before this change EvidenceReference creates a tree of Conclusion instances constrained to be of a single subclass to base instances directly derived from some SourceDescription. With this change it's that or a link to an analysis Document which explains why the EvidenceReference doesn't point to a Conclusion. The EvidenceReference either MUST have a Conclusion pointer and MAY have an analysis Document or MUST NOT have a Conclusion pointer and MUST have an analysis Document. If you were writing something like that in C you'd use a union. How can you claim that it's not an overload?

Personally, I prefer using EvidenceReference because I think it's a better fit.

You haven't yet articulated how it's a fit at all, never mind a better one.

@stoicflame
Copy link
Member

As so often of late, this is territory we've been over before, in #144 among others. The SourceReference model was the agreed solution. Why is it now "another option" when it has been in the spec for 9 months?

I disagree that it's territory we've been over before. Where in that thread did we discuss how to use that model to represent negative evidence? If we did, then that's great. Let's just make it more explicit and be done with it. Help us do that. How do I use the SourceDescription to describe "negative evidence"? What would the values if its properties be?

You haven't yet articulated how it's a fit at all, never mind a better one.

I disagree; I believe I did articulate it. I believe it's a git because "negative evidence" is evidence, not a container of information that may be used as evidence (which is what a "source" is).

@jralls
Copy link
Contributor

jralls commented Apr 30, 2013

disagree that it's territory we've been over before. Where in that thread did we discuss how to use that model to represent negative evidence? If we did, then that's great. Let's just make it more explicit and be done with it. Help us do that. How do I use the SourceDescription to describe "negative evidence"? What would the values if its properties be?

Sarah mentioned it specifically as a required element of a proof argument, which Thad later cited in his formulation of EvidenceAnalysis which eventually morphed into the Document of type Analysis (which I've taken to calling an analysis Document). Thad later explained how to connect the pieces with SourceReferences

I believe it's a fit because "negative evidence" is evidence, not a container of information that may be used as evidence (which is what a "source" is).

That's hand-waving, based solely on the class name. EvidenceReference is a data type which points to an instance of the same subclass of Subject as its referrer (I'm going to abbreviate that as Foo), seeking to build a tree of Foos each of which is based on a different set of sources and which the researcher asserts each represent the same historical foo. It's Tom Wetmore's n-tier model, generalized from Person. "Negative evidence" is indeed evidence, but it's not an instance of Foo. The existence of a Person is "evidence" of a birth event, but it can't be connected via an EvidenceReference to an Event because a Person isn't an instance of Event.

Let's dissect a commonly-used example of "negative evidence": One finds in the records of some church entries two years apart recording the birth of a child to "John Smith". It seems likely that the two children are siblings, but this is long before fully enumerated censuses so there's no direct evidence. Fortunately the county maintained annual tax records and they survive, so the researcher searches them for John Smith for a several-year period surrounding the birth years and finds only one. How do we record this?

If our researcher has learned her skills from the BCG/APG crowd, she'll create a Person for John Smith and attach an analysis Document with SourceReferences to the birth and tax records and say in it that the common name, the common church, and the fact that there appears to have been only one John Smith in the county at the time supports the conclusion that the two children have the same father.

If our researcher prefers Tom Wetmore's approach, she'll create a John Smith Person for each birth and each tax record and start combining them. When it comes time to combine the two birth fathers, she can add an EvidenceReference pointing to the tax records John Smith on the combination with an attached analysis Document explaining that this is the only John Smith in the county at the time as the justification for combining the two birth-father John Smiths. It seems that this issue proposes to also create another EvidenceReference pointing at no Person but also to the analysis Document explaining the thorough yet unsuccessful search for another John Smith to represent the one who isn't there. That doesn't seem very interesting to me.

Now suppose the search of tax records did turn up another John Smith, but near another church, and one finds in that church's records that there was a John Smith who lived nearby and was a member of the second church, making it unlikely that either of the two children are his. It's pretty easy to cover all of that in a prose discussion residing in an analysis Document, but how are you going to link that "negative evidence" with an EvidenceReference? With an "empty" one whose analysis Document connects the second John Smith's Person via a SourceReference? That's a simple example and it's already ugly.

@stoicflame
Copy link
Member

Thank you for taking the time to elaborate. That was enlightening and you've made some great points, as always.

@thomast73
Copy link
Contributor Author

As stated (and as discussed in #144), its is possible to represent all of the sources, the information contained in those sources, how this information fits in answering our research question(s) as evidence, any analysis of these (the sources, information, or evidence), and any explanation of proof in a analysis Document. This would amount to a list of sources associated with a Document and a narrative in its text. We would then associate this document with our answer — our Person or Relationship or other conclusion. A downside of representing all of this data in this manner is that we cannot see (programmatically/semantically) the association of information with its source, how the information is contributing as evidence, and the separate bits of analysis — the analysis of the source, the information and its relevance as evidence, and the explanation of proof. Modeling in this manner only highlights the answer (our Subject and subordinate Conclusions), our sources, and a big analysis bucket for everything else. An upside of this representation is that it is relatively simple.

Using the EvidenceReference and the extracted conclusion concepts in the model allows us to separate these concepts more distinctly. Whether in a 0-tier or n-tier implementation, sources and our analysis of the sources, information and our analysis of the information, how we are using the information as evidence and our analysis of its fit and relevance to answering our question(s), and our answer and the explanation of how we arrived at that answer can all be modeled distinctly. As @jralls states, this mechanism results in data that is quite a bit more complex — a possible downside. But I think there are upsides if an implementor is willing to pay the complexity cost. @ttwetmore has listed some of these on other threads. I think there are others. In particular, it makes it possible to distinctly represent the various types of data called out in an explanation of the genealogical research process that typically accompanies explanations of the Genealogical Proof Standard — sources, information, evidence, answers and their associated analysis.

Which representation is better? It depends on your implementation goals. I expect some will value being able to model the concepts distinctly and some will wish for simplicity.

For those who prefer representing all of these concepts distinctly, they will need a "negative evidence" mechanism that is distinct. Therefore the need for this proposal (or a modification of it).

@jralls
Copy link
Contributor

jralls commented May 2, 2013

For those who prefer representing all of these concepts distinctly, they will need a "negative evidence" mechanism that is distinct. Therefore the need for this proposal (or a modification of it).

Is there anyone actually asking for that? Do they (or you) really believe that they need only negative evidence to model good genealogical analysis? Has anyone written code that uses EvidenceReferences this way to model a complex proof argument? For that matter, has anyone written code that can model a complex proof argument with any data model?

@mikkelee
Copy link

mikkelee commented May 2, 2013

+1 @jralls I think your argument and especially your example is very clear.

I have a harder time following the argument from @thomast73 - can you perhaps give a concrete example? I know it's a bit old school but I often find it easier to wrap my head around abstractions when given an example at the same time.

@thomast73
Copy link
Contributor Author

@jralls: Is there anyone actually asking for that?

A good question. The answer is, perhaps, both "Yes" and "No".

One of the (not well) stated goals of the project is to have a model that fits and supports the genealogical research process—the research process typically expounded by the Genealogical Proof Standard pundits. And there are many watching or contributing to the project who care deeply about this fit (present company included). Current software vendors are also showing signs of caring, though implementations vary as to how much the research process is manifested in their products, and to how much they understand the concepts being put forth by these pundits. I know of at least one vender who is selling a product that attempts to express the research process in a very strict sort of way (their product: Evidentia). So I think in this sense the answer is very much "Yes!"

@jralls: Do they (or you) really believe that they need only negative evidence to model...?

In #242, we introduced the EvidenceReference, in part, to give a clear mechanism for expressing the evidence concept in the model. This allows the modeling of both "direct" and "indirect" evidence, and gives a place to document any analysis about our use of the associated information as evidence. So we only are missing a defined mechanism for "negative" evidence. We are not proposing only a mechanism for negative evidence.

If we were to add a classifier for direct/indirect/negative evidence, the classifier would belong on EvidenceReference, and in the current model, we would expect the direct/indirect designations to be discussed in the analysis Document associated with the EvidenceReference. We would like the negative concept to fit there as well.

There are several venders who have tried to introduce mechanisms for users to designate something as direct/indirect/negative—evidence that vendors will want a home for this concept.

But did someone approach us and tell us this "negative" mechanism is missing and here is how you ought to add it to the model? Here is where the answer to @jralls first question is "No." This proposal is a result of our own analysis of the model and how its fit the genealogical research process and its likely needs for future development.

@stoicflame
Copy link
Member

I think @thomast73 might be saying the following:

Given a conclusion (e.g. Person), I can find the information I used as evidence by following the EvidenceReferences. I can get at the analysis for that information by using the analysis property of the EvidenceReference. We know that sometimes the lack of information can be used as evidence for a conclusion. How do I get at the analysis that went into finding that lack of information?

It sounds logical. The hang up for me is the practical application. @jralls's intelligent response above made me realize that I can't provide a concrete use case where I need to provide an analysis on an evidence reference that isn't referencing information. I'm hoping @thomast73 can provide it for us.

Referring to @jralls examples above, the case where two distinct "John Smith"s were found in the tax records would mean that there would be two separate conclusions about two distinct "John Smith"s, each one referencing a church record and a tax record with appropriate analysis on each reference. For the case that only one John Smith was found in the tax records indicates only one conclusion about "John Smith" would be made with three references (two to church records and one to the tax record), and the analysis would be on those references, perhaps each of them have their own reference or perhaps all of them reference the same analysis.

So, Thad, help us out. Can you provide a concrete practical example where I would need to use an instance of an evidence reference with only an analysis on it?

@jralls
Copy link
Contributor

jralls commented May 2, 2013

In #242, we introduced the EvidenceReference, in part, to give a clear mechanism for expressing the evidence concept in the model. This allows the modeling of both "direct" and "indirect" evidence, and gives a place to document any analysis about our use of the associated information as evidence.

That might have been your intent, but the specification doesn't get you there. The EvidenceReference would have to point to another Foo instance which would contain a list of SourceReferences and an analysis Document explaining how the evidence in those sources can be interpreted to support the conclusion(s) contained in the Foo. That's the mechanism of #144, somewhat encapsulated in an n-tier structure.

So we only are missing a defined mechanism for "negative" evidence. We are not proposing only a mechanism for negative evidence.

No, there's also contradictory evidence.

This goes back to a fundamental flaw in your original presentation in #242:

 Evidence — Information selected and used to answer a Question

The most important requirement of the GPS is that all of the evidence obtained from a reasonably exhaustive search (granted, "reasonably exhaustive" is subject to interpretation, with answers ranging from "economically feasible" to "taking over the basement of the courthouse for 6 months while you examine and index all of the loose papers that have been stored there for the last 150 years"). No selecting is permitted. Any contradictory evidence must be discussed and explained, which is why in my second example I brought in a second John Smith and explained why he probably wasn't the father of either of the children. One of the papers in this months NGSQ, The Parents of Thomas Burgan of Baltimore County, Maryland, discusses the opposite case, where Burgan's birth and those of his siblings were registered in a different church because Burgan's parents were Catholic and the law of the time required that births be registered at the CofE parish because the poor laws of the day operated through the established church.

@jralls
Copy link
Contributor

jralls commented May 2, 2013

So, Thad, help us out. Can you provide a concrete practical example where I would need to use an instance of an evidence reference with only an analysis on it?

I think that misses the point. The right question is "what's the difference between an EvidenceReference pointing only to an analysis Document and a SourceReference pointing to the same Document?

@stoicflame
Copy link
Member

The right question is "what's the difference between an EvidenceReference pointing only to an analysis Document and a SourceReference pointing to the same Document?

That's a good question. Sigh. What a mess.

So I thought that with the changes we applied at #244 to move the analysis property up to the Conclusion data type meant effectively that analysis documents wouldn't be referenced as "sources" anymore, they'd be referenced using that analysis property. I can see transcription/translation documents being referenced as sources, but we've got an explicit member now to attach the analysis that went into making a conclusion. I suppose I might be able to stretch to see how an analysis made by someone else may be used as a source, but for the common case where I'm providing the analysis that went into making a given conclusion, isn't that what the analysis property is for? Is that what you were asking about when you asked "when did that change"?

However, the question as it applies to this thread is still relevant, but I would re-word it: what's the difference between an EvidenceReference pointing only to an analysis Document and the analysis property pointing to the same Document?

@jralls
Copy link
Contributor

jralls commented May 2, 2013

One of the (not well) stated goals of the project is to have a model that fits and supports the genealogical research process—the research process typically expounded by the Genealogical Proof Standard pundits.

I think it's both well stated and accomplished via the mechanism of #144.

I know of at least one vender who is selling a product that attempts to express the research process in a very strict sort of way (their product: Evidentia). So I think in this sense the answer is very much "Yes!"

I had a look at it. It's much less than its web page claims, something like Clooz without the cataloging support. It gets as far as extracting evidence (which it rather strangely calls "claims") from sources and collecting your analysis, but has no provisions beyond that. Perhaps they intend to develop it further, but there's no indication of that on the website.

@jralls
Copy link
Contributor

jralls commented May 2, 2013

That's a good question. Sigh. What a mess.

Yeah. :-(

We're digging ourselves into a hole and in the process blowing the schedule you laid out at RootsTech. Perhaps we should table this for a while and figure out what we really need to get done to publish a 1.0 spec, and then focus on doing that.

@thomast73
Copy link
Contributor Author

This goes back to a fundamental flaw in your original presentation in #242:

 Evidence — Information selected and used to answer a Question

The most important requirement of the GPS is that all of the evidence obtained from a reasonably exhaustive search (granted, "reasonably exhaustive" is subject to interpretation, with answers ranging from "economically feasible" to "taking over the basement of the courthouse for 6 months while you examine and index all of the loose papers that have been stored there for the last 150 years"). No selecting is permitted. Any contradictory evidence must be discussed and explained....

@jralls, you are misconstruing my statement(s) and are raising is fuss where there is no fuss to be made.

When I consult a tax roll looking for John Smith, I have not requirement to include all of the other persons mentioned in the tax roll. I "select" only the information about John Smith (or name variants that might be applicable) from among all of the information presented. Of course, we must consider all potentially applicable information in the source(s) we are consulting—i.e., all of the information that might belong to our "John Smith". But we have no requirement to discuss "James Smith" or "John Mackey" that appear in the same source just because they are among the information in that source (unless we think they are our "John Smith"). We "select" only the "John Smith" information as our evidence and then consider its usefulness to the question at hand.

And of course, we must explain any conflicting evidence—any of the information we thought that might belong to "John Smith" that we have "selected" as potential evidence.

"Selecting" is about combing through all of the information a source has to offer and gathering from it the information potentially applicable to our question so that it can be considered and correlated it with the potentially applicable information we have gathered from other sources. It has nothing to do with being "selective", i.e., ignoring information we do not like.

... there's also contradictory evidence.

Yes, there might be a need for a forth category for evidence (direct, indirect, negative, and conflicting), but I know of no pundit that describes things in this manner; they name only the three categories (direct, indirect, and negative). I think this is because the discussion of conflicts is not handled in the analysis of the evidence items themselves, but rather in the analysis that deals with correlating multiple items of evidence—the analysis likely to be associated with our answer, i.e., our Subject.

Can you provide a concrete practical example where I would need to use an instance of an evidence reference with only an analysis on it?

In the paper @jralls points out in this month's NGSQ, The Parents of Thomas Burgan of Baltimore County, Maryland, look at the part of the narrative associated with footnotes 7 and 9, and the footnotes themselves. Here is the verbiage and footnote associated with 9 (because 7 is more complex and I don't want to transcribe it all):

Baltimore County rent rolls 1741-75 should mention Burgin's Folly, but don't.9
-=-=-=-
9. Maryland Land Office, Rent Rolls 1-2 and 18-21.

This is the situation we are trying to address with this proposal.

We looked through a set of records that ought to have contained information on our research subject, but found none. Because we are demonstrating the elements of the GPS, we need to demonstrate our "exhaustive search". Therefore, we need to document our unproductive search(s) as well as the searches that were productive.

Per this proposal, the text of our Document would contain an explanation of our search—in this case, it might contain something like: "looked for 'Burgin's Folly' on roles 1-2 and 18-21 on 29-Apr-2013 where we should have been able to find it ..., but did not find it." Our Document would refer to a SourceDescription instance(s) describing the source(s) we consulted—in this case, our SourceDescription instance(s) would describe the Baltimore County rent rolls that we searched.

We would then associate this analysis with our research Subject via an EvidenceReference. There is no Subject representing "extracted" information because we found none, so the resource is explicitly missing. The analysis would refer to the Document described in the previous paragraph.

@jralls
Copy link
Contributor

jralls commented May 3, 2013

@jralls, you are misconstruing my statement(s) and are raising is fuss where there is no fuss to be made.

When I consult a tax roll looking for John Smith, I have not requirement to include all of the other persons mentioned in the tax roll. I "select" only the information about John Smith (or name variants that might be applicable) from among all of the information presented. Of course, we must consider all potentially applicable information in the source(s) we are consulting—i.e., all of the information that might belong to our "John Smith". But we have no requirement to discuss "James Smith" or "John Mackey" that appear in the same source just because they are among the information in that source (unless we think they are our "John Smith"). We "select" only the "John Smith" information as our evidence and then consider its usefulness to the question at hand.

If you explain it that way, OK. But you didn't.

Y es, there might be a need for a fourth category for evidence (direct, indirect, negative, and conflicting), but I know of no pundit that describes things in this manner; they name only the three categories (direct, indirect, and negative).

I don't know what "pundits" you've been listening too, but at all of the lectures and workshops I've been to in the last 13 years, ESM, Dr. Jones, Barbara Little, and Christine Rose among many have all emphasized the importance of finding and dealing with conflicting evidence. A good half of the papers in any particular issue of NGSQ will have conflicting evidence to deal with. Perhaps we can connect at Las Vegas next week and ask a couple of them how much importance they attach to it.

@jralls
Copy link
Contributor

jralls commented May 3, 2013

In the paper @jralls points out in this month's NGSQ, The Parents of Thomas Burgan of Baltimore County, Maryland, look at the part of the narrative associated with footnotes 7 and 9, and the footnotes themselves. Here is the verbiage and footnote associated with 9 (because 7 is more complex and I don't want to transcribe it all):

Baltimore County rent rolls 1741-75 should mention Burgin's Folly, but don't.9
-=-=-=-
9. Maryland Land Office, Rent Rolls 1-2 and 18-21.

This is the situation we are trying to address with this proposal.

Roger.

We looked through a set of records that ought to have contained information on our research subject, but found none. Because we are demonstrating the elements of the GPS, we need to demonstrate our "exhaustive search". Therefore, we need to document our unproductive search(s) as well as the searches that were productive.

Yup.

Per this proposal, the text of our Document would contain an explanation of our search—in this case, it might contain something like: "looked for 'Burgin's Folly' on roles 1-2 and 18-21 on 29-Apr-2013 where we should have been able to find it ..., but did not find it." Our Document would refer to a SourceDescription instance(s) describing the source(s) we consulted—in this case, our SourceDescription instance(s) would describe the Baltimore County rent rolls that we searched.

There's where your case begins to fall apart. That's the process discussed in #144 and already implemented in the spec.

We would then associate this analysis with our research Subject via an EvidenceReference. There is no Subject representing "extracted" information because we found none, so the resource is explicitly missing. The analysis would refer to the Document described in the previous paragraph.

And that's the crux of the first question: What's the benefit of (mis)using an EvidenceReference with no evidence referenced over a SourceReference?

The question isn't "how do we model this". We already model it. The SourceReference chain of #144 already covers this as well as contradictory and indirect evidence. EvidenceReference as it currently exists in the spec supports only direct evidence and relies on SourceReference for everything else, and this proposal doesn't change that, it just proposes stuffing a no-op EvidenceReference into the middle.

@jralls
Copy link
Contributor

jralls commented May 3, 2013

A broader comment: There is no extant code which implements this. I'm not talking about the trivial serialization/deserialization code in the XML and JSON implementations. I mean an actual program that genealogists use to help them collect, extract, and analyze evidence, generate conclusions, and lineage-link the conclusions into a family history.

Developing a "scratch" data model is the first step to writing such a program, but only a noob would proceed to write the program without reconsidering the data model frequently during implementation and early testing. @ttwetmore at least has some experience with the n-tier approach and could steer that part of the model -- though #242 generalized the concept beyond his experience -- and apparently his interest, since he's dropped out of the discussion. Nobody here (or anywhere else AFAICT) has implementation experience with this proposal, so in my view its premature to include it in a spec. I'd really like to see some code implement this model and gain some market share, even if it's only a few thousand users, before putting it in the spec.

I'd actually like to see the same for n-tier, though it might be too late. The only known implementation of that is DeadEnds which seems unlikely at this point to see the light of day.

It's absolutely true that the same argument can be made against the SourceReference model, but I can at least flesh out a way to get there with Gramps. Having time to write it is another matter.... Much was made at RootsTech about how FS FamilyTree is based on GedcomX and how Roger Buzbee's RootsMagic has implemented the FSFT API and is therefore implementing GedcomX. I didn't get a chance to cross-check that with him at RootsTech, but I'll try next week at NGS.

Where I'm going with all of this is that I think we're trying to stuff too much into GedcomX 1.0 and therefore setting ourselves up to fail. We're certainly setting ourselves up to not have a usable 1.0 release this year, which a lot of folks are going to interpret as failure even if we don't.

@thomast73
Copy link
Contributor Author

I don't know what "pundits" you've been listening too, but at all of the lectures and workshops I've been to in the last 13 years, ESM, Dr. Jones, Barbara Little, and Christine Rose among many have all emphasized the importance of finding and dealing with conflicting evidence. A good half of the papers in any particular issue of NGSQ will have conflicting evidence to deal with. Perhaps we can connect at Las Vegas next week and ask a couple of them how much importance they attach to it.

Again, you are missing the point.

I am not saying that conflicting evidence is not important. And yes, the pundits all discuss it.

But when analyzing a single item of evidence, it is not possible to say whether it is "conflicting". To identify conflict, we have to have something to compare with—at least two items of evidence that speak to the question at hand.

It is possible to categorize a single item of evidence as direct, indirect, or negative.

Therefore, when considering a item of evidence in isolation, the pundits talk about categorizing it as direct, indirect, or negative. When they discuss correlating several evidence items, they discuss the need to resolved conflicting evidence (if it exists).

@thomast73
Copy link
Contributor Author

Per this proposal, the text of our Document would contain an explanation of our search—in this case, it might contain something like: "looked for 'Burgin's Folly' on roles 1-2 and 18-21 on 29-Apr-2013 where we should have been able to find it ..., but did not find it." Our Document would refer to a SourceDescription instance(s) describing the source(s) we consulted—in this case, our SourceDescription instance(s) would describe the Baltimore County rent rolls that we searched.

There's where your case begins to fall apart. That's the process discussed in #144 and already implemented in the spec.

We would then associate this analysis with our research Subject via an EvidenceReference. There is no Subject representing "extracted" information because we found none, so the resource is explicitly missing. The analysis would refer to the Document described in the previous paragraph.

And that's the crux of the first question: What's the benefit of (mis)using an EvidenceReference with no evidence referenced over a SourceReference?

The benefit is that we have distinct ways to model sources, information, evidence and answers to research questions, and that we can associate the types of analysis that are distinct to each concept with the entities in the model that represent those concepts.

The question isn't "how do we model this". We already model it. The SourceReference chain of #144 already covers this ....

The disadvantage of using the mechanism defined in #144 is that many of the above concepts get lumped together into a single bucket and there are no semantics left to sort it out again. While it is possible to get all of the elements into the bucket, its usefulness has probably been diminished.

EvidenceReference as it currently exists in the spec supports only direct evidence....

The current proposal allows "extracted" information to be associated with an answer as evidence. The information could contribute to the answer either directly or indirectly. The model (as presently constituted) does not disallow an association of "extracted" information that indirectly answers our question as evidence. If you believe it does, we have yet another misunderstanding somewhere.

A broader comment: There is no extant code which implements this....

You argue that there is not extant code that addresses (very well) the research process the project has set out to support. I do not disagree.

You argue that because no extant code exists, the topic is not relevant, and therefore should be dropped. The implication of such an argument is that one of the project's foundational goals ought to be thrown out—that the lack of extant implementations of the research process makes the research process we wish to emulate irrelevant. Do you really believe that? I do not!!! Nor do I believe that you do.

If we believe in the research process (which I do), then we ought to try and express it in our model. If there concepts that are important in the process, then we would like to have a home for them in the model. When we release the model, we would like to be able to explain how the important concepts in the research process are expressed in the model.

@jralls
Copy link
Contributor

jralls commented May 3, 2013

You argue that because no extant code exists, the topic is not relevant, and therefore should be dropped. The implication of such an argument is that one of the project's foundational goals ought to be thrown out—that the lack of extant implementations of the research process makes the research process we wish to emulate irrelevant. Do you really believe that? I do not!!! Nor do I believe that you do.

No, I argue that because there's no implementation, the data model is probably wrong, and turning it into a specification is premature. A variant on the Army adage "No battle plan survives contact with the enemy". Once the model has been implemented, tested, and has gained some traction among users it might be appropriate to include in a interchange specification.

If we believe in the research process (which I do), then we ought to try and express it in our model. If there concepts that are important in the process, then we would like to have a home for them in the model. When we release the model, we would like to be able to explain how the important concepts in the research process are expressed in the model.

That's exactly the reasoning behind the Gentech Data Model. No one adopted it. Is that what you want for GedcomX?

@jralls
Copy link
Contributor

jralls commented May 3, 2013

EvidenceReference as it currently exists in the spec supports only direct evidence....

The current proposal allows "extracted" information to be associated with an answer as evidence. The information could contribute to the answer either directly or indirectly. The model (as presently constituted) does not disallow an association of "extracted" information that indirectly answers our question as evidence. If you believe it does, we have yet another misunderstanding somewhere.

The current model (not this proposal) allows connecting a Foo to another Foo by an EvidenceReference. I'll call the pointed-to Foo the sourceFoo and the one doing the pointing the derivedFoo. The sourceFoo may have its extracted flag set, meaning that all of its SourceReferences point to a single SourceDescription, and is therefore direct evidence.

In order to support indirect evidence or to address a conflict of evidence, a sourceFoo must have SourceReferences pointing to more than one SourceDescription and should have an analysis Document explaining the conclusion and the reasoning that got the researcher there. Thus, indirect evidence is not directly supported by the EvidenceReference model: It merely imports the capability provided by SourceReference.

This proposal just says that instead of calling a pointer to an analysis Document a SourceReference you can use an empty EvidenceReference. That doesn't add anything except confusion.

@jralls
Copy link
Contributor

jralls commented May 3, 2013

And that's the crux of the first question: What's the benefit of (mis)using an EvidenceReference with no evidence referenced over a SourceReference?

The benefit is that we have distinct ways to model sources, information, evidence and answers to research questions, and that we can associate the types of analysis that are distinct to each concept with the entities in the model that represent those concepts.

Rubbish. There are no Information or Evidence classes. I'll be generous and say that Conclusion is the "answer" class.

Despite all of your arm waving and misdirection, EvidenceReference links conclusions, not evidence. There is no Evidence for them to link.

The question isn't "how do we model this". We already model it. The SourceReference chain of #144 already covers this ....

The disadvantage of using the mechanism defined in #144 is that many of the above concepts get lumped together into a single bucket and there are no semantics left to sort it out again. While it is possible to get all of the elements into the bucket, its usefulness has probably been diminished.

How does "list of http://gedcom.org/SourceReference. Order is preserved" translate into "single bucket"?
What semantics are added by linking an analysis Document by the evidence field instead of the sources field? Particularly when the former is otherwise specified to link Foos?

@thomast73
Copy link
Contributor Author

It seems to me that not all that is needed is in place for us to align on these issues. We will set this aside for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants