-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
representing "negative evidence" #250
Conversation
Hmm. This rather distorts That's OK, though we might want a different name. But consider that in that model (which is the one taught by the BCG lecturers), the |
I think you may be overanalyzing it. We have a need to support negative evidence. We're thinking that we have the data structures in place to do so; we just need to be more explicit about how to use them for this purpose. We believe that using Another option may be to cite the "failure to find information" as a source by describing that search using Personally, I prefer using |
Probably impossible. The intention is a spec that can be realized in code, and that realization requires lots of detailed analysis.
As so often of late, this is territory we've been over before, in #144 among others. The
Again, how does the existing solution not meet this need? What about conflicting evidence and indirect evidence?
Conceptually, perhaps. As a data model, not so much. Before this change
You haven't yet articulated how it's a fit at all, never mind a better one. |
I disagree that it's territory we've been over before. Where in that thread did we discuss how to use that model to represent negative evidence? If we did, then that's great. Let's just make it more explicit and be done with it. Help us do that. How do I use the
I disagree; I believe I did articulate it. I believe it's a git because "negative evidence" is evidence, not a container of information that may be used as evidence (which is what a "source" is). |
Sarah mentioned it specifically as a required element of a proof argument, which Thad later cited in his formulation of
That's hand-waving, based solely on the class name. Let's dissect a commonly-used example of "negative evidence": One finds in the records of some church entries two years apart recording the birth of a child to "John Smith". It seems likely that the two children are siblings, but this is long before fully enumerated censuses so there's no direct evidence. Fortunately the county maintained annual tax records and they survive, so the researcher searches them for John Smith for a several-year period surrounding the birth years and finds only one. How do we record this? If our researcher has learned her skills from the BCG/APG crowd, she'll create a If our researcher prefers Tom Wetmore's approach, she'll create a John Smith Now suppose the search of tax records did turn up another John Smith, but near another church, and one finds in that church's records that there was a John Smith who lived nearby and was a member of the second church, making it unlikely that either of the two children are his. It's pretty easy to cover all of that in a prose discussion residing in an |
Thank you for taking the time to elaborate. That was enlightening and you've made some great points, as always. |
As stated (and as discussed in #144), its is possible to represent all of the sources, the information contained in those sources, how this information fits in answering our research question(s) as evidence, any analysis of these (the sources, information, or evidence), and any explanation of proof in a Using the Which representation is better? It depends on your implementation goals. I expect some will value being able to model the concepts distinctly and some will wish for simplicity. For those who prefer representing all of these concepts distinctly, they will need a "negative evidence" mechanism that is distinct. Therefore the need for this proposal (or a modification of it). |
Is there anyone actually asking for that? Do they (or you) really believe that they need only negative evidence to model good genealogical analysis? Has anyone written code that uses |
+1 @jralls I think your argument and especially your example is very clear. I have a harder time following the argument from @thomast73 - can you perhaps give a concrete example? I know it's a bit old school but I often find it easier to wrap my head around abstractions when given an example at the same time. |
A good question. The answer is, perhaps, both "Yes" and "No". One of the (not well) stated goals of the project is to have a model that fits and supports the genealogical research process—the research process typically expounded by the Genealogical Proof Standard pundits. And there are many watching or contributing to the project who care deeply about this fit (present company included). Current software vendors are also showing signs of caring, though implementations vary as to how much the research process is manifested in their products, and to how much they understand the concepts being put forth by these pundits. I know of at least one vender who is selling a product that attempts to express the research process in a very strict sort of way (their product: Evidentia). So I think in this sense the answer is very much "Yes!"
In #242, we introduced the If we were to add a classifier for direct/indirect/negative evidence, the classifier would belong on There are several venders who have tried to introduce mechanisms for users to designate something as direct/indirect/negative—evidence that vendors will want a home for this concept. But did someone approach us and tell us this "negative" mechanism is missing and here is how you ought to add it to the model? Here is where the answer to @jralls first question is "No." This proposal is a result of our own analysis of the model and how its fit the genealogical research process and its likely needs for future development. |
I think @thomast73 might be saying the following: Given a conclusion (e.g. It sounds logical. The hang up for me is the practical application. @jralls's intelligent response above made me realize that I can't provide a concrete use case where I need to provide an analysis on an evidence reference that isn't referencing information. I'm hoping @thomast73 can provide it for us. Referring to @jralls examples above, the case where two distinct "John Smith"s were found in the tax records would mean that there would be two separate conclusions about two distinct "John Smith"s, each one referencing a church record and a tax record with appropriate analysis on each reference. For the case that only one John Smith was found in the tax records indicates only one conclusion about "John Smith" would be made with three references (two to church records and one to the tax record), and the analysis would be on those references, perhaps each of them have their own reference or perhaps all of them reference the same analysis. So, Thad, help us out. Can you provide a concrete practical example where I would need to use an instance of an evidence reference with only an analysis on it? |
That might have been your intent, but the specification doesn't get you there. The
No, there's also contradictory evidence. This goes back to a fundamental flaw in your original presentation in #242:
The most important requirement of the GPS is that all of the evidence obtained from a reasonably exhaustive search (granted, "reasonably exhaustive" is subject to interpretation, with answers ranging from "economically feasible" to "taking over the basement of the courthouse for 6 months while you examine and index all of the loose papers that have been stored there for the last 150 years"). No selecting is permitted. Any contradictory evidence must be discussed and explained, which is why in my second example I brought in a second John Smith and explained why he probably wasn't the father of either of the children. One of the papers in this months NGSQ, The Parents of Thomas Burgan of Baltimore County, Maryland, discusses the opposite case, where Burgan's birth and those of his siblings were registered in a different church because Burgan's parents were Catholic and the law of the time required that births be registered at the CofE parish because the poor laws of the day operated through the established church. |
I think that misses the point. The right question is "what's the difference between an |
That's a good question. Sigh. What a mess. So I thought that with the changes we applied at #244 to move the However, the question as it applies to this thread is still relevant, but I would re-word it: what's the difference between an |
I think it's both well stated and accomplished via the mechanism of #144.
I had a look at it. It's much less than its web page claims, something like Clooz without the cataloging support. It gets as far as extracting evidence (which it rather strangely calls "claims") from sources and collecting your analysis, but has no provisions beyond that. Perhaps they intend to develop it further, but there's no indication of that on the website. |
Yeah. :-( We're digging ourselves into a hole and in the process blowing the schedule you laid out at RootsTech. Perhaps we should table this for a while and figure out what we really need to get done to publish a 1.0 spec, and then focus on doing that. |
@jralls, you are misconstruing my statement(s) and are raising is fuss where there is no fuss to be made. When I consult a tax roll looking for John Smith, I have not requirement to include all of the other persons mentioned in the tax roll. I "select" only the information about John Smith (or name variants that might be applicable) from among all of the information presented. Of course, we must consider all potentially applicable information in the source(s) we are consulting—i.e., all of the information that might belong to our "John Smith". But we have no requirement to discuss "James Smith" or "John Mackey" that appear in the same source just because they are among the information in that source (unless we think they are our "John Smith"). We "select" only the "John Smith" information as our evidence and then consider its usefulness to the question at hand. And of course, we must explain any conflicting evidence—any of the information we thought that might belong to "John Smith" that we have "selected" as potential evidence. "Selecting" is about combing through all of the information a source has to offer and gathering from it the information potentially applicable to our question so that it can be considered and correlated it with the potentially applicable information we have gathered from other sources. It has nothing to do with being "selective", i.e., ignoring information we do not like.
Yes, there might be a need for a forth category for evidence (direct, indirect, negative, and conflicting), but I know of no pundit that describes things in this manner; they name only the three categories (direct, indirect, and negative). I think this is because the discussion of conflicts is not handled in the analysis of the evidence items themselves, but rather in the analysis that deals with correlating multiple items of evidence—the
In the paper @jralls points out in this month's NGSQ, The Parents of Thomas Burgan of Baltimore County, Maryland, look at the part of the narrative associated with footnotes 7 and 9, and the footnotes themselves. Here is the verbiage and footnote associated with 9 (because 7 is more complex and I don't want to transcribe it all):
This is the situation we are trying to address with this proposal. We looked through a set of records that ought to have contained information on our research subject, but found none. Because we are demonstrating the elements of the GPS, we need to demonstrate our "exhaustive search". Therefore, we need to document our unproductive search(s) as well as the searches that were productive. Per this proposal, the We would then associate this analysis with our research |
If you explain it that way, OK. But you didn't.
I don't know what "pundits" you've been listening too, but at all of the lectures and workshops I've been to in the last 13 years, ESM, Dr. Jones, Barbara Little, and Christine Rose among many have all emphasized the importance of finding and dealing with conflicting evidence. A good half of the papers in any particular issue of NGSQ will have conflicting evidence to deal with. Perhaps we can connect at Las Vegas next week and ask a couple of them how much importance they attach to it. |
Roger.
Yup.
There's where your case begins to fall apart. That's the process discussed in #144 and already implemented in the spec.
And that's the crux of the first question: What's the benefit of (mis)using an The question isn't "how do we model this". We already model it. The |
A broader comment: There is no extant code which implements this. I'm not talking about the trivial serialization/deserialization code in the XML and JSON implementations. I mean an actual program that genealogists use to help them collect, extract, and analyze evidence, generate conclusions, and lineage-link the conclusions into a family history. Developing a "scratch" data model is the first step to writing such a program, but only a noob would proceed to write the program without reconsidering the data model frequently during implementation and early testing. @ttwetmore at least has some experience with the n-tier approach and could steer that part of the model -- though #242 generalized the concept beyond his experience -- and apparently his interest, since he's dropped out of the discussion. Nobody here (or anywhere else AFAICT) has implementation experience with this proposal, so in my view its premature to include it in a spec. I'd really like to see some code implement this model and gain some market share, even if it's only a few thousand users, before putting it in the spec. I'd actually like to see the same for n-tier, though it might be too late. The only known implementation of that is DeadEnds which seems unlikely at this point to see the light of day. It's absolutely true that the same argument can be made against the Where I'm going with all of this is that I think we're trying to stuff too much into GedcomX 1.0 and therefore setting ourselves up to fail. We're certainly setting ourselves up to not have a usable 1.0 release this year, which a lot of folks are going to interpret as failure even if we don't. |
Again, you are missing the point. I am not saying that conflicting evidence is not important. And yes, the pundits all discuss it. But when analyzing a single item of evidence, it is not possible to say whether it is "conflicting". To identify conflict, we have to have something to compare with—at least two items of evidence that speak to the question at hand. It is possible to categorize a single item of evidence as direct, indirect, or negative. Therefore, when considering a item of evidence in isolation, the pundits talk about categorizing it as direct, indirect, or negative. When they discuss correlating several evidence items, they discuss the need to resolved conflicting evidence (if it exists). |
The benefit is that we have distinct ways to model sources, information, evidence and answers to research questions, and that we can associate the types of
The disadvantage of using the mechanism defined in #144 is that many of the above concepts get lumped together into a single bucket and there are no semantics left to sort it out again. While it is possible to get all of the elements into the bucket, its usefulness has probably been diminished.
The current proposal allows "extracted" information to be associated with an answer as evidence. The information could contribute to the answer either directly or indirectly. The model (as presently constituted) does not disallow an association of "extracted" information that indirectly answers our question as evidence. If you believe it does, we have yet another misunderstanding somewhere.
You argue that there is not extant code that addresses (very well) the research process the project has set out to support. I do not disagree. You argue that because no extant code exists, the topic is not relevant, and therefore should be dropped. The implication of such an argument is that one of the project's foundational goals ought to be thrown out—that the lack of extant implementations of the research process makes the research process we wish to emulate irrelevant. Do you really believe that? I do not!!! Nor do I believe that you do. If we believe in the research process (which I do), then we ought to try and express it in our model. If there concepts that are important in the process, then we would like to have a home for them in the model. When we release the model, we would like to be able to explain how the important concepts in the research process are expressed in the model. |
No, I argue that because there's no implementation, the data model is probably wrong, and turning it into a specification is premature. A variant on the Army adage "No battle plan survives contact with the enemy". Once the model has been implemented, tested, and has gained some traction among users it might be appropriate to include in a interchange specification.
That's exactly the reasoning behind the Gentech Data Model. No one adopted it. Is that what you want for GedcomX? |
The current model (not this proposal) allows connecting a In order to support indirect evidence or to address a conflict of evidence, a This proposal just says that instead of calling a pointer to an |
Rubbish. There are no Despite all of your arm waving and misdirection,
How does "list of |
It seems to me that not all that is needed is in place for us to align on these issues. We will set this aside for now. |
Representing "negative evidence" or "the absence of evidence one would expect to find" is often important in forming a proof argument. We want the GEDCOM X model to support this concept.
Requirements
Documenting negative evidence typically requires the following elements:
From a modeling perspective, we would like "negative evidence" to look like and use the same mechanisms used to represent "evidence".
Proposal
We think that the documentation requirements can be met with a
Document
object. Thetext
of the document could be used to described the search—the date of the search, who conducted the search, how the search was carried out, search terms, etc. Thesources
list could contain the source description(s) for the body of records that was searched.Assuming
Document
as the documentation mechanism for a negative search, how should we associate thisDocument
with our hypothesis as "negative evidence" supporting our hypothesis?In the GEDCOM X model, we use
EvidenceReference
to associate data with a hypothesis as evidence. Theresource
member points to the data being used as evidence and is a required field. But in the case of "negative evidence", there is no data to point to—by definition, we have an "absence of data". It is by analysis that we have determined that there was no data where there ought to have been some.We would like to propose that we can represent "negative evidence" with an
EvidenceReference
instance with the document describing the negative search as theanalysis
document, and an explicitly missingresource
reference.However, as currently defined, an
EvidenceReference
REQUIRES aresource
reference (explicitly present) and that reference MUST resolve to an object that matches the type of its container (a specialization ofSubject
). This means that the current constraint would not allowresource
to be explicitly missing. Therefore, we would like to modify the constraint to something like the following.For
resource
:REQUIRED, except when modeling "negative evidence", in which case it MUST NOT be present. If provided, MUST resolve to an instance of [
http://gedcomx.org/v1/Subject
].For
analysis
:OPTIONAL, except when modeling "negative evidence", in which case it MUST be present. If provided, MUST resolve to an instance of [
http://gedcomx.org/v1/Document
] of typehttp://gedcomx.org/Analysis
The above changes are include in this pull request.