-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need mechanism to distinguish "extracted conclusions" from "working conclusions" #202
Conversation
Ryan, would you elaborate on what you mean by metadata? ISTM all of the Conclusion class's parameters are metadata. |
I don't believe there is a need for this ... if a Conclusion references a single Source then surely it can only have been "extracted" from that source in some way? If it references multiple sources then it must be some form of "worked" conclusion. If it references none, then it is simply the imagination/knowledge of the researcher. Why do we need to make the method by which the conclusion was created explicit? And if we do need to then are "extracted" and "working" the right/only terms? |
No, even a single source will yield inferred evidence. An 1841 census says that Thomas Hartley is 25, so an age "Fact" is extracted evidence. One can infer that he was therefore born between 7 June 1815 and 6 June 1816, but that would not be extracted. The reason behind making the distinction goes back to the Gentech GDM and arrives here via Tom Wetmore's N-Tier model, where conclusions are built up by aggregating "lower-level" conclusions. I don't think many genealogists actually work that way -- they seem to prefer to take a more single-level approach, collecting all of the evidence they can and then writing a single essay (proof argument) -- but database architects have trouble modeling that, so they resort to this layered approach. |
Ah I see - you mean you want to differentiate between explicit and implicit/inferred information? That's different in my mind to differentiating between "extracted" and "working" conclusions. .... OK ... I'd go along with some sort of "Method" field. So the valid values would be something like, maybe: Transcribed, Translated, Implied, Inferred? |
Not me. I think atomizing sources into conclusion snippets is a waste of time. I want to link a bunch of sources to an AnalysisDocument which lays out the case for a group of conclusions (perhaps the place, date, facts, and participants of an Event) and then link that as a source to the (minimally) atomized conclusions. "Laying out the case" includes explaining ones inferences. The N-tier model is what demands a distinction between the types of evidence. |
Why? How-so? |
As I understand Tom's model, the idea is that you start off with a Source, extract the explicitevidence into what Thad has designated as ExtractedConclusion attached to a Person, then aggregate those Persons into composites as you demonstrate that the component Persons are representations of the same historical person -- the latter demonstrations being designated WorkingConclusions. (The "working" part comes from your assertion that conclusions aren't conclusive.) |
My understanding of the N-tier approach is simply that conclusions can include and reference other conclusions ad infinitum. I don't see why that necessitates differentiating conclusions by the method used to create them - it seems to me to make it harder to implement N-tier since you then have to worry about the validity of ExtractedConclusions interlinking/interacting with WorkingConclusions.
I said: "I strongly believe that there is no "Conclusion" in genealogy". I was arguing for the allowance of multiple and contradicting hypotheses in the research process not for making some "conclusions" conclusive and others not! @thomast73 - since you created the post can you clarify what the issue is here? |
if extracted conclusions are extractions of evidence found in a single source then perhaps the extracted conclusion should have optional fields related to evidence analysis.
Where a working conclusion would not have such fields as is it based on looking at all available evidence together. It just need to be linked to the Analysis Document or the extracted evidence or both..
I think there could be some useful tools created to help in evidence analysis with atomizing sources. I would like software that helps the research and analysis process, not just documents the results, personally. If it exists please let me know. Right now I use FTM2012 but have been look for something better.. Ended up here hoping someone will make it someday... |
I would have said this thus: ...you start off with a Source, extract the explicit evidence into a Person -- designated an "extracted conclusion", then aggregate those Persons into composites as you demonstrate that the component ("extracted") Persons are representations of the same historical person -- the aggregate Persons being designated "working conclusions".
The issue is that the same objects are being used to model "extracted" and "working" conclusions. In the above example, a |
+1 In particular I think comparison of the conclusions of a source with other conclusions is critical and this is easier if they are the same type of object (regardless of whether or not they are "extracted" or "working" or whatever).
I totally agree (and am in exactly the same situation albeit using FH and FTM and dipping into a few others along the way) but I strongly believe that any meaningful analysis is dependent upon having cohesive data .. this is why I do not agree with the chuck-all-the-conflicts-in-one-person-and-sort-it-out-later approach and why I would like to see the ability to have Hypotheses (cohesive sets of conclusions). That way it gives the researcher the freedom to follow multiple trails and their respective probabilities until/if a "conclusion" (in the general sense of the word) is reached.
I would have said thus: ... you start off with a Source and interpret it into a number of Hypotheses (frequently only one but sometimes sources are vague and allow for multiple interpretations). Each Hypothesis has a number of "extracted" Conclusions. You then go back to your original Hypotheses (from previous research) and for each you compare the relevant "working" Conclusions with each of the "extracted" Conclusions (in each of the Hypotheses) to see how well they fit. You link them together with +ve/-ve evidence. You then re-evaluate your original hypotheses, set some new goals and go off in search of further evidence to prove/disprove them.
An "extracted" Person will be contained within a single source (that which it was extracted from). The parent source of a "working" Person is the GEDCOM-X file itself (or in my case the Hypothesis). |
OK. But is that justification for putting it in a results-oriented interchange format like GedcomX?
So would most of us. It's not what "here" is about, though. |
And how do you structure an "Hypothesis" object? My preference is for hypotheses to be documented in a textual argument, but it appears that you have something else in mind. +ve/-ve? |
To me it's just a source object with "working" "conclusions" in it (=Persons)
Positive or negative evidence |
Is GedcomX results-oriented? By results-oriented do you mean used to store completed research, and not necessarily in-progress research? The use case of allowing migration from one application to another with out data loss would not be supported then?
The logical model of gedcomX has the potential to affect a lot of what will become available as software features. I agree it is not what "here" is about, but needs some consideration when doing what here is about. Look at how many apps have directly used gedcom's logical model as their own.. |
I agree with @nilsbrummond :) however ...
Completely loss-less migration is unlikely since every app will necessarily have its own features/data in order to compete in the market. But what GEDCOM-X can/needs to do is extend the "core" that generally is/will be supported. |
No. By results-oriented I mean that GedcomX reflects the search-source-argument-conclusion model of the Genealogical Proof Standard. It is not AFAICT a use-case to capture everything that any future genealogical program might want to store.
Roger, and that's why Sarah, Tom, and I have been working so hard to move the GedcomX model towards supporting the full range of the GPS's proof requirements. There is no extant software that does, but we hope that by having a data model with LDS backing (they are, after all, the largest cohesive genealogy market) that does, the software will follow. I also recognize that most extant genealogy software does atomize evidence, and that GedcomX needs to support that atomization in a way that permits that extant software to use it or GedcomX will be stillborn. You've taken my earlier statement about my dislike of atomizing evidence, which was in the context of explaining why this issue is here, and turned it into a declaration that GedcomX shouldn't support it. The resident expert on analysis-support software is Tom Wetmore. Go read through #134 for some really interesting explanations of what he wants to do. See if you think his vision lines up well with the GPS, if its "in-scope" for GedcomX, and if anyone else is likely to write similar programs for DeadEnds to exchange data with. |
Are we meaning different things here ... I don't know any software which does this ... there is no equivalent in GEDCOM 5 or any software I have seen for dissecting a Source into Persons, Events etc whilst maintaining the context of the original source. |
GEDCOM-X is still struggling to support the GPS. It hardly "reflects" it - see #191 |
Okay, folks, sorry for the response delay. Been busy. The flag on the conclusion makes me nervous because it mixes the concept of how the data is to be used with the data itself. It smells to me. I guess I'd like something that conceptually contains the extracted conclusion(s) so that the context can be applied at the level of the container. I don't know, something like a new object, maybe And don't bother bringing up the record model. I get it. You told me so. :-) The difference is that my suggestion here is a reference model and not an encapsulation model. |
Surely this is the GEDCOM-X file itself (which is itself a source)? (Assuming we agree that the GEDCOM-X file may be used as a source and therefore should be defined as one) |
The file will contain both "extracted" conclusions and "working" conclusions (as we've been using the terms). I'm talking about something more fine-grained than the file. A container for all conclusions (persons, relationships, events) that are extracted from a single source. |
The way I see it a set of conclusions always resides within a source ... this source might be a simple document; or a book; or a project; or a tree; or a branch; or a set of conclusions about people with the same surname (1-name research); they might have several projects/trees/whatever within one GEDCOM-X file or they might just lump them altogether in one GEDCOM-X file with no structure at all. It doesn't really matter since the GEDCOM-X file is itself a source so forms the outermost source. Some of these "sets of conclusions" may contradict one another (e.g. I would create different sources to cater for different hypotheses ie to follow parallel trails to resolve a scenario with multiple possibilities) and some might compliment each other. The way the sources are used by the researcher determines this ... e.g. a hypothesis that resulted in an impossibility might be used as negative evidence; whilst another might be used a proof. This way we have a very flexible way of building and linking together sources and conclusions. If we try to provide a different type of object then we lose that flexibility and create instead a rigid hierarchy. |
Maybe. But "dissecting a Source into Persons" isn't what I was trying to describe. Most genealogical software and Gedcom5 have little atoms of conclusion: a "fact" or an "event" which have attributes of date, place, one or more linked persons ("individuals"), and a list of "sources" which are really citations. It's a rare source that contains only one "fact", so one creates a bunch of "fact" records, each of which has a pointer to the same "source" record. If you find sources which disagree about something, you have to create multiple "facts" with different values and mark one of them "preferred" or something. That's what I mean by "atomizing evidence". |
I agree that a GedcomX file may be used as a source and cited appropriately. In that case it's no different from a compiled genealogy in print. I don't think that that has anything to do with how the file is defined, nor has any bearing on the question at hand.
I'd say "depends upon one or more sources". A conclusion based on only one source is weak. "Resides within" implies that the conclusion is contained in the source, which might be true (if the source is, say, a compiled genealogy) but then the conclusion is extracted. I didn't think it up myself, I just copied it from the source. The interesting sort of conclusion, one that I think up after thorough research and analysis, be "reside within" any one of the sources? |
You misunderstand me ... by "resides in" I mean the containing source defines the author(s)/editor(s)/transcriber(s)/interpreter(s) etc of the conclusions... I totally agree that any decent conclusion will also reference many sources in order to provide proof/evidence. |
Here's the sort of thing I mean in pseudo data: S1 My Family Tree by Sarah Green, last edited 1st August 2012
S2 Birth Certificate for J Bloggs, GRO ref 12345, copy dated 1st Jan 2011 held/interpreted by Sarah Green
S3 Bloggs Family Tree, created by F Jones, last edited 15th August 2010
I've deliberately shown S3 as being out of date here ... just as it could have been if they had been referencing say an external web site. Maybe that was a complication too far and will prolly just highlight other problems but hey ho. |
OK. That reinforces my point that using "My Family Tree" as a source has as much to do with its structure as it does with how the GRO formats their birth certificates: None at all. It's utterly irrelevant to both this discussion and #192. |
So if I published "My Family Tree" you would refuse to recognise it as a source would you? And what's your beef with the GRO?? Their format is pretty sensible when you look at the process involved. How would you go about organising millions of BMD certificates then? Can you at least express why you think my points are irrelevant? |
The problem with this is that it implies that a Conclusion can be extracted from multiple sources. This muddies what we mean by "extracted from" ... If Source S1 and Source S2 have Person P1 in their list then P1 is not an extraction but a compound amalgamation (more like a "working" conclusion) Could we not instead have a Conclusion attribute which is a pointer to the source it was "extracted from"? |
Here is what I am thinking: <gedcomx-header>
<sourceDescription id="S0" title="My Family Tree">
<!-- source definition for this gedcom-x itself for external referrers to use. -->
</sourceDescription>
</gedcomx-header>
<sourceDescription id="S1" title="1900 United States Federal Census Record">
<!-- description of source 1 as defined by work done for #144 -->
<sourceDescription id="S2" title="Record for Joshua Amis">
<!-- description of source 2 as defined by work done for #144 -->
<sourceDescription id="S3" title="Interpretation by Sarah Green">
<!-- description of source 3 as defined by work done for #144 -->
<person id="P2" title="Joshua Amis">
<!-- data for person interpretation of person 2 contained in image of source 2 -->
<fact id="F1" type=".../Birth" Value="1888" />
</person>
</sourceDescription>
</sourceDescription>
</sourceDescription>
<sourceDescription id="S4" title="Some other source">
<!-- description of source 4 as defined by work done for #144 -->
...
</sourceDescription>
<!-- Working Conclusions follow as top level objects -->
<person id="P3" title="Josh Amis">
<!-- data for working person -->
<source resource="P2" />
<fact id="F2" type=".../Birth" Value="1888">
<!-- Reference to atomized extracted conclusion -->
<source resource="F1" />
<!-- Reference to non-atomized conclustions: ref the source or the analysis -->
<source resource="S4" />
</fact>
</person>
|
3 back-ticks, language name (XML in this case) to open 3 back-ticks to close |
Many thanks @nilsbrummond :) I would prefer all Persons to be in a sourceDescription because this allows the researcher to then refer to them in the same way as other Persons .. For example, say I'm researching John Milsom (WP1) in one family tree (FT1) and come across two possible Census entries for him. Taking aside the interpretation of the actual Censuses for a moment, I want to be able to create 2 new family trees (FT2 and FT3) to investigate each of them (with Persons WP2 and WP3 respectively). At some point I may decide that WP2=WP1 or WP3=WP1 or neither of them or both of them. It makes life much easier if I can just cite my own research/investigation just like I would any other source. |
Okay, let me repeat back to you what I think you're trying to do using the XML as specified right now. (Note its a lot flatter that what you've got in mind, but I think it means the same thing.) <sourceDescription id="S0" about="(reference to the file itself)">
<!-- this is the description of this file so the file
itself can be cited as a source. Note it's not referenced
anywhere, so I'm uncertain as to the value of it... -->
<displayName>My Family Tree</displayName>
</sourceDescription>
<!-- now we're going to describe the census record. -->
<sourceDescription id="S1" about="http://ancestry.com/path/to/census/record">
<!-- this is the description of the census record -->
<displayName>1900 United States Federal Census Record for Joshua Amis</displayName>
<mediator resource="/path/to/description/of/ancestry/dot/com"/>
</sourceDescription>
<!-- now you've got another source in there entitled "Interpretation by Sarah Green"
and I have no idea what that is, but I'll include it here for the sake of
completeness. -->
<sourceDescription id="S2" about="???whatisthisdescribing???">
<displayName>Interpretation by Sarah Green</displayName>
<!-- this source (whatever it is) was derived from S1 -->
<source resource="S1"/>
</sourceDescription>
<!-- okay, now I've got my extracted person, Joshua Amis -->
<person id="P2">
<name>...Joshua Amis...</name>
<source resource="S2"/>
</person>
<!-- and now my "working" conclusion of Joshua Amis... -->
<person id="P3">
<name>...Joshua Amis...</name>
<!-- and I want to reference P2 as a source, but I can't do it directly
because the source reference MUST resolve to a source
description according to the spec. So what I have to
do is describe P2 with yet another source description,
S3, and reference that. -->
<source resource="S3"/>
</person>
<sourceDescription id="S3" about="P2">
<displayName>Conclusion about Josh Amis Extracted From Sarah's interpretation of the 1900...</displayName>
...
</sourceDescription> Okay, so (like it or not), that's how it would be done with the spec as it is right now. So back to the question at hand. There is no way to determine that P2 is an extracted conclusion of a single source. How do we need to modify the spec in order to make that determination? The proposal I made above was to allow a new property of <sourceDescription id="S2" about="???">
<displayName>Interpretation by Sarah Green</displayName>
<!-- this source (whatever it is) was derived from S1 -->
<source resource="S1"/>
<extractedConclusion resource="P2"/>
</sourceDescription> John's question still needs to be addressed, too: how does P3 cite P2 as a source given that a source reference MUST resolve to a source description and cannot resolve to a person? And I want to know why "S2" is even needed. What purpose does it serve? Why not just use S1? |
It looks to me like your S3 is the same as Sarah's S2. The point it serves is there can be multiple interpretations. For example the Ancestry.com may come with it''s own provided extracted conclusions via the import of record model or whatever. Then the researcher may be unhappy with that interpretation and create their own. <sourceDescription id="S1" about="http://ancestry.com/path/to/census/record">
<!-- this is the description of the census record -->
<displayName>1900 United States Federal Census Record for Joshua Amis</displayName>
<mediator resource="/path/to/description/of/ancestry/dot/com"/>
</sourceDescription>
<!-- now you've got another source in there entitled "Interpretation by Sarah Green"
and I have no idea what that is, but I'll include it here for the sake of
completeness. -->
<sourceDescription id="S2" about="???whatisthisdescribing???">
<displayName>Interpretation by Sarah Green</displayName>
<!-- this source (whatever it is) was derived from S1 -->
<source resource="S1"/>
<extractedConclusion resource="P2"/>
</sourceDescription>
<!-- now you've got another source in there entitled "Interpretation by Sarah Green"
and I have no idea what that is, but I'll include it here for the sake of
completeness. -->
<sourceDescription id="S3" about="???whatisthisdescribing???">
<displayName>Imported Interpretation from Ancestry.com</displayName>
<!-- this source (whatever it is) was derived from S1 -->
<source resource="S1"/>
<extractedConclusion resource="P3"/>
</sourceDescription>
|
Okay, so you're saying that Sarah is describing the same source in a different way? So that would be a different description about the same source. S2 would look just like S1 with perhaps a different display name, e.g. "Sarah's Interpretation of 1900 United States Federal Census Record for Joshua Amis". That's fine. What got me confused is that she embedded it within the other source description, implying there was some relationship between the two other than that they were describing the same thing. |
Exactly so :)
I am interpreting the image copy source supplied by Ancestry but ignoring the Conclusions supplied with it in favour of my own Conclusions.
I embedded it within the source for the image copy to show that it was a source of my own creation derived from it. In some situations it might be fine to just have an all-on-one source/interpretation ... but suppose the source was vague and could be interpreted different ways ... I would want the Ancestry source and 2 derivative sources one for each interpretation - both of the derivatives are "derived from" the image copy. |
I agree but that is a problem with the model as it is at the moment isn't it? It's not a new problem I've introduced ... in my view a Person is a citable object so there isn't a problem. |
@stoicflame - try this - and see embedded comments <sourceDescription id="S0" about="Research undertaken by Sarah Green on behalf of John Jones">
<displayName>Family Tree of John Jones</displayName>
</sourceDescription>
<sourceDescription id="S1" about="http://ancestry.com/path/to/census/record">
<displayName>1900 United States Federal Census Record for Joshua Amis</displayName>
<mediator resource="/path/to/description/of/ancestry/dot/com"/>
</sourceDescription>
<sourceDescription id="S2" about="Preferred interpretation of S1">
<displayName>Interpretation of 1900 United States Federal Census Record for Joshua Amis by Sarah Green</displayName>
<source resource="S1"/> <!-- *** HOW DO I SAY IT'S A DERIVATIVE OF S1 (NOT JUST SOMEHOW REFERENCES IT)? *** -->
</sourceDescription>
<person id="P2">
<name>...Joshua Amis...</name>
<source resource="S2"/> <!-- *** HOW DO I SAY THAT THIS IS EXTRACTED FROM S2 (NOT JUST REFERENCING IT)? *** -->
</person>
<person id="P1">
<name>...Josh Amis...</name>
<!-- *** HOW DO I SAY THAT THIS IS INCLUDED IN S0 or S100? *** -->
<!-- and I want to reference P2 as a source, but I can't do it directly
because the source reference MUST resolve to a source
description according to the spec. So what I have to
do is describe P2 with yet another source description,
S3, and reference that. *** I AGREE THAT IS NOT GOOD *** -->
<source resource="S3"/>
</person>
<sourceDescription id="S3" about="P2">
<displayName>Conclusion about Josh Amis Extracted From Sarah's interpretation of the 1900...</displayName>
...
</sourceDescription>
<sourceDescription id="S100" about="Research undertaken by Sarah Green on behalf of Bob Smith">
<displayName>Family Tree of Bob Smith</displayName>
</sourceDescription> |
Every object with a I personally think the AnalysisDocument should be able to replace the interpretation sourceDescriptions in the last few examples, as long as the interpretation is by the GEDCOM-X author. I think of a SourceDescription as a description of an external source referenced; Every |
+1 Absolutely :) .... but what if we import a Conclusion from Ancestry or elsewhere? Is this still a Conclusion or is it now an external source (albeit in the format of a Conclusion)?
Yup I'd go along with that tho' I think that it's important to retain the context e.g. a Role shouldn't be cited out of context of its Event; a Person should retain the context of it's Relationships etc (but I think it may be difficult/impossible to enforce this at the data structure).
I don't think the AnalysisDocument contains Conclusions and I would prefer to be able to keep the "extracted" Conclusions clean from other sources and distinct from the "compound/working" conclusion(s) if that were wished by the researcher. My reasoning is that I often find it necessary to go back and compare my working vs extracted Conclusions if there is a problem further down the line. |
Hello everybody. I apologize for the neglect of this issue; it's gotten cold. If you'd be willing to push this back into working memory, I'd appreciate your help getting these issues addressed. I'll pick this back up by addressing Sarah's pass at the XML that I put out there. It was helpful, thank you. I think I understand better how you're approaching the problem. You had some questions inline there that I'd like to address:
You're saying that it's a derivative already by referencing it as a source. That's what it means to reference a source.
Exactly the question this issue originally intended to address--how to distinguish "extracted conclusions" from "working conclusions". My proposal is twofold:
Here's kind of what I mean: <sourceDescription id="S2" ...>
...
<extractedConclusion resource="P2"/>
</sourceDescription>
<person id="P2">
<identifier>P2</identifier>
<name>...Joshua Amis...</name>
<source resource="S2"/>
</person>
<person id="P3">
<!--P3 has an identifier "P2", perhaps of type "component" or something,
to specify that P2 and P3 are conclusions about the same person-->
<identifier type="...Component...">P2</identifier>
<name>...Joshua Amis...</name>
<source resource="S2"/>
</person> |
So extractedConclusion is, in the terms of EE, information extracted from a source for use as evidence. This evidence evaluated on it's own is the atomized extracted-conclusion. A conclusion is dependent on the analysis of all relevant extracted-conclusions. Lets not over simplify, but be sure all the elements needed for GPS are there. I added some I thought should be there... I believe the name element should have it's own set of evidence analysis fields as well, but left them out to keep it cleaner... Do Conclusions and ExtractedConclusions have the same attributes, behaviors, and rules? If not then I think we may need separate classes for each. <sourceDescription id="S2" ...>
...
<!-- EE inside front cover: source form -->
<sourceForm value="original | derivative" />
<extractedConclusion resource="P2"/>
</sourceDescription>
<person id="P2">
<identifier>P2</identifier>
<name>...Joshua Amis...</name>
<!-- EE inside front cover: informant's degree of knowledge -->
<information value="primary | secondary" />
<!-- EE inside front cover: evidence adequacy to answer question. An extractedConclusion can not be negative evidence by itself. -->
<evidence value="direct | indirect" />
<source resource="S2"/>
</person>
<person id="P3">
<!--P3 has an identifier "P2", perhaps of type "component" or something,
to specify that P2 and P3 are conclusions about the same person-->
<identifier type="...Component...">P2</identifier>
<name>...Joshua Amis...</name>
<confidence value="..." />
<proofStatement>...</proofStatement>
<!-- EE inside front cover: evidence adequacy to answer question.
Evidence can only be "negative when compared against other evidence. So
there must be an evidence type in the conclusion -->
<source resource="S2" evidence="direct | indirect | negative" />
</person> |
I think you misunderstand my meaning ... say I come across a source (S1) which is a transcription of an original. I first log S1 but then follow it up and get an image copy of the original (S2). I want to be able to say that S1 is derived from S2. This will then allow me to preserve both copies and potentially explain any anomalies, problems etc with the transcription. |
Thanks! My comments:
|
Umm... but it's not derived from S2. You said it was transcribed from an original. And S2 is also derived from that same original. So you need another description of the original (S0) and both S1 and S2 would reference S0 as a source. If you wanted to create a transcription of the image copy of the original (i.e. a transcription of S2), then your description of that transcription would reference S2 as a source. What am I missing? |
I've attached the discussed changes to this thread and I'm awaiting your comments. In summary, the changes include:
|
S1 says it's a transcription of the original (ie it is derived from S2 which at this point is not yet seen/assessed/seen/evaluated). Although theoretically you are correct that S1 may have been made up or transcribed/translated from another source etc until I see the original for myself I have to believe the details provided by the author/provider of S1. So I get the original and take a scan ... that's my S2. Why would I create yet another source? It is the one that S1 said it was transcribed (i.e. derived) from. If I create an S0 then I am assuming that S1 was transcribed from some other source than the one it said it was! |
Wait, that statement is in conflict to me. Is it derived from the original, or is it derived from a copy of the original? I'm good either way, just choose so I can tell you how to model it. If it's derived from the original, then you need to describe the original (i.e. create an S0) and reference the original as a source. If it's derived from a copy of the original, then you need to describe the copy (i.e. S2) and reference S2 as a source. What I'm really trying to do is identify how you think we're not fully accounting for the notion of "derived from". |
As a researcher finding the source, how would I know? I would trust that it came from the original but the transcriber may easily have been copying from fiche or whatever. However, all this is somewhat irrelevant ... what I want as a researcher is to be able to make an explicit and unambiguous relationship between S1 and S2 which is the equivalent of saying "S1 is a derivative of S2". As far as I can tell the only means I have of getting close to that in GEDCOMX is by using the generic "sources" list to cross-reference them. Since this is a generic list it is ambiguous. Similarly, in other situations, I want to be able to make an explicit and unambiguous statement that is the equivalent of saying "S1 is a component part of a larger source (collection) S3". Again in GEDCOMX the only way to do it is to use the generic sources list ... and hence again this leads to ambiguity. If I look at the sources list for any particular source then it is impossible to deduce any meaning from it except that they are somehow related. In my opinion this renders it useless. |
Oh, so you're identifying a third case: you don't know where it came from. That's fine. So why do you want to say that you do know where it came from? Just leave the sources list empty, for now, until you know where it did come from at which point you model its source. I'll tell the story. A researcher finds a transcription and describes it with S1. She doesn't know where it's derived from, so she leaves the source list empty. Later, she finds an image that she describes with S2 and she decides either (a) the transcription was derived from the image or (b) the transcription was derived from the original just like the image was derived from the original. If (a), the modifies S1 to reference S2 as a source. If (b), she describes the original with S0 and references it from the source list of both S1 and S2.
I disagree. It's not ambiguous at all. You make that clean and unambiguous statement by stating that the source of the transcription described by S1 is the source described by S2.
Thank you for articulating the other case that you've got in mind.
This is incorrect. When a researcher wants to describe a "component of" relationship, the source description provides a |
Oh jeez this is really getting a bit too pedantic. Look, whenever I get a source I read to say where the supplier/publisher said it came from. But I don't know it for a fact unless I go the leg work myself. In the same way I can't be sure they did a good job of transcribing it until I see the original. But in spite of this I want to be able to document it so that I can follow it up.
That's great - I hadn't realised that was in there. |
In issue #144, we discussed being able to model "extracted conclusions" (
Conclusion
instances representing persons, relationships, etc. extracted from a record) and "working conclusions" (theConclusion
instances that represent the current state of the researcher's work). But as both type of conclusions are modeled withConclusion
instances, we need a mechanism to distinguishConclusions
in one role or the other (extracted vs. working).@jralls has suggested adding an "extracted" flag to the
Conclusion
class as a potential solution to this issue. @stoicflame has voiced (to me) that he feels that this concept belongs in metadata about the conclusion -- that it does not belong as part of the conclusion itself.How is this concept best expressed in the model? What other options might we consider?