-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In need of a Source object #144
Comments
There already is a But I think what you want is the requirement for all source references to resolve to an instance of that thing? |
Just a different facet of #146 |
Forgive me for being dim but the link just shows the "Description" with all the DC meta data .... I get that the DC meta data is effectively attributes/properties of the "Description" but where in the model is the "Description" object (which you say equates to a Source)?
Er, yes. How can they not? Surely all references to the same source should be links to the same Source object? |
Yeah, that was my thoughts, too. I'm trying to figure out the difference.
Umm... sorry... I don't understand the question. Where in the model? It's part of the model...
You could refer to an image as a source. Or a multimedia file. Or a web page. Or anything else that can be identified with a URI.
Sure. But not everything cited as a source needs to be an instance of that same type. |
And that is the problem. As the model is presently expressed every reference reduces to a URI. I could enter a standard place, or some persona, or some random Slashdot article as a source. OK, a Slashdot article might be a valid source. Unlikely but possible. The real problem would be an internal object, an easy error to create, and which might cause actual trouble (think a is a source of b is a source of c is a source of a = crash). |
I asked where and you gave me a link to a "Description" definition (http://www.gedcomx.org/model/dcterms_Description.html). Maybe it's just the .Net version is broken or something but there RDFDescription is never referenced anywhere so I can only assume its just the meta data for the whole GEDCOMX file ... ergo the file itself is the one and only source and all sources must be in separate GEDCOMX files referenced by their URIs. If this is the case then as @jralls says any uri can be used as a Source and if that is so then there is no guarantee that there is any useful data whatsoever in the related uri since there is no guarantee it actually is a GEDCOMX file (and not just a jpg, a PC application, a link to a virus, etc). |
Okay. How else would you suggest the serialization format (de)reference other objects? simple string?
So that's an error in the data. Agreed. How is that any different from any other data error cases that will need to be intelligently handled by the application? I can write an "I'm my own grandpa" loop, too. |
No... that's not the intent of that at all. The intent of that object was to be akin to a |
See #146. But: The serialization isn't the model. The model describes the internal data structures that are serialized and that result when the stream is deserialized. So in the model references to other objects in the serial stream should be references to the class of those other objects, not the URI that is the proxy for the reference in the stream. Deserialization will have to be two passes: The first to construct all of the objects in the stream and the second to resolve the URIs into references and validate them. |
+++++++++1 |
If the file format includes an index of the objects with their types as @nealmcb is asking for in #140 the first pass wouldn't be necessary. (Yeah, I'm talking to myself. ;-) ) |
LOL! |
(I know I'm attempting to revive a stale thread here.) Given the new set of specifications and recipes that attempt to clarify how to model the "source object" you seek, what still needs to be addressed here? |
Just catching up ... may take a while .... bear with me! |
I'm still wading through the amended spec having been awol for some months but as far as I can see there is still no source object - just a "SourceReference" with an id, type, description and attribution. If the same source is reference multiple times then I presume these will also be replicated throughout the file creating a fragmented but inadequate source "record". |
Over in #134 Sarah and I got going on Source analysis and its importance to good genealogical work. In #156, Ryan expressed the mission of GedcomX:
Which also recognizes the importance of source analysis. How then should GedcomX record the source analysis? The logical answer to me is to have a proper Source object with a citation property (what's currently called a "Description") and an analysis property (which can be just a long string). |
Can you clarify what you mean by "source analysis"? Is this what I call an "interpretation" ie working out what is explicitly and implicitly detailed in a single source? |
Source analysis has three phases:
It's important to not try to make connections to evidence from other sources while you're doing this analysis so that you don't read in something that isn't there -- or miss something that is -- because "pieces of the puzzle" seem to fit. Yes, you could call it interpretation if you like, but "working out what is explicitly and implicitly detailed" leaves out the first part. |
Yes that's what I call interpretation :) |
OK. How would you structure the Source class and how would you tie the elements into the conclusional Persons, Relationshps, and Events? |
Briefly - simplistic syntax: Source is top level entity with properties:
Quick example:
Source 2: Birth Cert for Fred Bloggs, GRO ref 1234XYZ etc etc
Source 3: Birth Cert for Fred Bloggs, GRO ref 1234ABC etc etc
Source 4: My Family Tree, author: Sarah Green other attributes of source etc etc
NB: Persons are contained within each source, not pointers to somewhere else |
Hmm actually that's not quite right ... The proof of Source 4 should actually be the proof for Fred (or if you like for Fred's birth event) not the proof of the whole tree! Apologies. |
OK. Isn't "Source 4" really a set of conclusion-model objects (Two Events, the marriage and the birth, 3 Persons, Fred, Freda, and Frederick, and 3 Relationships), each of which has the appropriate SourceReferences pointing back to Sources 1 - 3, along with the appropriate proof statements? (Here I'm assuming that you're not using your "My Family Tree" database as a source for some other database.) Where would you put "Facts" (for example, the ages of the bride and groom from the marriage certificate)? |
If you're happy for the definition of a Source to be "a set of conclusion-model objects" then yes
Why would you assume that? That's sort of my point that it is a source (albeit one being changed as the research progresses)
|
I am not suggesting that I do feel that the relationship between a source that is a "component of" another source is different than the relationship between a source that is "derived from" another source. It seems that the use cases proffered for the "component of" relationship are really about describing multiple elements from a single source -- a mechanism for defining a citation in pieces so that one or more of those pieces can be reused to in citing additional elements in that source. On one hand, it seems the source metadata model is about supporting the need to describe a source and its provenance. This description (of the source and its provenance) is generally treated as a single logical item ("the source"). Regardless of the number elements used to describe "the source" (and the type of relationships that relate those elements), the user is still just describing "the source" and associating "the source" with his "conclusion". I think the model is sufficient for this purpose. On the other hand, if we were to state that the sources metadata model MUST support the automation of rending of entire provenance chains that include the citation of sources that are split into pieces (via the "component of" mechanism), then the model may need some help. For example, it may be that if we added a type to |
I don't agree with your use case for "component of". My intent (I believe I was the one to request it originally - see #123) was to enable the ability to represent the hierarchical nature of archival source collections and to cite from various parts of said collections. For example, a census book cover might have information about the households in statistical form and details about the enumerator. A page within that book might have details about a particular household. I may be interested in both and the fact that one is within the scope of the other may be relevant (e.g. the fact that the enumerator may have been recording his only family details; the fact that there should only be one representation of one person within a census etc etc). If you are not prepared to provide a means of using these references then I think it is better and less confusing to simply omit them. After all, things may move on by the time you get to them (witness the old FONE and ROMN tags). Don't build in a partial "something" just in case it "might come in useful one day" - it will be ignored, misinterpreted and misused and we'll all have a nightmare trying to unravel the mess. |
No, I did.
That's essentially the same use-case at a smaller scale:
(And book SDA is one of ten contained in box SDX, part of record group SDY).
+1 |
OK, then the spec should say so, not just that the sources are "related to" the source being described.
OK again, but recognize that that will force a large amount of redundant data into the file and require extra parsing overhead on the part of using applications which do provide multi-level citations (and most provide for at least two levels). |
??? I am not suggesting we eliminate the ability to create a hierarchy of source descriptions to represent "the source". And I am not suggesting that parts of that hierarchy cannot be reused to describe another source. I am just saying that the whole chain of source descriptions (that starts with the reference in the conclusion) is logically one "source". If we used five source descriptions linked together to describe that source and its provenance, it is still logically one source. |
What exactly is missing? How is it that there is no means of using these references? |
Agreed.
Then what do you mean by
? You earlier gave an example where you used the |
A description of what each one of them is for. "related to" isn't sufficient, and "best possible description of a source's provenance" is pretty vague, too. SourceReferenceType described how each SourceReference in
You can't seem to keep your story straight. ISTM you guys need to pull this and go think about it some more. |
So what if So then: A. Family XX, page YY, accessed DD Month YYYY Is modeled like: Conclusion.sources = [ SrcRefA ] SrcRefA.sourceDescription = SrcDescA SrcDescA.citation= { …(Family XX, page YY, accessed DD Month YYYY)… } SrcRefB.sourceDescription = SrcDescB SrcDescB.citation= { …(Town, County, State, Roll)… } SrcRefC.sourceDescription = SrcDescC SrcDescC.citation = { …(1900 U.S. Census, digital image, FamilySearch)… } SrcRefD.sourceDescription = SrcDescD SrcDescD.citation = { …(NARA Microfilm T623)… } |
The problem is that we couldn't get very far down that road without confusing people. The question they kept asking went something like this: If all source references to an extracted conclusion (or analysis, or transcription, or image, etc) are of type Since we were unable to give a good answer, we decided to propose the consolidation of the type onto the So what would your answer to that question be? |
I started to propose exactly that, but got distracted by Thad's "logically one source" tangent. Yes, that will work. |
I think I already did:
So if SourceDescA describes a birth certificate, and EventA extracts the particulars of that birth event without any inferences, then the SourceReference in EventA pointing to SourceDescA has type "ExtractedConclusion". SourceDescB, which describes EventA, doesn't need a type, it has a pointer to the original. If AnalysisDocumentG uses SourceDescB, its SourceReference will be of type Analysis. There is no repetition. |
And if AnalysisDocumentG uses all of SourceDescB, SourceDescC, SourceDescD, SourceDescE, and SourceDescF, under what circumstances will one of those |
Roger. So the only non-redundant usage is ExtractedConclusion vs. WorkingConclusion. So rip it all out. SourceReference should be a SourceDescription Reference in the model, and the URI of a SourceDescription in the implementations. No need for typing, no need for a separate ID, no need for an Attribution. SourceDescription doesn't need a SourceDerivationType, that's covered in the Citation, perhaps with additional detail provided in Notes. SourceDescription does need a Conclusion needs to have an "Extracted" flag. |
Isn't that what |
I don't know. Sometimes you say it is, sometimes not. If it is, SourceDescription should have only one -- only some Conclusion subclasses (AnalysisDocument, Event, Person, Relationship, maybe others, but only when ExtractedConclusion is False) need multiple But maybe, in the case of SourceDescription, it should be handled as part of Citation. |
I certainly agree that it could be modeled that way. For now, lets get some feedback on the |
So, to summarize, I will update the model as follows: SourceReference
SourceDescription
NOTE: The "Extracted" flag will need to be discussed separately. We recognize the issue this flag attempts to address. We will open an issue after we merge this pull request. NOTE: I am also making a change to the type of the |
I think it would flow better if SourceReference were pulled up to 3.2 so that all of the CitationFoo paragraphs are together. |
I think this is confusing .... most people in genealogy would think of a citation as what we are now calling proof/evidence. You are using it to indicate a derivative which I think is a subtle but significant difference. For example, I "cite" a source in a narrative about someone; but a transcription of a baptism record "is derived from" the original. Also I don't see the point of trying to list all the sources which cite/refer to this source since it will necessarily be impossible to do so in the real world. Instead each source should refer to the single source it is derived from (rather than trying to keep track of the infinite number of things which might reference it). |
I still don't see why we have to have an attribution in the SourceReference |
I have merged the pull request (#182) related to this issue. I am going to close this issue. I know that at least two questions are outstanding (about attribution - #192; and about ids - #198). If anything is unresolved and needs further discussion, would you please open a new issue(s) and summarize it there. Thank you for all of the input given here! We appreciate the help in improving our model! :-) |
I forgot, there was one other issue that I promised to open as a result of conversation here -- relative to being able to distinguish "working conclusions" from "extracted conclusions". The issue can be found in #202. |
I realise the infinite flexibility of inheriting everything from a Resource and hence allowing it to be considered a source but this also makes for infinite complexity and infinite nonsense!
To allow, for example, a DatePart to be used as evidence is nonsensical. Theoretically we could cite millions of documents with a DatePart of "Day=1" but it means absolutely nothing without the wider context of the Date, which means nothing without the wider context of the Fact.
I think we need a definite Source object (probably equating, or similar, to the Record) and it is this (not the Resource) which should be referenced in Citations and used in Evidence.
The text was updated successfully, but these errors were encountered: