Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an intermediate class for the top-level data types #244

Merged
merged 10 commits into from
Apr 26, 2013

Conversation

stoicflame
Copy link
Member

The 'top level' (section 2) data types all have or need extracted and media fields. Create an intermediate class to hold those two.

@thomast73
Copy link
Contributor

Actually, not all specializations of Conclusion include the extracted property. Only the top-level specializations in Section 2 have that property — i.e., Person, Relationship, Event, Document, PlaceDescription. The specializations in Section 3 do not — i.e., Gender, Name, Fact, EventRole.

The specializations that do not contain the extracted property cannot be exchanged outside the context of the specializations that do—that is, they are always contained within a specialization that has the extracted property. For data integrity and data duplication reasons, we felt the property should be applied at the top (outside) level and that any "contained" conclusions should assume the value of the outside container.

@jralls
Copy link
Contributor Author

jralls commented Apr 18, 2013

Hmm. I'd forgotten that some of the section 3 items were also Conclusion subclasses. Revised the issue to reflect the following:

Actually Document lacks extracted as well. One could say that having type Transcript, Extract, or Abstract implies it, but each source should also have an Analysis which discusses things like provenance, condition, informant, etc., and it should be tagged with extracted to indicate that it's an analysis of one source.

Both Document and Relationship need a media field: The former to attach facsimiles, the latter for those wedding videos.

Rather than repeating those properties in each subclass, collect them in an intermediate subclass of Conclusion.

@jralls
Copy link
Contributor Author

jralls commented Apr 18, 2013

The specializations that do not contain the extracted property cannot be exchanged outside the context of the specializations that do—that is, they are always contained within a specialization that has the extracted property. For data integrity and data duplication reasons, we felt the property should be applied at the top (outside) level and that any "contained" conclusions should assume the value of the outside container.

That should be explicit in the Extracted Conclusion Constraints: It requires that applications recursively check all source references of all contained conclusions, something that implementers might miss.

@stoicflame
Copy link
Member

I could go with this proposal. One of the biggest hurdles is coming up with a name. @thomast73 has been discussing the notion of a "Subject" with another contributor and defined it like this:

A “subject” is something with a unique and intrinsic identity—e.g., a person, a location on the surface of the earth. We identify that “subject” in time and space using various “identifying features”—for a person: things like name, birth date, age, address, etc. We aggregate these “identifying features” to form an apparently-unique identity by which we can distinguish our “subject” from all other possible “subjects”.

What do you think?

@thomast73
Copy link
Contributor

Actually Document lacks extracted as well. One could say that having type Transcript, Extract, or Abstract implies it, but each source should also have an Analysis which discusses things like provenance, condition, informant, etc., and it should be tagged with extracted to indicate that it's an analysis of one source.

I understand your desire to clearly associated (call out) analysis with a source. +1

I agree that extracted belongs in Document, but not for the reason you have suggested. +1 with a caveat.

In my mind, extracted is meant to say that the given data that is said to be extracted is data coming from a single source.

What you term Analysis is not data from the source, but rather data about the source.

So I think it appropriate in the cases of transcriptions, translations, and abstracts that the document be marked as extracted. But analysis is fundamentally different.

For data integrity and data duplication reasons, we felt the property should be applied at the top (outside) level and that any "contained" conclusions should assume the value of the outside container.

That should be explicit in the Extracted Conclusion Constraints: It requires that applications recursively check all source references of all contained conclusions, something that implementers might miss.

It is in the constraints. In fact, I thought that you helped with the wording? Perhaps you can suggest further clarification?

@jralls
Copy link
Contributor Author

jralls commented Apr 18, 2013

Better than my first thought, which was "TopLevel".

"Subject" gets a bit tangled up with its use defining part of a sentence: OK, Person A is the "subject" of Gender Male, or of Fact X. Gets a little hairier with Inter-"subject" references, like the Persons in a Relationship or EventRoles, which belong to one "subject" and point to another.

"Entity" is a bit more neutral in that respect, but (in English) is a Relationship or an Event an "entity"?

I think it's the right line of reasoning, we just need to look through the Thesaurus to find a word we like. "Thing" pops up, but might make folks think of a disembodied hand. ;-) Worse, it's more generic than "subject".

Hmm. I'll let that simmer for a bit.

@jralls
Copy link
Contributor Author

jralls commented Apr 18, 2013

In my mind, extracted is meant to say that the given data that is said to be extracted is data coming from a single source.

What you term Analysis is not data from the source, but rather data about the source.

Fair enough. How then do we set apart an analysis about a single source?

@thomast73
Copy link
Contributor

Person certainly is a subject—with identifying features like names, dates, places, gender, relationships, etc.

Relationship certainly is a subject—with identifying features like marriages, divorces, adoptions, etc.

Event can be a subject—described with dates, places, participants, etc.

A place can be a subject—possibly identified with PlaceDescription instances that might include names, coordinates, boundaries, etc.

I'm not sure that a Document qualifies as a subject.

In all cases, it seems that a “subject” can also be used as an “identifying feature” for another “subject”. So a “subject” can be a “subject” in one context and an “identifying feature” in another.

On the other hand, it seems to me that not all "identifying features" are subjects. We may research and debate a Fact (where it took place, when it took place, the spelling of the name, etc.), but it is always in the context of a "subject" (where was Bob born, when was Bob's birth, how did Bob spell his name).

@thomast73
Copy link
Contributor

@jralls: How then do we set apart an analysis about a single source?

I would propose something like the analysis member we are considering in the proposal for EvidenceReference.

@jralls
Copy link
Contributor Author

jralls commented Apr 18, 2013

That should be explicit in the Extracted Conclusion Constraints: It requires that applications recursively check all source references of all contained conclusions, something that implementers might miss.

It is in the constraints. In fact, I thought that you helped with the wording? Perhaps you can suggest further clarification?

Perhaps you mean this:

The conclusion (including any data it contains) MUST NOT refer to more than one source description.

"Any data it contains" doesn't translate into "all sub-conclusions", though when I'm hit over the head with it I recognize that it might.

I'd lose that line altogether and change

All source references used by the conclusion MUST resolve to the same source description, although each reference MAY contain distinct qualifying information such as attribution.

to:

All source references listed in the conclusion or any contained conclusion (e.g., Fact, EventRole) must resolve to a single SourceDescription...

What about referenced conclusions, though? That EventRole will point to a Person, and if we're working with pure single-source extraction, the Person's SourceReferences should also resolve to the single SourceDescription.

@jralls
Copy link
Contributor Author

jralls commented Apr 18, 2013

How then do we set apart an analysis about a single source?

I would propose something like the analysis member we are considering in the proposal for EvidenceReference.

So a new class? Or a mandatory Note attached to the SourceDescription?

@thomast73
Copy link
Contributor

So a new class? Or a mandatory Note attached to the SourceDescription?

analysis

  • Description: A reference to a document containing analysis about this source.
  • Data type: URI
  • Constraints: OPTIONAL. If provided, MUST resolve to an instance of http://gedcomx.org/v1/Document of type http://gedcomx.org/Analysis.

@jralls
Copy link
Contributor Author

jralls commented Apr 19, 2013

analysis

A new property of SourceDescription, I take it. On the one hand, no constraint that it be referenced by only a single SourceDescription. On the other, one can get carried away with constraints, and using a single Document to analyze more than one source isn't necessarily unreasonable: The documents in a single court docket, for example, are likely all to be in the same condition, and all will have been entered into evidence. Further analysis of credibility would have to take into account the minutes and decision, and so wouldn't be single-source.

OK. It's fine.

@thomast73
Copy link
Contributor

So returning to the modification to the model initially proposed by this issue...here is how we view the proposed refactor (in a UML-style diagram). I felt that this would be the easiest way to evaluate the proposed change. If we can agree this is the right direction, I will work to update the specification documents to reflect these changes.

In this diagram, I have added the proposed Subject class (from this issue), the proposed EvidenceReference class (from #242) and the proposed analysis member (see #246). There are also several other items not called out by @jralls that were like extracted and media in that they were only being applied to top-level specializations of Conclusion that have also been moved into the proposed Subject class. I did not classify Document (a top-level specialization of Conclusion) as a Subject, but I have updated it to included the extracted flag.

Here is the proposed model:
conceptual-model-graph

@jralls
Copy link
Contributor Author

jralls commented Apr 19, 2013

OK, looks pretty good.

My only concern is pulling identifier up into Subject because the way it is used is different for each subclass. It does make sense for code (e.g. the XML Schema) but it will require a different format in the spec in the form of a note on each subclass description to explain how to use the identifier.

@stoicflame
Copy link
Member

My only concern is pulling identifier up into Subject because the way it is used is different for each subclass. It does make sense for code (e.g. the XML Schema) but it will require a different format in the spec in the form of a note on each subclass description to explain how to use the identifier.

+1

@thomast73
Copy link
Contributor

I am sympathetic to the concern. However, I also feel that the concept is a generic concept. While we have identified some important uses for Identifiers and have called them out specifically in the documentation for the applicable data types, we are not placing limits on what type of identifiers can be stored there, or restricting the purposes for which they might be used. We are just calling attention to the uses we have already identified and that need to be recognized as mechanisms for addressing specific, recognized needs.

I would like the identifiers container to remain generic (part of Subject).

I also think it is appropriate to give specific information about important, supported identifier types and their use in the documentation for the applicable specialization of Subject.

So I would like to pursue the path of documenting important identifiers relative to a given specialization of Subject by ensuring we have verbiage associated with those specializations, but leave identifiers as a generic concept in the Subject class.

@jralls
Copy link
Contributor Author

jralls commented Apr 19, 2013

we are not placing limits on what type of identifiers can be stored there, or restricting the purposes for which they might be used.

So an open invitation for torpedoing portability. :-(

So I would like to pursue the path of documenting important identifiers relative to a given specialization of Subject by ensuring we have verbiage associated with those specializations, but leave identifiers as a generic concept in the Subject class.

-1

@thomast73
Copy link
Contributor

Your right. It is not supposed to be open ended. I miss spoke.

Identifiers have a type URI that identifies the type of the identifier. The type is OPTIONAL. If it is provided, it MUST resolve to an identifier type, and use of a known identifier type is RECOMMENDED.

Any notes in the various specializations of Subject seem to really be commentary on the semantic meaning assigned to the use of identifiers of a "known" type in the context of that specialization.

@jralls
Copy link
Contributor Author

jralls commented Apr 22, 2013

Your right. It is not supposed to be open ended. I miss spoke.

Well, I'm right that it's a danger to portability. I don't think that you misspoke.

Identifiers have a type URI that identifies the type of the identifier. The type is OPTIONAL. If it is provided, it MUST resolve to an identifier type, and use of a known identifier type is RECOMMENDED.

So the Identifier provides a single level of indirection to an unrestricted URI. If the exporter likes, he can assign a type, and it's recommended that the type be one of the four listed as "known". That sounds pretty wide-open to me.

Aside: In this case, what does "resolve to an identifier type" mean? Surely you're not expecting it to exist on some reachable web server: After all, none of the GedcomX indentifiers do. ISTM type URIs in GedcomX are just XML namespace-value pairs. So are we expecting enumerators to provide Schema or DTDs that declare all of their identifiers if they don't use the "known" ones? Will there at some point be a GedcomX Schema or DTD to validate against?

Any notes in the various specializations of Subject seem to really be commentary on the semantic meaning assigned to the use of identifiers of a "known" type in the context of that specialization.

Well, the only such note that I see is on PlaceDescription. It seems to say that an Identifier might point to a "Place Authority" (something like GNIS, I suppose) or to some other PlaceDescription with a type http://gedcomx.com/Primary. I guess the place authority usage might have a type of http://gedcomx.com/Evidence, though the note implies none at all.

All the other uses have been mentioned in passing in Issues, mostly regarding the n-tier implementation. If you leave it up to implementers to do what they want, no two implementations will inter-operate.

@thomast73
Copy link
Contributor

I believe your concerns are general to all type vocabularies in the GEDCOM X model. It is not overly germane to original issue raised here. I think it would be best to discuss your concerns in a separate issue. Can I suggest we open a new issue?

For now, I am proceeding toward the resolution outlined above.

@thomast73
Copy link
Contributor

I have posted the first batch of changes to reflect the desired changes in 52a85a5.

@jralls
Copy link
Contributor Author

jralls commented Apr 22, 2013

I believe your concerns are general to all type vocabularies in the GEDCOM X model. It is not overly germane to original issue raised here. I think it would be best to discuss your concerns in a separate issue. Can I suggest we open a new issue?

OK. It looked like two separate issues to me, so I opened #247 and #248.

@jralls
Copy link
Contributor Author

jralls commented Apr 22, 2013

Looks pretty good. I think in the 3.11 preamble the "it seems to me" is a bit casual for a spec.

Out of curiosity, why did you change all of the 'id' attributes to 'name', and why on this issue? Also, I think that it would be easier to read if Conclusion and Subject were at the top of the spec before all of their subclasses rather than near the bottom.

@ed4becky
Copy link

Point of clarification:

Why does a subject have an analysis? If a subject is a person/place or thing being researched, analysis is performed in a context - for example a subject/event (we research the "birth of James", not "James")

Where is the meat in Conclusion? that is, its, well "conclusion" I would think that in addition to a collection of conclusions, a conclusion would have a top level analysis?

Thank you for your time.

@thomast73
Copy link
Contributor

I think in the 3.11 preamble the "it seems to me" is a bit casual for a spec.

Agreed. Updated accordingly here.

Out of curiosity, why did you change all of the 'id' attributes to 'name', ...

It fixes all of the fragment links within the page. They should all work now.

... and why on this issue?

I figured out the fix while working on this issue. I was being lazy and did not want to open a new issue with a new branch, etc. Just figured it was a bug fix and could ride along just as well with this issue... :-)

There are some other bug fixes, additions to cover for omissions, deletions, etc. that will come along for the ride as well. This refactor has forced a general review of the documentation and I have been finding lots of little mistakes.

Also, I think that it would be easier to read if Conclusion and Subject were at the top of the spec before all of their subclasses rather than near the bottom.

I don't disagree. It might. But I am not going to address this in this issue. At some point, we expect to give the documents a more comprehensive review, and I think this sort of thing will come up at that point.

@jralls
Copy link
Contributor Author

jralls commented Apr 23, 2013

Why does a subject have an analysis? If a subject is a person/place or thing being researched, analysis is performed in a context - for example a subject/event (we research the "birth of James", not "James")

You still need an explanation of why you think that the James in the birth event is this James and not somebody else. Ideally the analysis would connect the Event to the Person, but that doesn't fit with the current design, so it goes on the Person (inherited from Subject, but that's a design rather than a conceptual detail).

@jralls
Copy link
Contributor Author

jralls commented Apr 23, 2013

Apologies to any non-developers reading this thread. The following comment is about working with Git and has nothing to do with GedcomX, so you can safely ignore it.

It fixes all of the fragment links within the page. They should all work now.

I figured out the fix while working on this issue. I was being lazy and did not want to open a new issue with a new branch, etc. Just figured it was a bug fix and could ride along just as well with this issue... :-)

Oh, OK, Github magic. They must be using Id, which is supposed to do the link-making magic, for something else. Why attach it to any issue, then? Just do it in master as its own change so it doesn't have to wait for this branch to get merged. You should probably cherry-pick those 3 changes into master now unless you're ready to merge this branch anyway.

There are some other bug fixes, additions to cover for omissions, deletions, etc. that will come along for the ride as well. This refactor has forced a general review of the documentation and I have been finding lots of little mistakes.

Likewise. Do them in master then rebase master onto subject-refactor:

git rebase master subject-refactor

Which will make sure that you don't get merge conflicts when you merge subject-refactor later.

@jralls
Copy link
Contributor Author

jralls commented Apr 23, 2013

I don't disagree. It might. But I am not going to address this in this issue. At some point, we expect to give the documents a more comprehensive review, and I think this sort of thing will come up at that point.

OK.

@ed4becky
Copy link

"Ideally the analysis would connect the Event to the Person,"

I totally agree, which is what I was recommending

"but that doesn't fit with the current design"

I guess I'm not clear on the scope of the discussion, vis-a-vis whats already carved in stone.

My concern is if you tie the analysis to the Subject rather than to the entity tying the Subject to the Event, as we both seem to agree is the proper strategy, vendors will have issue trying to separate each analysis into its correct context. Where I see this as an issue is when multiple analysis exist for a subject/event context - I may want to report on ALL the James/Birth analysis, but I don't want to pull in the all the James/Occupation analysis, or the James/Death analysis...

@thomast73
Copy link
Contributor

@ed4becky: Where is the meat in Conclusion? that is, its, well "conclusion" I would think that in addition to a collection of conclusions, a conclusion would have a top level analysis?

For better or for worse, "conclusion" has become a generic term in GEDCOM X. It stems from the fact that the very process of recording information (about a source, about people we know, etc.) takes us a step away from the original and therefore is "concluded" about the original—a valid definition, but not always the first definition in the GPS-trained genealogist's mind.

As Conclusion (the GEDCOM X model object) is abstract, the "meat" is defined in part by the specializations of Conclusion—currently those are Person, Relationship, Event, PlaceDescription, Name, Gender, Fact, EventRole and Document.

The "top level conclusions" in GEDCOM X are the ones we have identified in this issue as specializations of SubjectPerson, Relationship, Event and PlaceDescription—and as such, have the option to be associated with analysis.

@ed4becky: ...I may want to report on ALL the James/Birth analysis, but I don't want to pull in the all the James/Occupation analysis, or the James/Death analysis...

But, given this statement, I think I am not yet talking to your real question/concern?

Perhaps you are saying that the model is not well suited for accumulating birth analysis separate from death analysis and that it ought to be possible to do so?

I think we have been thinking of analysis in terms of the Subject as a whole—that any analysis of the Subject would include the analysis of any "identifying features" (e.g., birth, death) used to form the "apparent identity" of the subject. I know a home for this sort of analysis is needed. Thus, the association of analysis with a Subject.

But perhaps, to address your concerns, we should consider pushing analysis down into Conclusion?

Then, you could then associate birth analysis with the birth Fact (the birth "identifying feature"). The "identifying features" would still continue to be accumulated within a Subject, and we could still analyze the "apparent identity" as an analysis document associated with the Subject. But by moving analysis into Conclusion, we could also accumulate analysis specify to the birth with the birth Fact.

For my part: +1 to pushing analysis to Conclusion.

@thomast73
Copy link
Contributor

I have updated the serialization format specifications in feb16f7.

@jralls
Copy link
Contributor Author

jralls commented Apr 23, 2013

Then, you could then associate birth analysis with the birth Fact (the birth "identifying feature"). The "identifying features" would still continue to be accumulated within a Subject, and we could still analyze the "apparent identity" as an analysis document associated with the Subject. But by moving analysis into Conclusion, we could also accumulate analysis specify to the birth with the birth Fact.

And on the EventRole tying Events to Persons. Yeah, that will work. +1

@thomast73
Copy link
Contributor

Here are the updates with the analysis now a member of Conclusion.

@stoicflame
Copy link
Member

+1

@ed4becky
Copy link

Does this address both an analysis tying an Event to a Subject and and analysis tying a Fact to a Subject?

@jralls
Copy link
Contributor Author

jralls commented Apr 25, 2013

Does this address both an analysis tying an Event to a Subject and and analysis tying a Fact to a Subject?

Well, an Event is a Subject. I think you mean tying an Event to its participant Persons. That's done via EventRole, which is a Conclusion and therefore carries an analysis in which one would explain the evidence supporting (and conflicting with) the connection.

Facts are contained in the Person, so the Fact's analysis will have the evidence discussion.

The other way to get there is via EvidenceAnalysis Documents linked via SourceReference as discussed in excruciating detail in #144.

Which way is used will depend on the architecture of the exporting application, but importers are going to have to be able to handle both. Most current applications don't support either model and will at best convert everything into loosely connected plain-text notes and at worst dispatch them to /dev/null.

@thomast73
Copy link
Contributor

Does this address both an analysis tying an Event to a Subject and and analysis tying a Fact to a Subject?

The analysis document is not an association class, but an analysis document can certainly include a discussion of applicable associations.

It is my understanding, from a GPS methodology point of view, that we need to be able to associate analysis with the following GPS concepts:

  • sources — in GEDCOM X, described with SourceDescription instances
  • information — in GEDCOM X, "extracted" conclusions (Subject instances or their contained Conclusions)
  • evidence — in GEDCOM X, EvidenceReference instances
  • hypotheses — in GEDCOM X, Subject instances that have not been "accepted"
  • conclusions — in GEDCOM X, Subject instances that have been "accepted"

Other than the fact that there is no clear distinction in the model between a hypothesis and a conclusion (I cannot make it clear that a Subject is "accepted" or not—a separate issue), I can in all cases associate some analysis with the objects in the model that are intended to represent the identified GPS concepts.

So, I feeling like the basic research analysis needs can be met with the model (as it is being updated by this pull request).

thomast73 added a commit that referenced this pull request Apr 26, 2013
Completes initial refactor to add `Subject` class
@thomast73 thomast73 merged commit 79e4412 into master Apr 26, 2013
@thomast73
Copy link
Contributor

I think most of us are comfortable with moving forward on this, so I have merged these changes. I am not sure that @ed4becky is comfortable yet, so I would invite him (or any of you) to open up new issues for any remaining concerns.

We appreciate the feedback we received here. It has been very helpful. I am excited about the resulting improvements to the model.

@ed4becky
Copy link

I'm OK with what I understand. The true test will be when I try to map my data into the model. I'll let thinks perculate a little longer for that...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants