-
Notifications
You must be signed in to change notification settings - Fork 195
Modeling Data
Data modeling is the process of mapping your data to an ontology so that Karma can integrate it with data from other sources and publish it in a new format. To model your data you need to first import your data and the ontologies you want to use to model your data (see Importing Data).
The first step to model data is to ask Karma to show the panel where you can specify the model for your data:
click on the worksheet menu and select Show Model
:
If you have not defined a model for this source yet, Karma will show an empty model as indicated by the small red circles above each column heading:
The modeling process consists of the following activities:
- Specifying semantic types
- Specifying relationships among classes
- Automatically defining ontologies to model sources
- Managing the models for a source
A semantic type defines the relationship between a column of data and a property and a class in your ontology.
For example, in the artworks-list.xml
data shown above, we want specify that the artist
column contains the names of people.
We can do this by mapping the artist
column to the foaf:name
of a foaf:Person
(foaf
is a popular ontology for modeling data about people http://xmlns.com/foaf/spec/).
Karma is ontology agnostic, so you can use whatever ontology you want to model your data.
To specify a semantic type click on the small red circles above the column headings
(the circles turn black once you define a semantic type, but you can still click on them to change the semantic type).
Clicking on the red circle for artist
shows a panel such as the following.
Semantic types can be specified in several different ways and can combine multiple pieces of information. We explain each of these in turn:
- Basic semantic types
- Editing suggestions
- Multiple semantic types for a column
- Multiple columns with the same semantic type
- URIs for classes
- Per-row semantic types
- Literal types
The most common way of constructing a semantic type is to define it based on a property and a class in your ontology. When you use Karma on multiple data sources related to the same domain (e.g., several data sources with data about museums), Karma will learn the semantic types you assign to data, and offer them as suggestions.
In this example, we had been using Karma to model several museum sources so Karma shows the top-ranked suggestions for the semantic type for the artist
column.
The first one is correct, so you can click on the check box to select it and click submit to assign it.
Karma updates the model to show your semantic type assignment:
Sometimes, the suggested semantic types are not exactly what you want.
In these cases, you can can click on the small Edit
buttons on the right to edit any of the semantic types.
For example, you can click on the first Edit
button to edit the first foaf:name
suggestion:
You can type-in the name of a data property or a class name in the appropriate box. Karma offers type-in completion; so keep typing until the menu shows the choice you want. If the menu is not showing a completion, Karma will reject your input.
You can also click on the Browse
button to call up a window where you can browse the full contents of your ontologies.
The browser for the Property
box allows you to browse the data properties in the ontology and the browser for Class
allows you to browse the classes.
The browsers have a search box to make it easy to find relevant properties and classes.
For example, if you are unsure about which property to use to model the name of a person, you can search for all the data properties that match the keyword name in your loaded ontologies:
When you find the correct one, select it and click Submit
.
Karma will enter it in the appropriate box and you can continue editing other aspects of your semantic type.
Often you may want to assign multiple semantic types to a column.
For example, you may want to model the artist
column as the foaf:name
of a foaf:Person
and you also want to model it as the rdfs:label
as the name is an appropriate label for people (it is good practice to define an rdfs:label
for all classes in your models).
The semantic box dialogue box allows you to define multiple semantic types for a column by selecting multiple checkboxes in the list of semantic type suggestions.
You need to make on of the semantic types be the primary one by selecting the appropriate radio button in the Primary
column.
When you add more semantic types and relationships to your model, Karma will connect the primary semantic types to the other parts of your model.
The non-primary semantic types will be used in the publishing phase.
The following example shows how you can define foaf:name
as the primary semantic type for artist
and rdfs:label
as a synonym or non-primary semantic type.
Karma uses the foaf:name
semantic type for connecting the various pieces of the model into a coherent whole, but your final RDF will also include an rdfs:label
.
You can click on the Add synonym semantic type
button to add a new row of semantic types to the table of suggested semantic types.
This is useful in the unlikely event that you want to define more semantic type synonyms than there are rows in the list of suggestions.
Sometimes you have a situation where a source contains two columns with the same type of data, but the data in one column refers to different individuals than the data in the other column.
For example, consider the following dataset containing data about artworks.
The Artist
column contains the name of the creator of the artwork so we modeled it as the name
of a foaf:Person
(Karma does not show the prefixes).
The Sitter
column also contains the names of people, but these are the names of the people depicted in the artwork.
It is important for the model to specify that the artist and the sitter are different people.
This is what the indices after the class names are for (e.g., the 1
in Person1
).
If the artist is Person1
, we want the sitter to be a different person, say Person2
.
Here is the semantic types dialogue that Karma shows when you click on the red circle above Sitter
:
The last two entries are pertinent here.
One, foaf:name of arc-ont:Person1
is a suggestion that the Sitter
column contains a foaf:name
for Person1
, the same person that we used to represent the artist.
This is not what we want.
The last entry, foaf:name of arc-ont:Person2 (add)
suggests that you can add a new person to your model, Person2
, to represent the sitter.
This is precisely what we want.
The following screen shows the resulting model when you select Person2
for the sitter:
The model contains two person bubbles: Person1
for the artist and Person2
for the sitter.
Both columns contains the names, so we used the foaf:name
property in both semantic types.
The other bubble, CulturalHeritageObject1
represents the artwork and the creator
and sitter
links represent the relationships between the artwork and the people (see section on Specifying relationships among classes for an explanation about relationships among classes).
Every bubble in your model represents a class of entities in the world.
For example, Person1
in the following model represents the class of artists.
Different records in your dataset typically represent different specific entities.
For example, in the first row, Frishmuth, Harriet Whitney
is the name of one artist, and Archipenko, Alexander
is the name of a different artist:
When we model datasets, we typically want to assign unique identifiers to each entity mentioned in a dataset so that we can refer to them later when we publish the data. This is especially important when we have multiple datasets that refer to the same individual and we want to make sure that the data is cross-refenced appropriately so that we can merge the data about the same individuals when they are mentioned in different datasets. For example, we may have an "artworks" dataset that tells us that Alexander Archipenko created the bronze statue titled Concave of Standing Woman, and we may have an "artists" dataset that tell us that he was born in Kiev. We want our models to define a unique identifier to Alexander Archipenko so that all the data about him is tied to the same identifier.
In Karma you can assign unique identifiers (URIs) to entities in two ways:
- Marking a combination of columns as defining keys that identify the entities in a class, or
- Explicitly assigning a URI to each entity.
You can mark a column as key for a class by selecting the Mark as key for the class
option in the semantic type dialogue box.
In the model, Karma uses an asterisk to the semantic types where the Mark as key for the class
option is selected.
You can mark more than one column as key for the same bubble, and when you do that, Karma will use the combination of all attributes marked as key to construct the URI for the entities in the class.
When you publish the data in RDF, Karma will generate URIs by concatenating the prefix for your dataset, with the class name and the values of each of the cells containing data that you marked as key for the class.
For example, suppose that the prefix for your dataset is http://ima.org
, then if you mark artist
as key for Person1
, the URI for Alexander Archipenko will be http://ima.org/Person_Archipenko_Alexander
Note that the numeric suffix of the class (e.g., the 1
in Person1
) is used only to label the bubbles in the model of a source, and they have no relationship to the URI and have no meaning across datasets (e.g., Person1
in the "artworks" dataset is not the same person as Person1
in the "artists" dataset).
If you want absolute control over the URIs for entities, you can store them in a column in your dataset and tell Karma to use those URIs for the entities in your class. The following screenshot shows our "artworks" dataset transformed to add a column that explicitly provides the URI that we should use for each artist:
You can tell Karma to use the URIs in column artist_URI
as the URIs for Person1
by using the Advanced Options
in the semantic types dialogue box.
First, click on Advanced Options
to ask Karma to show you the advanced options.
Then, select the specifies class for node
and enter the name of the bubble for which you want to specify the URI:
When you click Submit
, Karma will updates the model.
The green, dashed arrow labelled classLink
shows that a column is being used to specify the URI for a class.
When you publish the data in RDF, Karma will use the URIs you specify in the contains URI for node
options to define the URIs for your entities.
For example, the URI for Alexander Archipenko will be http://ima.org/people/lexander_Archipenko
instead of the automatically generated URI that Karma generates when you use the Mark as key for class
option.
Note: if your model has a class for which you don't specify a URI, then Karma will generate a blank node for it. All the data within a single dataset will be self consistent, but there will be now way to cross-reference the entities in that class with entities in the same class in other datasets. It makes sense to use blank nodes for classes that you use to group information together, but which don't correspond to an entity in the world.
Note: this advanced option will be documented in a future release of the documentation.
You can explicitly define the types of literals using the Literal Type
option in the panel for specifying semantic types.
Karma offers the standard XSD types in a menu and also allows you to enter your own URIs if that is appropriate for your application.