Shmdoc is a tool that allows for a combination of automated and human-supplied analysis as well as annotation of scientific datasets. It was developed as a proof of concept application for VLIZ during Open Summer of Code 2020.
- Automated analysis of
.xml
,.json
and.csv
files. - Human-supplied annotation of the unit of certain fields through an easy-to-use web interface.
- Data stored as triples for linked-data/semantic analysis capabilities.
- Extensible microservices software stack based on the mu.semte.ch framework.
Shmdoc is a microservices-based application, meaning that the different parts of the application are contained in different entities that can each have their own independent implementation and dependencies. These contained entities are called microservices, and they communicate with each other via HTTP.
Shmdoc is built with the mu.semte.ch framework which means that microservices are contained in Docker containers.
Management of the different microservices is done with docker-compose
.
Shmdoc consists of a few microservices. Some were already available in the mu.semte.ch framework. Others were built specifically for shmdoc.
db
: the "triplestore" (linked-data database) is the heart of this application. Thedb
-service is the central location for all data storage.migrations
: service to import existing databases in thedb
service. This service can for example be used to import existing vocabularies into the database (as is done for the units vocabulary in this proof of concept.)resources
: The resources service allows the database to be accessed by a JSON-API, as is done by thefrontend
service. It is configured by/config/resoures/domain.lisp
.file-service
: Thefile-service
allows for file upload and download from thedb
-service. This service is used to upload files for analysis in this proof of concept.identifier
: A HTTP proxy that identifies the user-agent and makes sure that all user-agents behave as expected. It basically makes sure that browsers play nice with the services.dispatcher
: Thedispatcher
service dispatches HTTP requests that are sent to the application. The service is configured inconfig/dispatcher/dispatcher.ex
as stated here.
built for shmdoc:
shmdoc-analyzer
: service that fetches files from thedb
service for automatic analysis.frontend
: service that provides a front-end for our application. It accesses all services in a user-friendly way.
Below is a schematic depiction of the domain.lisp
file of the resources
microservice. This serves as a data model of how the data is stored in the db
service of our system.
We will now go through all of the objects in this data model as if we would go through our flow (detailed in the user documentation):
sources
: We begin by selecting a previous or creating a new data source. E.g. "measurement of seals in Ostend". The actual data of data sources is linked to the sources withschema-analysis-job
-objects.files
: A fileobject is made by thefile-service
upon an upload of anxml
,json
orcsv
file. e.g. "seal-measurement-ostend.csv
"schema-analysis-job
: In our current workflow, every file gets analyzed immediately by theshmdoc-analyzer
service, this generates aschema-analysis-job
object in the database that is linked to a file.column
: For every column in the data of the file that belongs to thisschema-analysis-job
-object acolumn
-object is generated that contains all of the statistical properties found during analysis.unit
: Everycolumn
object in the data model can be assigned aunit
e.g. "Kilograms"
To run this application you, must install:
This repository provides a main docker-compose.yml
-file, as well as some environment-specific compose files. You'll want to create a .env
-file with that contains the following line:
COMPOSE_FILE=docker-compose.yml:docker-compose.prod.yml
After the requirements have been installed and the environment has been set start the stack with:
docker-compose up
With the standard production settings, you can now open up a browser and browse to localhost
where you will be presented with the user interface.
See docker-compose's cli-reference for more information regarding the management of these kind of stacks.
When running the app for development, you might want to alter your
environment configuration. It is recommended to use an Environment File (aka .env
-file)
For development the .env
can look like this:
COMPOSE_FILE=docker-compose.yml:docker-compose.dev.yml
The configuration in the docker-compose.dev.yml
file will overwrite some standard settings for the specified microservices.
The docker-compose.dev.yml
file provided in this repository as an example will do the following things:
- Disable the
frontend
service altogether. - Route port
80
to the identifier to skip the disabledfrontend
. - Expose port
8890
to access the database directly via theSPARQL
-endpoint atlocalhost:8890/sparql
. - Expose the
shmdoc-analyzer
service for easy debugging at port8891
.
After your environment is configured you can start the code just like with the production environment settings:
docker-compose up
The OpenAPI resources generator can be used to generate an openapi-specification for the API exposed by resources
.
Following command will generate a openapi.json
-file in ./doc
. Make sure to delete the old one should a file by this name already exist.
docker-compose -f docker-compose.yml -f docker-compose.doc.yml up
Here you can find an introduction on getting started with shmdoc.
We're assuming your IT-administrator has already setup shmdoc somewhere, e.g. at localhost/
or shmdoc.osoc.be
.
Let's take a look at what you can do with shmdoc. In this tutorial you'll learn how you can create a source, upload a file and how you can view/edit the information about the columns in the file.
Let's start by going to the webpagee were shmdoc is hosted. The shmdoc home page let you create new analysis, view running analysis, view historic analysis and get an overview of your running pages.
Let's start by creating a new analysis.
Every analysis is attached to a source. A source is used to group similar analysis. Hit "Created new source" to create a new one. Enter the name and a small description and press "Create source".
Now we can upload a file to this source: Choose a file, selected the file you want. Be sure the name of the source you want to upload to is in the input field next to the "Upload job". You can also choose multiple files and upload them all at once to the same source. Once this is done, upload your job.
Once the file has been uploaded, it will show in the running analysis overview page. It might take some time before the file appears or the running completes, depending on how large your file is.
Once your job is done, you can go to the shmdoc homepage (by clicking on "shmdoc" in the upper left corner). You can find your analysis result under "Historic analyses", but it's recommended to navigate to your analysis results by clicking "Sources" on the homepage and then clicking on the name of the source you've added your analysis job to. (The "Historic analyses" list can get quite long over time).
Click on the source you've just added your analysis job. Then you can see an overview of all analysis jobs connected to the source. Click on the one you've uploaded and you can see an overview of the results from the analysis. Fun fact: You can also add a new analysis job to a source from the page with an overview of all jobs of the source.
The overview of the analysis results is column per column. You can change the column you're viewing the results for by clicking on "Columns" in the upper left corner and clicking on the column you want to see.
Shmdoc has done it's job. Now it's time for a human to complete the results. You can do this by clicking on "edit" and adding the extra info you'd like. Let's say we want to add "Years" as the unit of this column. Once entered, hit "save" and you can see your unit property has been updated.
Finally, you can take a look at columns from all analysis containing the same unit by clicking on "Show related" in the row of the unit property. Clicking on one of those brings you to the page for that column information.