Skip to content

sdbs-uni-p/tagger-edbt2023

Repository files navigation

Tagger

Tagger extends Josch (see Josch below) by integrating Tagged Unions as well as more schema extraction approaches. To use Tagger in Josch, please follow the steps above first and then the following below.

Config

First, navigate to tools/Tagger/Tagger-main/default-config.json. Change the value of out-dir to any directory you have the permission to write to.

Then you can start Josch like described above. Afterwards navigate to the Tool Settings Pane of Josch. Click on Select tools folder and use the directory navigator to point towards the tools-directory in Josch. If you want to use the approach of Spoth et al., you have to have a valid Java 8 installation. Please also select the folder accordingly and make sure that the path to java 8 ends with /bin/java.

If you want to use the approach of Frozza et al., you have to provide a path towards a working version of this tool. We provide one here, but you may use your own if you wish. Follow the instructions in the README of this project to build the tool. In Josch, please provide a valid path and make sure that it ends with approaches/frozza.

Afterwards you can connect to the database as above and utilise Tagged Union Extraction by navigating to the Tagger-pane.

Implemented by Valentin Gittinger and Stefan Klessinger.

Citing this work

This work was published as a demo at EDBT 2023. To cite this work, please use the following BibTeX entry

@inproceedings{DBLP:conf/edbt/KlessingerFGKSS23,
  author       = {Stefan Klessinger and
                  Michael Fruth and
                  Valentin Gittinger and
                  Meike Klettke and
                  Uta St{\"{o}}rl and
                  Stefanie Scherzinger},
  editor       = {Julia Stoyanovich and
                  Jens Teubner and
                  Nikos Mamoulis and
                  Evaggelia Pitoura and
                  Jan M{\"{u}}hlig and
                  Katja Hose and
                  Sourav S. Bhowmick and
                  Matteo Lissandrini},
  title        = {Tagger: {A} Tool for the Discovery of Tagged Unions in {JSON} Schema
                  Extraction},
  booktitle    = {Proceedings 26th International Conference on Extending Database Technology,
                  {EDBT} 2023, Ioannina, Greece, March 28-31, 2023},
  pages        = {827--830},
  publisher    = {OpenProceedings.org},
  year         = {2023},
  url          = {https://doi.org/10.48786/edbt.2023.75},
  doi          = {10.48786/EDBT.2023.75},
  timestamp    = {Sat, 29 Apr 2023 13:06:22 +0200},
  biburl       = {https://dblp.org/rec/conf/edbt/KlessingerFGKSS23.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

Josch

Josch is a cockpit application that combines schema extraction and checking for JSON Schema containment to exploit their interactions. It can be used for schema-less NoSQL document stores, but is currently geared for MongoDB. Furthermore, it does not implement schema extraction and checking for JSON Schema containment itself, instead, it uses third-party-tools for these tasks and allows the user to easily switch between them (in the user interface).

Schema-Extraction: Josch analyzes A MongoDB collection and a JSON Schema or a MongoDB validator is extracted, that describes the structure of the stored data.

JSON Schema Containment: Josch compares Two JSON Schema documents to check whether the language defined by one schema is a superset, superset, equivalent or incomparable to the language defined by the other JSON Schema document.

Josch uses Maven to preserve a modular architecture that allows to readily extend Josch by adding new tools for schema extraction or JSON Schema containment checking. Even further, other document stores can also be added.

Supported Third-Party-Tools

JSON Schema Containment Tools

Schema Extraction Tools

Features

Key features

  • Extract a JSON Schema using different extraction tools.

    • Use relative or absolute sampling to extract.
    • Switch between the extraction tools within the application.
  • Compare two JSON Schemas semantically using different containment tools.

    • Switch between the tools within the application.
  • Compare two JSON Schemas syntactically and highlight the differences.

  • Store and browse historic schema versions.

    • Add a personal note when storing.
    • Filter JSON Schemas by the date of storing.
  • Validate all or individual documents against a JSON Schema.

    • Find the documents that do not validate.
    • Get the amount of valid documents.
    • Find out why a single document fails validation.
  • Load, modify and create a JSON Schema

Other features

  • Show the available databases and collections of the database server.

  • Show random document samples for a given collection.

    • Show all documents if the collection is not too big.
  • Insert a document into the collection.

MongoDB specific features

  • Extract a MongoDB validator.
  • Generate a MongoDB validator from a given JSON Schema.
  • Register a new MongoDB validator at the database with specific validation action and level.
  • Validate all or individual documents against a MongoDB validator.
    • Find the documents that do not validate.
    • Get the amount of valid documents.

Installation

Josch is implemented in Java, but the third-party-tools used by Josch requires other compilers. These have to be installed and be accessible.

Some aspects of Josch require environment variables (short: variables, EV) to be set. The setting of these is dependent on your operating system (OS). Please refer to the manual in order to find out how to set and modify them.

Whenever a command is given, please execute it in your OS' shell/terminal. The shell is the command line interface of your operating system. Please note that the shell has to be restarted after each environment variable is set.

Josch

Josch uses Java 14 or higher. You can use OpenJDK or Oracle JDK.

jsonsubschema (containment)

To use the JSON Schema Containment tool jsonsubschema, the following needs to be installed:

Python 3.8

The schema containment checking tool jsonsubschema requires Python 3.8 or higher.

Pipenv

Pipenv creates and manages virtual environments for Python projects. There are two ways to install it: Isolated or pragmatic. For further information see the Pipenv documentation. We do generally suggest performing an isolated installation, which includes adding Pipenv to the PATH variable.

Josch requires that the location of Pipenv is part of your PATH variable, so please ensure that pipenv is accessible from your shell by the command pipenv --version.

Setup

Open the cloned directory of this repository and navigate (via your shell) to tools\JsonSubSchema and execute the command pipenv install in order to install all required Python modules.

You can also move this directory to another place, but please make sure to specify the correct path in Josch (settings can be applied in the user interface).

is-json-schema-subset (containment)

To use the JSON Schema Containment tool is-json-schema-subset, the following needs to be installed:

  1. Node.js JavaScript compiler.
  2. Yarn package manager.

Setup

Open the cloned directory of this repository and navigate (via your shell) to tools/IsJsonSchemaSubset and execute the command yarn install in order to install all required Node modules there.

You can also move this directory to another place, but please make sure to specify the correct path in Josch (settings can be applied in the user interface).

Hackolade (extraction)

Hackolade is a commercial tool to extract JSON Schema and MongoDB validator from the MongoDB database server. In order to use Hackolade with Josch a Professional Edition Licence is required. Before starting Josch and using Hackolade, it has to be installed and set up using the following steps:

  1. Start the application and click on common tasks. Then click on Reverse-Engineer target.
  2. Choose MongoDB with the according target version of your database. Now click the Create button and finally the Add button.
  3. Configure the connection to your database and enter the name that you want. Note that you have to remember the name and pass it to Josch later on. Confirm the settings by hitting the save button.
  4. After saving the connection, your database should show up in the list. Hackolade isn't required anymore and can be closed.
  5. Add the installation path of hackolade to a hackolade environment variable and add it to the PATH variable as well.

json-schema-inferrer (extraction)

As this is a Java library, it is contained in Josch.

Execute Josch

Josch is developed as a multi-module Maven Project. You can either use your Java IDE to execute it or you can use Maven directly.

Build with Maven
  1. Navigate to the josch directory of the repository via your shell. It holds a pom.xml and the submodules.
  2. Execute the command mvn clean install.
  3. Navigate to the subdirectory josch.presentation\josch.presentation.gui\josch.presentation.gui.controller\target.
  4. Execute the command java -jar josch-1.0-jar-with-dependencies.jar. For this command to work the Java application has to be on your PATH variable.

Build with IDE

Import Josch as a Maven Project via IDE and build the Project accordingly. The main class and method to launch the application is josch.presentation.gui.controller.App.main().

Expandability

  • Easy integration of new NoSQL document stores that base on JSON data.
  • Easy integration of new schema extraction and containment tools.
  • Different color themes that can be extended upon.

As Josch is a multi-module Maven Project. It can be extended easily. Extensions can be made in any given layer. The implementation of extensions is similar for all layers and extensions except for the presentation layer because it has no layer above.

Extend Josch

In order to extend Josch, you need to create a new Maven submodule in the respective component (josch.services.<COMPONENT>). To make your implementation stick to Josch, use the interfaces and abstract classes in the corresponding layer (josch.<LAYER>.interfaces). Each submodule is required to have a module-info.java in order to avoid transitive dependencies and needs to be registered in the parent pom.xml. Examples can be found in every leaf module, e.g. josch.services.comparison.jsonsubschema.

After you have implemented the new module, you have to register it within Josch:

  • To make the module selectable in the user interface, register the module as new value in the respective component. These can be found at josch.model.enums. E.g. to register a new module for checking containment, add it to josch.model.EContainmentTools.java
  • To make the module work in Josch internally, register it in the respective factory. Each layer has its own factory (josch.<LAYER>.factory). In order to register it, add it to the respective switch statement.

Implemented by @daubersc

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published