Skip to content

This is a basic instance of the D-Net software toolkit, a software framework for the realization of aggregative data infrastructures.

License

Notifications You must be signed in to change notification settings

dnet-team/dnet-basic-aggregator

Repository files navigation

D-Net Software Toolikt v.1.3.0

The D-NET Software Toolkit is a system that offers functionalities for the collection (“harvesting”), transformation, aggregation, and indexing of metadata records collected from an arbitrary number of data sources, complying with different protocols and data exchange formats. D-NET sets a workflow language, which developers can use to combine a variety of D-NET data management services, configure them to handle data according to given data models, and pipeline them into autonomic data processing workflows.

This software package is a simplified version of the D-Net toolkit and consists of a web application with a minimal set of services for:

  • Collection of metadata records in oai_dc format via OAI-PMH, FTP, local file system, HTTP.

  • Transformation of the collected metadata records into an internal format named DMF (Driver Metadata Format)

  • Indexing of DMF records in a Solr full-text index

  • OAI-PMH export of aggregated metadata records in DMF and oai_dc formats. More formats can be added at runtime by providing a dedicated XSLT from DMF to the desired target format.

Official Web Site: http://www.d-net.research-infrastructures.eu/

Source code available at: http://svn-public.driver.research-infrastructures.eu/driver/dnet45/modules/

Need support? Contact us via email at: dnet-team[at]isti.cnr.it

Installation requirements

This minimal instance can be run on a single machine as web application to be deployed on a Tomcat container.

Hardware requirements

Suggested minimal hardware requirements:

  • Operating system: almost anything but Windows.
  • HARD DISK space: mostly depends on the quantity and size of records you are going to collect. A couple of GBs for a small repository (<10K metadata recods) should be fine. See suggestions on installing mongodb below.

Software requirements

Software required:

  • Apache Tomcat 7: the webapp container. Consider to increase the default memory heap value. We suggest -Xmx2048m.
  • Mongodb >= 2.4: used to store the collected and transformed metadata records. Each collected record will be stored in three separate "versions": original, transformed, pmh-ready, hence enough disk space should be available for mongoDB.
  • Solr 7.5.0: used to make the documents searchable. The solr server should be run in SolrCloud mode information can be found on https://lucene.apache.org/solr/guide/7_5/solr-tutorial.html

Note that Tomcat, Solr and Mongodb can be installed in the same machine or in dedicated nodes, although this requires to change some default system properties.

Running the D-Net web app with Maven

Maven settings

Either if you want to run the D-Net web app with the Tomcat7 plugin for maven, or you want to build the .war file to deploy on a running tomcat, you need maven3 and you must add the following repository into your settings.xml:

 <repository>
          <id>dnet45-bootstrap-release</id>
          <name>D-Net 45 Bootstrap Release</name>
          <url>https://maven.d4science.org/nexus/content/repositories/dnet45-bootstrap-release/</url>
          <releases>
            <enabled>true</enabled>
          </releases>
          <snapshots>
            <enabled>false</enabled>
          </snapshots>
          <layout>default</layout>
 </repository>

We also suggest to add the Tomcat plugin to the plugins group at the bottom of the same file:

<pluginGroups>
    <pluginGroup>org.apache.tomcat.maven</pluginGroup>
</pluginGroups>

Testing on local machine:

The D-Net Software is developed in Java using Maven. You can try out the D-Net web app on your local machine with the tomcat7 plugin, provided you are also running a mongodb and a solr server on localhost that are listening to the relative standard ports.

Please note that the solr client used in D-Net needs to interact with the zookeeper server. For simplicity we suggest to use the embedded zookepper instance provided within the solr distribution. By default solr listens on the 8983 port and its embedded zookeeper server on the 9983 port.

To override properties, you can modify dnet-basic-aggregator/src/main/resources/eu/dnetlib/cnr-site.properties. Please check the Section D-Net Configuration and the PROPERTIES.md file for more information about D-Net properties.

> cd dnet-basic-aggregator

> mvn tomcat7:run

When you see a log like:

52665 [Thread-7] INFO  eu.dnetlib.enabling.is.store.TestContentInitializerJob  - INITIALIZED

The webapp should be ready and running at http://localhost:8280/app , where 'app' is the value of the property container.hostname ('app' is the default).

Deployment on a Tomcat instance

In this distribution you will find a ready-to-deploy war package.

Copy the war file into the Tomcat 7 webapps directory, ensure you have overridden the properties as explained in the D-Net configuration section and restart Tomcat.

When you see a log like:

52665 [Thread-7] INFO  eu.dnetlib.enabling.is.store.TestContentInitializerJob  - INITIALIZED

The webapp should be ready and running at

http://${container.hostname}:${container.port}/${container.context}

If you want to build the web app yourself, then keep reading...

Building the D-Net web app

The D-Net Software is developed in Java with Maven.

To build the war to use in a Tomcat 7 web app container:

 > cd dnet-basic-aggregator

 > mvn package

The .war file is then created into the target directory.

D-Net configuration

Before you start the web application, you need to configure at least the following properties. For the full list of available properties and their values, check PROPERTIES.md.

Create a file named cnr-override.properties in $yourTomcatHomeDirectory$/common/classes ($yourTomcatHomeDirectory$ will likely be something similar to /var/lib/tomcat7)

  • container.hostname: the host name where the web app will be running. Default value is localhost. The default value should only be used in local development scenarios.
    Example: container.hostname = dnet-host.dnet.eu
  • container.port: the port where the web app will be running. Default is 8280.
    Example: container.port = 8080
  • container.context: the name of the web app (i.e. the name of the war file). Default is "app". The default value should only be used in local development scenarios.
    Example: container.context = is
  • dnet.data.path: path to the directory where all D-Net related resources will be saved. An embedded existDB will be automatically installed in this directory during the first start-up. The directory must be writable by the user running tomcat. Default value is /tmp/dnet. The default value should only be used in local development scenarios.
    Example: dnet.data.path = /var/lib/dnet
  • services.aggregator.country: your country code. Default is EU (Europe).
    Example: services.aggregator.country = IT
  • services.aggregator.name: the name of your aggregator. Default is "D-NET"
    Example: services.aggregator.name = TEST_Aggregator.
  • services.mdstore.mongodb.host: the machine hosting mongodb for the storage of metadata records (M[eta]D[ata]Store). Default is localhost.
    Example: services.mdstore.mongodb.host = mongodb.dnet.eu
  • services.mdstore.mongodb.db: name of the mongodb database to be used for the storage of metadata records. Default is mdstore_minimal.
    Example: services.mdstore.mongodb.db = mdstore_1
  • dnet.logger.mongo.host: the machine hosting mongodb for the storage of workflow logs. Default is localhost.
    Example: dnet.logger.mongo.host = mongo.dnet.eu
  • dnet.logger.mongo.db: name of the mongodb database to be used for the storage of workflow logs. Default is "dnet_logs_minimal".
    Example: dnet.logger.mongo.db = dnet_logs_1
  • services.oai.publisher.repo.name: name of the OAI-PMH Publisher, as it will appear in the OAI Identify response. Default is "D-Net OAI-PMH Publisher".
    Example: services.oai.publisher.repo.name = TEST_Aggregator OAI-PMH Publisher
  • services.oai.publisher.repo.email: email of the OAI-PMH Publisher administrator, as it will appear in the OAI Identify response. Default is "dnet-admin@mock.it". The default must not be used in beta or production system for it is a mock email.
    Example: name.surname@valid.mail.com
  • dnet.admin.password: md5sum of the password that will allow the user "admin" to login to the D-Net Admin UI. To generate the new password: echo -n "thePassword" | md5sum. Default is "dnet-minimal" (without double quotes). The default value should always be overridden.
    Example: dnet.admin.password = 9003d1df22eb4d3820015070385194c8, where 9003d1df22eb4d3820015070385194c8 is the md5 for the string "pwd" obtained via the command echo -n "pwd" | md5sum.
  • service.solr.index.jsonConfiguration: information about the Solr instance to be used to create full-text indices on the aggregated metadata records. Default value assumes a local Solr instance. Specifically:
{"id":"solr",\ "address":"[solr zookeeper host]:[zookeeper port, default 2181]",\ "port":"8983",\ "webContext":"solr",\ "numShards":"4",\ "replicationFactor":"2",\ "maxShardsPerNode":"20",\ "host":"[solr zookeeper host]",\ "feedingShutdownTolerance":"30000",\ "feedingBufferFlushThreshold":"1000",\ "feedingSimulationMode":"false",\ "luceneMatchVersion":"7.5.0"}

If you are not running the Solr service on the same machine where Tomcat runs, then you need to override the above configuration according to your Solr server installation. Typically, changing address and host is enough if your Solr server is not configured for sharding and replication. For more details refer to the Solr documentation.

Using D-Net

Under the root folder of the project you can find the folder mock-repository-content. It contains 150 oai_dc metadata records you can use to test the functionality of the D-Net software with a Mock Datasource.

  • Place the folder in a location that is readable from tomcat
  • Start the container
  • Access the Admin UI (http://${container.hostname}:${container.port}/${container.context}/mvc/ui/index.do)
    • If you are running via the maven tomcat plugin with the default properties the URL is: http://localhost:8280/app/mvc/ui/index.do
  • Go on Datasource Management --> Overview and search for "mock"
  • Click on "Add metaworkflow" and select the "Collection and Transformation" meta-workflow. This action will associate a meta-workflow (i.e., a workflow of workflows) to the datasource and will create all needed metadata stores.
  • Click on the "access params" button on the top right and change the base url to the location where you saved the sample folder (e.g. file:///dnet/test/mock-repository-content)
  • Click on the meta-workflow "Collection and Transformation" and configure its workflows with the missing parameter for the transformation rule
    • click on the yellow "parameters" button of the trasnformation workflow and select the rule dc2dmf_DRIVER
  • Ensure the launch mode is set to "Auto" for each workflow
  • Click on the Launch button of the first ("collect")
  • Wait for all the workflows to complete: collect, transform, index, oai, and oaiPostFeed
  • Verify that the records get transformed and indexed: click on MD Inspectors --> D-Net content checker and perform some queries
  • Verify that the aggregated records are correctly exposed via the built-in OAI-PMH publisher at:
    • http://${container.hostname}:${container.port}/${container.context}/mvc/oai/oai.do?verb=ListRecords&metadataPrefix=dmf for the DMF metadata format
    • http://${container.hostname}:${container.port}/${container.context}/mvc/oai/oai.do?verb=ListRecords&metadataPrefix=oai_dc for the OAI_DC metadata format

To create a new data source, please read CREATING_NEW_DATASOURCE.md.

Need support?

Do not hesitate to contact dnet-team@isti.cnr.it

Build status

Build Status

About

This is a basic instance of the D-Net software toolkit, a software framework for the realization of aggregative data infrastructures.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages