The D-NET Software Toolkit is a system that offers functionalities for the collection (“harvesting”), transformation, aggregation, and indexing of metadata records collected from an arbitrary number of data sources, complying with different protocols and data exchange formats. D-NET sets a workflow language, which developers can use to combine a variety of D-NET data management services, configure them to handle data according to given data models, and pipeline them into autonomic data processing workflows.
This software package is a simplified version of the D-Net toolkit and consists of a web application with a minimal set of services for:
-
Collection of metadata records in oai_dc format via OAI-PMH, FTP, local file system, HTTP.
-
Transformation of the collected metadata records into an internal format named DMF (Driver Metadata Format)
-
Indexing of DMF records in a Solr full-text index
-
OAI-PMH export of aggregated metadata records in DMF and oai_dc formats. More formats can be added at runtime by providing a dedicated XSLT from DMF to the desired target format.
Official Web Site: http://www.d-net.research-infrastructures.eu/
Source code available at: http://svn-public.driver.research-infrastructures.eu/driver/dnet45/modules/
Need support? Contact us via email at: dnet-team[at]isti.cnr.it
This minimal instance can be run on a single machine as web application to be deployed on a Tomcat container.
Suggested minimal hardware requirements:
- Operating system: almost anything but Windows.
- HARD DISK space: mostly depends on the quantity and size of records you are going to collect. A couple of GBs for a small repository (<10K metadata recods) should be fine. See suggestions on installing mongodb below.
Software required:
- Apache Tomcat 7: the webapp container. Consider to increase the default memory heap value. We suggest -Xmx2048m.
- Mongodb >= 2.4: used to store the collected and transformed metadata records. Each collected record will be stored in three separate "versions": original, transformed, pmh-ready, hence enough disk space should be available for mongoDB.
- Solr 7.5.0: used to make the documents searchable. The solr server should be run in SolrCloud mode information can be found on https://lucene.apache.org/solr/guide/7_5/solr-tutorial.html
Note that Tomcat, Solr and Mongodb can be installed in the same machine or in dedicated nodes, although this requires to change some default system properties.
Either if you want to run the D-Net web app with the Tomcat7 plugin for maven, or you want to build the .war file to deploy on a running tomcat,
you need maven3 and you must add the following repository into your settings.xml
:
<repository>
<id>dnet45-bootstrap-release</id>
<name>D-Net 45 Bootstrap Release</name>
<url>https://maven.d4science.org/nexus/content/repositories/dnet45-bootstrap-release/</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
<layout>default</layout>
</repository>
We also suggest to add the Tomcat plugin to the plugins group at the bottom of the same file:
<pluginGroups>
<pluginGroup>org.apache.tomcat.maven</pluginGroup>
</pluginGroups>
The D-Net Software is developed in Java using Maven. You can try out the D-Net web app on your local machine with the tomcat7 plugin, provided you are also running a mongodb and a solr server on localhost that are listening to the relative standard ports.
Please note that the solr client used in D-Net needs to interact with the zookeeper server. For simplicity we suggest to use the embedded zookepper instance provided within the solr distribution. By default solr listens on the 8983 port and its embedded zookeeper server on the 9983 port.
To override properties, you can modify dnet-basic-aggregator/src/main/resources/eu/dnetlib/cnr-site.properties
. Please check the Section D-Net Configuration and the PROPERTIES.md file for more information about D-Net properties.
> cd dnet-basic-aggregator
> mvn tomcat7:run
When you see a log like:
52665 [Thread-7] INFO eu.dnetlib.enabling.is.store.TestContentInitializerJob - INITIALIZED
The webapp should be ready and running at http://localhost:8280/app , where 'app' is the value of the property container.hostname
('app' is the default).
In this distribution you will find a ready-to-deploy war package.
Copy the war file into the Tomcat 7 webapps
directory, ensure you have overridden the properties as explained in the D-Net configuration section and restart Tomcat.
When you see a log like:
52665 [Thread-7] INFO eu.dnetlib.enabling.is.store.TestContentInitializerJob - INITIALIZED
The webapp should be ready and running at
http://${container.hostname}:${container.port}/${container.context}
If you want to build the web app yourself, then keep reading...
The D-Net Software is developed in Java with Maven.
To build the war to use in a Tomcat 7 web app container:
> cd dnet-basic-aggregator
> mvn package
The .war
file is then created into the target
directory.
Before you start the web application, you need to configure at least the following properties. For the full list of available properties and their values, check PROPERTIES.md.
Create a file named cnr-override.properties
in $yourTomcatHomeDirectory$/common/classes
($yourTomcatHomeDirectory$
will likely be something similar to /var/lib/tomcat7
)
container.hostname
: the host name where the web app will be running. Default value islocalhost
. The default value should only be used in local development scenarios.
Example:container.hostname = dnet-host.dnet.eu
container.port
: the port where the web app will be running. Default is 8280.
Example:container.port = 8080
container.context
: the name of the web app (i.e. the name of the war file). Default is "app". The default value should only be used in local development scenarios.
Example:container.context = is
dnet.data.path
: path to the directory where all D-Net related resources will be saved. An embedded existDB will be automatically installed in this directory during the first start-up. The directory must be writable by the user running tomcat. Default value is/tmp/dnet
. The default value should only be used in local development scenarios.
Example:dnet.data.path = /var/lib/dnet
services.aggregator.country
: your country code. Default isEU
(Europe).
Example:services.aggregator.country = IT
services.aggregator.name
: the name of your aggregator. Default is "D-NET"
Example:services.aggregator.name = TEST_Aggregator
.services.mdstore.mongodb.host
: the machine hosting mongodb for the storage of metadata records (M[eta]D[ata]Store). Default islocalhost
.
Example:services.mdstore.mongodb.host = mongodb.dnet.eu
services.mdstore.mongodb.db
: name of the mongodb database to be used for the storage of metadata records. Default ismdstore_minimal
.
Example:services.mdstore.mongodb.db = mdstore_1
dnet.logger.mongo.host
: the machine hosting mongodb for the storage of workflow logs. Default is localhost.
Example:dnet.logger.mongo.host = mongo.dnet.eu
dnet.logger.mongo.db
: name of the mongodb database to be used for the storage of workflow logs. Default is "dnet_logs_minimal".
Example:dnet.logger.mongo.db = dnet_logs_1
services.oai.publisher.repo.name
: name of the OAI-PMH Publisher, as it will appear in the OAI Identify response. Default is "D-Net OAI-PMH Publisher".
Example:services.oai.publisher.repo.name = TEST_Aggregator OAI-PMH Publisher
services.oai.publisher.repo.email
: email of the OAI-PMH Publisher administrator, as it will appear in the OAI Identify response. Default is "dnet-admin@mock.it". The default must not be used in beta or production system for it is a mock email.
Example:name.surname@valid.mail.com
dnet.admin.password
: md5sum of the password that will allow the user "admin" to login to the D-Net Admin UI. To generate the new password:echo -n "thePassword" | md5sum
. Default is "dnet-minimal" (without double quotes). The default value should always be overridden.
Example:dnet.admin.password = 9003d1df22eb4d3820015070385194c8
, where 9003d1df22eb4d3820015070385194c8 is the md5 for the string "pwd" obtained via the commandecho -n "pwd" | md5sum
.service.solr.index.jsonConfiguration
: information about the Solr instance to be used to create full-text indices on the aggregated metadata records. Default value assumes a local Solr instance. Specifically:
{"id":"solr",\
"address":"[solr zookeeper host]:[zookeeper port, default 2181]",\
"port":"8983",\
"webContext":"solr",\
"numShards":"4",\
"replicationFactor":"2",\
"maxShardsPerNode":"20",\
"host":"[solr zookeeper host]",\
"feedingShutdownTolerance":"30000",\
"feedingBufferFlushThreshold":"1000",\
"feedingSimulationMode":"false",\
"luceneMatchVersion":"7.5.0"}
If you are not running the Solr service on the same machine where Tomcat runs, then you need to override the above configuration according to your Solr server installation.
Typically, changing address
and host
is enough if your Solr server is not configured for sharding and replication.
For more details refer to the Solr documentation.
Under the root folder of the project you can find the folder mock-repository-content
.
It contains 150 oai_dc
metadata records you can use to test the functionality of the D-Net software with a Mock Datasource.
- Place the folder in a location that is readable from tomcat
- Start the container
- Access the Admin UI (
http://${container.hostname}:${container.port}/${container.context}/mvc/ui/index.do
)- If you are running via the maven tomcat plugin with the default properties the URL is:
http://localhost:8280/app/mvc/ui/index.do
- If you are running via the maven tomcat plugin with the default properties the URL is:
- Go on Datasource Management --> Overview and search for "mock"
- Click on "Add metaworkflow" and select the "Collection and Transformation" meta-workflow. This action will associate a meta-workflow (i.e., a workflow of workflows) to the datasource and will create all needed metadata stores.
- Click on the "access params" button on the top right and change the base url to the location where you saved the sample folder (e.g.
file:///dnet/test/mock-repository-content
) - Click on the meta-workflow "Collection and Transformation" and configure its workflows with the missing parameter for the transformation rule
- click on the yellow "parameters" button of the trasnformation workflow and select the rule
dc2dmf_DRIVER
- click on the yellow "parameters" button of the trasnformation workflow and select the rule
- Ensure the launch mode is set to "Auto" for each workflow
- Click on the Launch button of the first ("collect")
- Wait for all the workflows to complete: collect, transform, index, oai, and oaiPostFeed
- Verify that the records get transformed and indexed: click on MD Inspectors --> D-Net content checker and perform some queries
- Verify that the aggregated records are correctly exposed via the built-in OAI-PMH publisher at:
http://${container.hostname}:${container.port}/${container.context}/mvc/oai/oai.do?verb=ListRecords&metadataPrefix=dmf
for the DMF metadata formathttp://${container.hostname}:${container.port}/${container.context}/mvc/oai/oai.do?verb=ListRecords&metadataPrefix=oai_dc
for the OAI_DC metadata format
To create a new data source, please read CREATING_NEW_DATASOURCE.md.
Do not hesitate to contact dnet-team@isti.cnr.it