A common and overarching problem in the field of Semantic Web development is the mapping of ontologies and vocabularies. Many ontologies, schemas, and vocabularies exist for classification, data exchange, and machine learning purposes, but often the transformations from one ontology to another is lossy or inexact. Kendra Signpost system that was developed as part of P2P-Next was used to manage large data sets of media metadata, defining an architecture for importing data in various formats and providing mappings between them.
The mappings defined in Kendra Signpost allow data from various sources and formats to be integrated for reasoning and inference, and entities from one domain can be used in queries from another. For example, media metadata fields corresponding to a video on YouTube may be mapped to equivalent or similar fields from MPEG7/21 metadata or metadata coming from another video-related SNS like Vimeo or Blip.tv. This feature is relevant to the complementary Kendra Match component as mappings between two networks or profiles can be extended to other networks, thereby improving the quality and richness of user and content recommendations.
In addition to a public-facing web server for accessing Kendra Signpost (Apache with mod_php for the purposes of the prototype), a relational database for storage of content and configuration (MySQL), an RDF triple store for storage of faceted metadata (OpenLink Virtuoso), and a search and retrieval system for indexing the metadata (Apache Solr), Kendra Signpost is comprised of several modules which are relevant to SARACEN, implemented as Drupal Features:
-
Kendra Signpost Mapper is a generalised tool for mapping metadata for content owners that includes mapping to the P2P-Next Rich Metadata schema. Kendra Signpost is generalised in order to drive adoption: content owners can use the Kendra Signpost mapping tools to link their content to other popular metadata mappings, thereby increasing content visibility and discovery. The server-side mapping and inference technology included in the other components of Kendra Signpost are useful when combining data from multiple sources, formats, and vocabularies, such as user profile data from Facebook, Twitter, Google, and OpenSocial.
-
Kendra Signpost Inference Engine is a Python tool that acts as a proxy between the portal hosting the uploaded catalogues, the RDF triple store containing the metadata extracted from catalogues, and the client-side tools that can run queries against the inferred relations generated by the mapping tool.
-
Kendra Signpost Search is a public-facing query builder that allows end users to quickly build queries to find the content that fits their exact criteria, based on the faceted metadata stored within the system. Building queries using a flexible interface in JSON-LD format, complex queries may be assembled with minimal knowledge of the underlying data structures or vocabularies.
Kendra Signpost is free software developed by Kendra Foundation in collaboration with P2P-Next and co-funded by the European Union under the Seventh Framework programme.
This is the full Drupal install for the Kendra Signpost Trial website, including Drupal core, required modules, and Kendra Signpost specific modules and configuration.
To run a full Kendra Signpost stack, the following components are required:
- LAMP (Linux/Apache/MySQL/PHP) server for running the Kendra Signpost site
- Python for running the inference proxy layer
- Solr for powering the search services
- Virtuoso for storage of RDF data
- A Git client - for checking out the code from github.
These are the steps to get your own Kendra Signpost server up and running:
- Download the Kendra Signpost Trial code from the github repository. If you want to make a working copy (in order to push back changes to github) then use
git clone git@github.com:kendrainitiative/kendra_signpost_trial.git
or if you just want to download the code to run tests, usewget http://github.com/kendrainitiative/kendra_signpost_trial/tarball/master
. - Create an empty MySQL database.
- In the sites/default folder, prepare for installation by copying
default.settings.php
tosettings.php
and making it writable by the webserver, and create a foldersites/default/files
and make this writable by the webserver. - Run the Drupal installer, using install.php and selecting the Kendra Signpost Trial install profile. You will need to enter database connection settings and create an admin account.
- Configure the connection settings for Virtuoso.
Configure Apache to execute the python scripts when called. There are rules in the .htaccess file to deal with the rewriting of Solr requests to use the proxy.
The instructions below offer a step by step procedure for installing Apache Solr for use with the Kendra Signpost Trial.
apt-get install openjdk-6-jdk
wget http://apache.dataphone.se//lucene/solr/1.4.1/apache-solr-1.4.1.tgz
tar xzvf apache-solr-1.4.1.tgz
cd apache-solr-1.4.1/example/solr/conf/
mv schema.xml schema.bak
mv solrconfig.xml solrconfig.bak
cd /var/www/html/
drush dl apachesolr
cd /var/www/html/sites/all/modules/apachesolr
svn checkout -r22 http://solr-php-client.googlecode.com/svn/trunk/ SolrPhpClient
To prepare solr for use with Drupal we must copy over schema.xml and solrconfig.xml which comes with the solr Drupal module.
cp schema.xml solrconfig.xml /usr/local/src/apache-solr-1.4.1/example/solr/conf/
drush en apachesolr # The following modules will be enabled: apachesolr, search
Visit http://dev.kendra.org.uk/admin/settings/apachesolr in a browser.
cd /usr/local/src/apache-solr-1.4.1/example
java -jar start.jar
cp /etc/init.d/skeleton /etc/init.d/solr
vim /etc/init.d/solr # see attached configuration script
chmod 700 /etc/init.d/solr
rcconf # look for the new solr service, select to start at boot time
The Kendra Signpost Trial includes a content type for uploading CSV data. An example file for testing is included with the source code in the examples folder.
After installation the front page prompts you to do the following:
- Configure the endpoint - this is done if you are using the default Kendra RDF repository.
- Upload a catalogue - click the link and upload thirdear.csv from the examples folder.
- Run CRON - RDF imports are queued to run on cron for better performance
- Browse to the catalogue page to see the result of a basic SPARQL query against the data.
Changes to Drupal features should be exported to code using the features module and committed back to github. (t.b.c. - insert commands for doing so).
- kendra_rdf = includes the commands to connect to the repository and send queries
- kendra_uploads = includes the feature (cck + views), and some nodeapi stuff to process the uploaded files
Other modules/features should be added (for example for the search/query builder/mapping tool). These should be placed in separate modules and then activated by the install profile by adding them to _kendra_signpost_features()
as found in kendra_signpost.profile
.
Code: Kendra Signpost is licensed under GPL, as per the Drupal licensing terms. Please see LICENSE.txt for more information.
Example metadata: Real World is pleased to be part of the Kendra Trials and has supplied metadata as part of these trials. The Real World Catalogue metadata is copyright Real World Records Ltd and the non exclusive license for its use during these trials is limited to experimental use within the Kendra environment with all other rights reserved to Real World Records Ltd. By accessing this data you accept those terms.