-
Notifications
You must be signed in to change notification settings - Fork 196
Working with JSON LD
- Download and install jq from http://stedolan.github.io/jq/
- To convert JSON Array to one json object per line from command line
jq ".[]" -c <filepath> > <outputfilepath>
Sometimes you model multiple sources and produce multiple JSON-LD files, and then you want to merge the JSON-LD files into a single file. There are two cases:
- Reducing: this involves combining top-level JSON-LD objects by URI. The reducer is smart so it first combines objects at the top level, and then proceeds recursively to combine objects at all levels of the tree.
- Joining: Need to provide an example.
To do joins of JSON-LD files you need to set up Hadoop and Hive on your machine, and then you run a script to join your files.
http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
http://www.apache.org/dyn/closer.cgi/hive/
After you download the files, unpack them then copy them to a safe place.
My recommendation on a Mac is that you put them in /usr/local
.
My setup looks as follows:
szeke (2):local szekely> pwd
/usr/local
szeke (2):local szekely> ls -1
apache-hive-1.1.0-bin
hadoop-2.6.0
# and many other files
To make things convenient, I put the following in my ~/.profile
, which makes it convenient to run the hive command later.
# For HIVE
export HADOOP_HOME=/usr/local/hadoop-2.6.0
export HIVE_HOME=/usr/local/apache-hive-1.1.0-bin
export PATH=${HIVE_HOME}/bin:$PATH
export HADOOP_HEAPSIZE=4096
export HADOOP_CLIENT_OPTS=-Xmx4196m
Download hive-join-example.zip and unpack the file, which will create a folder with the following files:
szeke (2):hive-join-example szekely> ls -1
derby.log
lib
merged
metastore_db
scripts
source
target
szeke (2):hive-join-example szekely>
- git clone https://github.com/usc-isi-i2/dig-elasticsearch.git
- Change directory to
types/webpage/scripts
- Type
python loadDataElasticSearch.py -h
. This will provide help for the script as below
usage: loadDataElasticSearch.py [-h] [-hostname HOSTNAME] [-port PORT]
[-mappingFilePath MAPPINGFILEPATH] dataFileType
filepath indexname doctype
positional arguments:
filepath json file to be loaded in ElasticSearch
indexname desired name of the index in ElasticSearch
doctype type of the document to be indexed
dataFileType Specify '0' if every line in the data file is
different json object or '1' otherwise
optional arguments:
-h, --help show this help message and exit
-hostname HOSTNAME Elastic Search Server hostname, defaults to 'localhost'
-port PORT Elastic Search Server port,defaults to 9200
-mappingFilePath MAPPINGFILEPATH mapping/setting file for the index
d. Execute:
python loadDataElasticSearch.py <filepath> <index-name> WebPage
If you don't have Elastic Search please download it from https://www.elastic.co/products/elasticsearch and follow the installation instructions.