Skip to content

ISWC2013Paper1BenchmarkDetails

mislusnys edited this page Nov 29, 2013 · 4 revisions

Table of Contents

ISWC 2013 Benchmark Detail Report (Paper 1)

This page presents the test configurations and the detailed results mentioned in the paper for ISWC 2013 entitled: "Efficient SPARQL-to-SQL translation using R2RML Mappings" (PDF) (Tech Report)

There are 5 systems used by the test and reported in the paper:

All the systems were test in the April-May 2013 experiment.

1. Benchmark Machine

The tests were run on the same machine with the following specification:

  • Hardware
   * Processors: 24 units Intel Xeon 6 cores @3.47GHz
   * Memory: 106GB DDR3
   * Hard Disks: 1TB (15,000 rpm) SCSI and 200GB (15,000 rpm) SCSI
  • Software
   * Operating System: Ubuntu 12.04 64-bit, Kernel 3.2.0
      * Filesystem: ext3
      * Separate partitions for application (on 200GB HDD) and databases (on 1TB HDD)
   * Java Version and JVM: Version 1.6.0_27, OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)

2. BSBM Benchmark

The test uses three different dataset sizes: 25 million, 100 million and 200 million triples. The datasets were generated using the BSBM data generator.

Details about the benchmark datasets are summarised in the following table:

BSBM 25 BSBM 100 BSBM 200
Scale Factor 71431 288115 576495
Exact number 25,000,172 100,000,165 200,000,153
File Size Turtle (.ttl) 695MB 2.8GB 5.6GB
File Size SQL* (.csv) 1.3GB 5.2GB 11GB
  • Note: We modified the generated SQL files into comma separated values (CSV) files.

2.1 onTop-MySQL

2.1.1 Configuration

  • MySQL: Version 5.6
  • Configuration file changes:
# CACHES AND LIMITS #
tmp_table_size                 = 32M
max_heap_table_size            = 32M
query_cache_type               = 0
query_cache_size               = 0
max_connections                = 500
thread_cache_size              = 50
open_files_limit               = 65535
table_definition_cache         = 1024
table_open_cache               = 2048

# INNODB #
innodb_flush_method            = O_DIRECT
innodb_log_file_size           = 512M
innodb_flush_log_at_trx_commit = 0
innodb_buffer_pool_size        = 80G

sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES
innodb_thread_concurrency = 0
innodb_read_io_threads = 32
  • Indexes:
* Indexes on primary key
* Indexes on columns given by bsbmtools data generator
  • Other notes:
* After loading the dataset, we performed index generation and did "analyse table" on all tables in the database.
* MySQL Connector Version 5.0.8 was used for JDBC access.

2.1.2 Load Time

The table below summarises the load time of the CSV dump files (in hh:mm:ss). It includes the indexing time.

BSBM 25 BSBM 100 BSBM 200
Total time 00:01:57 00:11:33 00:25:13

2.1.3 Query Performance Results

The summary of the benchmark can be found in the Google Doc public spreadsheet.

2.1.4 Execution Details

Number of clients
1 4 8 16 25 Return Triples
BSBM 25 csv csv csv csv csv csv
BSBM 100 csv csv csv csv csv csv
BSBM 200 csv csv csv csv csv csv

2.2 onTop-DB2

2.2.1 Configuration

  • MySQL: Version 5.6
  • Indexes:
* Indexes on primary key
* Indexes on columns given by bsbmtools data generator
  • Other notes:
* Set buffer pool page size 32K
* Set automatic memory, i.e., DB2 will use any available system memory
* After loading the dataset, we performed index generation and did "RUNSTATS" on all tables in the database.

2.2.2 Load Time

The table below summarises the load time of the CSV dump files (in hh:mm:ss). It includes the indexing time.

BSBM 25 BSBM 100 BSBM 200
00:02:20 00:15:20 00:24:38

2.2.3 Query Performance Results

The summary of the benchmark can be found in the Google Doc public spreadsheet.

2.2.4 Execution Details

Number of clients
1 4 8 16 25 Return Triples
BSBM 25 csv csv csv csv csv csv
BSBM 100 csv csv csv csv csv csv
BSBM 200 csv csv csv csv csv csv

2.3 Stardog

2.3.1 Configuration

  • Stardog Version 1.2
  • Server configuration in stardog.properties:
strict.parsing=false
  • Indexing:
Create less index by adding "--index-triples-only" in the loading command.
  • Other notes:
* Use multiple input files (bulk load)
* Each input file was compressed using gzip.

2.3.2 Load Time

BSBM 25 BSBM 100 BSBM 200
00:02:54 00:11:47 00:22:59

2.3.3 Query Performance Results

The summary of the benchmark can be found in the Google Doc public spreadsheet.

2.3.4 Execution Details

Number of clients
1 4 8 16 25 Return Triples
BSBM 25 csv csv csv csv csv csv
BSBM 100 csv csv csv csv csv csv
BSBM 200 csv csv csv csv csv csv

2.4 OWLIM

2.4.1 Configuration

  • OWLIM-SE Version 5.3.5849
  • Repository configuration template for BSBM 25:
#
# Sesame configuration template for a owlim repository
#
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix rep: <http://www.openrdf.org/config/repository#>.
@prefix sr: <http://www.openrdf.org/config/repository/sail#>.
@prefix sail: <http://www.openrdf.org/config/sail#>.
@prefix owlim: <http://www.ontotext.com/trree/owlim#>.

[] a rep:Repository ;
   rep:repositoryID "BSBM_25" ;
   rdfs:label "BSBM 25 Million Triples" ;
   rep:repositoryImpl [
     rep:repositoryType "openrdf:SailRepository" ;
     sr:sailImpl [
       sail:sailType "owlim:Sail" ; 

       owlim:base-URL "http://www4.wiwiss.fu-berlin.de/ibzer/bsbm/v01/instances/" ;
       owlim:entity-index-size "50000000" ;
       owlim:repository-type "file-repository" ;
       owlim:ruleset "empty" ;

       # OWLIM-SE parameters
       owlim:cache-memory "15G" ; 
       
       # Other OWLIM-SE parameters
       owlim:transaction-mode "fast" ;       
      ]
   ].
BSBM 25 BSBM 100 BSBM 200
owlim-bsbm25.ttl owlim-bsbm100.ttl owlim-bsbm200.ttl

2.4.2 Load Time

The table below summarises the load time of using a single Turtle TTL file (in hh:mm:ss).

BSBM 25 BSBM 100 BSBM 200
00:12:21 00:49:35 01:43:10

2.4.3 Query Performance Results

The summary of the benchmark can be found in the Google Doc public spreadsheet.

2.4.4 Execution Details

Number of clients
1 4 8 16 25 Return Triples
BSBM 25 csv csv csv csv csv csv
BSBM 100 csv csv csv csv csv csv
BSBM 200 csv csv csv csv csv csv

2.5 Virtuoso OpenSource RDF

2.5.1 Configuration

  • Virtuoso OpenSource Version 6.1
  • Changes in the configuration file virtuoso.ini

2.5.2 Load Time

BSBM 25 BSBM 100 BSBM 200
00:16:02 01:13:04 02:21:46

2.5.3 Query Performance Results

The summary of the benchmark can be found in the Google Doc public spreadsheet.

2.5.4 Execution Details

Number of clients
1 4 8 16 25 Return Triples
BSBM 25 csv csv csv csv csv csv
BSBM 100 csv csv csv csv csv csv
BSBM 200 csv csv csv csv csv csv

2.6 Virtuoso OpenSource View

2.6.1 Configuration

  • Virtuoso OpenSource Version 6.1
  • Changes in the configuration file virtuoso.ini

2.6.2 Load Time

Due to limited time, the loading time was not measured

BSBM 25 BSBM 100 BSBM 200
N/A N/A N/A

2.6.3 Query Performance Results

The summary of the benchmark can be found in the Google Doc public spreadsheet.

2.6.4 Execution Details

Number of clients
1 4 8 16 25 Return Triples
BSBM 25 csv csv csv csv csv csv
BSBM 100 csv csv csv csv csv csv
BSBM 200 csv csv csv csv csv csv

3. FishMark Benchmark

3.1 onTop-MySQL

3.1.1 Configuration

  • MySQL: Version 5.6
  • Configuration file changes:
# CACHES AND LIMITS #
tmp_table_size                 = 32M
max_heap_table_size            = 32M
query_cache_type               = 0
query_cache_size               = 0
max_connections                = 500
thread_cache_size              = 50
open_files_limit               = 65535
table_definition_cache         = 1024
table_open_cache               = 2048

# INNODB #
innodb_flush_method            = O_DIRECT
innodb_log_file_size           = 512M
innodb_flush_log_at_trx_commit = 0
innodb_buffer_pool_size        = 80G

sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES
innodb_thread_concurrency = 0
innodb_read_io_threads = 32
  • Indexes:
* Indexes on primary key
* comnames (SpecCode)
* comnames (Script)
* comnames (ComNamesRefNo)
* Species (FBname)
* Species (FamCode)
* Species (Subfamily)
* Species (DemersPelag)
* Species (Aquarium)
* countref (PAESE)
* countref (Note)
* refrens (Author)
* refrens (Year)
* refrens (FirstAuthor)
* refrens (Keywords)
* morphdat (autoctr)
* morphdat (Speccode)
* picturesmain (SpecCode)
* picturesmain (C_Code)
* families (Order)
* families (Ordnum)
* country (autoctr)
* country (SpecCode)
  • Other notes:
* After loading the dataset, we performed index generation and did "analyse table" on all tables in the database.
* MySQL Connector Version 5.0.8 was used for JDBC access.

3.1.2 Load Time

N/A

3.1.3 Execution Details

Number of clients
1 4 8 16 25 Return Triples
csv csv csv csv csv csv

3.2 Stardog

3.2.1 Configuration

  • Stardog Version 1.2
  • Server configuration in stardog.properties:
strict.parsing=false
  • Indexing:
Create less index by adding "--index-triples-only" in the loading command.
  • Other notes:

3.2.2 Load Time

The load time of a single N-TRIPLES dump files (in hh:mm:ss): 00:02:34

3.2.3 Execution Details

Number of clients
1 4 8 16 25 Return Triples
csv csv csv csv csv csv

3.3 OWLIM

3.3.1 Configuration

  • OWLIM-SE Version 5.3.5849
  • Repository configuration template (i.e., owlim-fish.ttl):
#
# Sesame configuration template for a owlim repository
#
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix rep: <http://www.openrdf.org/config/repository#>.
@prefix sr: <http://www.openrdf.org/config/repository/sail#>.
@prefix sail: <http://www.openrdf.org/config/sail#>.
@prefix owlim: <http://www.ontotext.com/trree/owlim#>.

[] a rep:Repository ;
   rep:repositoryID "fish" ;
   rdfs:label "Fishmark" ;
   rep:repositoryImpl [
     rep:repositoryType "openrdf:SailRepository" ;
     sr:sailImpl [
       sail:sailType "owlim:Sail" ;

       owlim:base-URL "http://fishdelish.cs.man.ac.uk/rdf/vocab/resource/" ;
#       owlim:defaultNS "hhttp://www4.wiwiss.fu-berlin.de/ibzer/bsbm/v01/instances/" ;
       owlim:entity-index-size "50000000" ;
       owlim:repository-type "file-repository" ;
       owlim:ruleset "empty" ;
#       owlim:storage-folder "storage" ;

       # OWLIM-SE parameters
       owlim:cache-memory "15G" ;

       # Other OWLIM-SE parameters
       owlim:transaction-mode "fast" ;
      ]
   ].

3.3.2 Load Time

The load time of a single N-TRIPLES dump files (in hh:mm:ss): 00:06:32

3.3.3 Execution Details

Number of clients
1 4 8 16 25 Return Triples
csv csv csv csv csv csv

Note: OWLIM produced more results for Query 2 compare to the other systems.

3.4 Virtuoso OpenSource RDF

3.4.1 Configuration

  • Virtuoso OpenSource Version 6.1
  • Changes in the configuration file virtuoso.ini

3.4.2 Load Time

The load time of a single N-TRIPLES dump files (in hh:mm:ss): 00:12:21

3.4.3 Execution Details

Number of clients
1 4 8 16 25 Return Triples
csv csv csv csv csv csv
Clone this wiki locally