Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Halyard benchmarking -- how to improve? #32

Open
earthquakesan opened this issue Oct 23, 2017 · 5 comments
Open

Halyard benchmarking -- how to improve? #32

earthquakesan opened this issue Oct 23, 2017 · 5 comments

Comments

@earthquakesan
Copy link

earthquakesan commented Oct 23, 2017

Hi Adam @asotona!

I have performed Halyard benchmarking on 1 node setup (i7-3770 3.4GHz, 32GB RAM, normal HDD) --> HDFS + YARN + HBase + Halyard. The querying was done via rdf4j-server SPARQL endpoint. e.g.:

wget -O - "http://halyard/rdf4j-server/repositories/benchmark50?query=select%20%2A%20%7B%3Fs%20%3Fp%20%3Fo%7D%20limit%2010"

I have used FEASIBLE [1] benchmark queries and IGUANA [2]. The configuration for the benchmarking is available in halyard docker repository [3] (iguana-config.tar.bz2).
As you can see from the benchmarking results for the smallest size Halyard could answer only 6 queries, for larger sizes (50 and 100) Halyard answered 0 queries.

From preliminary discussions: it is possible to query Halyard using Java interface and it should improve the performance. Is there any example on how to do that?

[1] http://aksw.org/Projects/FEASIBLE.html
[2] http://aksw.org/Projects/IGUANA.html
[3] https://github.com/earthquakesan/docker-halyard

@earthquakesan
Copy link
Author

upd:

did not add benchmarking results to the github, they are here: https://www.dropbox.com/s/st5sz0hu7eoxj8l/benchmark_results.tar.bz2?dl=0

@asotona
Copy link
Collaborator

asotona commented Oct 24, 2017

I'll take a look at it, there might be many configuration reasons why HBase does not perform well on a single-node cluster. And there might be also reason in Halyard query evaluation and the benchmarking queries.

@peterjohnlawrence
Copy link

I have found better performance when the 'Push' option is not enabled. There are probably issues with some queries (such as path queries) with that option. Have you tested without the Push option enabled?

@asotona
Copy link
Collaborator

asotona commented Jan 8, 2018 via email

@yyz1989
Copy link

yyz1989 commented Jan 22, 2018

@earthquakesan Hi Ivan, have you made it to work in a multi node cluster as well? Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants