-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Working on making the conversion to hdt faster #578
base: dev
Are you sure you want to change the base?
Working on making the conversion to hdt faster #578
Conversation
It seems interesting, I wrote this part years ago when I was still an intern so it'll probably be easy to find even more to patch. But did you compute the time to parse the rdf file compared to the indexing time? In my memories it was a small part |
private final ExceptionIterator<T, E> in1; | ||
private final ExceptionIterator<T, E> in2; | ||
private final Comparator<T> comp; | ||
private final int chunkSize = 1024 * 4; | ||
private final Executor executor = Executors.newVirtualThreadPerTaskExecutor(); // Could | ||
// be | ||
// a | ||
// ForkJoinPool.commonPool(), | ||
// or | ||
// a | ||
// custom | ||
// pool | ||
|
||
private final Deque<T> chunk1 = new ArrayDeque<>(); | ||
private final Deque<T> chunk2 = new ArrayDeque<>(); | ||
|
||
// Local buffer to store merged chunks | ||
private final Deque<T> buffer = new ArrayDeque<>(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been experimenting with having the MergeExceptionIterator be parallel. The code reads up to chunkSize (4096) from both child iterators concurrently and stores them in the two variables chunk1 and chunk2.
For some reason I'm getting an exception when reading from the two child iterators, even when I don't read concurrently.
[KWayMerger#2Worker#0] .. [# ] 10.00 reading triples part2 175300000
[KWayMerger#2Worker#1] .. [# ] 10.00 reading triples part2 178200000
[KWayMerger#2Worker#10] . [# ] 10.00 reading triples part2 177800000
[KWayMerger#2Worker#11] . [# ] 10.00 reading triples part2 178100000
[KWayMerger#2Worker#12] . [# ] 10.00 reading triples part2 178400000
[KWayMerger#2Worker#13] . [# ] 10.00 reading triples part2 177200000
[KWayMerger#2Worker#14] . [# ] 10.00 reading triples part2 178300000
[KWayMerger#2Worker#2] .. [# ] 10.00 reading triples part2 175200000
[KWayMerger#2Worker#3] .. [# ] 10.00 reading triples part2 176700000
[KWayMerger#2Worker#4] .. [# ] 10.00 reading triples part2 178000000
[KWayMerger#2Worker#5] .. [# ] 10.00 reading triples part2 177300000
[KWayMerger#2Worker#6] .. [# ] 10.00 reading triples part2 175100000
[KWayMerger#2Worker#7] .. [# ] 10.00 reading triples part2 177100000
[KWayMerger#2Worker#8] .. [# ] 10.00 reading triples part2 174900000
[KWayMerger#2Worker#9] .. [# ] 10.00 reading triples part2 177900000
[main] .................. [#### ] 40.00 Create mapped and sort triple file
Exception in thread "main" com.the_qa_company.qendpoint.core.exceptions.ParserException: com.the_qa_company.qendpoint.core.util.concurrent.KWayMerger$KWayMergerException: java.io.IOException: Triple got null node, but not all the nodes are 0! 2 0 17
at com.the_qa_company.qendpoint.core.hdt.impl.HDTDiskImporter.compressTriples(HDTDiskImporter.java:253)
at com.the_qa_company.qendpoint.core.hdt.impl.HDTDiskImporter.runAllSteps(HDTDiskImporter.java:357)
at com.the_qa_company.qendpoint.core.hdt.HDTManagerImpl.doGenerateHDTDisk0(HDTManagerImpl.java:475)
at com.the_qa_company.qendpoint.core.hdt.HDTManagerImpl.doGenerateHDTDisk(HDTManagerImpl.java:436)
at com.the_qa_company.qendpoint.core.hdt.HDTManagerImpl.doGenerateHDTDisk(HDTManagerImpl.java:421)
at com.the_qa_company.qendpoint.core.hdt.HDTManager.generateHDTDisk(HDTManager.java:818)
at com.the_qa_company.qendpoint.core.tools.RDF2HDT.execute(RDF2HDT.java:205)
at com.the_qa_company.qendpoint.core.tools.RDF2HDT.main(RDF2HDT.java:326)
Caused by: com.the_qa_company.qendpoint.core.util.concurrent.KWayMerger$KWayMergerException: java.io.IOException: Triple got null node, but not all the nodes are 0! 2 0 17
at com.the_qa_company.qendpoint.core.util.io.compress.MapCompressTripleMerger.mergeChunks(MapCompressTripleMerger.java:232)
at com.the_qa_company.qendpoint.core.util.concurrent.KWayMerger$MergeTask.run(KWayMerger.java:220)
at com.the_qa_company.qendpoint.core.util.concurrent.KWayMerger$Worker.runException(KWayMerger.java:285)
at com.the_qa_company.qendpoint.core.util.concurrent.ExceptionThread.run(ExceptionThread.java:125)
Caused by: java.io.IOException: Triple got null node, but not all the nodes are 0! 2 0 17
at com.the_qa_company.qendpoint.core.util.io.compress.CompressTripleReader.setAllOrEnd(CompressTripleReader.java:76)
at com.the_qa_company.qendpoint.core.util.io.compress.CompressTripleReader.hasNext(CompressTripleReader.java:64)
at com.the_qa_company.qendpoint.core.iterator.utils.MergeExceptionIterator.fillBuffer(MergeExceptionIterator.java:222)
at com.the_qa_company.qendpoint.core.iterator.utils.MergeExceptionIterator.hasNext(MergeExceptionIterator.java:185)
at com.the_qa_company.qendpoint.core.iterator.utils.MergeExceptionIterator.fillBuffer(MergeExceptionIterator.java:222)
at com.the_qa_company.qendpoint.core.iterator.utils.MergeExceptionIterator.hasNext(MergeExceptionIterator.java:185)
at com.the_qa_company.qendpoint.core.iterator.utils.MergeExceptionIterator.fillBuffer(MergeExceptionIterator.java:222)
at com.the_qa_company.qendpoint.core.iterator.utils.MergeExceptionIterator.hasNext(MergeExceptionIterator.java:185)
at com.the_qa_company.qendpoint.core.iterator.utils.MergeExceptionIterator.fillBuffer(MergeExceptionIterator.java:222)
at com.the_qa_company.qendpoint.core.iterator.utils.MergeExceptionIterator.hasNext(MergeExceptionIterator.java:185)
at com.the_qa_company.qendpoint.core.util.io.compress.MapCompressTripleMerger.mergeChunks(MapCompressTripleMerger.java:221)
... 3 more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you have any insight @ate47 about why I'm getting this exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind. I figured it out. My approach is wrong, can't batch the sorting like how I first thought.
292e0df
to
6c534fe
Compare
This is all work in progress.