Investigating 191520-2024 for Performance Optimization #61
Replies: 3 comments 1 reply
-
Thanks for reporting. This notice took ~7 mins with RMLMapper 6.2.2 (Java 17) on my Core i7-8565U laptop, which is six generations behind current hardware technology. I took the liberty to glean some statistics from our current set of internal test data staging ground (where not all end up in this project so it is a superset). It appears the median is 22KB and average 54KB. In terms of distribution, 0-255KB take majority share, nearly 98%. find -type f -printf '%s %p\n' | numfmt --to=iec | sort -h
This implies > 255KB can be considered relatively large for our purposes, and this notice is an outlier (aside from the *100_lots ones which are ~1.8MB). While the remote call for hashing a URI part is an inefficient network operation, it usually only affects the runtime performance on the first run for the same set of rules, as something (likely RMLMapper) appears to be doing some sort of caching, which is good. The performance impact seen here is introduced solely by the local transformation. Therefore, we are bottlenecked by tooling (RMLMapper) and can do very little about performance in this regard. |
Beta Was this translation helpful? Give feedback.
-
Thank you very much for the numbers! This information is valuable for configuring the pipeline to handle such outliers. On my machine, CPU usage was low, but the memory was not enough. I believe a pipeline could chop down the XML in chunks with less number of lots. I'll move this issue to discussions |
Beta Was this translation helpful? Give feedback.
-
I'll leave here a link to a benchmark of different RML engines might be relevant |
Beta Was this translation helpful? Give feedback.
-
The time required for transformations is generally acceptable. However, there are occasional instances where some transformations may time out.
An example is: 191520-2024
mappings/package_cn_v1.6
which times out in my machineThis particular case provides an opportunity to investigate and identify any performance bottlenecks.
Beta Was this translation helpful? Give feedback.
All reactions