You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Current I/O benchmark (spark.readfits.load.count) on 110 GB (first iteration), with 128 MB partitions:
Median task duration (s) [GC]
Comments
no reading/no decoding
0.5 [0.02]
Spark/Hadoop overhead
reading/no decoding
8 [0.04]
Overhead + I/O
reading/decoding
10 [1]
Overhead + I/O + Scala FITSIO
Most of the time is spent in reading from disk (>60%).
This is the time spent in doing f.readFully(). Could be better?
Note that I/O contains also the effect of data locality -- so it is not only reading file from local DataNode, but transferring data as well from remote DataNode.
Decoding is 30% of the total (with large GC time).
The current throughput is around 5-10 MB/sec to load and convert FITS data to DataFrame.
The decoding lib needs to be improved...
The text was updated successfully, but these errors were encountered: