In this project am trying to build out-of-core analytics command-line application and gradually improve it to build web-based application.
Goals
- Convert CSV file to Parquet file(https://findingcoefficients.blogspot.com/2021/03/convert-csv-file-parquet-file.html)
- Use Umap for memory mapping at userspace.
- Use thrust vector_host for in-memory data structue.
- Use thrust parallel algorithms to do analytics.
Development Stack
- Apache arrow c++(https://github.com/apache/arrow).
- Apache parquet-cpp(https://github.com/apache/parquet-cpp).
- LLNL/umap(https://github.com/LLNL/umap).
- NVIDIA/thrust(https://github.com/NVIDIA/thrust).