-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot use my own datasets with ALP Benchmark #8
Comments
Thanks for your question! Could you please provide a CSV file containing a portion of your dataset or give more details about the error you're encountering? I'd be happy to take a closer look. Also, to clarify, having null, NaN, inf, or -inf values should not cause any issues in ALP, as these are treated as exceptions by ALP. |
Thanks for the prompt response and sorry for not elaborating on the issue!
I assume my datasets are failing the first sampling step and the empty array is returning for exponent and factor pairs. |
No worries at all! That would be ideal if you could replicate the issue with an open dataset. I'll be happy to look into it further once you have the dataset ready. Looking forward to hearing back from you! |
Thanks! You can find the sample csv dataset. I converted it to the text format to be able to attach it here. In the experiments, I am feeding it as binary file. So far, I am running the benchmark for compression ratio, thus I am not specifying other parameters like exponent, factor and exceptions amount the data/include/double_columns.hpp file. Perhaps I have to compute them to be able to do it. |
Thank you for providing the dataset. I have submitted a PR (#9) that includes your dataset in our testing framework. So, there is no issue with ALP. ALP consists of two schemes,
|
Thanks a lot for your significant effort! I can see the rest of my datasets are also suited for ALP_RD.
|
Thanks for your questions!
Feel free to reach out if you have more questions! |
Thanks a lot for your detailed response and hope to see |
Note that if your doubles (or floats) have deep precision but you only care about limited precision, one trick is to cast the doubles to integer and then back.For instance, if you consider three digits of precision enough and your floats are between 0.0 and 1.0; you could cast to integer and then back to double, eg 0.001 * (int) (value*1000). The resulting doubles will compress to 10bits per value (6.4x) in ALP. Op 12 sep. 2024 om 16:36 heeft Abduvoris Abduvakhobov ***@***.***> het volgende geschreven:
Thanks a lot for your detailed response and hope to see ALP for floats soon.
Wish you best of luck in the future!
Cheers.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Hi, thanks for the idea! We also had the idea of trimming the unnecessary decimals for error-bounded lossy compression and compress using the latest lightweight encoding algorithms. ALP seems to fit this scenario very well. |
Hi, we have added all the primitives for the float implementation of ALP. Please check here. Moreover, you can use the FastLanes double encoder to encode the double data type with ALP. FastLanes will handle the low-level details of ALP and provide a better API. Please check an example of the FastLanes double encoder here. |
Hi, this is very helpful indeed. I am closing the issue and many thanks for your help! |
Hi,
I am trying to use the existing benchmark in ALP repository with my own datasets. I followed the documentation to plug in my datasets and all the methods are working, but ALP. The new datasets were preprocessed using the same method as in the datasets_transformer.ipynb. Also I made sure that my data doesn't have null, nan, inf, -inf values. The size of the data is also larger than some existing data in the benchmark.
Thank you!
The text was updated successfully, but these errors were encountered: