Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEATURE: shift the data range #111

Open
aaronspring opened this issue Jun 7, 2022 · 1 comment
Open

FEATURE: shift the data range #111

aaronspring opened this issue Jun 7, 2022 · 1 comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@aaronspring
Copy link
Collaborator

@milankl metioned very important points in milankl/BitInformation.jl#38 (comment) which could/should be implemented in our xbitinfo pipeline:

premisses have to be checked / questions have to be answered before doing the bitinformation+round business:

  1. is the data rather linear or logarithmically distributed?
  2. which binary encoding is most appropriate given 1. (integer/fixed-point/linear quantization vs floats)
  3. Analyse the bitinformation within that appropriate encoding
  4. Bitround in that encoding too (and you'll either have rigid absolute error bounds for linear or relative for log)

Problem is obviously that most people want to use floats regardless of what data they are handling. Sure that makes sense. So for linearly distributed data (where you want absolute errors to have rigid bounds) we have to come up with a workaround to better adhere to 1.-4. while using floats. Let's take your sea surface temperature example and say you have the high precision data in ˚C and definitely want to store the data as ˚C, so what can one do?

  1. shift the data into a range where floats are linear, i.e. all data is within a power of 2. For ˚C you could convert to Kelvin (although I'd advise not to* unless you store in K) but you can also just add 256 (if no neg temperatures) or 512, a power of 2 is a good choice. Now your encoding matches your data distribution.
  2. analyse the bitinformation now. While this gives you mantissa bits, this actually suggests an absolute error and not a relative one for your data as all exponent bits are identical anyway.
  3. now subtract your initial offset again and you'll get more mantissa bits of precision for higher temperatures and fewer for lower temperatures so that your max absolute error is globally constant.
@aaronspring aaronspring added enhancement New feature or request documentation Improvements or additions to documentation labels Jun 7, 2022
@milankl
Copy link
Collaborator

milankl commented Jul 25, 2022

I realised that in 2. analyse the bitinformation... is missing that you should also apply the rounding before subtracting the initial offset again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants