-
Notifications
You must be signed in to change notification settings - Fork 289
NSNumber: Rewrite hashing method #535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if not optimal, this looks like a clear improvement.
My understanding of the purpose of the hash method is that it's primarily to support the use of the object as a dictionary/map key, which is why a best performance is if the has values are evenly scattered.
In my experience, we use NSString objects as dictionary keys probably a hundred times more often than we use NSNumber objects, so if we want to optimise hash functions, NSString is a better candidate, but having an efficientd hash for NSNumber is still a good thing.
If the result of the hash method is really used as a key to a dictionary, then this is probably not the right implementation. I agree that this is better than the old one, at least it is now not broken anymore, but it still does not fullfill the basic properties of a non-cryptographic hash function. |
Are you sure that the result of -hash is directly used? If that is the case, then using NSNumber objects as keys into a dictionary is completely broken. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, on re-reading this, I no longer think this is an improvement on the original version.
Originally I was simply thinking about the ability of the returned hash to be used to evenly distribute keys among hash buckets to provide optimal performance of hash table (dictionary/map) lookups: a dictionary consists of an array of 'buckets' (efficiently selected by a hash % number-of-buckets calculation) with each bucket containing a linked list of nodes ... an even hash distribution minimises the length of the linked lists, so performance drops if we have an uneven hash.
However, that's not the only consideration ... if we hash two keys with the same value and get two different different hash values then those two keys can select different buckets so that a value stored in the dictionary with one key cannot be retrieved using the other. My concern is that the latest code uses the raw data of float and double numbers rather than the actual numeric value of the key, and I suspect that means that if we have two keys with the same numeric value they will give different hashes depending on whether they are stored as int/float/double. That really breaks the -hash method, by making the map/dictionary fail (rather than just being inefficient/slow).
Are you concerned about floating point values that are within machine epsilon distance and thus needs to be treated as being equivalent? |
I'm not really familiar with the underlying representations, but my worry is that the binary representations of the same number (say 42) differ depending on the type of the number. |
In short, for two NSNumbers A and B, the implementation needs to be such that |
I will investigate the effect on scattering when integer and fractional part are hashed seperately and then added togetter. This avoids working with the raw floating point represenation. |
That sounds like a good approach. |
-[NSNumber hash]
currently converts the number to a double value and casts it to an NSUInteger. The problem with this approach is that fractional numbers with the same integral part, but different fractional part are treated as being equal.I looked into using a proper, still non-cryptographic hash function, such as the ones proposed here (https://nullprogram.com/blog/2018/07/31/), but this is overkill for the hash method.
Apple's implementation uses the magic
0x9e3779b1
and does an integer multiplication with the numbers unsigned integer value.0x9e3779b1
is the closest prime to0x9e3779b9
which is the integral part of the Golden Ratio's fractional part. https://lkml.org/lkml/2016/4/29/838 features a great discussion to why this is not a great idea.If there is a need (I still do not understand the purpose of -[NSNumber hash]) for a hash function with good scattering properties, I'd recommend we instead use SplitMix64.