NSNumber: Rewrite hashing method #535

hmelder · 2025-09-04T10:40:33Z

-[NSNumber hash] currently converts the number to a double value and casts it to an NSUInteger. The problem with this approach is that fractional numbers with the same integral part, but different fractional part are treated as being equal.

I looked into using a proper, still non-cryptographic hash function, such as the ones proposed here (https://nullprogram.com/blog/2018/07/31/), but this is overkill for the hash method.

Apple's implementation uses the magic 0x9e3779b1 and does an integer multiplication with the numbers unsigned integer value. 0x9e3779b1 is the closest prime to 0x9e3779b9 which is the integral part of the Golden Ratio's fractional part. https://lkml.org/lkml/2016/4/29/838 features a great discussion to why this is not a great idea.

If there is a need (I still do not understand the purpose of -[NSNumber hash]) for a hash function with good scattering properties, I'd recommend we instead use SplitMix64.

rfm

Even if not optimal, this looks like a clear improvement.

My understanding of the purpose of the hash method is that it's primarily to support the use of the object as a dictionary/map key, which is why a best performance is if the has values are evenly scattered.

In my experience, we use NSString objects as dictionary keys probably a hundred times more often than we use NSNumber objects, so if we want to optimise hash functions, NSString is a better candidate, but having an efficientd hash for NSNumber is still a good thing.

hmelder · 2025-09-07T20:49:09Z

My understanding of the purpose of the hash method is that it's primarily to support the use of the object as a dictionary/map key, which is why a best performance is if the has values are evenly scattered.

If the result of the hash method is really used as a key to a dictionary, then this is probably not the right implementation. I agree that this is better than the old one, at least it is now not broken anymore, but it still does not fullfill the basic properties of a non-cryptographic hash function.

hmelder · 2025-09-07T20:50:07Z

Are you sure that the result of -hash is directly used? If that is the case, then using NSNumber objects as keys into a dictionary is completely broken.

rfm

Actually, on re-reading this, I no longer think this is an improvement on the original version.
Originally I was simply thinking about the ability of the returned hash to be used to evenly distribute keys among hash buckets to provide optimal performance of hash table (dictionary/map) lookups: a dictionary consists of an array of 'buckets' (efficiently selected by a hash % number-of-buckets calculation) with each bucket containing a linked list of nodes ... an even hash distribution minimises the length of the linked lists, so performance drops if we have an uneven hash.
However, that's not the only consideration ... if we hash two keys with the same value and get two different different hash values then those two keys can select different buckets so that a value stored in the dictionary with one key cannot be retrieved using the other. My concern is that the latest code uses the raw data of float and double numbers rather than the actual numeric value of the key, and I suspect that means that if we have two keys with the same numeric value they will give different hashes depending on whether they are stored as int/float/double. That really breaks the -hash method, by making the map/dictionary fail (rather than just being inefficient/slow).

hmelder · 2025-09-08T08:35:43Z

My concern is that the latest code uses the raw data of float and double numbers rather than the actual numeric value of the key, and I suspect that means that if we have two keys with the same numeric value they will give different hashes depending on whether they are stored as int/float/double

Are you concerned about floating point values that are within machine epsilon distance and thus needs to be treated as being equivalent?

rfm · 2025-09-08T08:57:44Z

I'm not really familiar with the underlying representations, but my worry is that the binary representations of the same number (say 42) differ depending on the type of the number.
So if 42 is stored as a double, the statement 'return c.u;' would return a different value from if 42 is stored as a double, and both would be different to 42 stored as an int.
Any hash code for NSNumber ideally should ensure that it returns the same hash for the same numeric value, irrespective of the binary representation in use to store that numeric value.
Now, we could argue that if you store values using different types then they should/do have different hashes, but I think that breaks expectations: If two numbers are equal then their hashes must be the same.
[[NSNumber numberWithFloat: 42] hash] == [[NSNumber numberWithInt: 42] hash]
This should apply for all cases where there is no loss of information converting between types (clearly not all numbers can be represented in all numeric types).

rfm · 2025-09-08T09:02:47Z

In short, for two NSNumbers A and B, the implementation needs to be such that
[A isEqual: B] => [A hash] == [B hash]

hmelder · 2025-09-08T10:16:38Z

I will investigate the effect on scattering when integer and fractional part are hashed seperately and then added togetter. This avoids working with the raw floating point represenation.

rfm · 2025-09-08T10:24:23Z

That sounds like a good approach.

hmelder added 2 commits September 4, 2025 12:28

NSNumber: Add hashing tests

c2dc623

NSNumber: Rewrite hashing method

3d357b9

hmelder requested a review from rfm as a code owner September 4, 2025 10:40

rfm approved these changes Sep 5, 2025

View reviewed changes

rfm requested changes Sep 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

NSNumber: Rewrite hashing method #535

NSNumber: Rewrite hashing method #535

Uh oh!

hmelder commented Sep 4, 2025

Uh oh!

rfm left a comment

Uh oh!

hmelder commented Sep 7, 2025

Uh oh!

hmelder commented Sep 7, 2025 •

edited

Loading

Uh oh!

rfm left a comment

Uh oh!

hmelder commented Sep 8, 2025

Uh oh!

rfm commented Sep 8, 2025

Uh oh!

rfm commented Sep 8, 2025

Uh oh!

hmelder commented Sep 8, 2025

Uh oh!

rfm commented Sep 8, 2025

Uh oh!

Uh oh!

NSNumber: Rewrite hashing method #535

Are you sure you want to change the base?

NSNumber: Rewrite hashing method #535

Uh oh!

Conversation

hmelder commented Sep 4, 2025

Uh oh!

rfm left a comment

Choose a reason for hiding this comment

Uh oh!

hmelder commented Sep 7, 2025

Uh oh!

hmelder commented Sep 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rfm left a comment

Choose a reason for hiding this comment

Uh oh!

hmelder commented Sep 8, 2025

Uh oh!

rfm commented Sep 8, 2025

Uh oh!

rfm commented Sep 8, 2025

Uh oh!

hmelder commented Sep 8, 2025

Uh oh!

rfm commented Sep 8, 2025

Uh oh!

Uh oh!

hmelder commented Sep 7, 2025 •

edited

Loading