Add `half::f16` and `half::bf16` support #1551

swfsql · 2025-11-27T16:26:39Z

Closes Add f16 support #1550
This is a minor implementation, given that is easy to add new float types to ndarray.
In best-effort, this adds tests/benches whenever both f32 and f64 were tested/benchmarked.
Adds a feature mention on Readme.

nilgoyette · 2025-12-01T14:43:53Z

benches/bench1.rs

+    (m127, 127, 127, 127) // ~128x slower than f32
+    (mix16x4, 32, 4, 32)
+    (mix32x2, 32, 2, 32)
+    // (mix10000, 128, 10000, 128) // too slow


You mention that f16 is slower in the issue and several times here. Is it faster on some operations? If not, why do you (and others) want to use it? Only to save space?

Yes, on my arch (x86_64) it is indeed quite slow. They mention on their docs that aarch64 has support for all operations (for half::f16), and it is possibly viable in performance for that arch -- but I haven't tested it. They also have specialized methods for storing/loading the half::{f16, bf16} data to/from other types (u16, f32, f64) which could improve performance also, but I didn't leverage those operations when including the types to ndarray (I don't really know if/how they could be leveraged).

Albeit (at least for me) it is a bit disappointing that it is slow, I find it useful for debugging fp16 models (for machine learning), given that some architectures behave poorly on fp16 and it is easier to debug them if everything is running on the cpu.
For wasm targets, some builds may not enable cpu features, and thus the performance of f32 and half::f16 should be closer -- so in that case I believe the memory savings could be meaningful, but I believe that is a niche target.

I don't know if proper simd support is possible for fp16, it appears that some work towards this is active (for the primitive f16).

With that being said, I still solicit for half::{f16, bf16} support on ndarray. That makes development and debugging smoother, even if fp16 doesn't have proper simd support -- given that the crux of the training happens on the gpus. In the future it is possible to both have simd improvements from the underlying half, or from a new addition or replacement into a primitive f16 type.
I also understand that ndarray takes performance in high regard, thus possibly opting for a delay or a non-inclusion of the fp16 types.

swfsql added 3 commits November 27, 2025 13:16

add half::16 support

bc68d2f

adjustments for half

edcc07b

add feature doc/tests/benches for half

f48b45b

swfsql marked this pull request as ready for review November 28, 2025 06:23

nilgoyette reviewed Dec 1, 2025

View reviewed changes

fmt

7ae289f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add `half::f16` and `half::bf16` support #1551

Add `half::f16` and `half::bf16` support #1551

Uh oh!

swfsql commented Nov 27, 2025 •

edited

Loading

Uh oh!

nilgoyette Dec 1, 2025

Uh oh!

swfsql Dec 1, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add half::f16 and half::bf16 support #1551

Are you sure you want to change the base?

Add half::f16 and half::bf16 support #1551

Uh oh!

Conversation

swfsql commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nilgoyette Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

swfsql Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add `half::f16` and `half::bf16` support #1551

Add `half::f16` and `half::bf16` support #1551

swfsql commented Nov 27, 2025 •

edited

Loading

swfsql Dec 1, 2025 •

edited

Loading