Skip to content

Conversation

@AntonOresten
Copy link
Contributor

@AntonOresten AntonOresten commented Jan 14, 2026

Uses DLFP8Types as it has the exact types needed.

It could be nice to in the future support multiple narrow precision type implementations (see Reactant/DLFP8TypesExt.jl), like Float8s.jl and Microfloats.jl (which currently has a MX_E4M3 equivalent to Float8_E4M3FN), but there's no immediate need since Float8s.jl doesn't have Float8_5, nor a type for finite-only, single NaN encoding.

julia> @be CUDA.@sync run(data8; tm=128, tn=128, tk=128)
Benchmark: 380 samples with 1 evaluation
 min    247.029 μs (442 allocs: 10.141 KiB)
 median 252.549 μs (442 allocs: 10.141 KiB)
 mean   252.715 μs (442 allocs: 10.141 KiB)
 max    378.948 μs (442 allocs: 10.141 KiB)

julia> run_others(data16, nruns=5, warmup=1)
Dict{String, Vector{Float64}} with 1 entry:
  "cuBLAS" => [0.42032, 0.408192, 0.409696, 0.409504, 0.411424]

I can resolve any potential merge conflicts caused by #34.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant