I noticed that division is used instead of bit shifts for getting the quotient/remainder by a power of 2.
util_cluster_entry_to_block does 32bit unsigned division by 128
util_ceil_byte_size_to_block_size does 32bit unsigned division and remainder by 512
I think it would be beneficial to rewrite these functions to use bit shifts as division is a very slow operation to perform.