There seem to be quite large overheads connected with oomph memory allocation. Ideally, performance should be the same in those two scenarios:
- halo exchange with static, pre-allocated messages
- halo exchange with messages allocated each time and moved into the oomph send/recv routines
This seems not to be the case now: ghexbench has much lower performance in the second case. We should debug and improve.