I tried running workload in a machine with multi-gpu, and only saw a single GPU being used.
The current docs mention that only a single GPU is used if used through the CLI, but looking in the crates I didn't find any other option either (but maybe missed something).