-
Notifications
You must be signed in to change notification settings - Fork 2
Description
It might be worth looking into how much time is spent transferring data to and from the GPU. In general sending 1 larger chunk to the GPU is preferable to sending 10-100 smaller chunks. I would just make sure that this isn't the case.
This line could be problematic as it copies data to the GPU memory from the CPU memory. I'm not sure exactly how long each call takes but for a 256x256 spatial position datset then there are 2562562=131072 transfers back and forth from the GPU. It probably depends if each of these transfers is 1 seconds, 0.1 seconds or 0.01 seconds. I'd hope it is more the last one but it is good to check.
An easy way to get this if just to time the difference in running predict_sequence for an image on the GPU already and one where the data has to transfer to/from the GPU.
I know that we had this problem in pyxem when using the GPU as things were slowing down from transfer times. The solution is to use something like dask and the map blocks function to transfer mutiple images to the GPU at one time. Maybe you are already doing this and I just didn't see it.
This might be relevant:
The other thing you could look into is that applying your model in a for-loop to a small part of your data is probably very impact on your processing time. If you think of a GPU as a bunch of small CPU's then you are having most of your GPU sit idle while you process the small patch.
You should look into this. Basically this uses numba and jit to accelerate batches of images. So you should just be able to apply it to multiple patches much more quickly than using a for loop.
We can talk about this more in person if you would like. The second suggestion is probably much easier to implement and has potentially much larger gains so I would start there.