GPU idle between iteration #5

yhgon · 2019-05-10T02:03:12Z

Awesome work.

When I check the GPU utilization during training, I found a lot of GPU idle time between each iteration.

Did you see similar behavior? I'm using NFS storage. so one possible reason would be load and feed mel would be the bottleneck.

Text2Mel 64batch on V100 16GB ( pytorch 1.0)

nvidia-smi -l 1 | grep MiB
| N/A   44C    P0    72W / 300W |   5484MiB / 16130MiB |      0%      Default |
| N/A   45C    P0   162W / 300W |   5484MiB / 16130MiB |     31%      Default |
| N/A   44C    P0    72W / 300W |   5484MiB / 16130MiB |      0%      Default |
| N/A   46C    P0   209W / 300W |   5484MiB / 16130MiB |     95%      Default |
| N/A   44C    P0    72W / 300W |   5484MiB / 16130MiB |      0%      Default |
| N/A   46C    P0   208W / 300W |   5484MiB / 16130MiB |     24%      Default |
| N/A   44C    P0    72W / 300W |   5484MiB / 16130MiB |      0%      Default |
| N/A   45C    P0    77W / 300W |   5484MiB / 16130MiB |     94%      Default |
| N/A   44C    P0    72W / 300W |   5484MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    71W / 300W |   5484MiB / 16130MiB |      0%      Default |

SSRN log with 32batch on V100 16GB ( pytorch 1.0)

nvidia-smi -l 1 | grep MiB
| N/A   43C    P0    60W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   47C    P0   207W / 300W |  10386MiB / 16130MiB |     29%      Default |
| N/A   44C    P0    69W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    69W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   49C    P0   263W / 300W |  10386MiB / 16130MiB |    100%      Default |
| N/A   44C    P0    68W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    68W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    68W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   47C    P0    65W / 300W |  10386MiB / 16130MiB |    100%      Default |
| N/A   44C    P0    63W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   47C    P0   283W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    60W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    60W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   45C    P0    70W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    69W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    70W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    70W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   45C    P0    68W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    68W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    68W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   45C    P0    70W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    69W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    70W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   45C    P0    63W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    63W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   46C    P0    61W / 300W |  10386MiB / 16130MiB |     34%      Default |
| N/A   44C    P0    60W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   43C    P0    60W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   45C    P0    67W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    66W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   46C    P0    62W / 300W |  10386MiB / 16130MiB |     28%      Default |
| N/A   44C    P0    62W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    61W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   45C    P0    69W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    70W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   50C    P0   179W / 300W |  10386MiB / 16130MiB |    100%      Default |
| N/A   44C    P0    62W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   45C    P0   227W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   45C    P0    70W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    70W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   50C    P0   293W / 300W |  10386MiB / 16130MiB |    100%      Default |
| N/A   44C    P0    67W / 300W |  10386MiB / 16130MiB |      0%      Default |

The text was updated successfully, but these errors were encountered:

chaiyujin · 2019-05-10T08:53:24Z

Yes, this version is not good. I don't have time to rewrite load_data with DataLoader provided by PyTorch. With DataLoader, we can set num_workers, which helps to prepare data in multi-workers.
You can have a try with DataLoader.

frytry · 2020-02-22T10:50:32Z

whats kind of error?
File "main.py", line 62, in <module> main() File "main.py", line 47, in main train(args.module, args.load) File "D:\deepFake_audio\advance\dctts-pytorch-master\pkg\train.py", line 56, in train train_superres(load_trained) File "D:\deepFake_audio\advance\dctts-pytorch-master\pkg\train.py", line 295, in train_superres plot_spectrum(mag_pred[0].cpu().data, "pred", gs, dir=logdir) File "D:\deepFake_audio\advance\dctts-pytorch-master\pkg\utils.py", line 110, in plot_spectrum im = ax.imshow(np.flip(spectrum, 0), cmap="jet", aspect=0.2 * spectrum.shape[1] / spectrum.shape[0]) File "<__array_function__ internals>", line 6, in flip File "C:\Users\N451M\Miniconda3\lib\site-packages\numpy\lib\function_base.py", line 254, in flip return m[indexer] ValueError: negative step not yet supported

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU idle between iteration #5

GPU idle between iteration #5

yhgon commented May 10, 2019

chaiyujin commented May 10, 2019

frytry commented Feb 22, 2020

GPU idle between iteration #5

GPU idle between iteration #5

Comments

yhgon commented May 10, 2019

chaiyujin commented May 10, 2019

frytry commented Feb 22, 2020