Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU idle between iteration #5

Open
yhgon opened this issue May 10, 2019 · 2 comments
Open

GPU idle between iteration #5

yhgon opened this issue May 10, 2019 · 2 comments

Comments

@yhgon
Copy link

yhgon commented May 10, 2019

Awesome work.

When I check the GPU utilization during training, I found a lot of GPU idle time between each iteration.

Did you see similar behavior? I'm using NFS storage. so one possible reason would be load and feed mel would be the bottleneck.

  • Text2Mel 64batch on V100 16GB ( pytorch 1.0)
nvidia-smi -l 1 | grep MiB
| N/A   44C    P0    72W / 300W |   5484MiB / 16130MiB |      0%      Default |
| N/A   45C    P0   162W / 300W |   5484MiB / 16130MiB |     31%      Default |
| N/A   44C    P0    72W / 300W |   5484MiB / 16130MiB |      0%      Default |
| N/A   46C    P0   209W / 300W |   5484MiB / 16130MiB |     95%      Default |
| N/A   44C    P0    72W / 300W |   5484MiB / 16130MiB |      0%      Default |
| N/A   46C    P0   208W / 300W |   5484MiB / 16130MiB |     24%      Default |
| N/A   44C    P0    72W / 300W |   5484MiB / 16130MiB |      0%      Default |
| N/A   45C    P0    77W / 300W |   5484MiB / 16130MiB |     94%      Default |
| N/A   44C    P0    72W / 300W |   5484MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    71W / 300W |   5484MiB / 16130MiB |      0%      Default |


  • SSRN log with 32batch on V100 16GB ( pytorch 1.0)
nvidia-smi -l 1 | grep MiB
| N/A   43C    P0    60W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   47C    P0   207W / 300W |  10386MiB / 16130MiB |     29%      Default |
| N/A   44C    P0    69W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    69W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   49C    P0   263W / 300W |  10386MiB / 16130MiB |    100%      Default |
| N/A   44C    P0    68W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    68W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    68W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   47C    P0    65W / 300W |  10386MiB / 16130MiB |    100%      Default |
| N/A   44C    P0    63W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   47C    P0   283W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    60W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    60W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   45C    P0    70W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    69W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    70W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    70W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   45C    P0    68W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    68W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    68W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   45C    P0    70W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    69W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    70W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   45C    P0    63W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    63W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   46C    P0    61W / 300W |  10386MiB / 16130MiB |     34%      Default |
| N/A   44C    P0    60W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   43C    P0    60W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   45C    P0    67W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    66W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   46C    P0    62W / 300W |  10386MiB / 16130MiB |     28%      Default |
| N/A   44C    P0    62W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    61W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   45C    P0    69W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    70W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   50C    P0   179W / 300W |  10386MiB / 16130MiB |    100%      Default |
| N/A   44C    P0    62W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   45C    P0   227W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   45C    P0    70W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   44C    P0    70W / 300W |  10386MiB / 16130MiB |      0%      Default |
| N/A   50C    P0   293W / 300W |  10386MiB / 16130MiB |    100%      Default |
| N/A   44C    P0    67W / 300W |  10386MiB / 16130MiB |      0%      Default |
@chaiyujin
Copy link
Owner

Yes, this version is not good. I don't have time to rewrite load_data with DataLoader provided by PyTorch. With DataLoader, we can set num_workers, which helps to prepare data in multi-workers.
You can have a try with DataLoader.

@frytry
Copy link

frytry commented Feb 22, 2020

whats kind of error?
File "main.py", line 62, in <module> main() File "main.py", line 47, in main train(args.module, args.load) File "D:\deepFake_audio\advance\dctts-pytorch-master\pkg\train.py", line 56, in train train_superres(load_trained) File "D:\deepFake_audio\advance\dctts-pytorch-master\pkg\train.py", line 295, in train_superres plot_spectrum(mag_pred[0].cpu().data, "pred", gs, dir=logdir) File "D:\deepFake_audio\advance\dctts-pytorch-master\pkg\utils.py", line 110, in plot_spectrum im = ax.imshow(np.flip(spectrum, 0), cmap="jet", aspect=0.2 * spectrum.shape[1] / spectrum.shape[0]) File "<__array_function__ internals>", line 6, in flip File "C:\Users\N451M\Miniconda3\lib\site-packages\numpy\lib\function_base.py", line 254, in flip return m[indexer] ValueError: negative step not yet supported

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants