Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

torch.multinomial on GPU #144

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rubencart
Copy link

Results from training an FC model with self-critical RL on 1 single GPU with batch_size 32.
Output from training script, before change:

iter 50 (epoch 0), avg_reward = 0.001, time/batch = 1.137
iter 100 (epoch 0), avg_reward = 0.002, time/batch = 1.145
iter 150 (epoch 0), avg_reward = 0.001, time/batch = 1.143
iter 200 (epoch 0), avg_reward = 0.002, time/batch = 1.130
iter 250 (epoch 0), avg_reward = 0.000, time/batch = 1.143
iter 300 (epoch 0), avg_reward = 0.000, time/batch = 1.123
iter 350 (epoch 0), avg_reward = 0.001, time/batch = 1.123
iter 400 (epoch 0), avg_reward = -0.000, time/batch = 1.131
iter 450 (epoch 0), avg_reward = 0.001, time/batch = 1.115
iter 500 (epoch 0), avg_reward = 0.000, time/batch = 1.155
total time: 589.1319808959961

And cProfile output:

         173950435 function calls (173760536 primitive calls) in 591.966 seconds
   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     8500  315.550    0.037  315.550    0.037 {built-in method multinomial}
   960610   90.125    0.000  117.889    0.000 cider/pyciderevalcap/ciderD/ciderD_scorer.py:128(counts2vec)
   800610   35.399    0.000   44.638    0.000 cider/pyciderevalcap/ciderD/ciderD_scorer.py:154(sim)
   960610   30.668    0.000   31.411    0.000 cider/pyciderevalcap/ciderD/ciderD_scorer.py:17(precook)
 37222323   11.875    0.000   11.875    0.000 {built-in method builtins.pow}
      511   11.348    0.022   11.348    0.022 {method 'item' of 'torch._C._TensorBase' objects}
     9500   11.165    0.001   11.165    0.001 {method 'cpu' of 'torch._C._TensorBase' objects}
     1000    9.129    0.009  344.628    0.345 /export/home1/NoCsBack/hci/rubenc/selfcritical/models/FCModel.py:150(_sample)
      500    9.106    0.018    9.106    0.018 {method 'run_backward' of 'torch._C._EngineBase' objects}
 32826245    6.941    0.000    6.941    0.000 {built-in method builtins.min}

After change, with exact same options and same number or iterations:

iter 50 (epoch 0), avg_reward = 0.000, time/batch = 0.519
iter 100 (epoch 0), avg_reward = 0.000, time/batch = 0.523
iter 150 (epoch 0), avg_reward = 0.001, time/batch = 0.534
iter 200 (epoch 0), avg_reward = 0.000, time/batch = 0.522
iter 250 (epoch 0), avg_reward = 0.001, time/batch = 0.529
iter 300 (epoch 0), avg_reward = 0.002, time/batch = 0.532
iter 350 (epoch 0), avg_reward = 0.001, time/batch = 0.711
iter 400 (epoch 0), avg_reward = -0.000, time/batch = 0.528
iter 450 (epoch 0), avg_reward = 0.001, time/batch = 0.517
iter 500 (epoch 0), avg_reward = 0.001, time/batch = 0.512
total time: 283.7362642288208

And cProfile output:

         184722279 function calls (184532377 primitive calls) in 296.112 seconds
   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   960610   99.424    0.000  131.812    0.000 cider/pyciderevalcap/ciderD/ciderD_scorer.py:128(counts2vec)
   800610   42.364    0.000   53.293    0.000 cider/pyciderevalcap/ciderD/ciderD_scorer.py:154(sim)
   960610   31.212    0.000   32.016    0.000 cider/pyciderevalcap/ciderD/ciderD_scorer.py:17(precook)
 38569360   15.590    0.000   15.590    0.000 {built-in method builtins.pow}
     1000   14.660    0.015   23.485    0.023 /export/home1/NoCsBack/hci/rubenc/selfcritical/models/FCModel.py:150(_sample)
      511   10.595    0.021   10.595    0.021 {method 'item' of 'torch._C._TensorBase' objects}
      500    9.524    0.019    9.524    0.019 {method 'run_backward' of 'torch._C._EngineBase' objects}
 39566383    8.326    0.000    8.326    0.000 {built-in method builtins.min}
 38570206    6.785    0.000    6.785    0.000 {built-in method builtins.max}
      500    6.722    0.013  195.449    0.391 cider/pyciderevalcap/ciderD/ciderD_scorer.py:127(compute_cider)

So basically, a big improvement in speed 🙂 .

A comparison in ipython:

In [28]: device = torch.device('cuda:0')

In [29]: weights = torch.randn((32, 20000), dtype=torch.float32).clamp(0.01, 1)

In [30]: cweights = weights.clone().detach().to(device)

In [31]: avg_timeit(lambda: torch.multinomial(cweights, 1), 100)
Out[31]: 7.232666015625e-05

In [32]: avg_timeit(lambda: torch.multinomial(weights, 1), 100)
Out[32]: 0.015503778457641601

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant