Hardware used for training #1

kaotika · 2024-08-29T15:26:42Z

Hi, I'am very excited in your work.
What kind of gpu you are using for testing?
I tried dqn_transform with a Nvidia A100 with 40GB Ram and with default params I ran into OOM errors instantly.
Setting --replay_buffer_max_size to 125 runs properly for a while (I killed it after ~20-40 episodes, so I don't know if it runs into OOM errors in later episodes).
Running reinforce.py runs into OOM errors too, most of the time after 1-2 episodes.

The text was updated successfully, but these errors were encountered:

DevSlem · 2024-11-01T05:21:49Z

Thank you for your interest and sorry for my late reply. I've run the experiment on A6000 GPU (48GB). CUDA OOM error could happen due to the $O(n^2)$ computational complexity of the Transformer architecture where $n$ is the sequence length. So, you should modify the following code to adjust the sequence length in src/knapsack_env_transformer.py:

self.num_items = np.random.randint(2, 201)
self.num_bags = np.random.randint(2, 21)

In this case, the maximum sequence length is $200 \times 20 = 4000$. Thank you. If you have another issue, feel free to open an issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hardware used for training #1

Hardware used for training #1

kaotika commented Aug 29, 2024

DevSlem commented Nov 1, 2024 •

edited

Loading

Hardware used for training #1

Hardware used for training #1

Comments

kaotika commented Aug 29, 2024

DevSlem commented Nov 1, 2024 • edited Loading

DevSlem commented Nov 1, 2024 •

edited

Loading