Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fatal Python error: Segmentation fault #172

Open
joelongLin opened this issue Nov 1, 2021 · 0 comments
Open

Fatal Python error: Segmentation fault #172

joelongLin opened this issue Nov 1, 2021 · 0 comments

Comments

@joelongLin
Copy link

joelongLin commented Nov 1, 2021

I wanna to train a dataset unsupervisedly with tensorflow 1.15(just want to train with multiple GPUs). About 42,000,000+ random walks edge pairs and 7,000,000 nodes.
And ValueError: Tried to convert 'value' to a tensor and failed. Error: Cannot create a tensor proto whose content is larger than 2GB. happened. So I change the code like that

# define placeholders
adj_info_ph = tf.placeholder(tf.int32, shape=minibatch.adj.shape, name="adj_info_ph")
test_adj_info_ph = tf.placeholder(tf.int32, shape=minibatch.test_adj.shape, name="test_adj_info_ph")

# define variable
adj_info = tf.Variable(adj_info_ph, trainable=False, name="adj_info")

# assign with placeholder instead of minibatch.adj or minibatch.test_adj
train_adj_info = tf.assign(adj_info, adj_info_ph)
val_adj_info = tf.assign(adj_info, test_adj_info_ph)

It works. that's the change of the origin colde.

But then, I encounter another problem, the logs of error is

Fatal Python error: Segmentation fault
Thread 0x00007f96d37fe700 (most recent call first):
  File \"/usr/lib64/python3.6/threading.py\", line 295 in wait
  File \"/usr/lib64/python3.6/queue.py\", line 164 in get
  File \"/usr/local/lib64/python3.6/site-packages/tensorflow_core/python/summary/writer/event_file_writer.py\", line 159 in run
  File \"/usr/lib64/python3.6/threading.py\", line 916 in _bootstrap_inner
  File \"/usr/lib64/python3.6/threading.py\", line 884 in _bootstrap
Thread 0x00007fa5e4e6e740 (most recent call first):
  File \"/usr/local/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py\", line 1443 in _call_tf_sessionrun
  File \"/usr/local/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py\", line 1350 in _run_fn
  File \"/usr/local/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py\", line 1365 in _do_call
  File \"/usr/local/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py\", line 1359 in _do_run
  File \"/usr/local/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py\", line 1180 in _run
  File \"/usr/local/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py\", line 956 in run
  File \"${MY_LOCAL_PATH}/GraphSAGE/graphsage/unsupervised_train_tf1.15.py\", line 302 in train
  File \"${MY_LOCAL_PATH}/GraphSAGE/graphsage/unsupervised_train_tf1.15.py\", line 410 in main
  File \"/usr/local/lib/python3.6/site-packages/absl/app.py\", line 250 in _run_main
  File \"/usr/local/lib/python3.6/site-packages/absl/app.py\", line 299 in run
  File \"/usr/local/lib64/python3.6/site-packages/tensorflow_core/python/platform/app.py\", line 40 in run
  File \"${MY_LOCAL_PATH}/GraphSAGE/graphsage/unsupervised_train_tf1.15.py\", line 415 in <module>

There are few related solutions online. Do you have any advice? Thanks a lot!
My appeal is actually to use multi-GPUs parallel computing to accelerate the computation of GraphSAGE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant