Encounter error when training: Check failed: err == cudaSuccess (77 vs. 0) #14

deepmo24 · 2020-08-13T08:00:48Z

During training, I encounter this error:

08-13 15:55:05 - x - INFO: - batch 460 / 6988 (epoch 0 / 50):
08-13 15:55:05 - x - INFO: -  all_loss: 3.348e+00, pca_loss: 8.353e-01, gcn_loss: 8.326e-01, proj_loss: 1.160e-01, refine_loss: 6.524e-02, perc_loss: 6.035e-01, var_loss: 4.013e-02, sym_loss: 2.571e-01
08-13 15:55:08 - x - INFO: - batch 470 / 6988 (epoch 0 / 50):
08-13 15:55:08 - x - INFO: -  all_loss: 3.262e+00, pca_loss: 8.777e-01, gcn_loss: 8.643e-01, proj_loss: 1.088e-01, refine_loss: 6.327e-02, perc_loss: 6.063e-01, var_loss: 3.102e-02, sym_loss: 2.503e-01
08-13 15:55:10 - x - INFO: - batch 480 / 6988 (epoch 0 / 50):
08-13 15:55:10 - x - INFO: -  all_loss: 3.181e+00, pca_loss: 9.175e-01, gcn_loss: 9.221e-01, proj_loss: 1.402e-01, refine_loss: 6.417e-02, perc_loss: 4.947e-01, var_loss: 5.884e-02, sym_loss: 2.770e-01
08-13 15:55:12 - x - INFO: - batch 490 / 6988 (epoch 0 / 50):
08-13 15:55:12 - x - INFO: -  all_loss: 3.095e+00, pca_loss: 6.863e-01, gcn_loss: 6.734e-01, proj_loss: 1.191e-01, refine_loss: 5.824e-02, perc_loss: 4.795e-01, var_loss: 4.229e-02, sym_loss: 2.548e-01
2020-08-13 15:55:14.619465: F tensorflow/core/common_runtime/gpu/gpu_device.cc:143] Check failed: err == cudaSuccess (77 vs. 0)
Aborted (core dumped)

I don't know whether this error is caused by tensorflow version. I tried tensorflow=1.8.0 and 1.10.0, but still encountered this error. hope you can give me some suggestions. Thanks.

The text was updated successfully, but these errors were encountered:

levanpon98 · 2020-08-14T14:58:09Z

Could you share your data segmentation you use for training?

deepmo24 · 2020-08-14T15:01:46Z

@levanpon98 Do you mean the face segmentation? I get the segmentation result of CelebA using
face-parsing.

levanpon98 · 2020-08-14T15:04:23Z

Thank you, I just run training my data, but i encounter an error, i think my data have problem. So, how to generate face segmentation using this repos??

deepmo24 · 2020-08-14T15:08:15Z

This is the core code I write to get the segmentation, hope this would be helpful to you.

def save_RGBA_face(im, parsing_anno, img_size, save_path='vis_results/parsing_map_on_im.jpg'):

    vis_parsing_anno = parsing_anno.copy().astype(np.uint8)
    face_mask = np.zeros((vis_parsing_anno.shape[0], vis_parsing_anno.shape[1]))

    num_of_class = np.max(vis_parsing_anno)
    for pi in range(1, num_of_class + 1):
        index = np.where(vis_parsing_anno == pi)
        if pi in [1,2,3,4,5,10,12,13]:
            face_mask[index[0], index[1]] = 255.0

    im = np.array(im)
    img = im.copy().astype(np.uint8)
    img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)

    seg_img = face_mask.astype(np.uint8)

    img = cv2.resize(img, (img_size, img_size))
    seg_img = cv2.resize(seg_img, (img_size, img_size))
    seg_img = seg_img[:,:,None]

    BGRA_img = np.concatenate((img, seg_img), axis=2)

    cv2.imwrite(save_path, BGRA_img)

levanpon98 · 2020-08-14T15:12:52Z

Thank you.
Have you solved your problem? What is the current tensorflow version you use?

deepmo24 · 2020-08-15T03:54:35Z

partially solved the issue by reducing batch size. tensorflow==1.10.0

levanpon98 · 2020-08-16T23:33:03Z

@deepmo24
Thank you.
But i still get error while training

08-16 23:29:49 - x - INFO: - Namespace(adv_lambda=0.001, batch_size=1, buffer_size=10, drop_rate=0.2, epoch=50, eval=0, gan=False, img_size=224, input='data/test/raw', lr=0.0001, mode='train', model='normal', name='bfm09_face', nz=512, output='result/raw', restore=False, root_dir='/content/3D-Face-GCNs', seed=2, stage='all', suffix=None, wide=False, workers=4)
08-16 23:29:49 - x - INFO: - Loading data from /content/3D-Face-GCNs
2020-08-16 23:29:49.887578: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
08-16 23:29:49 - x - INFO: - Transform Matrices and Graph Laplacians Generated.
08-16 23:29:50 - x - INFO: - Number of train data: 21808
08-16 23:29:50 - x - INFO: - Evaluation frequency: 21808
08-16 23:30:03 - x - INFO: - Successfully Inferenced
08-16 23:30:04 - x - INFO: - Successfully Computed Losses
08-16 23:30:07 - x - INFO: - Successfully Inferenced
08-16 23:30:08 - x - INFO: - Successfully Computed Losses
08-16 23:30:16 - x - INFO: - Successfully Build Training Optimizer
08-16 23:30:16 - x - INFO: - Successfully Build Graph
08-16 23:30:16 - x - INFO: - Using Normal Model...
08-16 23:30:16 - x - INFO: - Start Fitting Model
tcmalloc: large alloc 3288334336 bytes == 0x116e12000 @  0x7f517cf411e7 0x7f517aa814ee 0x7f517aad1c2b 0x7f517aad4f73 0x7f517aad516b 0x7f517ab766e1 0x50a7f5 0x50c1f4 0x507f24 0x509c50 0x50a64d 0x50cfd6 0x509918 0x50a64d 0x50c1f4 0x507f24 0x509c50 0x50a64d 0x50c1f4 0x507f24 0x50b053 0x634dd2 0x634e87 0x63863f 0x6391e1 0x4b0dc0 0x7f517cb3eb97 0x5b26fa
08-16 23:30:41 - x - INFO: - render_lambda: 0.000000, refine_lambda: 1.000000
08-16 23:31:01 - x - INFO: - Error Occured in Sess Run.
Traceback (most recent call last):
  File "main.py", line 134, in <module>
    main()
  File "main.py", line 94, in main
    model.fit()
  File "/content/3D-Face-GCNs/base_model.py", line 624, in fit
    string, results = self.evaluate(val_image)
  File "/content/3D-Face-GCNs/base_model.py", line 724, in evaluate
    result = self.predict(batch_image)
  File "/content/3D-Face-GCNs/base_model.py", line 796, in predict
    proj_loss, refine_loss, perc_loss = self.sess.run(fetches, feed_dict)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 877, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1100, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1272, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1291, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[23499] = 50134 is not in [0, 35709)
	 [[Node: loss_1/data_loss/GatherV2_1 = GatherV2[Taxis=DT_INT32, Tindices=DT_INT64, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](mesh_generator_1/Tanh, loss_1/data_loss/GatherV2_1/indices, GatherV2_9/axis)]]

Caused by op 'loss_1/data_loss/GatherV2_1', defined at:
  File "main.py", line 134, in <module>
    main()
  File "main.py", line 86, in main
    model = NormalModel(args, sess, graph, refer_mesh, image_paths, img_file)
  File "/content/3D-Face-GCNs/model_normal.py", line 16, in __init__
    super(Model, self).__init__(*args, **kwargs)
  File "/content/3D-Face-GCNs/base_model.py", line 112, in __init__
    self.build_graph()
  File "/content/3D-Face-GCNs/base_model.py", line 241, in build_graph
    image_feat_test, gcn_image_feat_test, self.regularization, True)
  File "/content/3D-Face-GCNs/base_model.py", line 434, in compute_loss
    sym_diff = tf.gather(gcn_texture, self.bfm.left_index, axis=1) - tf.gather(
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/array_ops.py", line 2653, in gather
    return gen_array_ops.gather_v2(params, indices, axis, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 3142, in gather_v2
    "GatherV2", params=params, indices=indices, axis=axis, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3155, in create_op
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 1717, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): indices[23499] = 50134 is not in [0, 35709)
	 [[Node: loss_1/data_loss/GatherV2_1 = GatherV2[Taxis=DT_INT32, Tindices=DT_INT64, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](mesh_generator_1/Tanh, loss_1/data_loss/GatherV2_1/indices, GatherV2_9/axis)]]

Have you got this error yet?

ZechCal · 2020-10-20T07:48:21Z

@deepmo24
Thank you.
But i still get error while training

08-16 23:29:49 - x - INFO: - Namespace(adv_lambda=0.001, batch_size=1, buffer_size=10, drop_rate=0.2, epoch=50, eval=0, gan=False, img_size=224, input='data/test/raw', lr=0.0001, mode='train', model='normal', name='bfm09_face', nz=512, output='result/raw', restore=False, root_dir='/content/3D-Face-GCNs', seed=2, stage='all', suffix=None, wide=False, workers=4)
08-16 23:29:49 - x - INFO: - Loading data from /content/3D-Face-GCNs
2020-08-16 23:29:49.887578: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
08-16 23:29:49 - x - INFO: - Transform Matrices and Graph Laplacians Generated.
08-16 23:29:50 - x - INFO: - Number of train data: 21808
08-16 23:29:50 - x - INFO: - Evaluation frequency: 21808
08-16 23:30:03 - x - INFO: - Successfully Inferenced
08-16 23:30:04 - x - INFO: - Successfully Computed Losses
08-16 23:30:07 - x - INFO: - Successfully Inferenced
08-16 23:30:08 - x - INFO: - Successfully Computed Losses
08-16 23:30:16 - x - INFO: - Successfully Build Training Optimizer
08-16 23:30:16 - x - INFO: - Successfully Build Graph
08-16 23:30:16 - x - INFO: - Using Normal Model...
08-16 23:30:16 - x - INFO: - Start Fitting Model
tcmalloc: large alloc 3288334336 bytes == 0x116e12000 @  0x7f517cf411e7 0x7f517aa814ee 0x7f517aad1c2b 0x7f517aad4f73 0x7f517aad516b 0x7f517ab766e1 0x50a7f5 0x50c1f4 0x507f24 0x509c50 0x50a64d 0x50cfd6 0x509918 0x50a64d 0x50c1f4 0x507f24 0x509c50 0x50a64d 0x50c1f4 0x507f24 0x50b053 0x634dd2 0x634e87 0x63863f 0x6391e1 0x4b0dc0 0x7f517cb3eb97 0x5b26fa
08-16 23:30:41 - x - INFO: - render_lambda: 0.000000, refine_lambda: 1.000000
08-16 23:31:01 - x - INFO: - Error Occured in Sess Run.
Traceback (most recent call last):
  File "main.py", line 134, in <module>
    main()
  File "main.py", line 94, in main
    model.fit()
  File "/content/3D-Face-GCNs/base_model.py", line 624, in fit
    string, results = self.evaluate(val_image)
  File "/content/3D-Face-GCNs/base_model.py", line 724, in evaluate
    result = self.predict(batch_image)
  File "/content/3D-Face-GCNs/base_model.py", line 796, in predict
    proj_loss, refine_loss, perc_loss = self.sess.run(fetches, feed_dict)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 877, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1100, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1272, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1291, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[23499] = 50134 is not in [0, 35709)
	 [[Node: loss_1/data_loss/GatherV2_1 = GatherV2[Taxis=DT_INT32, Tindices=DT_INT64, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](mesh_generator_1/Tanh, loss_1/data_loss/GatherV2_1/indices, GatherV2_9/axis)]]

Caused by op 'loss_1/data_loss/GatherV2_1', defined at:
  File "main.py", line 134, in <module>
    main()
  File "main.py", line 86, in main
    model = NormalModel(args, sess, graph, refer_mesh, image_paths, img_file)
  File "/content/3D-Face-GCNs/model_normal.py", line 16, in __init__
    super(Model, self).__init__(*args, **kwargs)
  File "/content/3D-Face-GCNs/base_model.py", line 112, in __init__
    self.build_graph()
  File "/content/3D-Face-GCNs/base_model.py", line 241, in build_graph
    image_feat_test, gcn_image_feat_test, self.regularization, True)
  File "/content/3D-Face-GCNs/base_model.py", line 434, in compute_loss
    sym_diff = tf.gather(gcn_texture, self.bfm.left_index, axis=1) - tf.gather(
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/array_ops.py", line 2653, in gather
    return gen_array_ops.gather_v2(params, indices, axis, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 3142, in gather_v2
    "GatherV2", params=params, indices=indices, axis=axis, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3155, in create_op
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 1717, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): indices[23499] = 50134 is not in [0, 35709)
	 [[Node: loss_1/data_loss/GatherV2_1 = GatherV2[Taxis=DT_INT32, Tindices=DT_INT64, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](mesh_generator_1/Tanh, loss_1/data_loss/GatherV2_1/indices, GatherV2_9/axis)]]

Have you got this error yet?

10-20 12:02:08 - x - INFO: - Input to reshape is a tensor with 401408 values, but the requested shape has 802816
[[node BatchGather/Reshape_2 (defined at /home/zxy/3dtest/3D-Face-GCNs/utils.py:832) ]]
[[node render_1/strided_slice_34 (defined at /home/zxy/anaconda3/envs/tensorglue/lib/python3.6/site-packages/rasterize_triangles.py:121) ]]

Caused by op 'BatchGather/Reshape_2', defined at:
File "main.py", line 142, in
main()
File "main.py", line 95, in main
model = NormalModel(args, sess, graph, refer_mesh, image_paths, img_file)
File "/home/zxy/3dtest/3D-Face-GCNs/model_normal.py", line 16, in init
super(Model, self).init(*args, **kwargs)
File "/home/zxy/3dtest/3D-Face-GCNs/base_model.py", line 115, in init
self.build_graph()
File "/home/zxy/3dtest/3D-Face-GCNs/base_model.py", line 200, in build_graph
pred_results = self.inference(self.train_rgbas, self.coeff, self.image_emb)
File "/home/zxy/3dtest/3D-Face-GCNs/base_model.py", line 354, in inference
proj_color = self.project_color(proj_vert, eros_image)
File "/home/zxy/3dtest/3D-Face-GCNs/base_model.py", line 1029, in project_color
proj_color = utils.batch_gather(flatten_image, coords)
File "/home/zxy/3dtest/3D-Face-GCNs/utils.py", line 718, in batch_gather
return _batch_gather(params, indices, batch_dims=indices.shape.ndims - 1)
File "/home/zxy/3dtest/3D-Face-GCNs/utils.py", line 832, in _batch_gather
flat_params = tf.reshape(params, tf.concat([[flat_inner_shape], outer_shape], axis=0))
File "/home/zxy/anaconda3/envs/tensorglue/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 7179, in reshape
"Reshape", tensor=tensor, shape=shape, name=name)
File "/home/zxy/anaconda3/envs/tensorglue/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/home/zxy/anaconda3/envs/tensorglue/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/zxy/anaconda3/envs/tensorglue/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
op_def=op_def)
File "/home/zxy/anaconda3/envs/tensorglue/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1801, in init
self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Input to reshape is a tensor with 401408 values, but the requested shape has 802816
[[node BatchGather/Reshape_2 (defined at /home/zxy/3dtest/3D-Face-GCNs/utils.py:832) ]]
[[node render_1/strided_slice_34 (defined at /home/zxy/anaconda3/envs/tensorglue/lib/python3.6/site-packages/rasterize_triangles.py:121) ]]

This is my error, interesting

djx99 · 2021-07-06T00:33:56Z

@deepmo24
Thank you.
But i still get error while training

08-16 23:29:49 - x - INFO: - Namespace(adv_lambda=0.001, batch_size=1, buffer_size=10, drop_rate=0.2, epoch=50, eval=0, gan=False, img_size=224, input='data/test/raw', lr=0.0001, mode='train', model='normal', name='bfm09_face', nz=512, output='result/raw', restore=False, root_dir='/content/3D-Face-GCNs', seed=2, stage='all', suffix=None, wide=False, workers=4)
08-16 23:29:49 - x - INFO: - Loading data from /content/3D-Face-GCNs
2020-08-16 23:29:49.887578: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
08-16 23:29:49 - x - INFO: - Transform Matrices and Graph Laplacians Generated.
08-16 23:29:50 - x - INFO: - Number of train data: 21808
08-16 23:29:50 - x - INFO: - Evaluation frequency: 21808
08-16 23:30:03 - x - INFO: - Successfully Inferenced
08-16 23:30:04 - x - INFO: - Successfully Computed Losses
08-16 23:30:07 - x - INFO: - Successfully Inferenced
08-16 23:30:08 - x - INFO: - Successfully Computed Losses
08-16 23:30:16 - x - INFO: - Successfully Build Training Optimizer
08-16 23:30:16 - x - INFO: - Successfully Build Graph
08-16 23:30:16 - x - INFO: - Using Normal Model...
08-16 23:30:16 - x - INFO: - Start Fitting Model
tcmalloc: large alloc 3288334336 bytes == 0x116e12000 @  0x7f517cf411e7 0x7f517aa814ee 0x7f517aad1c2b 0x7f517aad4f73 0x7f517aad516b 0x7f517ab766e1 0x50a7f5 0x50c1f4 0x507f24 0x509c50 0x50a64d 0x50cfd6 0x509918 0x50a64d 0x50c1f4 0x507f24 0x509c50 0x50a64d 0x50c1f4 0x507f24 0x50b053 0x634dd2 0x634e87 0x63863f 0x6391e1 0x4b0dc0 0x7f517cb3eb97 0x5b26fa
08-16 23:30:41 - x - INFO: - render_lambda: 0.000000, refine_lambda: 1.000000
08-16 23:31:01 - x - INFO: - Error Occured in Sess Run.
Traceback (most recent call last):
  File "main.py", line 134, in <module>
    main()
  File "main.py", line 94, in main
    model.fit()
  File "/content/3D-Face-GCNs/base_model.py", line 624, in fit
    string, results = self.evaluate(val_image)
  File "/content/3D-Face-GCNs/base_model.py", line 724, in evaluate
    result = self.predict(batch_image)
  File "/content/3D-Face-GCNs/base_model.py", line 796, in predict
    proj_loss, refine_loss, perc_loss = self.sess.run(fetches, feed_dict)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 877, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1100, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1272, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1291, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[23499] = 50134 is not in [0, 35709)
	 [[Node: loss_1/data_loss/GatherV2_1 = GatherV2[Taxis=DT_INT32, Tindices=DT_INT64, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](mesh_generator_1/Tanh, loss_1/data_loss/GatherV2_1/indices, GatherV2_9/axis)]]

Caused by op 'loss_1/data_loss/GatherV2_1', defined at:
  File "main.py", line 134, in <module>
    main()
  File "main.py", line 86, in main
    model = NormalModel(args, sess, graph, refer_mesh, image_paths, img_file)
  File "/content/3D-Face-GCNs/model_normal.py", line 16, in __init__
    super(Model, self).__init__(*args, **kwargs)
  File "/content/3D-Face-GCNs/base_model.py", line 112, in __init__
    self.build_graph()
  File "/content/3D-Face-GCNs/base_model.py", line 241, in build_graph
    image_feat_test, gcn_image_feat_test, self.regularization, True)
  File "/content/3D-Face-GCNs/base_model.py", line 434, in compute_loss
    sym_diff = tf.gather(gcn_texture, self.bfm.left_index, axis=1) - tf.gather(
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/array_ops.py", line 2653, in gather
    return gen_array_ops.gather_v2(params, indices, axis, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 3142, in gather_v2
    "GatherV2", params=params, indices=indices, axis=axis, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3155, in create_op
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 1717, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): indices[23499] = 50134 is not in [0, 35709)
	 [[Node: loss_1/data_loss/GatherV2_1 = GatherV2[Taxis=DT_INT32, Tindices=DT_INT64, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](mesh_generator_1/Tanh, loss_1/data_loss/GatherV2_1/indices, GatherV2_9/axis)]]

Have you got this error yet?

Hi,I have the same problem, do you solve it? And how?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encounter error when training: Check failed: err == cudaSuccess (77 vs. 0) #14

Encounter error when training: Check failed: err == cudaSuccess (77 vs. 0) #14

deepmo24 commented Aug 13, 2020

levanpon98 commented Aug 14, 2020

deepmo24 commented Aug 14, 2020

levanpon98 commented Aug 14, 2020

deepmo24 commented Aug 14, 2020

levanpon98 commented Aug 14, 2020 •

edited

Loading

deepmo24 commented Aug 15, 2020

levanpon98 commented Aug 16, 2020 •

edited

Loading

ZechCal commented Oct 20, 2020

djx99 commented Jul 6, 2021

Encounter error when training: Check failed: err == cudaSuccess (77 vs. 0) #14

Encounter error when training: Check failed: err == cudaSuccess (77 vs. 0) #14

Comments

deepmo24 commented Aug 13, 2020

levanpon98 commented Aug 14, 2020

deepmo24 commented Aug 14, 2020

levanpon98 commented Aug 14, 2020

deepmo24 commented Aug 14, 2020

levanpon98 commented Aug 14, 2020 • edited Loading

deepmo24 commented Aug 15, 2020

levanpon98 commented Aug 16, 2020 • edited Loading

ZechCal commented Oct 20, 2020

djx99 commented Jul 6, 2021

levanpon98 commented Aug 14, 2020 •

edited

Loading

levanpon98 commented Aug 16, 2020 •

edited

Loading