Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault while running MAMnet with slurm #6

Open
LYC-vio opened this issue Feb 10, 2023 · 1 comment
Open

Segmentation fault while running MAMnet with slurm #6

LYC-vio opened this issue Feb 10, 2023 · 1 comment

Comments

@LYC-vio
Copy link

LYC-vio commented Feb 10, 2023

When running MAMnet with slurm job manager, it sometimes fails with the error Segmentation fault, but sometimes works well on other bam file. (All bam files can be called correctly with other SV callers)

The MAMnet command I used is:

python ${MAMnetPath}/MAMnet.py -bamfilepath ${bam} -threads 20 -step 50 -INTERVAL 1e7 -genotype True -workdir ${work_dir} -SV_weightspath ${MAMnetPath}/type -genotype_weightspath ${MAMnetPath}/geno -outputpath ${output}

The bam file is generated from human genome ONT reads, and the coverage is around 10x.

Would you please give me some suggestions about this issue?

The complete error log is:

2023-02-10 14:22:01.737917: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2023-02-10 14:26:18.452917: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2023-02-10 14:26:18.455260: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2023-02-10 14:26:18.491379: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:3b:00.0 name: NVIDIA GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s
2023-02-10 14:26:18.491468: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2023-02-10 14:26:20.455815: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2023-02-10 14:26:20.455988: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10
2023-02-10 14:26:28.632020: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2023-02-10 14:26:37.928679: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2023-02-10 14:26:49.623876: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2023-02-10 14:26:49.826945: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2023-02-10 14:26:57.359726: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.7
2023-02-10 14:26:57.373221: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2023-02-10 14:26:57.433696: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-10 14:26:57.435454: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2023-02-10 14:26:57.438896: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:3b:00.0 name: NVIDIA GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s
2023-02-10 14:26:57.439012: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2023-02-10 14:26:57.439100: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2023-02-10 14:26:57.439161: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10
2023-02-10 14:26:57.439220: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2023-02-10 14:26:57.439279: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2023-02-10 14:26:57.439335: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2023-02-10 14:26:57.439393: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2023-02-10 14:26:57.439451: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.7
2023-02-10 14:26:57.444975: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2023-02-10 14:26:57.454605: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2023-02-10 14:30:41.554191: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2023-02-10 14:30:41.555194: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0 
2023-02-10 14:30:41.555358: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N 
2023-02-10 14:30:47.180432: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10074 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:3b:00.0, compute capability: 7.5)
2023-02-10 14:53:46.612038: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2023-02-10 14:53:48.612236: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2300000000 Hz
2023-02-10 14:53:57.108128: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2023-02-10 14:54:35.100335: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.7
/var/spool/slurmd.gpu0038/job49507205/slurm_script: line 43: 54825 Segmentation fault      python ${MAMnetPath}/MAMnet.py -bamfilepath ${bam} -threads 20 -step 50 -INTERVAL 1e7 -genotype True -workdir ${work_dir} -SV_weightspath ${MAMnetPath}/type -genotype_weightspath ${MAMnetPath}/geno -outputpath ${output}

my environment is:

_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
_tflow_select             2.1.0                       gpu  
abseil-cpp                20211102.0           h27087fc_1    conda-forge
absl-py                   1.1.0              pyhd8ed1ab_0    conda-forge
aiohttp                   3.8.1            py38h0a891b7_1    conda-forge
aiosignal                 1.2.0              pyhd8ed1ab_0    conda-forge
alsa-lib                  1.2.6.1              h7f98852_0    conda-forge
astor                     0.8.1              pyh9f0ad1d_0    conda-forge
astunparse                1.6.3              pyhd8ed1ab_0    conda-forge
async-timeout             4.0.2              pyhd8ed1ab_0    conda-forge
attrs                     21.4.0             pyhd8ed1ab_0    conda-forge
blinker                   1.4                        py_1    conda-forge
brotli                    1.0.9                h166bdaf_7    conda-forge
brotli-bin                1.0.9                h166bdaf_7    conda-forge
brotlipy                  0.7.0           py38h0a891b7_1004    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.18.1               h7f98852_0    conda-forge
ca-certificates           2022.6.15            ha878542_0    conda-forge
cachetools                4.2.4              pyhd8ed1ab_0    conda-forge
certifi                   2022.6.15        py38h578d9bd_0    conda-forge
cffi                      1.15.0           py38h3931269_0    conda-forge
charset-normalizer        2.0.12             pyhd8ed1ab_0    conda-forge
click                     8.1.3            py38h578d9bd_0    conda-forge
cryptography              37.0.1           py38h9ce1e76_0  
cudatoolkit               10.1.243            h8cb64d8_10    conda-forge
cudatoolkit-dev           10.1.243             h516909a_3    conda-forge
cudnn                     7.6.5.32             hc0a50b0_1    conda-forge
cupti                     10.1.168                      0  
cycler                    0.11.0             pyhd8ed1ab_0    conda-forge
dbus                      1.13.6               h5008d03_3    conda-forge
expat                     2.4.8                h27087fc_0    conda-forge
font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
font-ttf-inconsolata      3.000                h77eed37_0    conda-forge
font-ttf-source-code-pro  2.038                h77eed37_0    conda-forge
font-ttf-ubuntu           0.83                 hab24e00_0    conda-forge
fontconfig                2.14.0               h8e229c2_0    conda-forge
fonts-conda-ecosystem     1                             0    conda-forge
fonts-conda-forge         1                             0    conda-forge
fonttools                 4.33.3           py38h0a891b7_0    conda-forge
freetype                  2.10.4               h0708190_1    conda-forge
frozenlist                1.3.0            py38h0a891b7_1    conda-forge
gast                      0.4.0              pyh9f0ad1d_0    conda-forge
gettext                   0.19.8.1          h73d1719_1008    conda-forge
giflib                    5.2.1                h36c2ea0_2    conda-forge
glib                      2.70.2               h780b84a_4    conda-forge
glib-tools                2.70.2               h780b84a_4    conda-forge
google-auth               1.35.0             pyh6c4a22f_0    conda-forge
google-auth-oauthlib      0.4.6              pyhd8ed1ab_0    conda-forge
google-pasta              0.2.0              pyh8c360ce_0    conda-forge
grpc-cpp                  1.46.3               hc275302_1    conda-forge
grpcio                    1.46.3           py38hb6c94e9_1    conda-forge
gst-plugins-base          1.20.3               hf6a322e_0    conda-forge
gstreamer                 1.20.3               hd4edc92_0    conda-forge
h5py                      2.10.0          nompi_py38h513d04c_102    conda-forge
hdf5                      1.10.5          nompi_h5b725eb_1114    conda-forge
icu                       69.1                 h9c3ff4c_0    conda-forge
idna                      3.3                pyhd8ed1ab_0    conda-forge
importlib-metadata        4.11.4           py38h578d9bd_0    conda-forge
jpeg                      9e                   h166bdaf_1    conda-forge
keras-preprocessing       1.1.2              pyhd8ed1ab_0    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
kiwisolver                1.4.3            py38h43d8883_0    conda-forge
krb5                      1.19.3               h3790be6_0    conda-forge
lcms2                     2.12                 hddcbb42_0    conda-forge
ld_impl_linux-64          2.36.1               hea4e1c9_2    conda-forge
lerc                      3.0                  h9c3ff4c_0    conda-forge
libblas                   3.9.0           15_linux64_openblas    conda-forge
libbrotlicommon           1.0.9                h166bdaf_7    conda-forge
libbrotlidec              1.0.9                h166bdaf_7    conda-forge
libbrotlienc              1.0.9                h166bdaf_7    conda-forge
libcblas                  3.9.0           15_linux64_openblas    conda-forge
libclang                  13.0.1          default_hc23dcda_0    conda-forge
libcurl                   7.83.1               h7bff187_0    conda-forge
libdeflate                1.10                 h7f98852_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libevent                  2.1.10               h9b69904_4    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 12.1.0              h8d9b700_16    conda-forge
libgfortran-ng            12.1.0              h69a702a_16    conda-forge
libgfortran5              12.1.0              hdcd56e2_16    conda-forge
libglib                   2.70.2               h174f98d_4    conda-forge
libgomp                   12.1.0              h8d9b700_16    conda-forge
libiconv                  1.16                 h516909a_0    conda-forge
liblapack                 3.9.0           15_linux64_openblas    conda-forge
libllvm13                 13.0.1               hf817b99_2    conda-forge
libnghttp2                1.47.0               h727a467_0    conda-forge
libnsl                    2.0.0                h7f98852_0    conda-forge
libogg                    1.3.4                h7f98852_1    conda-forge
libopenblas               0.3.20          pthreads_h78a6416_0    conda-forge
libopus                   1.3.1                h7f98852_1    conda-forge
libpng                    1.6.37               h21135ba_2    conda-forge
libpq                     14.4                 hd77ab85_0    conda-forge
libprotobuf               3.20.1               h6239696_0    conda-forge
libssh2                   1.10.0               ha56f1ee_2    conda-forge
libstdcxx-ng              12.1.0              ha89aaad_16    conda-forge
libtiff                   4.4.0                h0fcbabc_0    conda-forge
libuuid                   2.32.1            h7f98852_1000    conda-forge
libvorbis                 1.3.7                h9c3ff4c_0    conda-forge
libwebp                   1.2.2                h3452ae3_0    conda-forge
libwebp-base              1.2.2                h7f98852_1    conda-forge
libxcb                    1.13              h7f98852_1004    conda-forge
libxkbcommon              1.0.3                he3ba5ed_0    conda-forge
libxml2                   2.9.12               h885dcf4_1    conda-forge
libzlib                   1.2.12               h166bdaf_1    conda-forge
llvmlite                  0.38.1                    <pip>
lz4-c                     1.9.3                h9c3ff4c_1    conda-forge
markdown                  3.3.7              pyhd8ed1ab_0    conda-forge
matplotlib                3.5.1            py38h578d9bd_0    conda-forge
matplotlib-base           3.5.1            py38hf4fb855_0    conda-forge
multidict                 6.0.2            py38h0a891b7_1    conda-forge
munkres                   1.1.4              pyh9f0ad1d_0    conda-forge
mysql-common              8.0.29               haf5c9bc_1    conda-forge
mysql-libs                8.0.29               h28c427c_1    conda-forge
ncurses                   6.3                  h27087fc_1    conda-forge
nspr                      4.32                 h9c3ff4c_1    conda-forge
nss                       3.78                 h2350873_0    conda-forge
numba                     0.55.2                    <pip>
numpy                     1.19.5           py38h8246c76_3    conda-forge
oauthlib                  3.2.0              pyhd8ed1ab_0    conda-forge
openjpeg                  2.4.0                hb52868f_1    conda-forge
openssl                   1.1.1o               h166bdaf_0    conda-forge
opt_einsum                3.3.0              pyhd8ed1ab_1    conda-forge
packaging                 21.3               pyhd8ed1ab_0    conda-forge
pandas                    1.4.2            py38h47df419_2    conda-forge
pcre                      8.45                 h9c3ff4c_0    conda-forge
pillow                    9.1.1            py38h0ee0e06_1    conda-forge
pip                       22.1.2             pyhd8ed1ab_0    conda-forge
protobuf                  3.20.1           py38hfa26641_0    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
pyasn1                    0.4.8                      py_0    conda-forge
pyasn1-modules            0.2.7                      py_0    conda-forge
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pyjwt                     2.4.0              pyhd8ed1ab_0    conda-forge
pyopenssl                 22.0.0             pyhd8ed1ab_0    conda-forge
pyparsing                 3.0.9              pyhd8ed1ab_0    conda-forge
pyqt                      5.12.3           py38ha8c2ead_4    conda-forge
pysam                     0.19.0           py38h8bf8b8d_0    bioconda
pysocks                   1.7.1            py38h578d9bd_5    conda-forge
python                    3.8.13          h582c2e5_0_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python-flatbuffers        2.0                pyhd8ed1ab_0    conda-forge
python_abi                3.8                      2_cp38    conda-forge
pytz                      2022.1             pyhd8ed1ab_0    conda-forge
pyu2f                     0.1.5              pyhd8ed1ab_0    conda-forge
qt                        5.12.9               h1304e3e_6    conda-forge
re2                       2022.04.01           h27087fc_0    conda-forge
readline                  8.1.2                h0f457ee_0    conda-forge
requests                  2.28.0             pyhd8ed1ab_0    conda-forge
requests-oauthlib         1.3.1              pyhd8ed1ab_0    conda-forge
rsa                       4.8                pyhd8ed1ab_0    conda-forge
scipy                     1.8.1            py38h1ee437e_0    conda-forge
setuptools                62.6.0           py38h578d9bd_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
sqlite                    3.38.5               h4ff8645_0    conda-forge
tensorboard               2.4.1              pyhd8ed1ab_1    conda-forge
tensorboard-plugin-wit    1.8.1              pyhd8ed1ab_0    conda-forge
tensorflow                2.4.1           gpu_py38h8a7d6ce_0  
tensorflow-base           2.4.1           gpu_py38h29c2da4_0  
tensorflow-estimator      2.6.0            py38h709712a_0    conda-forge
tensorflow-gpu            2.4.1                h30adc30_0  
termcolor                 1.1.0                      py_2    conda-forge
tk                        8.6.12               h27826a3_0    conda-forge
tornado                   6.1              py38h0a891b7_3    conda-forge
typing-extensions         4.2.0                hd8ed1ab_1    conda-forge
typing_extensions         4.2.0              pyha770c72_1    conda-forge
unicodedata2              14.0.0           py38h0a891b7_1    conda-forge
urllib3                   1.26.9             pyhd8ed1ab_0    conda-forge
werkzeug                  2.1.2              pyhd8ed1ab_1    conda-forge
wheel                     0.37.1             pyhd8ed1ab_0    conda-forge
wrapt                     1.14.1           py38h0a891b7_0    conda-forge
xorg-libxau               1.0.9                h7f98852_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xz                        5.2.5                h516909a_1    conda-forge
yarl                      1.7.2            py38h0a891b7_2    conda-forge
zipp                      3.8.0              pyhd8ed1ab_0    conda-forge
zlib                      1.2.12               h166bdaf_1    conda-forge
zstd                      1.5.2                h8a70e8d_1    conda-forge
@micahvista
Copy link
Owner

Dear LYC-vio:
I have encountered this error before, a simple rerun often works for me. I think this problem is caused by multithreading, some data may conflict where data exchange between different process.
Thanks.
Hongyu Ding

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants