BUG: g++: fatal error: Killed signal terminated program cc1plus #6833

ThomasHoppe · 2023-07-18T17:28:27Z

Describe the issue:

During compilation of models compiler receives a kill signal (reason unknown).
Can be reproduced with two different models.

Reproduceable code example:

Code example is longer, see attached notebook and data file below

Error message:

CompileError: Compilation failed (return status=1):
/usr/bin/g++ -shared -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -Wno-c++11-narrowing -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables -march=broadwell -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16 -msahf -mmovbe -maes -mno-sha -mpclmul -mpopcnt -mabm -mno-lwp -mfma -mno-fma4 -mno-xop -mbmi -mno-sgx -mbmi2 -mno-pconfig -mno-wbnoinvd -mno-tbm -mavx -mavx2 -msse4.2 -msse4.1 -mlzcnt -mrtm -mhle -mrdrnd -mf16c -mfsgsbase -mrdseed -mprfchw -madx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 -mno-clflushopt -mno-xsavec -mno-xsaves -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma -mno-avx512vbmi -mno-avx5124fmaps -mno-avx5124vnniw -mno-clwb -mno-mwaitx -mno-clzero -mno-pku -mno-rdpid -mno-gfni -mno-shstk -mno-avx512vbmi2 -mno-avx512vnni -mno-vaes -mno-vpclmulqdq -mno-avx512bitalg -mno-avx512vpopcntdq -mno-movdiri -mno-movdir64b -mno-waitpkg -mno-cldemote -mno-ptwrite --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=4096 -mtune=broadwell -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -m64 -fPIC -I/home/thomas/.local/lib/python3.8/site-packages/numpy/core/include -I/usr/include/python3.8 -I/home/thomas/.local/lib/python3.8/site-packages/pytensor/link/c/c_code -L/usr/lib -fvisibility=hidden -o /home/thomas/.pytensor/compiledir_Linux-5.15-microsoft-standard-WSL2-x86_64-with-glibc2.29-x86_64-3.8.10-64/tmpqc37cguk/m80e56c88364a8e9d8553f659577acc090143a8d54f9ef83c7c12ac7eb91aecfa.so /home/thomas/.pytensor/compiledir_Linux-5.15-microsoft-standard-WSL2-x86_64-with-glibc2.29-x86_64-3.8.10-64/tmpqc37cguk/mod.cpp -lpython3.8
/home/thomas/.pytensor/compiledir_Linux-5.15-microsoft-standard-WSL2-x86_64-with-glibc2.29-x86_64-3.8.10-64/tmpqc37cguk/mod.cpp: In member function ‘int {anonymous}::__struct_compiled_op_m80e56c88364a8e9d8553f659577acc090143a8d54f9ef83c7c12ac7eb91aecfa::run()’:
/home/thomas/.pytensor/compiledir_Linux-5.15-microsoft-standard-WSL2-x86_64-with-glibc2.29-x86_64-3.8.10-64/tmpqc37cguk/mod.cpp:5249:13: note: variable tracking size limit exceeded with ‘-fvar-tracking-assignments’, retrying without
 5249 |         int run(void) {
      |             ^~~
g++: fatal error: Killed signal terminated program cc1plus
compilation terminated.

PyMC version information:

Occured in 5.5.0 and 5.6.1

Detailed watermark:

Last updated: Tue Jul 18 2023

Python implementation: CPython
Python version : 3.8.10
IPython version : 8.0.1

arviz : 0.15.1
pandas : 2.0.2
daft : 0.1.2
pymc : 5.6.1
matplotlib: 3.7.1
numpy : 1.22.1
scipy : 1.7.3
pytensor: 2.12.3

Watermark: 2.3.0

Operating System: Ubuntu 20.04.6 LTS Subsystem under Windows 10 WSL-2
PyMC installation via pip

Context for the issue:

Stops further evaluation of the model with sample_posterior_prediction

D1.csv
compiler-bug.zip

twiecki · 2023-07-18T18:59:18Z

Installation with pip is not supported (because the compiler situation is too difficult), you need to use mamba or conda.

ThomasHoppe · 2023-07-25T13:50:00Z

@twiecki:

I reinstalled now pymc under conda, but the problem remains :-(

Operating System: Ubuntu 20.04.6 LTS Subsystem under Windows 10 WSL-2
PyMC installation via conda (miniconda)

Last updated: Tue Jul 25 2023

Python implementation: CPython
Python version : 3.8.17
IPython version : 8.0.1

arviz : 0.15.1
numpy : 1.22.1
matplotlib: 3.7.1
scipy : 1.7.3
pandas : 2.0.2
pymc : 5.6.1

Watermark: 2.3.0

CompileError: Compilation failed (return status=1):
/usr/bin/g++ -shared -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -Wno-c++11-narrowing -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables -march=broadwell -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16 -msahf -mmovbe -maes -mno-sha -mpclmul -mpopcnt -mabm -mno-lwp -mfma -mno-fma4 -mno-xop -mbmi -mno-sgx -mbmi2 -mno-pconfig -mno-wbnoinvd -mno-tbm -mavx -mavx2 -msse4.2 -msse4.1 -mlzcnt -mrtm -mhle -mrdrnd -mf16c -mfsgsbase -mrdseed -mprfchw -madx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 -mno-clflushopt -mno-xsavec -mno-xsaves -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma -mno-avx512vbmi -mno-avx5124fmaps -mno-avx5124vnniw -mno-clwb -mno-mwaitx -mno-clzero -mno-pku -mno-rdpid -mno-gfni -mno-shstk -mno-avx512vbmi2 -mno-avx512vnni -mno-vaes -mno-vpclmulqdq -mno-avx512bitalg -mno-avx512vpopcntdq -mno-movdiri -mno-movdir64b -mno-waitpkg -mno-cldemote -mno-ptwrite --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=4096 -mtune=broadwell -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -m64 -fPIC -I/home/thomas/.local/lib/python3.8/site-packages/numpy/core/include -I/home/thomas/miniconda3/envs/pymc/include/python3.8 -I/home/thomas/.local/lib/python3.8/site-packages/pytensor/link/c/c_code -L/home/thomas/miniconda3/envs/pymc/lib -fvisibility=hidden -o /home/thomas/.pytensor/compiledir_Linux-5.15-microsoft-standard-WSL2-x86_64-with-glibc2.17-x86_64-3.8.17-64/tmph2y66i87/m80e56c88364a8e9d8553f659577acc090143a8d54f9ef83c7c12ac7eb91aecfa.so /home/thomas/.pytensor/compiledir_Linux-5.15-microsoft-standard-WSL2-x86_64-with-glibc2.17-x86_64-3.8.17-64/tmph2y66i87/mod.cpp -lpython3.8

g++: fatal error: Killed signal terminated program cc1plus
compilation terminated.

twiecki · 2023-07-25T14:11:24Z

Hm, it seems it's still using the system compile (/usr/bin/g++), whereas it should use the compilers from the environment. Are you sure you activated the environment correctly? Also, can you post the outputs of: mamba list and which g++?

ThomasHoppe · 2023-07-25T14:26:11Z

I am definitly sure that the environment was activated correctly. This python version is only used for pymc.

Here is the module list and the output of g++ -v:

conda_list.txt
g++-version.txt

twiecki · 2023-07-26T22:31:40Z

That's not the output of which g++.

ThomasHoppe · 2023-07-27T05:48:12Z

which g++ gives /usr/bin/g++

twiecki · 2023-07-27T09:49:42Z

This is what it shows for me:

>>which clang
clang is /Users/twiecki/micromamba/envs/pymc5/bin/clang
clang is /usr/bin/clang

You can see it has a compiler installed in my env which you lack, not sure why. But you can try to install it manually.

ThomasHoppe · 2023-07-29T10:13:47Z

I installed clang outside and environment which clang shows /usr/bin/clang.
Even if I install clang inside an env which clang ´still shows /usr/bin/clang.

But still I got

/home/thomas/.local/lib/python3.8/site-packages/pytensor/tensor/rewriting/elemwise.py:1019: UserWarning: Loop fusion failed because the resulting node would exceed the kernel argument limit.
warn(
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...

CompileError: Compilation failed (return status=1):
/usr/bin/g++ -shared -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -Wno-c++11-narrowing -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables -march=broadwell -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16 -msahf -mmovbe -maes -mno-sha -mpclmul -mpopcnt -mabm -mno-lwp -mfma -mno-fma4 -mno-xop -mbmi -mno-sgx -mbmi2 -mno-pconfig -mno-wbnoinvd -mno-tbm -mavx -mavx2 -msse4.2 -msse4.1 -mlzcnt -mrtm -mhle -mrdrnd -mf16c -mfsgsbase -mrdseed -mprfchw -madx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 -mno-clflushopt -mno-xsavec -mno-xsaves -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma -mno-avx512vbmi -mno-avx5124fmaps -mno-avx5124vnniw -mno-clwb -mno-mwaitx -mno-clzero -mno-pku -mno-rdpid -mno-gfni -mno-shstk -mno-avx512vbmi2 -mno-avx512vnni -mno-vaes -mno-vpclmulqdq -mno-avx512bitalg -mno-avx512vpopcntdq -mno-movdiri -mno-movdir64b -mno-waitpkg -mno-cldemote -mno-ptwrite --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=4096 -mtune=broadwell -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -m64 -fPIC -I/home/thomas/.local/lib/python3.8/site-packages/numpy/core/include -I/home/thomas/miniconda3/envs/pymc5/include/python3.8 -I/home/thomas/.local/lib/python3.8/site-packages/pytensor/link/c/c_code -L/home/thomas/miniconda3/envs/pymc5/lib -fvisibility=hidden -o /home/thomas/.pytensor/compiledir_Linux-5.15-microsoft-standard-WSL2-x86_64-with-glibc2.17-x86_64-3.8.17-64/tmpv9hkx7wj/m68ff8b8c4d606c4dd1f8fe6d6ebc5e974ecbc23ad2e3ca82f4d826e6f743dc44.so /home/thomas/.pytensor/compiledir_Linux-5.15-microsoft-standard-WSL2-x86_64-with-glibc2.17-x86_64-3.8.17-64/tmpv9hkx7wj/mod.cpp -lpython3.8
g++: fatal error: Killed signal terminated program cc1plus
compilation terminated.

So /usr/bin/g++ is still called. Is there some additional configuration to do for switching to clang?

twiecki · 2023-07-31T06:41:32Z

What I meant is that you need to install g++ from mamba into your environment. clang is the compile I'm using on OSX instead of g++. Something went wrong with your installation, you can also retry in a fresh env. Or try mamba install -c conda-forge gcc.

ThomasHoppe · 2023-08-02T13:03:57Z

Well, I made a clean install.

I installed mamba under Linux as described on https://mamba.readthedocs.io/en/latest/installation.html from mambaforge.
Then:
mamba create -n pymc
mamba activate pymc
mamba install gcc
mamba install pymc (which also downgraded gcc from 13.1.0 to 12.3.0 and four other packages)
which gcc gives /home/thomas/mambaforge/envs/pymc/bin/gcc
which g++ gives /home/thomas/mambaforge/envs/pymc/bin/g++
follwed by the installation of jupyter notebook and supporting libs.

Watermark now gives:
Last updated: Wed Aug 02 2023

Python implementation: CPython
Python version : 3.11.4
IPython version : 8.14.0

arviz : 0.16.1
pandas : 2.0.3
scipy : 1.11.1
matplotlib: 3.7.2
numpy : 1.25.1
pymc : 5.7.0

Watermark: 2.4.3

Again running the compiler-bug notebook gives after

/home/thomas/mambaforge/envs/pymc/lib/python3.11/site-packages/pytensor/tensor/rewriting/elemwise.py:1028: UserWarning: Loop fusion failed because the resulting node would exceed the kernel argument limit.
warn(
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...

the well-known compiler bug, but now with gcc from the env

CompileError: Compilation failed (return status=1):
/home/thomas/mambaforge/envs/pymc/bin/g++ -shared -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -Wno-c++11-narrowing -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables -march=broadwell -mmmx -mpopcnt -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -mavx -mavx2 -mno-sse4a -mno-fma4 -mno-xop -mfma -mno-avx512f -mbmi -mbmi2 -maes -mpclmul -mno-avx512vl -mno-avx512bw -mno-avx512dq -mno-avx512cd -mno-avx512er -mno-avx512pf -mno-avx512vbmi -mno-avx512ifma -mno-avx5124vnniw -mno-avx5124fmaps -mno-avx512vpopcntdq -mno-avx512vbmi2 -mno-gfni -mno-vpclmulqdq -mno-avx512vnni -mno-avx512bitalg -mno-avx512bf16 -mno-avx512vp2intersect -mno-3dnow -madx -mabm -mno-cldemote -mno-clflushopt -mno-clwb -mno-clzero -mcx16 -mno-enqcmd -mf16c -mfsgsbase -mfxsr -mhle -msahf -mno-lwp -mlzcnt -mmovbe -mno-movdir64b -mno-movdiri -mno-mwaitx -mno-pconfig -mno-pku -mno-prefetchwt1 -mprfchw -mno-ptwrite -mno-rdpid -mrdrnd -mrdseed -mrtm -mno-serialize -mno-sgx -mno-sha -mno-shstk -mno-tbm -mno-tsxldtrk -mno-vaes -mno-waitpkg -mno-wbnoinvd -mxsave -mno-xsavec -mxsaveopt -mno-xsaves -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -mno-uintr -mno-hreset -mno-kl -mno-widekl -mno-avxvnni -mno-avx512fp16 --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=4096 -mtune=broadwell -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -m64 -fPIC -I/home/thomas/mambaforge/envs/pymc/lib/python3.11/site-packages/numpy/core/include -I/home/thomas/mambaforge/envs/pymc/include/python3.11 -I/home/thomas/mambaforge/envs/pymc/lib/python3.11/site-packages/pytensor/link/c/c_code -L/home/thomas/mambaforge/envs/pymc/lib -fvisibility=hidden -o /home/thomas/.pytensor/compiledir_Linux-5.15-microsoft-standard-WSL2-x86_64-with-glibc2.31-x86_64-3.11.4-64/tmpyelpazdx/m68ff8b8c4d606c4dd1f8fe6d6ebc5e974ecbc23ad2e3ca82f4d826e6f743dc44.so /home/thomas/.pytensor/compiledir_Linux-5.15-microsoft-standard-WSL2-x86_64-with-glibc2.31-x86_64-3.11.4-64/tmpyelpazdx/mod.cpp -lpython3.11
g++: fatal error: Killed signal terminated program cc1plus
compilation terminated.

Since this used Python 3.11 and Pymc 5.7, I made a second attempt by downgrading Python to 3.8 and Pymc 3.6.1.

The paths to gcc and g++ are the same as above as well as the error.

So I think, it is not an issue with my installations.

Did you run the compiler-bug.ipynb yourself? Could you reproduce the behaviour?

Since the warning Loop fusion failed because the resulting node would exceed the kernel argument limit. appears always, couldn't it be that the translation from the tensor-network to the c-code (atleast so far I understand that from the outside) produces some kind of "loop" for the compiler and that thus the compiler runs out of space?

ricardoV94 · 2023-08-02T13:05:32Z

Did you try the conda-forge channel specifically? mamba install -c conda-forge pymc in a new environment.

ricardoV94 · 2023-08-02T13:07:21Z

Since the warning Loop fusion failed because the resulting node would exceed the kernel argument limit. appears always, couldn't it be that the translation from the tensor-network to the c-code (atleast so far I understand that from the outside) produces some kind of "loop" for the compiler and that thus the compiler runs out of space?

Can you try with a very simple model?

import pymc as pm

with pm.Model() as m:
  x = pm.Normal()
  pm.sample()

It is not clear for me if you see a problem with specific models or in general

ThomasHoppe · 2023-08-02T13:09:39Z

mamba install -c conda-forge pymc gives as output

Looking for: ['pymc']

conda-forge/noarch 13.5MB @ 4.0MB/s 3.7s
conda-forge/linux-64 33.4MB @ 4.7MB/s 7.7s

Pinned packages:

python 3.8.*

Transaction

Prefix: /home/thomas/mambaforge/envs/pymc

All requested packages already installed

ricardoV94 · 2023-08-02T13:10:37Z

You should install from a fresh environment

ThomasHoppe · 2023-08-02T13:15:34Z

It is the specific model of the notebook. As I explained at the beginning, a colleague of mine who authored this model has no problem at all.

All of my other models worked unter PyMC 5 (after some adaptations) without problem.
Even the simple model:
`import pymc as pm

with pm.Model() as m:
x = pm.Normal("test")
pm.sample()`

Runs as expected:

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [test]

100.00% [4000/4000 00:02<00:00 Sampling 2 chains, 0 divergences]

Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 2 seconds.
We recommend running at least 4 chains for robust computation of convergence diagnostics

ricardoV94 · 2023-08-02T13:18:06Z

So back to your case. After you install with conda-forge, can you try running a single chain? Just trying to narrow down the issue space

ThomasHoppe · 2023-08-02T14:48:40Z

So back to your case. After you install with conda-forge, can you try running a single chain? Just trying to narrow down the issue space

Well, installed mamba install -c conda-forge pymc in a fresh env test,
Sampled with chains=1 as suggested:

with model_toto: trace_ = pm.sample(draws=nb_samples, chains=1, tune=tune)

Still got same behavior

CompileError: Compilation failed (return status=1):
/home/thomas/mambaforge/envs/test/bin/g++ -shared -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -Wno-c++11-narrowing -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables -march=broadwell -mmmx -mpopcnt -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -mavx -mavx2 -mno-sse4a -mno-fma4 -mno-xop -mfma -mno-avx512f -mbmi -mbmi2 -maes -mpclmul -mno-avx512vl -mno-avx512bw -mno-avx512dq -mno-avx512cd -mno-avx512er -mno-avx512pf -mno-avx512vbmi -mno-avx512ifma -mno-avx5124vnniw -mno-avx5124fmaps -mno-avx512vpopcntdq -mno-avx512vbmi2 -mno-gfni -mno-vpclmulqdq -mno-avx512vnni -mno-avx512bitalg -mno-avx512bf16 -mno-avx512vp2intersect -mno-3dnow -madx -mabm -mno-cldemote -mno-clflushopt -mno-clwb -mno-clzero -mcx16 -mno-enqcmd -mf16c -mfsgsbase -mfxsr -mhle -msahf -mno-lwp -mlzcnt -mmovbe -mno-movdir64b -mno-movdiri -mno-mwaitx -mno-pconfig -mno-pku -mno-prefetchwt1 -mprfchw -mno-ptwrite -mno-rdpid -mrdrnd -mrdseed -mrtm -mno-serialize -mno-sgx -mno-sha -mno-shstk -mno-tbm -mno-tsxldtrk -mno-vaes -mno-waitpkg -mno-wbnoinvd -mxsave -mno-xsavec -mxsaveopt -mno-xsaves -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -mno-uintr -mno-hreset -mno-kl -mno-widekl -mno-avxvnni -mno-avx512fp16 --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=4096 -mtune=broadwell -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -m64 -fPIC -I/home/thomas/.local/lib/python3.8/site-packages/numpy/core/include -I/home/thomas/mambaforge/envs/test/include/python3.8 -I/home/thomas/.local/lib/python3.8/site-packages/pytensor/link/c/c_code -L/home/thomas/mambaforge/envs/test/lib -fvisibility=hidden -o /home/thomas/.pytensor/compiledir_Linux-5.15-microsoft-standard-WSL2-x86_64-with-glibc2.10-x86_64-3.8.17-64/tmpi_iiqq0k/m68ff8b8c4d606c4dd1f8fe6d6ebc5e974ecbc23ad2e3ca82f4d826e6f743dc44.so /home/thomas/.pytensor/compiledir_Linux-5.15-microsoft-standard-WSL2-x86_64-with-glibc2.10-x86_64-3.8.17-64/tmpi_iiqq0k/mod.cpp -lpython3.8
g++: fatal error: Killed signal terminated program cc1plus
compilation terminated.

Did you run the supplied notebook? How did it behave in your environment?

twiecki · 2023-08-02T20:38:00Z

I think you might not have enough resources (RAM) so g++ is getting killed. E.g. soedinglab/hh-suite#280

ThomasHoppe · 2023-08-14T15:18:57Z

I increased the limit for the main storage to 10 GB and still the same error occured. Actually I can't believe that a compilation of roughly 8 MB of C-Code (compare the attached generated code file) cannot be done within 10 GB
pytensor_compilation_error_1pmxatij.zip

twiecki · 2023-08-14T22:26:48Z

@maresb could this be an arch issue?

maresb · 2023-08-15T07:38:06Z

No, this should be pure linux-64. This feels to me like a memory issue. Maybe the 10GB is not being made available somehow. I would check the output of free, and then look in /var/log/syslog for messages from the kernel's OOM-killer.

ThomasHoppe · 2023-09-12T15:13:35Z

I increased the limit for the main storage to 10 GB and still the same error occured. Actually I can't believe that a compilation of roughly 8 MB of C-Code (compare the attached generated code file) cannot be done within 10 GB

twiecki · 2023-09-12T15:14:29Z

@ThomasHoppe Not disk space but RAM.

ThomasHoppe · 2023-09-12T15:24:19Z

If I say main memory, I do not talk about disc space. Im talking about 10GB of RAM !
The 10 GB are available. Take a look at the excerpt of the syslog.

I enclose also a video showing the last 6 minutes from 31 minutes of the call to pymc.sample where you can see from htop and pmap that the storage usage of cc1plus increases within these 6 minutes from rougly 2GB to more than 10GB.

syslog-htop-pmap-video.zip

twiecki · 2023-09-12T15:40:59Z

@ThomasHoppe I misunderstood. Then it's definitely not the RAM. I'm a bit stumped, because it's not a compiler error but the compiler getting killed.

ThomasHoppe · 2023-09-13T16:15:43Z

State of the bug isolation:

Clean install in separate mamba environment (hence dependencies should be correct)
Excluded RAM restrictions
Found increased memory consumption of the compiler in the last 6 minutes of the 30 minutes process during compilation
Remaining confounder for the compiler behavior are 1) until now unnoticed bug in the compiler itself 2) a configuration issue of the parameters used to call the compiler 3) the generated C-Code which depends on PyTensor

considering the frequent usage of GCC, it is not very probable that such a compiler bug wasn't found yet
this is a possibility I cannot exclude. If I inspect the compiler parameters in the error message of Aug, 2. I see -I/home/thomas/.local/lib/python3.8/site-packages/numpy/core/include which is definitly an inclusion path outside the used mamba environment. Could this be the reason?
As computer scientist I would conclude, that the trouble is more likely caused by the generated C-Code, which causes the compiler in one way or the other to allocate more and more memory.

ricardoV94 · 2023-09-13T16:30:52Z

@ThomasHoppe I didn't have time to look at your model before. I believe the source of the problem is that you have a very inefficient model. You are doing a series of operations per row of data, which builds a very large latent graph. You can probably vectorize your operations using advanced indexing, which will make the computational graph of the model much simpler and shorter to compile.

ricardoV94 · 2023-09-13T16:40:39Z

Here is how I would write your last model (probably has bugs!!!):

#import sklearn.preprocessing
model_toto = pm.Model()

with model_toto:
    score = pm.Normal("score", tau=1., mu=0., shape=nb_clubs)
    advantage_defence_diff = pm.Normal("offence_defence_diff", 
                            tau=1., mu=1.5, shape=nb_clubs)
    
    # number of goals scored more at home as away
    home_advantage = pm.Normal("home_advantage", tau=10., mu=.0)
       
    # softmax regression weights for winner predicton:
    weights = pm.Normal("weights", mu=(0., .25, -0.25), tau=100., shape=(3))    
          
    heim = np.array([hg[0] for hg in home_goals_])
    gast = np.array([hg[1] for hg in home_goals_])
    h_goals = np.array([hg[2] for hg in home_goals_])
    
    heim_ = np.array([ag[0] for hg in away_goals_])
    gast_ = np.array([ag[1] for hg in away_goals_])
    a_goals = np.array([ag[2] for hg in away_goals_])
    
    s_h_, add_h = score[heim], advantage_defence_diff[heim]
    s_g, add_g = score[gast], advantage_defence_diff[gast]
    
    s_h = s_h_ + home_advantage
    
    offence_heim = s_h + add_h
    defence_heim = s_h - add_h
    offence_gast = s_g + add_g
    defence_gast = s_g - add_g
            
    home_value = offence_heim - defence_gast
    away_value = offence_gast - defence_heim
        
    score_diff = s_h-s_g # can be negative!
        
    ### no negative values
    home_value = pm.math.switch(pm.math.lt(home_value, 0.), low, home_value)
    away_value = pm.math.switch(pm.math.lt(away_value, 0.), low, away_value)

    # for prediction of the winner
    toto = np.where(
        h_goals == a_goals,
        0,
        np.where(
            h_goals > a_goals,
            1,
            2
        ),
    )
                    
    mu_home = pm.Deterministic("home_rate", home_value)
    pm.Poisson("home_goals", observed=home_goals, mu=mu_home)

    mu_away = pm.Deterministic("away_rate", away_value)
    pm.Poisson("away_goals", observed=away_goals, mu=mu_away)
    
    ha_diff = score_diff
    ha_diff = ha_diff.reshape((-1,1))
    ha_diff = ha_diff.repeat(3, axis=1)  
        
    pred = pm.math.exp(ha_diff * weights)
    pred = (pred.T/pm.math.sum(pred, axis=1)).T
    pm.Categorical('toto', p=pred, observed=toto)

Those index and numerical operations are vectorized just like numpy, and your model won't grow exponentially in complexity with your data size.

ThomasHoppe · 2023-09-14T07:46:34Z

@ThomasHoppe I didn't have time to look at your model before. I believe the source of the problem is that you have a very inefficient model. You are doing a series of operations per row of data, which builds a very large latent graph. You can probably vectorize your operations using advanced indexing, which will make the computational graph of the model much simpler and shorter to compile.

@ricardoV94: Thanks, for the suggestion. Actually, the model was designed by a colleague, who has no problems running it. He does not encounter the compiler problem. I also found that the iterative solution wouldn't be ideal, but hadn't the time diving deeper into it, without a running reference solution. Your seems to me quite plausible and we will give it a try ...

ricardoV94 · 2023-09-14T09:09:20Z

Let us know if it works. If not, the right place to continue this discussion would be on discourse: https://discourse.pymc.io/

Regarding your colleague, even if he could manage to compile, I am certain the model will be considerably slower the way he wrote it down. I'll close this issue in the meantime, as it's not clear it would be worth the trouble to try and make the compiler more robust to very large graphs.

ThomasHoppe added the bug label Jul 18, 2023

twiecki closed this as completed Jul 18, 2023

ricardoV94 reopened this Jul 25, 2023

ricardoV94 added installation issues about dependencies or installation and removed bug labels Jul 25, 2023

ricardoV94 closed this as completed Sep 14, 2023

ricardoV94 added the pytensor label Sep 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: g++: fatal error: Killed signal terminated program cc1plus #6833

BUG: g++: fatal error: Killed signal terminated program cc1plus #6833

ThomasHoppe commented Jul 18, 2023 •

edited

Loading

twiecki commented Jul 18, 2023

ThomasHoppe commented Jul 25, 2023

twiecki commented Jul 25, 2023

ThomasHoppe commented Jul 25, 2023

twiecki commented Jul 26, 2023

ThomasHoppe commented Jul 27, 2023

twiecki commented Jul 27, 2023

ThomasHoppe commented Jul 29, 2023

twiecki commented Jul 31, 2023

ThomasHoppe commented Aug 2, 2023

ricardoV94 commented Aug 2, 2023

ricardoV94 commented Aug 2, 2023 •

edited

Loading

ThomasHoppe commented Aug 2, 2023

ricardoV94 commented Aug 2, 2023

ThomasHoppe commented Aug 2, 2023

ricardoV94 commented Aug 2, 2023

ThomasHoppe commented Aug 2, 2023

twiecki commented Aug 2, 2023

ThomasHoppe commented Aug 14, 2023

twiecki commented Aug 14, 2023

maresb commented Aug 15, 2023

ThomasHoppe commented Sep 12, 2023

twiecki commented Sep 12, 2023

ThomasHoppe commented Sep 12, 2023

twiecki commented Sep 12, 2023

ThomasHoppe commented Sep 13, 2023

ricardoV94 commented Sep 13, 2023 •

edited

Loading

ricardoV94 commented Sep 13, 2023 •

edited

Loading

ThomasHoppe commented Sep 14, 2023

ricardoV94 commented Sep 14, 2023

BUG: g++: fatal error: Killed signal terminated program cc1plus #6833

BUG: g++: fatal error: Killed signal terminated program cc1plus #6833

Comments

ThomasHoppe commented Jul 18, 2023 • edited Loading

Describe the issue:

Reproduceable code example:

Error message:

PyMC version information:

Context for the issue:

twiecki commented Jul 18, 2023

ThomasHoppe commented Jul 25, 2023

twiecki commented Jul 25, 2023

ThomasHoppe commented Jul 25, 2023

twiecki commented Jul 26, 2023

ThomasHoppe commented Jul 27, 2023

twiecki commented Jul 27, 2023

ThomasHoppe commented Jul 29, 2023

twiecki commented Jul 31, 2023

ThomasHoppe commented Aug 2, 2023

ricardoV94 commented Aug 2, 2023

ricardoV94 commented Aug 2, 2023 • edited Loading

ThomasHoppe commented Aug 2, 2023

ricardoV94 commented Aug 2, 2023

ThomasHoppe commented Aug 2, 2023

ricardoV94 commented Aug 2, 2023

ThomasHoppe commented Aug 2, 2023

twiecki commented Aug 2, 2023

ThomasHoppe commented Aug 14, 2023

twiecki commented Aug 14, 2023

maresb commented Aug 15, 2023

ThomasHoppe commented Sep 12, 2023

twiecki commented Sep 12, 2023

ThomasHoppe commented Sep 12, 2023

twiecki commented Sep 12, 2023

ThomasHoppe commented Sep 13, 2023

ricardoV94 commented Sep 13, 2023 • edited Loading

ricardoV94 commented Sep 13, 2023 • edited Loading

ThomasHoppe commented Sep 14, 2023

ricardoV94 commented Sep 14, 2023

ThomasHoppe commented Jul 18, 2023 •

edited

Loading

ricardoV94 commented Aug 2, 2023 •

edited

Loading

ricardoV94 commented Sep 13, 2023 •

edited

Loading

ricardoV94 commented Sep 13, 2023 •

edited

Loading