Installation issues #9

wd15 · 2018-05-15T13:33:42Z

I tried to install amgx / pyamgx using this Nix recipe. I've encountered three problems while testing using demo.py.

After installation I need to place import numpy as the first import in demo.py. Otherwise, I get the following error.

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "__init__.pxd", line 163, in init pyamgx
  File "/nix/store/mq134cl5nfiy422cjzvjms90az1zxnwh-python2.7-numpy-1.14.0/lib/python2.7/site-packages/numpy/__init__.py", line 142, in <module>
    from . import add_newdocs
...
ImportError: 
Importing the multiarray numpy extension module failed.  Most
likely you are trying to import a failed build of numpy.
If you're working with a numpy git repo, try `git clean -xdf` (removes all
files not under version control).  Otherwise reinstall numpy.

The configuration in demo.py on line 6 needs to be changed to cfg = pyamgx.Config().create_from_file(os.environ['AMGX_DIR']+'/lib/configs/core/FGMRES_AGGREGATION.json') since I'm accessing the json file in lib/ where pyamgx is installed.
When running demo.py I now get the following error.

AMGX version 2.0.0.130-opensource
Built on May 14 2018, 23:06:38
Failed while initializing CUDA runtime in cudaRuntimeGetVersionVariable 'solver' not registered
Converting config string to current config version
Error parsing parameter string: Incorrect config entry (number of equal signs is not 1) :  "config_version": 2

Error parsing parameter string obtained from file: 
Traceback (most recent call last):
  File "demo.py", line 8, in <module>
    cfg = pyamgx.Config().create_from_file(os.environ['AMGX_DIR']+'/lib/configs/core/FGMRES_AGGREGATION.json')
  File "pyamgx/Config.pyx", line 49, in pyamgx.Config.create_from_file
  File "pyamgx/Errors.pyx", line 62, in pyamgx.check_error
pyamgx.AMGXError: Incorrect amgx configuration provided.

I'm using version 6cb23fed266 of amgx and version df32133 of pyamgx. Are those versions compatible?

The text was updated successfully, but these errors were encountered:

shwina · 2018-05-15T14:37:07Z

Thanks.

I am still trying to play with nix and figure this one out.
AMGX_DIR is supposed to be set to the cloned AMGX repo directory, but yes, it can also be set to the AMGX install directory (which is lib in this case). I will try to improve the wording in the setup instructions about what exactly AMGX_DIR is
This error means that pyamgx was not correctly initialized. Within the nix environment, I see the following error when running pyamgx.initialize:

Failed while initializing CUDA runtime in cudaRuntimeGetVersion

This might indicate some issue with the CUDA libraries in nix, or that they are incompatible with our systems. Might have to look into this deeper.

shwina · 2018-05-15T15:43:54Z

This last issue definitely seems to be unrelated to (py)amgx, as I'm unable to run the AMGX example program:

[nix-shell:/nix/store/2mmp5wjg7f829lvgxg7gfl5fcyav9bf5-AmgX/lib/examples]$ pwd
/nix/store/2mmp5wjg7f829lvgxg7gfl5fcyav9bf5-AmgX/lib/examples

[nix-shell:/nix/store/2mmp5wjg7f829lvgxg7gfl5fcyav9bf5-AmgX/lib/examples]$ ./amgx_capi -c ../configs/core/CG_DILU.json 
AMGX version 2.0.0.130-opensource
Built on May 15 2018, 15:33:04
AMGX ERROR: file /tmp/nix-build-AmgX.drv-0/lafn8qxabfn95rh3bh3y0bi113kzwl8w-source/examples/amgx_capi.c line    245
AMGX ERROR: Error initializing amgx core.
Failed while initializing CUDA runtime in cudaRuntimeGetVersion

wd15 · 2018-05-15T17:02:52Z

@shwina, thanks for looking into that. I'll try and debug or switch to Conda.

wd15 · 2018-05-15T17:57:43Z

@shwina, I submitted an issue on amgx, NVIDIA/AMGX#27

shwina · 2018-05-15T19:54:52Z

Thanks, but I think the problem may be at a lower level than AMGX. I cannot even run the deviceQuery sample from the CUDA toolkit provided by Nix, and the error indicates an incompatibility between the NVIDIA driver and CUDA toolkit version.

[nix-shell:/nix/store/l7xmd5899g9789saqkd9bm7fh2hp3jlq-cudatoolkit-9.1.85.1/samples/bin/x86_64/linux/release]$ ./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL

I'm trying to see if installing NVIDIA drivers via Nix fixes anything.

shwina · 2018-05-15T20:31:40Z

So far that doesn't seem to be working.

shwina · 2018-05-16T13:51:55Z

OK some progress after going down a small Nix hole:

First (and sort of unrelated), I added a fix to pyamgx so that it produces the appropriate error message when pyamgx.initialize() fails:

[nix-shell:~/tmp/nixes/amgx]$ python demo.py
AMGX version 2.0.0.130-opensource
Built on May 16 2018, 13:20:54
Failed while initializing CUDA runtime in cudaRuntimeGetVersionTraceback (most recent call last):
  File "demo.py", line 7, in <module>
    pyamgx.initialize()
  File "pyamgx/pyamgx.pyx", line 14, in pyamgx.initialize
  File "pyamgx/Errors.pyx", line 62, in pyamgx.check_error
pyamgx.AMGXError: Error initializing amgx core.

From the discussion here it looks like the way to get the previous command to work is to ensure that the libucda.so is picked up from the host system. The NVIDIA drivers must be installed on the host system.

So first I tried:

[nix-shell:~/tmp/nixes/amgx]$ LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libcuda.so python demo.py
python: error while loading shared libraries: libnvidia-fatbinaryloader.so.375.82: cannot open shared object file: No such file or directory

So I also added path to libnvidia-fatbinaryloader.so.375.82 to LD_PRELOAD, but now I get:

LD_PRELOAD="/usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.375.82" python demo.py

AMGX version 2.0.0.130-opensource
Built on May 16 2018, 13:20:54
Compiled with CUDA Runtime 8.0, using CUDA driver 8.0
Traceback (most recent call last):
  File "demo.py", line 20, in <module>
    import scipy.sparse as sparse
  File "/nix/store/097b7q6jlzsl5fa0zcvbkpcf3gnk53yw-python2.7-scipy-1.0.0/lib/python2.7/site-packages/scipy/sparse/__init__.py", line 229, in <module>
    from .csr import *
  File "/nix/store/097b7q6jlzsl5fa0zcvbkpcf3gnk53yw-python2.7-scipy-1.0.0/lib/python2.7/site-packages/scipy/sparse/csr.py", line 15, in <module>
    from ._sparsetools import csr_tocsc, csr_tobsr, csr_count_blocks, \
ImportError: /nix/store/nvdymgkdcp7cmyvh318bzs397sy2hrxp-gcc-4.8.5-lib/lib/libstdc++.so.6: version `CXXABI_1.3.9' not found (required by /nix/store/097b7q6jlzsl5fa0zcvbkpcf3gnk53yw-python2.7-scipy-1.0.0/lib/python2.7/site-packages/scipy/sparse/_sparsetools.so)

So as with NumPy, for some reason I had to import scipy before importing pyamgx:

[nix-shell:~/tmp/nixes/amgx]$ head -5 demo.py
import numpy as np
import scipy.sparse as sparse
import scipy.sparse.linalg as splinalg
import pyamgx
import os

Finally:


[nix-shell:~/tmp/nixes/amgx]$ LD_PRELOAD="/usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.375.82" python demo.py
AMGX version 2.0.0.130-opensource
Built on May 16 2018, 13:20:54
Compiled with CUDA Runtime 8.0, using CUDA driver 8.0
AMG Grid:
         Number of Levels: 1
            LVL         ROWS               NNZ    SPRSTY       Mem (GB)
         --------------------------------------------------------------
           0(D)            5                25         1       3.69e-07
         --------------------------------------------------------------
         Grid Complexity: 1
         Operator Complexity: 1
         Total Memory Usage: 3.68804e-07 GB
         --------------------------------------------------------------
           iter      Mem Usage (GB)       residual           rate
         --------------------------------------------------------------
            Ini            0.406494   1.248728e+00
              0            0.406494   1.441213e-15         0.0000
         --------------------------------------------------------------
         Total Iterations: 1
         Avg Convergence Rate: 		         0.0000
         Final Residual: 		   1.441213e-15
         Total Reduction in Residual: 	   1.154145e-15
         Maximum Memory Usage: 		          0.406 GB
         --------------------------------------------------------------
Total Time: 0.00239411
    setup: 0.00153699 s
    solve: 0.00085712 s
    solve(per iteration): 0.00085712 s
('pyamgx solution: ', array([  5.10430199, -12.78780803,   1.91116712,  -7.30070306,
        13.65398112]))
('scipy solution: ', array([  5.10430199, -12.78780803,   1.91116712,  -7.30070306,
        13.65398112]))

OK so a more elegant way to do this (also recommended in the above discussion) is to create symlinks to the above libraries to a folder /nix/var/nix/lib, and add this folder to LD_LIBRARY_PATH:

[nix-shell:~/tmp/nixes/amgx]$ ls -l /nix/var/nix/lib/
total 4
lrwxrwxrwx 1 root root 51 May 16 09:50 libcuda.so -> /usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so
lrwxrwxrwx 1 root root 61 May 16 09:50 libnvidia-fatbinaryloader.so.375.82 -> /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.375.82

[nix-shell:~/tmp/nixes/amgx]$ $LD_LIBRARY_PATH
bash: /nix/var/nix/lib:/home/ashwin/local/bin:/usr/lib/x86_64-linux-gnu/nvidia/current/: No such file or directory

[nix-shell:~/tmp/nixes/amgx]$ python demo.py
AMGX version 2.0.0.130-opensource
Built on May 16 2018, 13:20:54
Compiled with CUDA Runtime 8.0, using CUDA driver 8.0
AMG Grid:
         Number of Levels: 1
            LVL         ROWS               NNZ    SPRSTY       Mem (GB)
         --------------------------------------------------------------
           0(D)            5                25         1       3.69e-07
         --------------------------------------------------------------
         Grid Complexity: 1
         Operator Complexity: 1
         Total Memory Usage: 3.68804e-07 GB
         --------------------------------------------------------------
           iter      Mem Usage (GB)       residual           rate
         --------------------------------------------------------------
            Ini            0.406494   1.393182e+00
              0            0.406494   4.370313e-16         0.0000
         --------------------------------------------------------------
         Total Iterations: 1
         Avg Convergence Rate: 		         0.0000
         Final Residual: 		   4.370313e-16
         Total Reduction in Residual: 	   3.136930e-16
         Maximum Memory Usage: 		          0.406 GB
         --------------------------------------------------------------
Total Time: 0.0024313
    setup: 0.00156934 s
    solve: 0.000861952 s
    solve(per iteration): 0.000861952 s
('pyamgx solution: ', array([-0.76851926,  5.01697985, -1.61899648,  2.52336589, -1.42913961]))
('scipy solution: ', array([-0.76851926,  5.01697985, -1.61899648,  2.52336589, -1.42913961]))

[nix-shell:~/tmp/nixes/amgx]$

shwina · 2018-05-16T13:55:11Z

I also made a few small changes to the .nix files:


[nix-shell:~/tmp/nixes/amgx]$ cat amgx.nix
{ nixpkgs ? import <nixpkgs> {} }:
let
  stdenv48 = nixpkgs.overrideCC nixpkgs.stdenv nixpkgs.pkgs.gcc48;
in
  stdenv48.mkDerivation rec {
    name = "AmgX";

    src = nixpkgs.fetchFromGitHub {
      owner = "NVIDIA";
      repo = "AMGX";
      rev = "6cb23fed26602e4873d5c1deb694a2c8480feac3";
      sha256 = "1g5zj7wzxc8b2lyn00xp7jqq70bz550q8fmzcb5mzzapa44xjk7q";
    };

    buildInputs = [
      nixpkgs.pkgs.cmake
      nixpkgs.pkgs.cudatoolkit8
    ];

    unpackPhase = ''
      cp --recursive "$src" ./
      chmod --recursive u=rwx ./"$(basename "$src")"
      cd ./"$(basename "$src")"
    '';

    configurePhase = ''
      mkdir -p build
      cd build
      mkdir --parents "$out"
      cmake -DCMAKE_INSTALL_PREFIX:PATH="$out" ../
    '';

    buildPhase = ''
      make -j"$NIX_BUILD_CORES" all
    '';
  }

[nix-shell:~/tmp/nixes/amgx]$ cat pyamgx.nix
{ nixpkgs ? import <nixpkgs> {} }:
let
  amgx = import ./amgx.nix { inherit nixpkgs; };
in
  nixpkgs.python27Packages.buildPythonPackage rec {
    pname = "pyamgx";
    version = "";
    src = nixpkgs.fetchFromGitHub {
      owner = "shwina";
      repo = pname;
      rev = "fac3c841e1527942da64c7d1805d1ffe94f58766";
      sha256 = "1752yhhq82980qhn5i8mngjlybkgvp96qlgnv6y5cdn8921m8h2s";
    };
    doCheck=false;
    buildInputs = [
      nixpkgs.python27Packages.scipy
      nixpkgs.python27Packages.numpy
      amgx
      nixpkgs.python27Packages.cython
    ];
    AMGX_DIR = "/blah";
    # shellHook = ''
    #   export AMGX_DIR = "/blah"
    # '';
  }

wd15 · 2018-05-16T15:19:56Z

Awesome work! I'm going to try and follow along. Apologies for sending you down the Nix rabbit hole. I've been trying to get into Nix lately and I think it's an improvement over using Conda. I hope that you enjoy it.

Please do add the final nix recipes to this repository if you/we do get things working. I can submit a pull request if you'd like some outside contributions. I haven't tried submitting anything to nixpkgs yet, but maybe that is also an option down the road.

I think that the LD_LIBRARY_PATH can probably be set during the nix build via a shell hook. I'll look into that assuming I can reproduce your work above.

Also, I wasn't using a machine with a gpu or drivers, which I am now. I don't use gpus much so didn't really know what I was doing.

shwina · 2018-05-17T00:11:33Z

I've figured out what causes the issues with numpy/scipy imports: it's because of the different GCC versions used to compile AMGX (gcc-4.8) v/s numpy/scipy (Nix default I think gcc-7.3.0).

See below: when pyamgx is imported first, numpy and scipy are not happy about finding the gcc-4.8.5 versions of libgomp.so and libstdc++.so instead of the gcc-7.3.0 versions:

[nix-shell:~/tmp/nixes/amgx]$ python demo.py
Traceback (most recent call last):
  File "demo.py", line 1, in <module>
    import pyamgx
  File "__init__.pxd", line 163, in init pyamgx
  File "/nix/store/mq134cl5nfiy422cjzvjms90az1zxnwh-python2.7-numpy-1.14.0/lib/python2.7/site-packages/numpy/__init__.py", line 142, in <module>
    from . import add_newdocs
  File "/nix/store/mq134cl5nfiy422cjzvjms90az1zxnwh-python2.7-numpy-1.14.0/lib/python2.7/site-packages/numpy/add_newdocs.py", line 13, in <module>
    from numpy.lib import add_newdoc
  File "/nix/store/mq134cl5nfiy422cjzvjms90az1zxnwh-python2.7-numpy-1.14.0/lib/python2.7/site-packages/numpy/lib/__init__.py", line 8, in <module>
    from .type_check import *
  File "/nix/store/mq134cl5nfiy422cjzvjms90az1zxnwh-python2.7-numpy-1.14.0/lib/python2.7/site-packages/numpy/lib/type_check.py", line 11, in <module>
    import numpy.core.numeric as _nx
  File "/nix/store/mq134cl5nfiy422cjzvjms90az1zxnwh-python2.7-numpy-1.14.0/lib/python2.7/site-packages/numpy/core/__init__.py", line 26, in <module>
    raise ImportError(msg)
ImportError:
Importing the multiarray numpy extension module failed.  Most
likely you are trying to import a failed build of numpy.
If you're working with a numpy git repo, try `git clean -xdf` (removes all
files not under version control).  Otherwise reinstall numpy.

Original error was: /nix/store/nvdymgkdcp7cmyvh318bzs397sy2hrxp-gcc-4.8.5-lib/lib/libgomp.so.1: version `GOMP_4.0' not found (required by /nix/store/lhd411dd1r4nzqpi52l2sxhyfjqiwlph-openblas-0.2.20/lib/libopenblas.so.0)

LD_PRELOAD libgomp.so to make numpy happy:

[nix-shell:~/tmp/nixes/amgx]$ LD_PRELOAD=/nix/store/236pskgkb440l3q0458fbd4gikgplw5w-gcc-7.3.0-lib/lib/libgomp.so.1 python demo.py
Traceback (most recent call last):
  File "demo.py", line 4, in <module>
    import scipy.sparse as sparse
  File "/nix/store/097b7q6jlzsl5fa0zcvbkpcf3gnk53yw-python2.7-scipy-1.0.0/lib/python2.7/site-packages/scipy/sparse/__init__.py", line 229, in <module>
    from .csr import *
  File "/nix/store/097b7q6jlzsl5fa0zcvbkpcf3gnk53yw-python2.7-scipy-1.0.0/lib/python2.7/site-packages/scipy/sparse/csr.py", line 15, in <module>
    from ._sparsetools import csr_tocsc, csr_tobsr, csr_count_blocks, \
ImportError: /nix/store/nvdymgkdcp7cmyvh318bzs397sy2hrxp-gcc-4.8.5-lib/lib/libstdc++.so.6: version `CXXABI_1.3.9' not found (required by /nix/store/097b7q6jlzsl5fa0zcvbkpcf3gnk53yw-python2.7-scipy-1.0.0/lib/python2.7/site-packages/scipy/sparse/_sparsetools.so

LD_PRELOAD libstdc++.so to make scipy happy:


[nix-shell:~/tmp/nixes/amgx]$ LD_PRELOAD="/nix/store/236pskgkb440l3q0458fbd4gikgplw5w-gcc-7.3.0-lib/lib/libgomp.so.1 /nix/store/236pskgkb440l3q0458fbd4gikgplw5w-gcc-7.3.0-lib/lib/libstdc++.so" python demo.py
AMGX version 2.0.0.130-opensource
Built on May 16 2018, 14:45:25
Compiled with CUDA Runtime 8.0, using CUDA driver 8.0
AMG Grid:
         Number of Levels: 1
            LVL         ROWS               NNZ    SPRSTY       Mem (GB)
         --------------------------------------------------------------
           0(D)            5                25         1       3.69e-07
         --------------------------------------------------------------
         Grid Complexity: 1
         Operator Complexity: 1
         Total Memory Usage: 3.68804e-07 GB
         --------------------------------------------------------------
           iter      Mem Usage (GB)       residual           rate
         --------------------------------------------------------------
            Ini            0.477661   1.113422e+00
              0            0.477661   1.189315e-15         0.0000
         --------------------------------------------------------------
         Total Iterations: 1
         Avg Convergence Rate: 		         0.0000
         Final Residual: 		   1.189315e-15
         Total Reduction in Residual: 	   1.068162e-15
         Maximum Memory Usage: 		          0.478 GB
         --------------------------------------------------------------
Total Time: 0.0023961
    setup: 0.00153821 s
    solve: 0.000857888 s
    solve(per iteration): 0.000857888 s
('pyamgx solution: ', array([-13.10772991,  -8.11496672,  -2.27576125,   6.75924547,
        12.55481743]))
('scipy solution: ', array([-13.10772991,  -8.11496672,  -2.27576125,   6.75924547,
        12.55481743]))

Use LD_LIBRARY_PATH to avoid LD_PRELOAD - this isn't a great solution but at least it's pretty:

[nix-shell:~/tmp/nixes/amgx]$ export LD_LIBRARY_PATH=/nix/store/236pskgkb440l3q0458fbd4gikgplw5w-gcc-7.3.0-lib/lib/:$LD_LIBRARY_PATH

[nix-shell:~/tmp/nixes/amgx]$ python demo.py
AMGX version 2.0.0.130-opensource
Built on May 16 2018, 14:45:25
Compiled with CUDA Runtime 8.0, using CUDA driver 8.0
AMG Grid:
         Number of Levels: 1
            LVL         ROWS               NNZ    SPRSTY       Mem (GB)
         --------------------------------------------------------------
           0(D)            5                25         1       3.69e-07
         --------------------------------------------------------------
         Grid Complexity: 1
         Operator Complexity: 1
         Total Memory Usage: 3.68804e-07 GB
         --------------------------------------------------------------
           iter      Mem Usage (GB)       residual           rate
         --------------------------------------------------------------
            Ini            0.477661   1.584991e+00
              0            0.477661   1.029213e-15         0.0000
         --------------------------------------------------------------
         Total Iterations: 1
         Avg Convergence Rate: 		         0.0000
         Final Residual: 		   1.029213e-15
         Total Reduction in Residual: 	   6.493497e-16
         Maximum Memory Usage: 		          0.478 GB
         --------------------------------------------------------------
Total Time: 0.00246966
    setup: 0.00157434 s
    solve: 0.000895328 s
    solve(per iteration): 0.000895328 s
('pyamgx solution: ', array([ 9.27278245,  6.04026388, -1.18125993, -5.33643819, -3.43642485]))
('scipy solution: ', array([ 9.27278245,  6.04026388, -1.18125993, -5.33643819, -3.43642485]))

wd15 · 2018-05-17T14:39:42Z

Currently, the following happens

$ LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libcuda.so python demo.py
python: error while loading shared libraries: libnvidia-fatbinaryloader.so.384.111: cannot open shared object file: No such file or directory

but when the libnvidia-fatbinaryloader.so.384.111 is included, the following happens

$ LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libcuda.so /usr/lib/nvidia-384/libnvidia-fatbinaryloader.so.384.111 " python demo.py
AMGX version 2.0.0.130-opensource
Built on May 16 2018, 21:41:17
Failed while initializing CUDA runtime in cudaRuntimeGetVersionTraceback (most recent call last):
  File "demo.py", line 5, in <module>
    pyamgx.initialize()
  File "pyamgx/pyamgx.pyx", line 14, in pyamgx.initialize
  File "pyamgx/Errors.pyx", line 62, in pyamgx.check_error
pyamgx.AMGXError: Error initializing amgx core.

It's built using cudatoolkit9, not 8 as the machine has the drivers for 9 (edit), apparently. Could this be an issue for pyamgx / amgx?

shwina · 2018-05-17T14:53:36Z

Could you try with cudatoolkit8? See here for the minimum driver versions required for different CUDA toolkit versions.

It's only the NVIDIA driver (which provides both libcuda.so [confusingly] and libnvidia-fatbinaryloader) that needs to be installed on the host system, so it shouldn't matter what CUDA toolkit is installed on the host system.

shwina · 2018-05-17T15:00:25Z

It looks like cudatoolkit9 installs CUDA toolkit 9.1, which needs a minimum driver version 387.xx

wd15 · 2018-05-17T15:07:45Z

Sorry, I'm confused, does that mean I should try cudatoolkit8 or not? I think I go a different error pointing out the incompatibility with 9, but can't remember now.

shwina · 2018-05-17T15:12:14Z

Yes, try with cudatoolkit8, because it looks like you have NVIDIA driver version 384.111, which is apparently not sufficient to support CUDA toolkit 9.1

shwina · 2018-05-17T15:16:27Z

I can confirm that I get the same error with cudatoolkit9, but not with cudatoolkit8.

wd15 · 2018-05-17T18:09:28Z

Different error this time (using cudatoolkit8)

$ LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libcuda.so /usr/lib/nvidia-384/libnvidia-fatbinaryloader.so.384.111" python demo.py
AMGX version 2.0.0.130-opensource
Built on May 17 2018, 17:46:34
Compiled with CUDA Runtime 8.0, using CUDA driver 9.0
Thrust failure: function_attributes(): after cudaFuncGetAttributes: invalid device function

[snip]

  File "pyamgx/Solver.pyx", line 28, in pyamgx.Solver.create
  File "pyamgx/Errors.pyx", line 62, in pyamgx.check_error
pyamgx.AMGXError: CUDA kernel launch error.

shwina · 2018-05-17T18:32:19Z

Hmm, what GPU is on the system? Can you provide the output of

$ nvidia-smi

on the host system?

shwina · 2018-05-17T19:16:10Z

I think we are close. I suspect that the final piece is the CUDA_ARCH CMake variable described in the AMGX README. Depending on the GPU on your system, you may have to set this to a different value, e.g., -DCUDA_ARCH="30" for some older GPUs.

The appropriate value can be found for different GPUs here. For example, a Quadro K2200 GPU supports compute capability capability 3.0, so -DCUDA_ARCH="30".

I don't think AMGX supports anything lower than 3.0 though.

wd15 · 2018-05-17T19:43:15Z

$ nvidia-smi
Thu May 17 15:42:54 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.111                Driver Version: 384.111                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla C2075         On   | 00000000:03:00.0 Off |                    0 |
| 30%   52C   P12    30W /  N/A |      1MiB /  5301MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

wd15 · 2018-05-17T19:53:08Z

Oh dear! It looks like the "Tesla C2075" is 2.0. Looks like I need to switch to another GPU. Correct?

shwina · 2018-05-17T20:02:01Z

Let me try compiling AMGX with only 2.0 compute capability and see what happens.

shwina · 2018-05-17T20:03:03Z

Sorry that the process is so long drawn out! I forgot how many things there are to keep in mind.

shwina · 2018-05-17T20:12:49Z

Unfortunately, that didn't work. I can try to investigate further, but if you have access to a newer GPU for testing, that would probably be the way to go!

wd15 · 2018-05-18T18:53:43Z

My current situation:

GPU Name: Tesla K40m (with compute capability of 3.5, )
Building amgx with -DCUDA_ARCH="35", amgx.nix version

I cloned the AMGX repo so that I can do, LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libcuda.so /usr/lib/nvidia-384/libnvidia-fatbinaryloader.so.384.111" $AMGX_DIR/lib/examples/amgx_capi -m examples/matrix.mtx -c core/configs/FGMRES_AGGREGATION.json

The above gives,

AMGX version 2.0.0.130-opensource
Built on May 17 2018, 20:34:58
Compiled with CUDA Runtime 8.0, using CUDA driver 9.0
Warning: No mode specified, using dDDI by default.
Caught amgx exception: Could not create the CUDENSE handle
 at: /tmp/nix-build-AmgX.drv-0/lafn8qxabfn95rh3bh3y0bi113kzwl8w-source/core/src/solvers/dense_lu_solver.cu:733
Stack trace:
 /nix/store/6s94g7q56wkc7i3sd3zd9jhihwnwjrrg-AmgX/lib/libamgxsh.so : amgx::dense_lu_solver::DenseLUSolver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::DenseLUSolver(amgx::AMG_Config&, std::string const&, amgx::ThreadManager*)+0x159

...

Reading data...
RHS vector was not found. Using RHS b=[1,…,1]^T
Solution vector was not found. Setting initial solution to x=[0,…,0]^T
Finished reading
Caught amgx exception: Mode not found.

...

wd15 · 2018-05-21T15:20:08Z

As reported here, AMGX does actually seem to work with the nix recipe, however, running demo.py still gives the above error.

wd15 · 2018-05-21T15:36:14Z

Ok the following version of demo.py works now for me.

import numpy as np
import scipy.sparse as sparse
import scipy.sparse.linalg as splinalg

import pyamgx
import os

pyamgx.initialize()

# Initialize config and resources:
cfg = pyamgx.Config().create_from_file(os.environ['AMGX_DIR']+'/lib/configs/core/FGMRES_NOPREC.json')
...

shwina · 2018-05-21T16:11:27Z

OK so the problem definitely has something to do with the creation of the dense LU solver although I have no idea what exactly. The FGMRES_NOPREC is a purely iterative solver without any preconditioning, so there is no dense system solved or created.

At least the Nix pieces are coming together nicely 🎉

shwina · 2018-05-22T11:48:18Z

I noticed that a similar error (but not same) is raised with a singular matrix. This may be a long shot, but @wd15, could you please post the output of the following program:

import pyamgx
import os

pyamgx.initialize()

# Initialize config and resources:
cfg = pyamgx.Config().create_from_file(os.environ['AMGX_DIR']+'/core/configs/FGMRES_AGGREGATION.json')
rsc = pyamgx.Resources().create_simple(cfg)

# Create matrices and vectors:
A = pyamgx.Matrix().create(rsc)
x = pyamgx.Vector().create(rsc)
b = pyamgx.Vector().create(rsc)

# Create solver:
solver = pyamgx.Solver().create(rsc, cfg)

# Upload system:
import numpy as np
import scipy.sparse as sparse
import scipy.sparse.linalg as splinalg

R = np.random.rand(5, 5)
print(R)
M = sparse.csr_matrix(R)
rhs = np.random.rand(5)
sol = np.zeros(5, dtype=np.float64)

A.upload_CSR(M)
b.upload(rhs)
x.upload(sol)

# Setup and solve system:
solver.setup(A)
solver.solve(b, x)

# Download solution
x.download(sol)
print("pyamgx solution: ", sol)
print("scipy solution: ", splinalg.spsolve(M, rhs))

# Clean up:
A.destroy()
x.destroy()
b.destroy()
solver.destroy()
rsc.destroy()
cfg.destroy()

pyamgx.finalize()

The above just prints the random matrix before uploading it to AMGX.

shwina · 2018-05-22T12:04:02Z

Sorry, but you have to rearrange the imports as before I think.

wd15 · 2018-05-22T15:43:27Z

I ran the above and I get the same, Could not create the CUDENSE handle

$ LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libcuda.so /usr/lib/nvidia-384/libnvidia-fatbinaryloader.so.384.111" python demo_shwina.py 
AMGX version 2.0.0.130-opensource
Built on May 18 2018, 19:09:36
Compiled with CUDA Runtime 8.0, using CUDA driver 9.0
Caught amgx exception: Could not create the CUDENSE handle
 at: /tmp/nix-build-AmgX.drv-0/lafn8qxabfn95rh3bh3y0bi113kzwl8w-source/core/src/solvers/dense_lu_solver.cu:733
Stack trace:

shwina · 2018-05-22T15:51:03Z

Thanks, did it also print the matrix?

…

On Tue, May 22, 2018, 11:43 AM Daniel Wheeler ***@***.***> wrote: I ran the above and I get the same, Could not create the CUDENSE handle $ LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libcuda.so /usr/lib/nvidia-384/libnvidia-fatbinaryloader.so.384.111" python demo_shwina.py AMGX version 2.0.0.130-opensource Built on May 18 2018, 19:09:36 Compiled with CUDA Runtime 8.0, using CUDA driver 9.0 Caught amgx exception: Could not create the CUDENSE handle at: /tmp/nix-build-AmgX.drv-0/lafn8qxabfn95rh3bh3y0bi113kzwl8w-source/core/src/solvers/dense_lu_solver.cu:733 Stack trace: — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADCuhTYEVOUYHBcBKNTtE3AoM7291su4ks5t1DIigaJpZM4T_jql> .

wd15 · 2018-05-22T19:27:11Z

It doesn't get that far, the Python traceback is

Traceback (most recent call last):
  File "demo_shwina.py", line 16, in <module>
    solver = pyamgx.Solver().create(rsc, cfg)
  File "pyamgx/Solver.pyx", line 28, in pyamgx.Solver.create
  File "pyamgx/Errors.pyx", line 62, in pyamgx.check_error
pyamgx.AMGXError: CUDA kernel launch error.

shwina · 2018-05-22T20:09:24Z

Hmm. So looks like maybe a problem with the cuSolver library, and not AMGX. I'll try to see how to do a simple test with Nix for this.

…

On Tue, May 22, 2018, 3:27 PM Daniel Wheeler ***@***.***> wrote: It doesn't get that far, the Python traceback is Traceback (most recent call last): File "demo_shwina.py", line 16, in <module> solver = pyamgx.Solver().create(rsc, cfg) File "pyamgx/Solver.pyx", line 28, in pyamgx.Solver.create File "pyamgx/Errors.pyx", line 62, in pyamgx.check_error pyamgx.AMGXError: CUDA kernel launch error. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADCuhWJPkERn6mS-owOdPjuT1_k-9B5uks5t1GaRgaJpZM4T_jql> .

wd15 · 2018-05-23T13:58:13Z

The cuSolver wasn't required for testing FiPy with pyamgx so I'm no longer concerned about this. Please close this if you like.

shwina · 2018-05-23T16:52:00Z

OK thanks for trying!

pyramidpoint · 2019-04-13T00:55:05Z

wd15 mentioned this issue May 15, 2018

Adding support for GPU solvers via pyamgx usnistgov/fipy#567

Merged

shwina mentioned this issue May 21, 2018

Building AMGX using Nix NVIDIA/AMGX#27

Closed

shwina closed this as completed May 23, 2018

Installation issues #9

Installation issues #9

Comments

wd15 commented May 15, 2018

shwina commented May 15, 2018

shwina commented May 15, 2018 • edited Loading

wd15 commented May 15, 2018 • edited Loading

wd15 commented May 15, 2018

shwina commented May 15, 2018

shwina commented May 15, 2018

shwina commented May 16, 2018

shwina commented May 16, 2018

wd15 commented May 16, 2018

shwina commented May 17, 2018

wd15 commented May 17, 2018 • edited Loading

shwina commented May 17, 2018

shwina commented May 17, 2018

wd15 commented May 17, 2018

shwina commented May 17, 2018 • edited Loading

shwina commented May 17, 2018

wd15 commented May 17, 2018 • edited Loading

shwina commented May 17, 2018

shwina commented May 17, 2018 • edited Loading

wd15 commented May 17, 2018

wd15 commented May 17, 2018

shwina commented May 17, 2018

shwina commented May 17, 2018

shwina commented May 17, 2018

wd15 commented May 18, 2018

wd15 commented May 21, 2018

wd15 commented May 21, 2018

shwina commented May 21, 2018

shwina commented May 22, 2018

shwina commented May 22, 2018

wd15 commented May 22, 2018

shwina commented May 22, 2018 via email

wd15 commented May 22, 2018

shwina commented May 22, 2018 via email

wd15 commented May 23, 2018

shwina commented May 23, 2018

pyramidpoint commented Apr 13, 2019

shwina commented May 15, 2018 •

edited

Loading

wd15 commented May 15, 2018 •

edited

Loading

wd15 commented May 17, 2018 •

edited

Loading

shwina commented May 17, 2018 •

edited

Loading

wd15 commented May 17, 2018 •

edited

Loading

shwina commented May 17, 2018 •

edited

Loading