Building AMGX using Nix #27

wd15 · 2018-05-15T17:49:45Z

I'm working on a Nix recipe for AMGX. It currently looks like this. Currently it seems to build without any issues. However, when I try to run an example I get the following,

$ examples/amgx_capi -m ../examples/matrix.mtx -c ../core/configs/FGMRES_AGGREGATION.json
AMGX version 2.0.0.130-opensource
Built on May 14 2018, 23:06:38
AMGX ERROR: file /tmp/nix-build-AmgX.drv-0/lafn8qxabfn95rh3bh3y0bi113kzwl8w-source/examples/amgx_capi.c line    245
AMGX ERROR: Error initializing amgx core.
Failed while initializing CUDA runtime in cudaRuntimeGetVersion

Any ideas?

The text was updated successfully, but these errors were encountered:

marsaev · 2018-05-15T19:17:33Z

Hi there,

What driver is installed on the system? What CUDA toolkit is the library built against?

wd15 · 2018-05-15T19:48:01Z

I am using the CUDA toolkit outlined here. However, I haven't installed any drivers at least I'm not aware of any.

marsaev · 2018-05-15T20:19:18Z

Driver and proper Nvidia GPU are required for running AMGX.

wd15 · 2018-05-18T19:04:48Z

I've since switched to using the correct setup including drivers and GPU. The GPU is Tesla K40m with a compute capability of 3.5 and I'm using toolkit version 8.0.61_375.26. The build process that I'm using is outlined here. Now the following problem occurs when I run an example,

$ LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libcuda.so /usr/lib/nvidia-384/libnvidia-fatbinaryloader.so.384.111" $AMGX_DIR/lib/examples/amgx_capi -m examples/matrix.mtx -c core/configs/FGMRES_AGGREGATION.json

AMGX version 2.0.0.130-opensource
Built on May 17 2018, 20:34:58
Compiled with CUDA Runtime 8.0, using CUDA driver 9.0
Warning: No mode specified, using dDDI by default.
Caught amgx exception: Could not create the CUDENSE handle
 at: /tmp/nix-build-AmgX.drv-0/lafn8qxabfn95rh3bh3y0bi113kzwl8w-source/core/src/solvers/dense_lu_solver.cu:733
Stack trace:
 /nix/store/6s94g7q56wkc7i3sd3zd9jhihwnwjrrg-AmgX/lib/libamgxsh.so : amgx::dense_lu_solver::DenseLUSolver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::DenseLUSolver(amgx::AMG_Config&, std::string const&, amgx::ThreadManager*)+0x159

...

Reading data...
RHS vector was not found. Using RHS b=[1,…,1]^T
Solution vector was not found. Setting initial solution to x=[0,…,0]^T
Finished reading
Caught amgx exception: Mode not found.

...

shwina · 2018-05-21T10:26:09Z

I'm unable to reproduce this, but could you post the output of:

$ LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libcuda.so /usr/lib/nvidia-384/libnvidia-fatbinaryloader.so.384.111" $AMGX_DIR/lib/examples/amgx_capi -m examples/matrix.mtx -c core/configs/FGMRES_NOPREC.json

shwina · 2018-05-21T10:27:23Z

@marsaev sorry to keep "hijacking" this thread! But a related issue is shwina/pyamgx#9

wd15 · 2018-05-21T15:15:25Z

@shwina, oh, wow, it seems to have worked.

$ LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libcuda.so /usr/lib/nvidia-384/libnvidia-fatbinaryloader.so.384.111" $AMGX_DIR/lib/examples/amgx_capi -m examples/matrix.mtx -c core/configs/FGMRES_NOPREC.json
AMGX version 2.0.0.130-opensource
Built on May 18 2018, 19:09:36
Compiled with CUDA Runtime 8.0, using CUDA driver 9.0
Warning: No mode specified, using dDDI by default.
Reading data...
RHS vector was not found. Using RHS b=[1,…,1]^T
Solution vector was not found. Setting initial solution to x=[0,…,0]^T
Finished reading
           iter      Mem Usage (GB)       residual           rate
         --------------------------------------------------------------
            Ini                   0   3.464102e+00
              0                   0   1.845471e+00         0.5327
              1              0.0000   1.541877e+00         0.8355
              2              0.0000   1.374225e+00         0.8913
              3              0.0000   1.366903e+00         0.9947
              4              0.0000   1.040855e+00         0.7615
              5              0.0000   1.026638e+00         0.9863
              6              0.0000   8.614123e-01         0.8391
              7              0.0000   6.599583e-01         0.7661
              8              0.0000   6.596676e-01         0.9996
              9              0.0000   6.593714e-01         0.9996
             10              0.0000   6.331763e-01         0.9603
             11              0.0000   1.284671e-14         0.0000
         --------------------------------------------------------------
         Total Iterations: 12
         Avg Convergence Rate: 		         0.0627
         Final Residual: 		   1.284671e-14
         Total Reduction in Residual: 	   3.708525e-15
         Maximum Memory Usage: 		          0.000 GB
         --------------------------------------------------------------
Total Time: 0.0130909
    setup: 8.2144e-05 s
    solve: 0.0130088 s
    solve(per iteration): 0.00108406 s

shwina · 2018-05-21T17:21:30Z

OK so the Nix recipe (while a bit clunky at runtime) at least works, but the problem seems to have something to do with cuDense?

@marsaev any ideas?

marsaev · 2018-05-22T07:10:19Z

Hi guys,

By looking into https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverDNcreate i couldn't tell right away what is the problem with cuDense.
Just few notes.

Driver has backwards compatibility, so the more recent driver you install - the better (jsut make sure driver supports your GPU)
AMGX initialization also initializes context (through CUDA runtime), so without supported GPU it would fail.

So now moving to the issues:
It looks like you are able to run AMGX successfully (in comment #27 (comment)), so issue happens when you switch to the configuration that contains DENSE_LU_SOLVER, right? Issue happens with amgx_example, right?

wd15 · 2018-05-22T15:34:02Z

Right, so I can run

$ LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libcuda.so /usr/lib/nvidia-384/libnvidia-fatbinaryloader.so.384.111" $AMGX_DIR/lib/examples/amgx_capi -m examples/matrix.mtx -c core/configs/FGMRES_NOPREC.json

but cannot run

$ LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libcuda.so /usr/lib/nvidia-384/libnvidia-fatbinaryloader.so.384.111" $AMGX_DIR/lib/examples/amgx_capi -m examples/matrix.mtx -c core/configs/FGMRES_AGGREGATION.json

for which I get the Caught amgx exception: Could not create the CUDENSE handle error.

marsaev · 2018-05-22T15:59:29Z

Can you set environment variable CUDA_LAUCNH_BLOCKING=1 and try to run again? Sometimes async launches generate errors that are misleading.

wd15 · 2018-05-22T19:24:02Z

I tried with export CUDA_LAUNCH_BLOCKING=1 but I get the same Could not create the CUDENSE handle

marsaev · 2018-05-22T20:27:39Z

Interesting, it's just from documentation it's not a lot of things that could go wrong with cuDense handle creation:
https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverDNcreate
You could probably take closer look at return code from cudense function to see what is exact error.

wd15 · 2018-05-23T14:04:03Z

I didn't need the dense solvers for the task at hand so the installation was adequate. Please close this if you like.

Just to add to this, I've submitted a pull request for a Nix recipe, which also includes an AMGX recipe, for pyamgx. Would you want the AMGX part of that as part of this repository? It works, but it probably needs to be improved for general use.

marsaev · 2018-06-19T21:14:27Z

Thanks for your submission!
It's better to keep things separate, but i will add the mention in the README.

wd15 mentioned this issue May 15, 2018

Installation issues shwina/pyamgx#9

Closed

tkphd mentioned this issue May 18, 2018

Adding support for GPU solvers via pyamgx usnistgov/fipy#567

Merged

marsaev closed this as completed Jun 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Building AMGX using Nix #27

Building AMGX using Nix #27

wd15 commented May 15, 2018

marsaev commented May 15, 2018

wd15 commented May 15, 2018

marsaev commented May 15, 2018

wd15 commented May 18, 2018

shwina commented May 21, 2018

shwina commented May 21, 2018

wd15 commented May 21, 2018

shwina commented May 21, 2018

marsaev commented May 22, 2018

wd15 commented May 22, 2018

marsaev commented May 22, 2018

wd15 commented May 22, 2018

marsaev commented May 22, 2018

wd15 commented May 23, 2018 •

edited

Loading

marsaev commented Jun 19, 2018

Building AMGX using Nix #27

Building AMGX using Nix #27

Comments

wd15 commented May 15, 2018

marsaev commented May 15, 2018

wd15 commented May 15, 2018

marsaev commented May 15, 2018

wd15 commented May 18, 2018

shwina commented May 21, 2018

shwina commented May 21, 2018

wd15 commented May 21, 2018

shwina commented May 21, 2018

marsaev commented May 22, 2018

wd15 commented May 22, 2018

marsaev commented May 22, 2018

wd15 commented May 22, 2018

marsaev commented May 22, 2018

wd15 commented May 23, 2018 • edited Loading

marsaev commented Jun 19, 2018

wd15 commented May 23, 2018 •

edited

Loading