Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building AMGX using Nix #27

Closed
wd15 opened this issue May 15, 2018 · 15 comments
Closed

Building AMGX using Nix #27

wd15 opened this issue May 15, 2018 · 15 comments

Comments

@wd15
Copy link

wd15 commented May 15, 2018

I'm working on a Nix recipe for AMGX. It currently looks like this. Currently it seems to build without any issues. However, when I try to run an example I get the following,

$ examples/amgx_capi -m ../examples/matrix.mtx -c ../core/configs/FGMRES_AGGREGATION.json
AMGX version 2.0.0.130-opensource
Built on May 14 2018, 23:06:38
AMGX ERROR: file /tmp/nix-build-AmgX.drv-0/lafn8qxabfn95rh3bh3y0bi113kzwl8w-source/examples/amgx_capi.c line    245
AMGX ERROR: Error initializing amgx core.
Failed while initializing CUDA runtime in cudaRuntimeGetVersion

Any ideas?

@marsaev
Copy link
Collaborator

marsaev commented May 15, 2018

Hi there,

What driver is installed on the system? What CUDA toolkit is the library built against?

@wd15
Copy link
Author

wd15 commented May 15, 2018

I am using the CUDA toolkit outlined here. However, I haven't installed any drivers at least I'm not aware of any.

@marsaev
Copy link
Collaborator

marsaev commented May 15, 2018

Driver and proper Nvidia GPU are required for running AMGX.

@wd15
Copy link
Author

wd15 commented May 18, 2018

I've since switched to using the correct setup including drivers and GPU. The GPU is Tesla K40m with a compute capability of 3.5 and I'm using toolkit version 8.0.61_375.26. The build process that I'm using is outlined here. Now the following problem occurs when I run an example,

$ LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libcuda.so /usr/lib/nvidia-384/libnvidia-fatbinaryloader.so.384.111" $AMGX_DIR/lib/examples/amgx_capi -m examples/matrix.mtx -c core/configs/FGMRES_AGGREGATION.json

AMGX version 2.0.0.130-opensource
Built on May 17 2018, 20:34:58
Compiled with CUDA Runtime 8.0, using CUDA driver 9.0
Warning: No mode specified, using dDDI by default.
Caught amgx exception: Could not create the CUDENSE handle
 at: /tmp/nix-build-AmgX.drv-0/lafn8qxabfn95rh3bh3y0bi113kzwl8w-source/core/src/solvers/dense_lu_solver.cu:733
Stack trace:
 /nix/store/6s94g7q56wkc7i3sd3zd9jhihwnwjrrg-AmgX/lib/libamgxsh.so : amgx::dense_lu_solver::DenseLUSolver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::DenseLUSolver(amgx::AMG_Config&, std::string const&, amgx::ThreadManager*)+0x159

...

Reading data...
RHS vector was not found. Using RHS b=[1,…,1]^T
Solution vector was not found. Setting initial solution to x=[0,…,0]^T
Finished reading
Caught amgx exception: Mode not found.

...

@shwina
Copy link

shwina commented May 21, 2018

I'm unable to reproduce this, but could you post the output of:

$ LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libcuda.so /usr/lib/nvidia-384/libnvidia-fatbinaryloader.so.384.111" $AMGX_DIR/lib/examples/amgx_capi -m examples/matrix.mtx -c core/configs/FGMRES_NOPREC.json

@shwina
Copy link

shwina commented May 21, 2018

@marsaev sorry to keep "hijacking" this thread! But a related issue is shwina/pyamgx#9

@wd15
Copy link
Author

wd15 commented May 21, 2018

@shwina, oh, wow, it seems to have worked.

$ LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libcuda.so /usr/lib/nvidia-384/libnvidia-fatbinaryloader.so.384.111" $AMGX_DIR/lib/examples/amgx_capi -m examples/matrix.mtx -c core/configs/FGMRES_NOPREC.json
AMGX version 2.0.0.130-opensource
Built on May 18 2018, 19:09:36
Compiled with CUDA Runtime 8.0, using CUDA driver 9.0
Warning: No mode specified, using dDDI by default.
Reading data...
RHS vector was not found. Using RHS b=[1,…,1]^T
Solution vector was not found. Setting initial solution to x=[0,…,0]^T
Finished reading
           iter      Mem Usage (GB)       residual           rate
         --------------------------------------------------------------
            Ini                   0   3.464102e+00
              0                   0   1.845471e+00         0.5327
              1              0.0000   1.541877e+00         0.8355
              2              0.0000   1.374225e+00         0.8913
              3              0.0000   1.366903e+00         0.9947
              4              0.0000   1.040855e+00         0.7615
              5              0.0000   1.026638e+00         0.9863
              6              0.0000   8.614123e-01         0.8391
              7              0.0000   6.599583e-01         0.7661
              8              0.0000   6.596676e-01         0.9996
              9              0.0000   6.593714e-01         0.9996
             10              0.0000   6.331763e-01         0.9603
             11              0.0000   1.284671e-14         0.0000
         --------------------------------------------------------------
         Total Iterations: 12
         Avg Convergence Rate: 		         0.0627
         Final Residual: 		   1.284671e-14
         Total Reduction in Residual: 	   3.708525e-15
         Maximum Memory Usage: 		          0.000 GB
         --------------------------------------------------------------
Total Time: 0.0130909
    setup: 8.2144e-05 s
    solve: 0.0130088 s
    solve(per iteration): 0.00108406 s

@shwina
Copy link

shwina commented May 21, 2018

OK so the Nix recipe (while a bit clunky at runtime) at least works, but the problem seems to have something to do with cuDense?

@marsaev any ideas?

@marsaev
Copy link
Collaborator

marsaev commented May 22, 2018

Hi guys,

By looking into https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverDNcreate i couldn't tell right away what is the problem with cuDense.
Just few notes.

  1. Driver has backwards compatibility, so the more recent driver you install - the better (jsut make sure driver supports your GPU)
  2. AMGX initialization also initializes context (through CUDA runtime), so without supported GPU it would fail.

So now moving to the issues:
It looks like you are able to run AMGX successfully (in comment #27 (comment)), so issue happens when you switch to the configuration that contains DENSE_LU_SOLVER, right? Issue happens with amgx_example, right?

@wd15
Copy link
Author

wd15 commented May 22, 2018

Right, so I can run

$ LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libcuda.so /usr/lib/nvidia-384/libnvidia-fatbinaryloader.so.384.111" $AMGX_DIR/lib/examples/amgx_capi -m examples/matrix.mtx -c core/configs/FGMRES_NOPREC.json

but cannot run

$ LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libcuda.so /usr/lib/nvidia-384/libnvidia-fatbinaryloader.so.384.111" $AMGX_DIR/lib/examples/amgx_capi -m examples/matrix.mtx -c core/configs/FGMRES_AGGREGATION.json

for which I get the Caught amgx exception: Could not create the CUDENSE handle error.

@marsaev
Copy link
Collaborator

marsaev commented May 22, 2018

Can you set environment variable CUDA_LAUCNH_BLOCKING=1 and try to run again? Sometimes async launches generate errors that are misleading.

@wd15
Copy link
Author

wd15 commented May 22, 2018

I tried with export CUDA_LAUNCH_BLOCKING=1 but I get the same Could not create the CUDENSE handle

@marsaev
Copy link
Collaborator

marsaev commented May 22, 2018

Interesting, it's just from documentation it's not a lot of things that could go wrong with cuDense handle creation:
https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverDNcreate
You could probably take closer look at return code from cudense function to see what is exact error.

@wd15
Copy link
Author

wd15 commented May 23, 2018

I didn't need the dense solvers for the task at hand so the installation was adequate. Please close this if you like.

Just to add to this, I've submitted a pull request for a Nix recipe, which also includes an AMGX recipe, for pyamgx. Would you want the AMGX part of that as part of this repository? It works, but it probably needs to be improved for general use.

@marsaev
Copy link
Collaborator

marsaev commented Jun 19, 2018

Thanks for your submission!
It's better to keep things separate, but i will add the mention in the README.

@marsaev marsaev closed this as completed Jun 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants