Skip to content

OlegKonings/GLwithmex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GLwithmex

mex interface for CUDA implementation of Stephen Boyd's admm group lasso solver, with the extra feature of mulitple lambda tesing(in parallel).

http://www.stanford.edu/~boyd/papers/admm/group_lasso/group_lasso.html

This implementation is a mostly literal translation of the solver, with the added ability to test up to 31 lambdas in parallel. Can operate on any shape of dense matrix A, but should be at least size (32,32), otherwise the infernal MATLAB mex overheads will no make the call worthwhile.

inputs from MATLAB are (in order)

0) Matrix A (m,n) single precision floating point numbers (32 bit) in DENSE form AND must be passed into mex in TRANSPOSE form due to row-major format(will adjust m and n internally)
1) vector b (m,1) single precision floating point numbers
2) vector p (Psize length) 32 bit integer of K(Psize) length (partitions)
3) vector u (n,1) single precision floating point numbers
4) vector z (n,1) single precision floating point numbers
5) float (single) rho
6) float (single) alpha
7) integer max_iter
8) float (single) abstol
9) float (single) reltol
10) lambda array
11) 32 bit integer array which will return the number of iterations until convergence for each lambda(size of array is equal to number of lambdas)

outputs are (in order)

0) vector u (n,lambdas) single precision floating point numbers
1) vector z (n,lambdas) single precision floating point numbers
2) vector num_iters (num_lambdas,1) 32-bit integer array


NOTE: compile with --use_fast_math and for better parallel performance set environment variable CUDA_DEVICE_MAX_CONNECTIONS to 32 if using the Tesla line GPUs. 

Testing was done with default CUDA_DEVICE_MAX_CONNECTIONS=8, but if testing more lambdas increase to number of lambdas.

NOTE: no overlocking of GPU, is running at stock 706 Mhz

CUDA mex vs MATLAB 6-core 3.9 Ghz comparison:

dimensions Anumber of lambdas 6-core 3.9 Ghz MATLAB time CUDA mex time CUDA Speedup
1920x956 17 463 ms 26 ms 17.8x
1133x1545 17 862 ms 72 ms 11.9x
2000x1243 24 1023 ms 49 ms 20.8x
1111x1537 24 1144 ms 89 ms 12.85x
5000x1491 30 1369 ms 75 ms 18.25x
10000x1142 30 1329 ms 62 ms 21.43x
___

NOTE: First call of any GPU related mex interface from MATLAB will be at least 10x slower than subsequent calls, due to intial context setup. In general MATLAB adds 10-20 ms of running time vs. a clean C++ API library call to the same function.

Will perform better on 'skinny' matrices (where num_rows>=num_cols) due to fewer operations needed for that shape of Matrix.

<script> (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); ga('create', 'UA-43459430-1', 'github.com'); ga('send', 'pageview'); </script>

githalytics.com alpha

About

beta mex interface for group lasso

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published