GLwithmex

mex interface for CUDA implementation of Stephen Boyd's admm group lasso solver, with the extra feature of mulitple lambda tesing(in parallel).

http://www.stanford.edu/~boyd/papers/admm/group_lasso/group_lasso.html

This implementation is a mostly literal translation of the solver, with the added ability to test up to 31 lambdas in parallel. Can operate on any shape of dense matrix A, but should be at least size (32,32), otherwise the infernal MATLAB mex overheads will no make the call worthwhile.

inputs from MATLAB are (in order)

0) Matrix A (m,n) single precision floating point numbers (32 bit) in DENSE form AND must be passed into mex in TRANSPOSE form due to row-major format(will adjust m and n internally)
1) vector b (m,1) single precision floating point numbers
2) vector p (Psize length) 32 bit integer of K(Psize) length (partitions)
3) vector u (n,1) single precision floating point numbers
4) vector z (n,1) single precision floating point numbers
5) float (single) rho
6) float (single) alpha
7) integer max_iter
8) float (single) abstol
9) float (single) reltol
10) lambda array
11) 32 bit integer array which will return the number of iterations until convergence for each lambda(size of array is equal to number of lambdas)

outputs are (in order)

0) vector u (n,lambdas) single precision floating point numbers
1) vector z (n,lambdas) single precision floating point numbers
2) vector num_iters (num_lambdas,1) 32-bit integer array


NOTE: compile with --use_fast_math and for better parallel performance set environment variable CUDA_DEVICE_MAX_CONNECTIONS to 32 if using the Tesla line GPUs. 

Testing was done with default CUDA_DEVICE_MAX_CONNECTIONS=8, but if testing more lambdas increase to number of lambdas.

NOTE: no overlocking of GPU, is running at stock 706 Mhz

CUDA mex vs MATLAB 6-core 3.9 Ghz comparison:

dimensions A	number of lambdas	6-core 3.9 Ghz MATLAB time	CUDA mex time	CUDA Speedup
1920x956	17	463 ms	26 ms	17.8x
1133x1545	17	862 ms	72 ms	11.9x
2000x1243	24	1023 ms	49 ms	20.8x
1111x1537	24	1144 ms	89 ms	12.85x
5000x1491	30	1369 ms	75 ms	18.25x
10000x1142	30	1329 ms	62 ms	21.43x

___

NOTE: First call of any GPU related mex interface from MATLAB will be at least 10x slower than subsequent calls, due to intial context setup. In general MATLAB adds 10-20 ms of running time vs. a clean C++ API library call to the same function.

Will perform better on 'skinny' matrices (where num_rows>=num_cols) due to fewer operations needed for that shape of Matrix.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
GLcuda.cu		GLcuda.cu
GLmex.cpp		GLmex.cpp
GLtestMultiLambda.m		GLtestMultiLambda.m
GenerateRandomGroupLassoDataSet.m		GenerateRandomGroupLassoDataSet.m
GroupMextest.cpp		GroupMextest.cpp
GroupMextest.def		GroupMextest.def
GroupMextest.h		GroupMextest.h
GroupMextest.mexw64		GroupMextest.mexw64
GroupMextest.rc		GroupMextest.rc
GroupMextest.vcxproj		GroupMextest.vcxproj
GroupMextest.vcxproj.filters		GroupMextest.vcxproj.filters
GroupMextest.vcxproj.user		GroupMextest.vcxproj.user
MySubPlus.m		MySubPlus.m
README.md		README.md
Resource.h		Resource.h
factor.m		factor.m
objective.m		objective.m
shrinkage.m		shrinkage.m
stdafx.cpp		stdafx.cpp
stdafx.h		stdafx.h
targetver.h		targetver.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GLwithmex

CUDA mex vs MATLAB 6-core 3.9 Ghz comparison:

About

Releases

Packages

Languages

OlegKonings/GLwithmex

Folders and files

Latest commit

History

Repository files navigation

GLwithmex

CUDA mex vs MATLAB 6-core 3.9 Ghz comparison:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages