When working with sparse matrices, it is desired to have a way to work with them as
if they were a regular numpy.arrays. Yet, many popular methods for arrays don't exist for
sparse matrices. spartans wishes to help, with many operations to work with
Full example notebook
- Free software: GNU General Public License v3
- Documentation: https://spartans.readthedocs.io.
- Mathematical Operations
- Rich set of operations not supported on sparse matrices like
variance,cov(covariance matrix) andcorrcoef(correlation matrix). - Easy Indexing
- Convenient methods to index for "extra" sparse features by variance or by quantity.
- Masking
- Many algorithms consider the zeros in a sparse matrix as missing data. Or considering missing
data as zeros. Depending on the use-case.
spartans - FeatureMatrix
- FeatureMatrix is a
spartan's first-class citizen. It is a wrapper aroundscipy.sparse.csrMatrix built with data analysis and data-science in mind.
Full example notebook
>>> import spartans as st
>>> from scipy.sparse import csr_matrix
>>> import numpy as np
>>> m = np.array([[1, -2, 0, 50],
[0, 0, 0, 100],
[1, 0, 0, 80],
[1, 4, 0, 0],f
[0, 0, 0, 0],
[0, 4, 0, 0],
[0, 0, 0, -50]])
>>> c = csr_matrix(m)We can get the the correlation matrix of m using numpy.
>>> np.corrcoef(m, rowvar=False)Out[]: array([[ 1. , -0.08, nan, 0.31],
[-0.08, 1. , nan, -0.35],
[ nan, nan, nan, nan],
[ 0.31, -0.35, nan, 1. ]])This won't work with the sparse matrix c
>>> np.corrcoef(c, rowvar=False)AttributeError: 'float' object has no attribute 'shape'But with spartans this can be done.
>>> st.corr(c)Out[]: array([[ 1. , -0.08, nan, 0.31],
[-0.08, 1. , nan, -0.35],
[ nan, nan, nan, nan],
[ 0.31, -0.35, nan, 1. ]])The column and row with nan is because the original matrix has a columns (feature) which is
zero for the entire column. spartans can handle that using st.non_zero_index(c, axis=0, as_bool=False)
which will return array([0, 1, 3]).
A lot more functionality is in the notebook.
- This open-source project is backed by SentinelOne
- This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.