In this repository you can find my personal implementation of several unsupervised learning algorithms.
See here for library usage instructions.
An entire directory is dedicated to implementation and testing of the algorithm described in this paper by Vittorio Erba, Marco Gherardi and Pietro Rotondo.
This repo contains:
- This README file
UL_lib/
: unsupervised learning library (see here for use)studies/FCI_estimator/
: directory containing codes for testing of FCI estimator by Erba, Gherardi and Rotondo (see here for more)environment.yaml
: YAML file for building conda environmentrequirements.txt
: requirements file
The library contains the following modules:
clustering.py
, implementing:kmeans
: k-meansFuzzyCMeans
: fuzzy c-meansSpectralClustering
: spectral clusteringDensPeakClustering
: density peak clustering
density_estimation.py
, implementing:HistEstimator
: histogram density estimatorGaussKDEstimator
: kernel density estimator with Gaussian kernel
dimensionality_reduction.py
, implementing:PCA
: principal component analysisIsomap
: isomap (Notice: this method still has to be revised, we do not guarantee it to work properly)KernelPCA
: kernel principal component analysisTwoNN
: two NNFCIEstimator
: full correlation integral-based intrinsic dimension estimator
metrics.py
, implementing:compute_MI
: mutual information calculatorcompute_NMI
: normalized mutual information calculatorcompute_FRatio
: F-ratio score calculator
To use these methods you can follow these simple steps:
-
Clone this repository:
git@github.com:TommasoTarchi/My_UL_library.git
-
Prepare the environment:
-
If using conda, substitute your desired environment's name to
<your_env_name>
in the first line ofenvironment.yaml
, and build the conda environment using:$ conda env create -f environment.yaml
-
If using Pip, just do:
$ pip install -r requirements.txt
-
-
Copy the UL_lib directory in your working directory.
-
use the desired classes/functions by importing them into your python script:
from UL_lib.<module_name> import <function/class name>
For instance, if you want to use the k-means algorithm you can use:
from UL_lib.clustering import kmeans
studies/FCI_estimator/
directory contains:
bash_scripts/
: bash scripts for running testsdatasets/
: datasets used to test the algorithmsrc/
: codes for implementation and testing of the algorithmresults/
: directory containing results of testsFCI_estimator-Presentation.pdf
: presentation of implementation and results
To reproduce the tests it is sufficient to navigate to the FCI_estimator/bash_scripts/
directory and run the bash scripts. Parameters in the scripts can be adjusted to
investigate the desired parameter configurations.
For details about implementation and testing of the algorithm (in particular for numerical "subtleties" in implementation) see this presentation.
The intrinsic dimension estimator for undersampled data implemented and tested in
FCI_estimator/
, was taken from the paper:
Erba, V., Gherardi, M. & Rotondo, P. Intrinsic dimension estimation for locally undersampled data. Sci Rep 9, 17133 (2019). link to paper.