Skip to content
forked from Prograf-UFF/SCIM

Here you find the official implementation of the Spatial Contextualization for Closed Itemset Mining (SCIM) algorithm.

License

Notifications You must be signed in to change notification settings

laffernandes/SCIM

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spatial Contextualization for Closed Itemset Mining

The Spatial Contextualization for Closed Itemset Mining (SCIM) algorithm is a mining procedure that builds a space for the target database in such a way that relevant closed itemsets can be retrieved regarding the relative spatial location of their items.

The SCIM algorithm uses Dual Scaling to map the items of the database to a multidimensional metric space called Solution Space. The representation of the database in the Solution Space assists in the interpretation and definition of overlapping clusters of related items. The distances of the items to the centers of the clusters are used as criteria for generating itemsets. Therefore, instead of using the minimum support threshold, a distance threshold is defined concerning the reference and the maximum distances computed per cluster during the mapping procedure.

The approach was developed by Altobelli B. Mantuan and Leandro A. F. Fernandes. Check out the project's website for details.

This repository includes the C++ implementation of the SCIM algorithm, and a sample application using this implementation.

Please cite our IEEE ICDM 2018 paper if you use this code in your research:

@InProceedings{mantuan_fernandes-icdm-2018,
  author    = {Mantuan, Altobelli B. and Fernandes, Leandro A. F.},
  title     = {Spatial contextualization for closed itemset mining},
  booktitle = {Proceedings of the 2018 IEEE International Conference on Data Mining (ICDM)},
  year      = {2018},
  pages     = {1176--1181},
  doi       = {https://doi.org/10.1109/ICDM.2018.00155},
  url       = {http://www.ic.uff.br/~laffernandes/projects/sodm},
}

Do not exitate to contact Altobelli B. Mantuan (amantuan@ic.uff.br, altobelli.bm@gmail.com) if any problems are encountered.

Licence

All code is released under the GNU General Public License, version 3, or (at your option) any later version.

Platforms

We have compiled and tested the sample application on Linux and Windows using GCC 4.9.1 and Microsoft Visual C++ 2013.

Requirements

Make sure that you have all the following tools and libraries installed and working before attempting to compile the SCIM implementation.

Required tools:

Required C++ libraries:

  • Boost 1.5.0 or later (header-only libraries)
  • Eigen 3.2.0 or later

Building, Compiling, and Running

Use the git clone command to download the project:

$ git clone https://github.com/Prograf-UFF/SCIM.git SCIM
$ cd SCIM

Make sure you have the environment variables for Eigen (EIGEN3_INCLUDE_DIR) and Boost (BOOST_ROOT) defined in your system.

The basic steps for configuring and building the sample application look like this:

$ mkdir build
$ cd build

$ cmake [-G <generator>] [options] -DCMAKE_BUILD_TYPE=Release ..

Assuming a makefile generator was used:

$ make

To run the sample application, just call:

$ SCIM <database-file-path> <dr-threshold-value> <output-folder-path>

The database file must follow the .num format used by The LUCS-KDD Discretised/normalised ARM and CARM Data Library.

The distance ratio threshold (dr-threshold-value) must be in the [0, 1] range. We believe that 0 (zero) is an excellent initial guess value. The user may increase the parameter value slightly in an exploratory fashion in order to detect more closed itemsets.

About

Here you find the official implementation of the Spatial Contextualization for Closed Itemset Mining (SCIM) algorithm.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 95.0%
  • CMake 5.0%