dcme

Overview

The dcme package provides functions to compute data complexity measures.

Installation

dcme is under development and not yet available on CRAN. You can install the development version using the devtools package as follows:

# install.packages("devtools")
devtools::install_github("RomeroBarata/dcme")

Data Complexity Measures

The following complexity measures are currently implemented:

Simple Measures

num_examples: Number of Observations
num_examples_majority: Number of Observations in the Majority Class
num_examples_minority: Number of Observations in the Minority Class
num_features: Number of Features
num_features_numeric: Number of Numeric Features
num_features_binary: Number of Binary Features
num_features_categorical: Number of Categorical Features
num_classes: Number of Classes
proportion_examples_majority: Proportion of Majority Examples
proportion_examples_minority: Proportion of Minority Examples
proportion_features_numeric: Proportion of Numeric Features
proportion_features_binary: Proportion of Binary Features
proportion_features_categorical: Proportion of Categorical Features
IR: Imbalance Ratio

num_examples_majority, num_examples_minority, proportion_examples_majority, proportion_examples_minority, and IR are defined only for binary data sets.

Statistical Measures

sd_ratio: Geometric Mean Ratio of Standard Deviations
corr_abs: Mean Absolute Correlation Coefficient

Measures of Overlap of Individual Feature Values

F1: Fisher's Discriminant Ratio
F2: Volume of Overlap Region

Unfortunately the F1 and F2 measures are implemented only for binary data sets. General versions will be made available soon.

Measures of Separability of Classes

N2: Ratio of Average Intra/Inter Class 1-NN Distance
N3: Error Rate of 1-NN Classifier

Measures of Geometry, Topology, and Density of Manifolds

N4: Nonlinearity of the 1-NN Classifier
T2: Average Number of Points per Dimension

References

Definitions and explanations of most functions implemented in the dcme package can be found in the following literature:

[1] Michie, D., Spiegelhalter, D. J., & Taylor, C. C. (1994). Machine learning, neural and statistical classification.

[2] Ho, T. K., & Basu, M. (2002). Complexity measures of supervised classification problems. IEEE transactions on pattern analysis and machine intelligence, 24(3), 289-300.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
R		R
man		man
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
README.md		README.md
dcme.Rproj		dcme.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dcme

Overview

Installation

Data Complexity Measures

Simple Measures

Statistical Measures

Measures of Overlap of Individual Feature Values

Measures of Separability of Classes

Measures of Geometry, Topology, and Density of Manifolds

References

About

Releases

Packages

Languages

RomeroBarata/dcme

Folders and files

Latest commit

History

Repository files navigation

dcme

Overview

Installation

Data Complexity Measures

Simple Measures

Statistical Measures

Measures of Overlap of Individual Feature Values

Measures of Separability of Classes

Measures of Geometry, Topology, and Density of Manifolds

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages