The dcme
package provides functions to compute data complexity measures.
dcme
is under development and not yet available on CRAN. You can install the
development version using the devtools
package as follows:
# install.packages("devtools")
devtools::install_github("RomeroBarata/dcme")
The following complexity measures are currently implemented:
num_examples
: Number of Observationsnum_examples_majority
: Number of Observations in the Majority Classnum_examples_minority
: Number of Observations in the Minority Classnum_features
: Number of Featuresnum_features_numeric
: Number of Numeric Featuresnum_features_binary
: Number of Binary Featuresnum_features_categorical
: Number of Categorical Featuresnum_classes
: Number of Classesproportion_examples_majority
: Proportion of Majority Examplesproportion_examples_minority
: Proportion of Minority Examplesproportion_features_numeric
: Proportion of Numeric Featuresproportion_features_binary
: Proportion of Binary Featuresproportion_features_categorical
: Proportion of Categorical FeaturesIR
: Imbalance Ratio
num_examples_majority
, num_examples_minority
, proportion_examples_majority
, proportion_examples_minority
, and IR
are defined only for binary data sets.
sd_ratio
: Geometric Mean Ratio of Standard Deviationscorr_abs
: Mean Absolute Correlation Coefficient
F1
: Fisher's Discriminant RatioF2
: Volume of Overlap Region
Unfortunately the F1
and F2
measures are implemented only for binary data
sets. General versions will be made available soon.
N2
: Ratio of Average Intra/Inter Class 1-NN DistanceN3
: Error Rate of 1-NN Classifier
N4
: Nonlinearity of the 1-NN ClassifierT2
: Average Number of Points per Dimension
Definitions and explanations of most functions implemented in the dcme
package can be found in the following literature:
[1] Michie, D., Spiegelhalter, D. J., & Taylor, C. C. (1994). Machine learning, neural and statistical classification.
[2] Ho, T. K., & Basu, M. (2002). Complexity measures of supervised classification problems. IEEE transactions on pattern analysis and machine intelligence, 24(3), 289-300.