Skip to content

error when input data are only factors #13

@dylanbeaudette

Description

@dylanbeaudette

I think that clhs should function without continuous variables, currently an error is encountered when attempting to compute correlation.

library(clhs)

d <- data.frame(
  x=sample(letters[1:4], size = 100, replace = TRUE, prob = c(0.25, 0.25, 0.05, 0.15)),
  y=sample(LETTERS[1:4], size = 100, replace = TRUE, prob = c(0.25, 0.25, 0.25, 0.25))
)

d$x <- factor(d$x)
d$y <- factor(d$y)

# error
res <- clhs(d, size=10, simple = FALSE)
Error in cor(data_continuous, use = "complete.obs") : 
  no complete element pairs

Adding a condition for no continuous data would help, but correlation would still need to be computed (I think). vcd::assocstats() could be use to compute correlation from a cross-tabulation of all factors. I don't know how to adapt or interpret Cramer's V in the context of more than 2 factors.

library(vcd)

d <- data.frame(
  x=sample(letters[1:4], size = 100, replace = TRUE, prob = c(0.25, 0.25, 0.05, 0.15)),
  y=sample(LETTERS[1:4], size = 100, replace = TRUE, prob = c(0.25, 0.25, 0.25, 0.25)),
  z=sample(LETTERS[21:24], size = 100, replace = TRUE, prob = c(0.25, 0.25, 0.25, 0.25))
)

d$x <- factor(d$x)
d$y <- factor(d$y)
d$z <- factor(d$z)

# single pair-wise `V`
tab <- table(d$x, d$y)
assocstats(tab)

This post has some great ideas on efficient calculation of all pair-wise V.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions