Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reverse 'missingness' plot to 'complete entries' #182

Closed
steinjm opened this issue Jun 18, 2024 · 3 comments
Closed

Reverse 'missingness' plot to 'complete entries' #182

steinjm opened this issue Jun 18, 2024 · 3 comments
Labels
N/A: wontfix Probably valid, but no plan to address in a foreseeable future

Comments

@steinjm
Copy link

steinjm commented Jun 18, 2024

Hi,

I have lots of colleagues without data background for which the missingness is quite confusing. Would it be possible to have an argument in the create_report() function that reverses this plot? I've changed up the function to do this quick and dirty, but integrating it into the create_report function is a bit much for me. Here's an example plot and the example function:

Completeness

plot_missing2 = function (data, group = list(Good = 1, OK = 0.6, Bad = 0.2, 
                             Remove = 0), group_color = list(Good = "#1B9E77", OK = "#E6AB02", 
                                                             Bad = "#D95F02", Remove = "#E41A1C"), missing_only = FALSE, 
          geom_label_args = list(), title = NULL, ggtheme = theme_gray(), 
          theme_config = list(legend.position = c("bottom"))){
  num_missing <- pct_missing <- Band <- NULL
  missing_value <- data.table(profile_missing(data))
  missing_value$pct_missing = 1-missing_value$pct_missing
  missing_value$num_missing = nrow(data)-missing_value$num_missing
  if (missing_only) {
    missing_value <- missing_value[num_missing > 0]}
  group <- group[sort.list(unlist(group))]
  invisible(lapply(seq_along(group), function(i) {
    if (i == 1) {
      missing_value[pct_missing <= group[[i]], `:=`(Band, 
                                                    names(group)[i])]
    } else {
      missing_value[pct_missing > group[[i - 1]] & pct_missing <= 
                      group[[i]], `:=`(Band, names(group)[i])]
    }
  }))
  ordinal_levels <- names(group[sort.list(unlist(group))])
  missing_value[, `:=`(Band, factor(Band, levels = ordinal_levels, 
                                    ordered = TRUE))]
  if (length(setdiff(names(group), names(group_color))) > 0) {
    bar_fill <- scale_fill_discrete("Band")
  } else {
    bar_fill <- scale_fill_manual(values = group_color)
  }
  output <- ggplot(missing_value, aes_string(x = "feature", 
                                             y = "num_missing", fill = "Band")) + geom_bar(stat = "identity") + 
    bar_fill + coord_flip() + xlab("Features") + ylab("Complete Entries") + 
    guides(fill = guide_legend(override.aes = aes(label = "")))
  geom_label_args_list <- list(mapping = aes(label = paste0(round(100 * 
                                                                    pct_missing, 2), "%")))
  output <- output + do.call("geom_label", c(geom_label_args_list, 
                                             geom_label_args))
  class(output) <- c("single", class(output))
  plotDataExplorer(plot_obj = output, title = title, ggtheme = ggtheme, 
                   theme_config = theme_config)
}
@boxuancui
Copy link
Owner

Thank you for the suggestion, but both profile_missing and plot_missing are designed towards detecting "missingness". Unfortunately, there is no plan to add a similar function for "completeness", since it really duplicates the function of plot_missing.

To address your concern, you can always update this code chunk so that you can incorporate your proposed plot_missing2. However, I'd propose naming it something like plot_completeness or similar, so that it doesn't create confusion later.

@boxuancui boxuancui added the N/A: wontfix Probably valid, but no plan to address in a foreseeable future label Jun 18, 2024
@steinjm
Copy link
Author

steinjm commented Jun 19, 2024

Alright, that's fair! Thanks for the tips, and keep up the good work!

@boxuancui
Copy link
Owner

Thank you for using DataExplorer!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
N/A: wontfix Probably valid, but no plan to address in a foreseeable future
Projects
None yet
Development

No branches or pull requests

2 participants