Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hf_load_dataset() shouldn't default to using labels #45

Open
samterfa opened this issue Jan 3, 2023 · 1 comment
Open

hf_load_dataset() shouldn't default to using labels #45

samterfa opened this issue Jan 3, 2023 · 1 comment

Comments

@samterfa
Copy link
Collaborator

samterfa commented Jan 3, 2023

Some datasets don't have labels defined but are still valid datasets (one I'm using for the ez functions vignette). hf_load_dataset("sentiment140", split = "test") results in KeyError: 'label'.

@samterfa samterfa changed the title hf_load_dataset() can't default to using labels hf_load_dataset() shouldn't default to using labels Jan 3, 2023
@jpcompartir
Copy link
Collaborator

These are the offending blocks - we can re-implement fairly straightforwardly to cover this dataset, but would like to implement something more general so thought required.

#Set the default value of label_conversion to 'int2str' unless specified, in which case match the input
  label_conversion <- match.arg(if (missing(label_conversion)) "int2str" else label_conversion, c("str2int", "int2str"))
 #get int2str & str2int which can later be called directly on the label variable
  if(!is.null(label_conversion)){
    x <- splits[[1]]
    x <- .dataset[[x]]
    x <- x[["features"]]
    x <- x[["label"]]
    int2str <- x[["int2str"]]
    str2int <- x[["str2int"]]
  }

  if(label_conversion == "int2str"){
    label_names <- purrr::map(datasets, ~int2str(.x[["label"]]))
    datasets <- purrr::map2(.x = datasets, .y = label_names, .f = ~ .x %>% dplyr::mutate(label_name = .y))
  }
  if(label_conversion == "str2int"){
    label_ids <- purrr::map(datasets, ~str2int(.x[["label"]]))
    datasets <- purrr::map2(.x = datasets, .y = label_ids, .f = ~ .x %>% dplyr::mutate(label_id = .y))
  }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants