Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable labels disappearing after deriving scores #51

Open
Chris-M-P opened this issue Mar 1, 2022 · 4 comments
Open

Variable labels disappearing after deriving scores #51

Chris-M-P opened this issue Mar 1, 2022 · 4 comments
Assignees
Labels

Comments

@Chris-M-P
Copy link

Possibly related to #39 - in the code below I can see the variable labels and access them through the variable.labels attribute of the dataframe:

set.seed(1)
x <- data.frame(
  id = sample(1:15, size = 200, replace = TRUE),
  code = sample_diag(200),
  stringsAsFactors = FALSE
)

# Charlson score based on ICD-10 diagnostic codes:
x1 <- comorbidity(x = x, id = "id", code = "code", map = "charlson_icd10_quan", assign0 = FALSE)
attributes(x1)

However, if I append the dataframe with the score as well (since I'm interested both in the scores and underlying comorbidities) then I lose the variable.labels attribute (using tidyverse since it's in my workflow):

library(tidyverse)
set.seed(1)
x <- data.frame(
  id = sample(1:15, size = 200, replace = TRUE),
  code = sample_diag(200),
  stringsAsFactors = FALSE
)

# Charlson score based on ICD-10 diagnostic codes:
x1 <- comorbidity(x = x, id = "id", code = "code", map = "charlson_icd10_quan", assign0 = FALSE) %>%
score(x = ., weights = "charlson", assign0 = FALSE)
attributes(x1)

This seems to be a result of applying the variable labels as an attribute of the dataframe, rather than of the variable. But this is harder to work around now that mapping and scoring are distinct functions.

@ellessenne
Copy link
Owner

Hi,
x1 here should not have any name, as it's just the score column?
The example code above does not add the score to x:

  library(comorbidity)
#> This is {comorbidity} version 1.0.0.
#> A lot has changed since the last release on CRAN, please check-out breaking changes here:
#> -> https://ellessenne.github.io/comorbidity/articles/C-changes.html
library(tidyverse)
set.seed(1)
x <- data.frame(
  id = sample(1:15, size = 200, replace = TRUE),
  code = sample_diag(200),
  stringsAsFactors = FALSE
)

# Charlson score based on ICD-10 diagnostic codes:
x1 <- comorbidity(x = x, id = "id", code = "code", map = "charlson_icd10_quan", assign0 = FALSE) %>%
  score(x = ., weights = "charlson", assign0 = FALSE)
attributes(x1)
#> $map
#> [1] "charlson_icd10_quan"
#> 
#> $weights
#> [1] "charlson"

x1
#>  [1] 2 6 0 2 0 0 4 0 3 2 0 0 3 0 2
#> attr(,"map")
#> [1] "charlson_icd10_quan"
#> attr(,"weights")
#> [1] "charlson"

Created on 2022-03-01 by the reprex package (v2.0.1)

@ellessenne ellessenne self-assigned this Mar 1, 2022
@Chris-M-P
Copy link
Author

Sorry, I made a typo in my MWE but also I've tried it on another computer and cannot replicate the issue - clearly a clash somewhere. I'll dig further.

In the meantime here's the proper MWE:

library(comorbidity)
library(tidyverse)

set.seed(1)
x <- data.frame(
  id = sample(1:15, size = 200, replace = TRUE),
  code = sample_diag(200),
  stringsAsFactors = FALSE
)

# Charlson score based on ICD-10 diagnostic codes:
x1 <- comorbidity(x = x, id = "id", code = "code", map = "charlson_icd10_quan", assign0 = FALSE) 

x2 <- x1 %>% 
  mutate(score = score(x = ., weights = "charlson", assign0 = FALSE)) 

attributes(x1)
attributes(x2)

@Chris-M-P
Copy link
Author

Update - it looks like it's linked to updating dplyr from 1.0.6 to 1.0.8 - sorry! No idea why it's happening though.

If you fancy trying it yourself:

require(devtools)
install_version("dplyr", version = "1.0.6", repos = "http://cran.us.r-project.org")

library(tidyverse)
library(comorbidity)
#> This is {comorbidity} version 1.0.0.
#> A lot has changed since the last release on CRAN, please check-out breaking changes here:
#> -> https://ellessenne.github.io/comorbidity/articles/C-changes.html

set.seed(1)
x <- data.frame(
  id = sample(1:15, size = 200, replace = TRUE),
  code = sample_diag(200),
  stringsAsFactors = FALSE
)

# Charlson score based on ICD-10 diagnostic codes:
x1 <- comorbidity(x = x, id = "id", code = "code", map = "charlson_icd10_quan", assign0 = FALSE) 

x2 <- x1 %>% 
  mutate(score = score(x = ., weights = "charlson", assign0 = FALSE)) %>% 
  rename_with(.fn = ~paste0(., "_cci"), -"id")
  
attributes(x1)
attributes(x2)

@ellessenne
Copy link
Owner

Thanks, this seems to be an issue related to {dplyr}, as the following does not drop the attributes:

library(comorbidity)
set.seed(1)
x <- data.frame(
  id = sample(1:15, size = 200, replace = TRUE),
  code = sample_diag(200),
  stringsAsFactors = FALSE
)
x1 <- comorbidity(x = x, id = "id", code = "code", map = "charlson_icd10_quan", assign0 = FALSE)
x1$score <- score(x = x1, weights = "charlson", assign0 = FALSE)
attributes(x1)
#> $names
#>  [1] "id"       "ami"      "chf"      "pvd"      "cevd"     "dementia"
#>  [7] "copd"     "rheumd"   "pud"      "mld"      "diab"     "diabwc"  
#> [13] "hp"       "rend"     "canc"     "msld"     "metacanc" "aids"    
#> [19] "score"   
#> 
#> $row.names
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
#> 
#> $variable.labels
#>  [1] "ID"                                    
#>  [2] "Myocardial infarction"                 
#>  [3] "Congestive heart failure"              
#>  [4] "Peripheral vascular disease"           
#>  [5] "Cerebrovascular disease"               
#>  [6] "Dementia"                              
#>  [7] "Chronic obstructive pulmonary disease" 
#>  [8] "Rheumatoid disease"                    
#>  [9] "Peptic ulcer disease"                  
#> [10] "Mild liver disease"                    
#> [11] "Diabetes without chronic complications"
#> [12] "Diabetes with chronic complications"   
#> [13] "Hemiplegia or paraplegia"              
#> [14] "Renal disease"                         
#> [15] "Cancer (any malignancy)"               
#> [16] "Moderate or severe liver disease"      
#> [17] "Metastatic solid tumour"               
#> [18] "AIDS/HIV"                              
#> 
#> $map
#> [1] "charlson_icd10_quan"
#> 
#> $class
#> [1] "comorbidity" "data.frame"

Created on 2022-03-02 by the reprex package (v2.0.1)

This has been already reported (I think), see e.g. tidyverse/dplyr#6100 and tidyverse/dplyr#6102, my understanding is that a fix is planned to {dplyr} 1.1.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants