Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add and retrieve arbitrary metadata? #32

Open
jimjam-slam opened this issue Mar 10, 2025 · 1 comment
Open

Add and retrieve arbitrary metadata? #32

jimjam-slam opened this issue Mar 10, 2025 · 1 comment

Comments

@jimjam-slam
Copy link

Does this package support adding and retrieving arbitrary metadata to a CSVY file (even if it's under a specified key in the YAML)? If I add attributes to a data frame and write it out to a CSVY file, those attributes are included in the YAML front matter:

library(csvy)

test_path <- "test.csvy"
test <- tibble(x = 1:4, y = x^2)

# add attributes and confirm they're present
attr(test, "fruit") <- "banana"
attr(test, "vegetable") <- "broccoli"
test |> attributes()

write_csvy(test, test_path)

# confirm that metadata was written out to file
test_path |> readLines() |> paste(collapse = "\n") |> cat()
# #---
# #profile: tabular-data-package
# #name: test
# #fruit: banana
# #vegetable: broccoli
# #fields:
# #- name: x
# #  type: integer
# #- name: 'y'
# #  type: number
# #--- 
# x,y
# 1,1
# 2,4
# 3,9
# 4,16

But if I read that file back in, the extra attributes are dropped:

test_path |> read_csvy() |> attributes()
# $names
# [1] "x" "y"
# 
# $row.names
# [1] 1 2 3 4
# 
# $profile
# [1] "tabular-data-package"
# 
# $name
# [1] "test"
# 
# $class
# [1] "data.frame"

I'm reading up on the Tabular Data Package schema to see if there's a place reserved in it for arbitrary metadata, but I'm having some trouble understanding it. Is this package intended to allow users to retrieve arbitrary metadata?

Session info
R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS 15.3.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Australia/Melbourne
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] here_1.0.1      csvy_0.3.0      lubridate_1.9.3 forcats_1.0.0  
 [5] stringr_1.5.1   dplyr_1.1.4     purrr_1.0.2     readr_2.1.5    
 [9] tidyr_1.3.1     tibble_3.2.1    ggplot2_3.5.1   tidyverse_2.0.0

loaded via a namespace (and not attached):
 [1] gtable_0.3.5       jsonlite_1.8.9     compiler_4.4.1     tidyselect_1.2.1  
 [5] scales_1.3.0       yaml_2.3.10        R6_2.6.1           generics_0.1.3    
 [9] rprojroot_2.0.4    munsell_0.5.1      pillar_1.10.1.9000 tzdb_0.4.0        
[13] rlang_1.1.5        utf8_1.2.4         stringi_1.8.4      timechange_0.3.0  
[17] cli_3.6.4          withr_3.0.2        magrittr_2.0.3     grid_4.4.1        
[21] hms_1.1.3          lifecycle_1.0.4    vctrs_0.6.5        glue_1.8.0        
[25] data.table_1.16.2  colorspace_2.1-1   tools_4.4.1        pkgconfig_2.0.3  

Thanks!

@jimjam-slam
Copy link
Author

Taking a look at add_dataset_metadata(), it seems like dataset-level metadata is limited to specified keys but not really processed or limited inside those!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant