Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add and retrieve arbitrary metadata? #32

jimjam-slam opened this issue Mar 10, 2025 · 1 comment

Add and retrieve arbitrary metadata? #32

jimjam-slam opened this issue Mar 10, 2025 · 1 comment


Copy link

Does this package support adding and retrieving arbitrary metadata to a CSVY file (even if it's under a specified key in the YAML)? If I add attributes to a data frame and write it out to a CSVY file, those attributes are included in the YAML front matter:


test_path <- "test.csvy"
test <- tibble(x = 1:4, y = x^2)

# add attributes and confirm they're present
attr(test, "fruit") <- "banana"
attr(test, "vegetable") <- "broccoli"
test |> attributes()

write_csvy(test, test_path)

# confirm that metadata was written out to file
test_path |> readLines() |> paste(collapse = "\n") |> cat()
# #---
# #profile: tabular-data-package
# #name: test
# #fruit: banana
# #vegetable: broccoli
# #fields:
# #- name: x
# #  type: integer
# #- name: 'y'
# #  type: number
# #--- 
# x,y
# 1,1
# 2,4
# 3,9
# 4,16

But if I read that file back in, the extra attributes are dropped:

test_path |> read_csvy() |> attributes()
# $names
# [1] "x" "y"
# $row.names
# [1] 1 2 3 4
# $profile
# [1] "tabular-data-package"
# $name
# [1] "test"
# $class
# [1] "data.frame"

I'm reading up on the Tabular Data Package schema to see if there's a place reserved in it for arbitrary metadata, but I'm having some trouble understanding it. Is this package intended to allow users to retrieve arbitrary metadata?

Session info
R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS 15.3.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Australia/Melbourne
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] here_1.0.1      csvy_0.3.0      lubridate_1.9.3 forcats_1.0.0  
 [5] stringr_1.5.1   dplyr_1.1.4     purrr_1.0.2     readr_2.1.5    
 [9] tidyr_1.3.1     tibble_3.2.1    ggplot2_3.5.1   tidyverse_2.0.0

loaded via a namespace (and not attached):
 [1] gtable_0.3.5       jsonlite_1.8.9     compiler_4.4.1     tidyselect_1.2.1  
 [5] scales_1.3.0       yaml_2.3.10        R6_2.6.1           generics_0.1.3    
 [9] rprojroot_2.0.4    munsell_0.5.1      pillar_1.10.1.9000 tzdb_0.4.0        
[13] rlang_1.1.5        utf8_1.2.4         stringi_1.8.4      timechange_0.3.0  
[17] cli_3.6.4          withr_3.0.2        magrittr_2.0.3     grid_4.4.1        
[21] hms_1.1.3          lifecycle_1.0.4    vctrs_0.6.5        glue_1.8.0        
[25] data.table_1.16.2  colorspace_2.1-1   tools_4.4.1        pkgconfig_2.0.3  


Copy link

Taking a look at add_dataset_metadata(), it seems like dataset-level metadata is limited to specified keys but not really processed or limited inside those!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet

No branches or pull requests

1 participant