Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV extension ignores schema's data type. #414

Open
wibeasley opened this issue Mar 23, 2022 · 3 comments
Open

CSV extension ignores schema's data type. #414

wibeasley opened this issue Mar 23, 2022 · 3 comments
Labels

Comments

@wibeasley
Copy link
Contributor

wibeasley commented Mar 23, 2022

Thanks for adding the recent extensions that support CSV imports (eg, #407, #389).

I believe the data types passed to the schema argument aren't being respected. This is from the recent documentation (and I've added str(ds) to display the data types

library(DBI)
db <- dbConnect(RSQLite::SQLite())
RSQLite::initExtension(db, "csv")
# use the filename argument to mount CSV files from disk
sql <- paste0(
  "CREATE VIRTUAL TABLE tbl USING ",
  "csv(data='1,2', schema='CREATE TABLE x(a INT, b INT)')"
)
dbExecute(db, sql)
#> [1] 0
ds <- dbGetQuery(db, "SELECT * FROM tbl")


str(ds)
#> 'data.frame':    1 obs. of  2 variables:
#>  $ a: chr "1"
#>  $ b: chr "2"

Version:

RSQLite::rsqliteVersion()
#>   header  library 
#> "3.38.0" "3.38.0"

Created on 2022-03-23 by the reprex package (v2.0.1)

@krlmlr
Copy link
Member

krlmlr commented Mar 25, 2022

Thanks. This appears to be a limitation of the csv extension:

sqlite3_result_text(ctx, pCur->azVal[i], -1, SQLITE_TRANSIENT);

I suspect that type-converted output needs to call other functions: https://www.sqlite.org/c3ref/result_blob.html

@wibeasley
Copy link
Contributor Author

I suspect that type-converted output needs to call other functions

I'm not understanding that content --are you saying something needs to be added to csv.c??

If there's not a reasonable approach within RSQLite, does it make sense to document that limitation and present a supplemental approach? Something like:

library(DBI)
db <- dbConnect(RSQLite::SQLite())
RSQLite::initExtension(db, "csv")
# use the filename argument to mount CSV files from disk
sql <- paste0(
  "CREATE VIRTUAL TABLE tbl USING ",
  "csv(data='1,b,3', schema='CREATE TABLE x(a TEXT, b TEXT, c TEXT)')"
)
dbExecute(db, sql)
#> [1] 0

ds <- dbGetQuery(db, "SELECT * FROM tbl")
str(ds)
#> 'data.frame':    1 obs. of  3 variables:
#>  $ a: chr "1"
#>  $ b: chr "b"
#>  $ c: chr "3"

# The resulting data types are always characters,
#   which essentially ignores the defined schema.
#   One approach is to declare 'text' columns in the virtual table and use a
#   function like `readr::type_convert()` to cast to the desired type.
# \dontrun{
col_types <- 
  readr::cols(
    `a` = readr::col_integer(),
    `b` = readr::col_character(),
    `c` = readr::col_guess()
  )
ds <- readr::type_convert(ds, col_types = col_types)
str(ds)
#> 'data.frame':    1 obs. of  3 variables:
#>  $ a: int 1
#>  $ b: chr "b"
#>  $ c: num 3
# }

I don't think I can help much with the C interface, but please tell me if I can help with documentation or examples.

Created on 2022-03-25 by the reprex package (v2.0.1)

@krlmlr krlmlr added the docs label Mar 27, 2022
@krlmlr
Copy link
Member

krlmlr commented Mar 27, 2022

Thanks. Documenting the limitation will be helpful, until an update to csv.c becomes available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants