Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Cross-Post] Polygon/multipolygon columns: incorrect write, correct read #2206

Closed
pachadotdev opened this issue Jul 31, 2023 · 4 comments
Closed

Comments

@pachadotdev
Copy link

r-dbi/RPostgres#439

@dpprdan
Copy link
Contributor

dpprdan commented Aug 23, 2023

This only concerns the dbWriteTable() method for sf(c) objects, so I think this is better handled here than in RPostgres.

The issue is that dbWriteTable(append = TRUE) does not generate the appropriate field types in Postgres, if the table does not yet exist.

A more minimal example:

library(RPostgres)
library(sf)
#> Linking to GEOS 3.11.2, GDAL 3.6.2, PROJ 9.2.0; sf_use_s2() is TRUE

# let's use the standard st_read example to make things simpler
nc <- st_read(system.file("shape/nc.shp", package="sf"))
#> Reading layer `nc' from data source 
#>   `C:\Users\Daniel.AK-HAMBURG\AppData\Local\R\win-library\4.3\sf\shape\nc.shp' 
#>   using driver `ESRI Shapefile'
#> Simple feature collection with 100 features and 14 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> Geodetic CRS:  NAD27

# con-fig defined in https://www.postgresql.org/docs/current/libpq-pgservice.html
# password via https://www.postgresql.org/docs/current/libpq-pgpass.html
con <- dbConnect(Postgres(), service = "test")

# append = TRUE does not define table structure (and doesn't warn)
dbWriteTable(con, "nc_append", nc, append = TRUE)
dbGetQuery(con, r"(SELECT pg_typeof("geometry") FROM nc_append LIMIT 1)")
#>   pg_typeof
#> 1      text
no problems with dataframes and `append = FALSE`
# (non-geo) dataframes are fine
nc_df <- st_drop_geometry(nc)
dbWriteTable(con, "nc_df", nc_df, append = TRUE)
dbGetQuery(con, "
           SELECT column_name, data_type
           FROM   information_schema.columns
           WHERE  table_name = 'nc_df'
           ORDER  BY ordinal_position;")
#>    column_name        data_type
#> 1         AREA double precision
#> 2    PERIMETER double precision
#> 3        CNTY_ double precision
#> 4      CNTY_ID double precision
#> 5         NAME             text
#> 6         FIPS             text
#> 7       FIPSNO double precision
#> 8     CRESS_ID          integer
#> 9        BIR74 double precision
#> 10       SID74 double precision
#> 11     NWBIR74 double precision
#> 12       BIR79 double precision
#> 13       SID79 double precision
#> 14     NWBIR79 double precision

# works fine without append
dbWriteTable(con, "nc", nc)
#> Note: method with signature 'DBIObject#sf' chosen for function 'dbDataType',
#>  target signature 'PqConnection#sf'.
#>  "PqConnection#ANY" would also be valid
dbGetQuery(con, r"(SELECT pg_typeof("geometry") FROM nc LIMIT 1)")
#>   pg_typeof
#> 1  geometry
clean-up
lapply(c("nc", "nc_append", "nc_df"), dbRemoveTable, conn = con)
#> [[1]]
#> [1] TRUE
#> 
#> [[2]]
#> [1] TRUE
#> 
#> [[3]]
#> [1] TRUE
dbDisconnect(con)
Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.1 (2023-06-16 ucrt)
#>  os       Windows 10 x64 (build 19044)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language en
#>  collate  German_Germany.utf8
#>  ctype    German_Germany.utf8
#>  tz       Europe/Berlin
#>  date     2023-08-23
#>  pandoc   3.1.6.1 @ C:/Users/DANIEL~1.AK-/AppData/Local/Pandoc/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version    date (UTC) lib source
#>  bit           4.0.5      2022-11-15 [1] CRAN (R 4.3.0)
#>  bit64         4.0.5      2020-08-30 [1] CRAN (R 4.3.0)
#>  blob          1.2.4      2023-03-17 [1] CRAN (R 4.3.0)
#>  class         7.3-22     2023-05-03 [2] CRAN (R 4.3.1)
#>  classInt      0.4-9      2023-02-28 [1] CRAN (R 4.3.0)
#>  cli           3.6.1      2023-03-23 [1] CRAN (R 4.3.0)
#>  DBI           1.1.3      2022-06-18 [1] CRAN (R 4.3.0)
#>  digest        0.6.33     2023-07-07 [1] CRAN (R 4.3.1)
#>  dplyr         1.1.2      2023-04-20 [1] CRAN (R 4.3.0)
#>  e1071         1.7-13     2023-02-01 [1] CRAN (R 4.3.0)
#>  evaluate      0.21       2023-05-05 [1] CRAN (R 4.3.0)
#>  fansi         1.0.4      2023-01-22 [1] CRAN (R 4.3.0)
#>  fastmap       1.1.1      2023-02-24 [1] CRAN (R 4.3.0)
#>  fs            1.6.3      2023-07-20 [1] CRAN (R 4.3.1)
#>  generics      0.1.3      2022-07-05 [1] CRAN (R 4.3.0)
#>  glue          1.6.2      2022-02-24 [1] CRAN (R 4.3.0)
#>  hms           1.1.3      2023-03-21 [1] CRAN (R 4.3.0)
#>  htmltools     0.5.6      2023-08-10 [1] CRAN (R 4.3.1)
#>  KernSmooth    2.23-22    2023-07-10 [1] CRAN (R 4.3.1)
#>  knitr         1.43       2023-05-25 [1] CRAN (R 4.3.0)
#>  lifecycle     1.0.3      2022-10-07 [1] CRAN (R 4.3.0)
#>  lubridate     1.9.2      2023-02-10 [1] CRAN (R 4.3.0)
#>  magrittr      2.0.3      2022-03-30 [1] CRAN (R 4.3.0)
#>  pillar        1.9.0      2023-03-22 [1] CRAN (R 4.3.0)
#>  pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.3.0)
#>  proxy         0.4-27     2022-06-09 [1] CRAN (R 4.3.0)
#>  purrr         1.0.2      2023-08-10 [1] CRAN (R 4.3.1)
#>  R.cache       0.16.0     2022-07-21 [1] CRAN (R 4.3.0)
#>  R.methodsS3   1.8.2      2022-06-13 [1] CRAN (R 4.3.0)
#>  R.oo          1.25.0     2022-06-12 [1] CRAN (R 4.3.0)
#>  R.utils       2.12.2     2022-11-11 [1] CRAN (R 4.3.0)
#>  R6            2.5.1      2021-08-19 [1] CRAN (R 4.3.0)
#>  Rcpp          1.0.11     2023-07-06 [1] CRAN (R 4.3.1)
#>  reprex        2.0.2      2022-08-17 [1] CRAN (R 4.3.0)
#>  rlang         1.1.1      2023-04-28 [1] CRAN (R 4.3.0)
#>  rmarkdown     2.24       2023-08-14 [1] CRAN (R 4.3.1)
#>  RPostgres   * 1.4.5.9012 2023-08-23 [1] Github (r-dbi/RPostgres@58a052b)
#>  rstudioapi    0.15.0     2023-07-07 [1] CRAN (R 4.3.1)
#>  sessioninfo   1.2.2      2021-12-06 [1] CRAN (R 4.3.0)
#>  sf          * 1.0-14     2023-07-11 [1] CRAN (R 4.3.1)
#>  styler        1.10.1     2023-06-05 [1] CRAN (R 4.3.0)
#>  tibble        3.2.1      2023-03-20 [1] CRAN (R 4.3.0)
#>  tidyselect    1.2.0      2022-10-10 [1] CRAN (R 4.3.0)
#>  timechange    0.2.0      2023-01-11 [1] CRAN (R 4.3.0)
#>  units         0.8-3      2023-08-10 [1] CRAN (R 4.3.1)
#>  utf8          1.2.3      2023-01-31 [1] CRAN (R 4.3.0)
#>  vctrs         0.6.3      2023-06-14 [1] CRAN (R 4.3.1)
#>  withr         2.5.0      2022-03-03 [1] CRAN (R 4.3.0)
#>  xfun          0.40       2023-08-09 [1] CRAN (R 4.3.1)
#>  yaml          2.3.7      2023-01-23 [1] CRAN (R 4.3.0)
#> 
#>  [1] C:/Users/Daniel.AK-HAMBURG/AppData/Local/R/win-library/4.3
#>  [2] C:/Program Files/R/R-4.3.1/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Now is that a problem or should the user be smart enough to use append = TRUE when initializing a table?

IMHO there should either be a warning , if the table does not exist.

Or dbWriteTable.sf should generate the correct field types (this is what vanilla RPostgres::dbWriteTable() does), e.g. change

sf/R/db.R

Lines 474 to 476 in 732525f

if (is.null(field.types)) field.types <- dbDataType(conn, value)
# DBI cannot set field types with append
if (append) field.types <- NULL

to

# DBI stops if append && !is.null(field.types)
if (append) {
  found <- dbExistsTable(conn, name)
  if (found) {
    append <- FALSE
  } else {
    field.types <- NULL
  }
}

(This all needs to happen because RPostgres::dbWriteTable() will stop() if append && !is.null(field.types) - but maybe it should only do this if a table exists? 🤔).

@edzer
Copy link
Member

edzer commented Aug 23, 2023

@etiennebr

@pachadotdev
Copy link
Author

This only concerns the dbWriteTable() method for sf(c) objects, so I think this is better handled here than in RPostgres.

The issue is that dbWriteTable(append = TRUE) does not generate the appropriate field types in Postgres, if the table does not yet exist.

A more minimal example:

library(RPostgres)
library(sf)
#> Linking to GEOS 3.11.2, GDAL 3.6.2, PROJ 9.2.0; sf_use_s2() is TRUE

# let's use the standard st_read example to make things simpler
nc <- st_read(system.file("shape/nc.shp", package="sf"))
#> Reading layer `nc' from data source 
#>   `C:\Users\Daniel.AK-HAMBURG\AppData\Local\R\win-library\4.3\sf\shape\nc.shp' 
#>   using driver `ESRI Shapefile'
#> Simple feature collection with 100 features and 14 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> Geodetic CRS:  NAD27

# con-fig defined in https://www.postgresql.org/docs/current/libpq-pgservice.html
# password via https://www.postgresql.org/docs/current/libpq-pgpass.html
con <- dbConnect(Postgres(), service = "test")

# append = TRUE does not define table structure (and doesn't warn)
dbWriteTable(con, "nc_append", nc, append = TRUE)
dbGetQuery(con, r"(SELECT pg_typeof("geometry") FROM nc_append LIMIT 1)")
#>   pg_typeof
#> 1      text

no problems with dataframes and append = FALSE
clean-up
Session info
Now is that a problem or should the user be smart enough to use append = TRUE when initializing a table?

IMHO there should either be a warning , if the table does not exist.

Or dbWriteTable.sf should generate the correct field types (this is what vanilla RPostgres::dbWriteTable() does), e.g. change

sf/R/db.R

Lines 474 to 476 in 732525f

if (is.null(field.types)) field.types <- dbDataType(conn, value)
# DBI cannot set field types with append
if (append) field.types <- NULL

to

# DBI stops if append && !is.null(field.types)
if (append) {
  found <- dbExistsTable(conn, name)
  if (found) {
    append <- FALSE
  } else {
    field.types <- NULL
  }
}

(This all needs to happen because RPostgres::dbWriteTable() will stop() if append && !is.null(field.types) - but maybe it should only do this if a table exists? 🤔).

thanks a lot! it looks I made a discovery here

@etiennebr
Copy link
Member

Thanks a lot @pachadotdev for the report and @dpprdan for the fix. I'll write a few tests and try to push a fix rapidly.

@edzer edzer closed this as completed in ac0a1dd Aug 29, 2023
edzer added a commit that referenced this issue Aug 29, 2023
Force field type when appending to unexisting table (Fix #2206)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants