Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak when querying from a packaged function #457

Closed
agilly opened this issue Feb 6, 2024 · 3 comments
Closed

Memory leak when querying from a packaged function #457

agilly opened this issue Feb 6, 2024 · 3 comments

Comments

@agilly
Copy link

agilly commented Feb 6, 2024

I am connecting to a Redshift Serverless deployment using RPostgres. Let's call that connection con:

r$> class(con)
[1] "PqConnection"
attr(,"package")
[1] "RPostgres"

I am trying to read a table with a where statement. In my case the result is about 144MB:

r$> object.size(a)/1024/1024
144.7 bytes

If I run the following command:

a=DBI::dbGetQuery(con, "select * from schema.table where column = 'value'")

a few times, I see memory usage jump sharply on the first connection (around 1.5GB), then it keeps increasing moderately as I keep rerunning that statement, until stabilizing at ~2GB total RAM.

However, if I wrap the call in a function:

test_db_con=function(con){
  ret=dbGetQuery(con, "select * from schema.table where column = 'value'")
  return(ret)
}

and put it in a package, the following:

devtools::load_all()
## create con
a=test_db_con(con=con)

has a different behavior. No matter how many times I run the last line, memory usage never peaks. Instead it keeps increasing, by increments larger than the size of the data (around 500MB), eventually leading to OOM. To me, this hints at a memory leak problem, but I don't know why this would happen only in a package. FWIW, I am running this on an AWS EC2 instance, and the issue also happens when using PostgreSQL (although the memory explosion there seems much more pronounced). Any help appreciated !

Version info:

r$> DBI::dbGetInfo(RPostgres::Postgres())
$driver.version
[1] ‘1.4.6$client.version
[1] ‘14.10
@agilly agilly changed the title Memory leak when dbGetQuery in package Memory leak when querying from a packaged function Feb 7, 2024
@krlmlr
Copy link
Member

krlmlr commented Feb 7, 2024

Thanks. Minor nitpick: RPostgres::Redshift() is preferred over RPostgres::Postgres() to connect to the database, but I don't think this will change the outcome here.

Do you have a way of reproducing this on a toy RedShift instance where you could share credentials?

@agilly
Copy link
Author

agilly commented Feb 7, 2024

I didn't know about RPostgres::Redshift(), will update.

Unfortunately, I am not familiar at all with Redshift as the database was provided to me "as is" so I don't really know how to spin up such a db.

I did do a comparison with a MySQL database where this did not occur.

@krlmlr
Copy link
Member

krlmlr commented Feb 8, 2024

What is the type of the columns in the result set?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants