Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bbr::use_bbi() fails #317

Open
bmreilly opened this issue Apr 5, 2024 · 16 comments
Open

bbr::use_bbi() fails #317

bmreilly opened this issue Apr 5, 2024 · 16 comments

Comments

@bmreilly
Copy link

bmreilly commented Apr 5, 2024

Recently I tried to use bbr::use_bbi() and I am getting a JSON parsing error. I'm wondering if the location of files to download has changed and needs to be updated?
image

It looks like it is coming from bbr:::current_release_url(owner = 'metrumresearchgroup', repo = 'bbi')
image

@seth127
Copy link
Collaborator

seth127 commented Apr 10, 2024

Thanks for submitting this @bmreilly and for posting those screenshots. We're looking into this now and we'll let you know when we figure out the source of this.

@andersone1
Copy link

andersone1 commented Apr 10, 2024

@bmreilly

Can you please try running these three commands (in order) in your R console, and let me know if you receive an error on any of them:

test <- tempfile(fileext = '.json') 
download.file("https://api.github.com/repos/metrumresearchgroup/bbi/releases/latest", destfile = test, quiet = TRUE)
jsonlite::fromJSON(test, simplifyDataFrame = FALSE)

@bmreilly
Copy link
Author

@andersone1

I did indeed get an error when attempting to download the file:
image

@andersone1
Copy link

andersone1 commented Apr 10, 2024

@bmreilly

Thanks for trying that.

Are you able to visit this link in your browser? (It should take you to a page of JSON text.)

https://api.github.com/repos/metrumresearchgroup/bbi/releases/latest

@andersone1
Copy link

@bmreilly

Regardless of the result of that, can you also try visiting this link in your browser (it should prompt you to download a zip file, you can press Cancel - I just want to see if you will even get prompted):

https://github.com/metrumresearchgroup/bbi/releases/download/v3.3.0/bbi_linux_amd64.tar.gz

Thanks again for bringing this issue to our attention.

@bmreilly
Copy link
Author

This is what I see at https://api.github.com/repos/metrumresearchgroup/bbi/releases/latest

image

The .tar.gz file is able to be downloaded at this link as expected

https://github.com/metrumresearchgroup/bbi/releases/download/v3.3.0/bbi_linux_amd64.tar.gz

@andersone1
Copy link

@bmreilly

Thanks for doing that. I was wondering if it was a firewall blocking the link, but that is not the case (at least at the browser level).

I am going to keep investigating.

@andersone1
Copy link

@bmreilly

What is a good email to reach you at? (I am thinking live debugging may help here.)

@bmreilly
Copy link
Author

On our call we discovered that the error is isolated to this piece, indicating that there is an issue downloading/accessing the bbi latest release via the github API:

> download.file('https://api.github.com/repos/metrumresearchgroup/bbi/releases/latest', destfile = tempfile(fileext = '.json'))
trying URL 'https://api.github.com/repos/metrumresearchgroup/bbi/releases/latest'
Error in download.file("https://api.github.com/repos/metrumresearchgroup/bbi/releases/latest",  : 
  cannot open URL 'https://api.github.com/repos/metrumresearchgroup/bbi/releases/latest'
In addition: Warning message:
In download.file("https://api.github.com/repos/metrumresearchgroup/bbi/releases/latest",  :
  cannot open URL 'https://api.github.com/repos/metrumresearchgroup/bbi/releases/latest': HTTP status was '403 Forbidden'

For posterity, this is on Metworx 21.08.3 using the following R / OS version:

R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

@andersone1
Copy link

andersone1 commented Apr 11, 2024

Perfect, thank you for providing this @bmreilly. This will help us find the issue.

@seth127
Copy link
Collaborator

seth127 commented Apr 12, 2024

Thanks both of you. We'll keep looking into this. I'll just add a few relevant points:

  • The error causing this is HTTP status was '403 Forbidden'
  • As far as we know, no permissions were changed here, so it's not clear why that's happening
  • This same thing was reported offline by another user in another organization, so it doesn't seem like it's an issue with Brian's credentials or environment specifically
  • Both of these users had previously used this function in the same environments
  • We were wondering if this is related to rate limiting from GitHub, but @andersone1 ran the same thing and didn't see the error, so that indicates rate limiting is not the issue

@kyleam
Copy link
Collaborator

kyleam commented Apr 15, 2024

@seth127:

We were wondering if this is related to rate limiting from GitHub [...]

Yes, that'd be my bet (more details below).

but @andersone1 ran the same thing and didn't see the error, so that indicates rate limiting is not the issue

Hmm I'm not following why that would rule it out. The rate-limiting for unauthenticated requests (as they are for use_bbi()) is per IP.

demo of triggering the error after exceeding request limit

Here's a demonstration that I can trigger a consistent error by exhausting my limit.

$ while :; do sleep 0.1 && \
    curl -fSsL https://api.github.com/repos/metrumresearchgroup/bbi/releases/latest >/dev/null || \
    break; done
curl: (22) The requested URL returned error: 403

# confirm I am at limit
$ curl -fsSL https://api.github.com/rate_limit | jq .rate
{
  "limit": 60,
  "remaining": 0,
  "reset": 1713207165,
  "used": 60,
  "resource": "core"
}

use_bbi() failure:

use_bbi("foo")
#> Installing bbi on a linux system
#>  ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
#>  Error: lexical error: invalid char in json text.
#>                                        /tmp/Rtmps2hjoB/file74526c727e66
#>                       (right here) ------^
#>  In addition: Warning message:
#>  In download.file(glue("https://api.github.com/repos/{owner}/{repo}/releases/latest"),  :
#>    cannot open URL 'https://api.github.com/repos/metrumresearchgroup/bbi/releases/latest': HTTP status was '403 Forbidden'

follow-up questions

That doesn't necessarily mean this is what @bmreilly is hitting into, but I think it's the most likely culprit.

@bmreilly:

  • are you consistently hitting this error?

  • when you hit the error, can you check if curl -fsSL https://api.github.com/rate_limit | jq .rate indicates you've exhausted your allowed requests?

  • if that's the case, perhaps you're doing something else on the machine that's hitting api.github.com with a good number of unauthenticated requests (the limit is 60 per hour)?

current_release_url issues

https://github.com/metrumresearchgroup/bbr/blob/9097c894d62bdfd16adca6a318b372f9ef414437/R/use-bbi.R#L109-L118

The current_release_url code has a couple of issues in its handling. While fixing them wouldn't resolve a failure due to rate limiting, it'd at least lead to a clearer error:

  • the caught error looks for "HTTP error 403" in the error message. For the error I triggered (on Metworx 22.09.03 with R 4.1), that's not a part of the message.

    Browse[3]> e$message
     [1] "cannot open URL 'https://api.github.com/repos/metrumresearchgroup/bbi/releases/latest'"
    

    And either way there are a handful of possible download.file() methods (libcurl should be used on Metworx), and the code shouldn't assume what's in the message.

    And, regardless of whether it can find the match it's looking for, it should abort to avoid failing downstream with confusing JSON parsing error.

  • the download.file() documentation says that some methods may not error and that the return result should be checked.

So perhaps something like this (untested), though we could consider adding extra handling to check https://api.github.com/rate_limit:

diff
diff --git a/R/use-bbi.R b/R/use-bbi.R
index 374fa531..edad2290 100644
--- a/R/use-bbi.R
+++ b/R/use-bbi.R
@@ -106,17 +106,17 @@ current_release_url <- function(owner = 'metrumresearchgroup', repo = 'bbi'){

   on.exit(unlink(tmp),add = TRUE)

-  tryCatch(
-    {
-      download.file(glue('https://api.github.com/repos/{owner}/{repo}/releases/latest'), destfile = tmp, quiet = TRUE)
-    },
-    error = function(e) {
-      if (str_detect(e$message, "HTTP error 403")) {
-        stop(glue('`current_release_url({owner}, {repo})` failed, possibly because this IP is over the public Github rate limit of 60/hour.'))
-      }
-    }
+  res <- tryCatch(
+    download.file(
+      glue('https://api.github.com/repos/{owner}/{repo}/releases/latest'),
+      destfile = tmp, quiet = TRUE
+    ),
+    error = identity
   )

+  if (!identical(res, 0L)) {
+    stop(glue('`current_release_url({owner}, {repo})` failed, possibly because this IP is over the public Github rate limit of 60/hour.'))
+  }
   release_info <- jsonlite::fromJSON(tmp, simplifyDataFrame = FALSE)

   uris <- grep('gz$',sapply(release_info$assets,function(x) x$browser_download_url),value = TRUE)

avoiding failures due to rate limiting

If the error is due to rate limiting (and in particular if users are hitting into this regularly), one solution that Seth mentioned offline is that we could put the bbi binary in another location that we control (likely a public s3 bucket with http access) and update use_bbi() to download it from there.

@seth127
Copy link
Collaborator

seth127 commented Apr 15, 2024

Thanks for checking in on this @kyleam

The rate-limiting for unauthenticated requests (as they are for use_bbi()) is per IP.

I was forgetting about the "per IP" part, hence my earlier misfire on dismissing this as the problem. Still... it seems weird to me that several users would stumble onto this around the same time. Unless, as you suggest, they're doing something else (perhaps unintentionally) that's hitting the API too.

In terms of the error handling, Eric and I noticed that too. I think those improvements are a good idea, no matter what the underlying cause here is.

@bmreilly
Copy link
Author

@kyleam

Here is the output of the command:

curl -fsSL https://api.github.com/rate_limit | jq .rate
{
  "limit": 60,
  "remaining": 0,
  "reset": 1713280846,
  "used": 60,
  "resource": "core"
}

Looks like I have exhausted my allowed requests. I don't know of any other processes that would hit the github API with unauthenticated requests, let alone 60 per hour. Do you have any idea how I could search for processes that are doing this?

@bmreilly
Copy link
Author

@kyleam

As an additional test, I started a new workflow with a different IP address and I got the same issue as above.

I'm not exactly sure what this means, but I'm more confident that the github API requests were not due to something custom I had done.

@kyleam
Copy link
Collaborator

kyleam commented Apr 26, 2024

@bmreilly Thanks for the additional information.

I'm more confident that the github API requests were not due to something custom I had done.

Right, it seems unlikely to be triggered directly by you.

I suspect that all (or some subset) of workflows on your private subnet are configured in a way that they share the same public IP address (see here for a description of AWS NAT gateways). I believe you'd have to check with admins on your end about the details of the networking configuration.

A shared public IP for many workflows would make it less surprising that you're regularly exceeding the limit, although I still find it somewhat surprising. It's hard to say, though, and it just takes one user regularly hitting the public GitHub API with, say, an automated script.


Anyway, we're planning on making some adjustment to avoid failing due to the api.github.com. Some options:

a. deposit binaries and latest release info somewhere else to avoid the rate limit (solution mentioned above)

b. deposit just the latest version info elsewhere because that's what is being rate-limited, not the release tarball itself

c. get the latest version another way if pinging api.github.com fails

For a, the high-level steps would be something like:

  • populate new location with some number of existing releases
  • update the release pipeline to automatically deposit new binaries there
  • make the release pipeline update a file that reports what the latest release is
  • update use_bbi to pull from the new spot

For b, the publish elsewhere could be placing, say, a json file on a gh-pages site (which we're planning to setup soon anyway for another purpose). And then we'd have to update bbr:::current_release_url() to get the info from there rather than api.github.com.

For c, the main "another way" that comes to mind is using ls-remote:

git ls-remote --tags --refs --sort='-v:refname' https://github.com/metrumresearchgroup/bbi

Then current_release_url() would be relying on Git being present, at least as a fallback if the api.github.com attempt fails.

I'm undecided on which of those I think is the best way to go but am currently leaning toward b.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants