-
-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make rate limiting work with RxClass #331
Comments
@jrlegrand it looks like the reason why the code is failing is because it does not handle cases where the API response lacks the 'rxclassDrugInfoList' key, (meaning there is no class data associated with the concept) leading to a KeyError. The problem is happening in the Potential solutions off the top of my head:
Let me know what you think would be the best solution moving forward and I'll see how I can fix the code so that it works |
Fixes #331 Checked for rxclassDrugInfoList in response before trying to process response. The API piece runs in about 1:45 and overall runs in about 2:30 which means the loading takes about 45 min. Where the key doesn't exist, the object returned is usually completely empty {}. One weird thing is spot checking the results in the database doesn't exactly line up with RxClass UI online.
I pushed up some code to the branch. It works - see my most recent commit message. It runs in 2.5 hours which could be optimized I'm sure. I noticed when the key doesn't exist, it returns an empty object {}. Also I spot checked against RxClass for may_treat "Multiple Myeloma" and SageRx had 3 fewer IN drugs than the RxClass UI online. These ones were missing in SageRx. |
Hmm... I don't see an IN listed for the may_treat Multiple Myeloma relationship in the API (I'm only seeing the PIN) so maybe it's not an issue with our code. Maybe it's some weird thing with RxClass UI? API https://rxnav.nlm.nih.gov/REST/rxclass/class/byRxcui.json?rxcui=612937 NOTE: the only may-treat relation is a PIN with RXCUI 72258. I suspect what the RxClass UI is doing is mapping PIN to IN if an IN doesn't already exist in the list. In other words, I see a lot of PINs that kind of have "sister" INs... except for these 3. They only show up as PINs. But you can map from PIN to IN to get the IN if that's preferred. RxNav https://mor.nlm.nih.gov/RxNav/search?searchBy=RXCUI&searchTerm=72258 |
#333 - potential optimization |
Problem Statement
See related branch jrlegrand/rxclass-rework.
RxClass API has a rate limit of 20 calls / second.
There's about 123,246 API calls.
[2024-11-22, 01:14:28 CST] {logging_mixin.py:137} INFO - URL List created of length: 123246
I'm no mathematician, but 20 calls / second x 60 seconds / minute = 1200 calls / minute. 123,246 / 1200 calls / minute = 103 minutes or exactly 1 hour and 43 minutes.
When I run my branch locally, it runs for 1 hour and 43 minutes and errors out with the error below.
As I'm writing this, I think the issue is more about what happens after the API calls have completed - seeing as the time it ran is appropriate based on my #math above and the error seems to be about a KeyError.
Either way, this is not working - maybe the problem isn't with my rate limiting code, but either way it would be great to have other eyes on this.
Criteria for Success
RxClass DAG runs in about 1 hour 45 minutes and does not error out.
Additional Information
https://lhncbc.nlm.nih.gov/RxNav/TermsofService.html
The text was updated successfully, but these errors were encountered: