-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1972 replace covidcast #2004
1972 replace covidcast #2004
Conversation
edited: runs: 171.824 secs runs: 265.703 |
1ff5866
to
6e22db8
Compare
@aysim319 can you explain your previous comment a little more? what do each of those files represent? the first two files appear to be telling me that this branch runs a little bit slower (281s) than the dev branch (267s)... but what operation are they performing (ie, what commands and arguments did you use to run these samples)? |
The first time I ran the comparison the profiler was taking more time because of the issues param that I thought I also needed to figure out to format to make the call to epidata. I later found out I didn't need to pass along the issues param which fixed both the speed and the difference between the validator result. |
323f481
to
9269621
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Want to make sure the tests we have are solid, so made some a few change requests.
8ed8c6b
to
76f1519
Compare
Couldn't find a way to elegantly run the whole thing while having 2 seperate logs so I ran the first half where it's calling with covidcast api and then saved the resulted into parquet with the rest commented it out, then ran the whole thing to get the logs: where the only logs except the initial metadata run is just from the epidata |
fa2208e
to
7f60275
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as correctness goes, this is looking good to me. I tested locally and the new port is parsing things identically to the old covidcast function (FWIW, the client logs aren't that important here, we just really want to make sure the DataFrame outputs from our API calls are absolutely identical).
I convinced myself that we don't need to test every signal: the covidcast response schema is the same for every signal, so testing a single signal API query gets you 99% coverage (as long as the query returns a representative data subset). The only snag is that time_value can be in two different formats (date or epiweek), so as long as we test NCHS along with the other sources, we get full coverage. I tweaked the test to test a single signal per source, so it runs much faster now (thank you for doing the comprehensive runs nonetheless, it's nice to have that extra safety!).
I also found and fixed some anti-patterns in the covidcast code you ported over, specifically I made an error here and fixing it made the code a lot uglier. I think clarity is more important, so I reverted it._parse_datetimes
. Should be a bit faster.
TODO:
- fix conflicts
- test with CI (for some reason CI isn't running?)
|
||
response = Epidata.covidcast_meta() | ||
|
||
if response["result"] != 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@melange396 would this do the trick? or still add the conditional anyway?
4d6117f
to
d5be1bd
Compare
Screwed up with rebasing instead of merge: Continuation from last comment #2056 |
Description
refactored covidcast that under the hood uses a for loop for each day to grab signals
Changelog
Itemize code/test/documentation changes and files added/removed.
Associated Issue(s)