Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove 7dav cumulative signals from JHU documentation and from the DB #1527

Open
nmdefries opened this issue Aug 22, 2024 · 4 comments · May be fixed by #1586
Open

Remove 7dav cumulative signals from JHU documentation and from the DB #1527

nmdefries opened this issue Aug 22, 2024 · 4 comments · May be fixed by #1586
Assignees
Labels
data quality devops building, running, deploying, environment stuff, handy utils, repository-related, engineer QoL, etc documentation mysql mysql database related

Comments

@nmdefries
Copy link
Contributor

In JHU Cases and Deaths, the documentation still lists the silly signals confirmed_7dav_cumulative_num and confirmed_7dav_cumulative_prop -- it doesn’t make much sense to average cumulative data. We should remove these from the documentation, and also from the DB. Of note, the parallel source USAFacts does not have these (compare the two signal lists). I [Roni] have a vague recollection that I pointed this out during the pandemic, and it was removed, but maybe only from USAFacts and not from JHU.

@melange396 melange396 added devops building, running, deploying, environment stuff, handy utils, repository-related, engineer QoL, etc data quality mysql mysql database related labels Aug 22, 2024
@carlynvandyke carlynvandyke self-assigned this Dec 19, 2024
@carlynvandyke
Copy link

@RoniRos just want to confirm- we should remove deaths_7dav_cumulative_num and deaths_7dav_cumulative_prop, too, right?

@RoniRos
Copy link
Member

RoniRos commented Jan 16, 2025

Yes! And if you find the combination of "7dav" and "cumulative" anywhere else, that should probably be removed as well!

@carlynvandyke carlynvandyke linked a pull request Jan 27, 2025 that will close this issue
@melange396
Copy link
Collaborator

here are all of the cumulative and averaged signals i can find from the metadata:

    ('indicator-combination', 'confirmed_7dav_cumulative_num'),
    ('indicator-combination', 'confirmed_7dav_cumulative_prop'),
    ('indicator-combination', 'deaths_7dav_cumulative_num'),
    ('indicator-combination', 'deaths_7dav_cumulative_prop'),
    ('jhu-csse', 'confirmed_7dav_cumulative_num'),
    ('jhu-csse', 'confirmed_7dav_cumulative_prop'),
    ('jhu-csse', 'deaths_7dav_cumulative_num'),
    ('jhu-csse', 'deaths_7dav_cumulative_prop'),
    ('usa-facts', 'confirmed_7dav_cumulative_num'),
    ('usa-facts', 'confirmed_7dav_cumulative_prop'),
    ('usa-facts', 'deaths_7dav_cumulative_num'),
    ('usa-facts', 'deaths_7dav_cumulative_prop')

they still need to be removed from the database, so i am going to edit the associated PR to remove the "automatic close" linkage

@melange396
Copy link
Collaborator

melange396 commented Feb 13, 2025

signals deleted from db:

MySQL [covid]> select * from signal_dim where `signal` like '%_7dav_cumulative_%';                                      
+---------------+-----------------------+--------------------------------+
| signal_key_id | source                | signal                         |
+---------------+-----------------------+--------------------------------+
|            25 | indicator-combination | confirmed_7dav_cumulative_num  |
|            26 | indicator-combination | confirmed_7dav_cumulative_prop |
|            38 | indicator-combination | deaths_7dav_cumulative_num     |
|            39 | indicator-combination | deaths_7dav_cumulative_prop    |
|            65 | jhu-csse              | confirmed_7dav_cumulative_num  |
|            66 | jhu-csse              | confirmed_7dav_cumulative_prop |
|            70 | jhu-csse              | deaths_7dav_cumulative_num     |
|            71 | jhu-csse              | deaths_7dav_cumulative_prop    |
|            91 | usa-facts             | confirmed_7dav_cumulative_num  |
|            92 | usa-facts             | confirmed_7dav_cumulative_prop |
|            99 | usa-facts             | deaths_7dav_cumulative_num     |
|           100 | usa-facts             | deaths_7dav_cumulative_prop    |
+---------------+-----------------------+--------------------------------+
12 rows in set (0.00 sec)

MySQL [covid]> delete from epimetric_latest where signal_key_id in (select signal_key_id from signal_dim where `signal` like '%_7dav_cumulative_%');                                                                                           
Query OK, 25130802 rows affected (26 min 49.15 sec)

MySQL [covid]> delete from epimetric_full where signal_key_id=25;                                                       
Query OK, 19414643 rows affected (1 hour 5 min 36.07 sec)

MySQL [covid]> delete from epimetric_full where signal_key_id=26;                                                       
Query OK, 19179199 rows affected (54 min 22.43 sec)

MySQL [covid]> delete from epimetric_full where signal_key_id=38;                                                       
Query OK, 19364697 rows affected (1 hour 1 min 3.99 sec)

MySQL [covid]> delete from epimetric_full where signal_key_id=39;                                                       
Query OK, 19136785 rows affected (59 min 36.94 sec)

MySQL [covid]> delete from epimetric_full where signal_key_id=65;                                                       
Query OK, 33248185 rows affected (1 hour 34 min 26.39 sec)

MySQL [covid]> delete from epimetric_full where signal_key_id=66;                                                       
Query OK, 34082976 rows affected (1 hour 49 min 29.66 sec)

MySQL [covid]> delete from epimetric_full where signal_key_id=70;                                                       
Query OK, 32480185 rows affected (1 hour 56 min 45.80 sec)

MySQL [covid]> delete from epimetric_full where signal_key_id=71;                                                       
Query OK, 33571267 rows affected (1 hour 36 min 53.65 sec)

MySQL [covid]> delete from epimetric_full where signal_key_id=91;                                                       
Query OK, 3396687 rows affected (10 min 42.70 sec)

MySQL [covid]> delete from epimetric_full where signal_key_id=92;                                                       
Query OK, 3417835 rows affected (10 min 26.67 sec)

MySQL [covid]> delete from epimetric_full where signal_key_id=99;                                                       
Query OK, 3254267 rows affected (9 min 49.89 sec)

MySQL [covid]> delete from epimetric_full where signal_key_id=100;                                                      
Query OK, 3261662 rows affected (10 min 1.23 sec)

MySQL [covid]>  select * from epimetric_full_v where `signal` like '%_7dav_cumulative_%';                               
Empty set (56.04 sec)

MySQL [covid]> delete from signal_dim where `signal` like '%_7dav_cumulative_%';                                        
Query OK, 12 rows affected (0.02 sec)

re-linking #1586 to close this issue

@melange396 melange396 linked a pull request Feb 13, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data quality devops building, running, deploying, environment stuff, handy utils, repository-related, engineer QoL, etc documentation mysql mysql database related
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants