Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release covidcast-indicators 0.3.55 #1991

Merged
merged 58 commits into from
Jul 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
ea52ad7
Replace deprecated pkg_resources
rzats Jan 4, 2024
3ae53ad
Less strict version
rzats Jan 23, 2024
16a2128
Merge pull request #1957 from cmu-delphi/bot/sync-prod-main
melange396 Apr 24, 2024
20693a2
fix: errors in build-container-images.yml
dsweber2 Apr 24, 2024
3022a90
ci: update backfill-corr-ci.yml
dshemetov Apr 24, 2024
3fc3254
Merge pull request #1958 from cmu-delphi/backfillCI_fix
dshemetov Apr 30, 2024
8afaa63
refactor(geomap): add type hints, refactor test_archive
dshemetov May 3, 2024
d4b056e
lint: format geomap.py with black
dshemetov May 3, 2024
677131e
repo: ignore format commit blame
dshemetov May 3, 2024
0751162
Merge pull request #1706 from cmu-delphi/ds/geomap-refactor
dshemetov May 3, 2024
ff2d341
lint+doc: update geomap docs and minor lint
dshemetov May 6, 2024
a7fbb3e
feat(geomap): add aggregate_by_weighted_sum
dshemetov May 6, 2024
7359bf9
Update _delphi_utils_python/delphi_utils/geomap.py
dshemetov May 7, 2024
c249a3a
Update _delphi_utils_python/delphi_utils/geomap.py
dshemetov May 7, 2024
577a41e
Update _delphi_utils_python/delphi_utils/geomap.py
dshemetov May 7, 2024
912f58d
Update _delphi_utils_python/delphi_utils/geomap.py
dshemetov May 7, 2024
511322b
feat(geomap): fix and test aggregate_by_weighted_sum
dshemetov May 7, 2024
79072dc
lint: format test_geomap
dshemetov May 7, 2024
48e247f
repo: update blame-ignore
dshemetov May 7, 2024
0c65bb4
Merge pull request #1960 from cmu-delphi/ds/geomap
dshemetov May 7, 2024
6bccf68
specify solver
minhkhul May 30, 2024
74726ed
weekday test change
minhkhul May 31, 2024
f011857
Merge pull request #1966 from cmu-delphi/doctor_visit_clarabel
minhkhul Jun 3, 2024
6912077
feat+lint+ci: unify linters, add `make format`
dshemetov Oct 25, 2023
6f46f2b
lint: sort dependencies in all setup.py
dshemetov May 15, 2024
3794513
chore: sorting dependencies ignore blame
dshemetov Jun 5, 2024
a6ea003
Merge pull request #1905 from cmu-delphi/ds/lint
dshemetov Jun 5, 2024
bc6962e
lint(geomap): minor tweak
dshemetov Jun 7, 2024
8acd18b
Merge branch 'main' into rzatserkovnyi/pkg-resources
dshemetov Jun 7, 2024
195d7b2
lint: format
dshemetov Jun 7, 2024
38ea2da
Merge pull request #1921 from cmu-delphi/rzatserkovnyi/pkg-resources
dshemetov Jun 7, 2024
ae6f011
nssp pipeline code (#1952)
minhkhul Jun 10, 2024
445b583
Reformat NSSP county zip code (#1976)
minhkhul Jun 18, 2024
e2033c3
Add params to control solver, default Clarabel, pin cvxpy version (#1…
minhkhul Jun 25, 2024
84d0597
remove hhs and chng from sircal (#1971)
minhkhul Jun 25, 2024
d3bac9d
initial add pipeline manual
nmdefries Jul 1, 2024
7e4a4fc
links
nmdefries Jul 1, 2024
79738f6
formatting cleanup
nmdefries Jul 1, 2024
8ebb475
resource links
nmdefries Jul 1, 2024
027d269
statistical review
nmdefries Jul 1, 2024
100bc74
naming standards
nmdefries Jul 2, 2024
f26beea
documentation
nmdefries Jul 2, 2024
ae61a70
commenting and TODOs
nmdefries Jul 2, 2024
6f24e88
user links
nmdefries Jul 2, 2024
fe39ebb
receiving backticks
nmdefries Jul 2, 2024
8ecea58
wrap lines
nmdefries Jul 8, 2024
1526dc9
Doctor_visits patching code (#1977)
minhkhul Jul 8, 2024
c261a20
Format, wording recommendations
nmdefries Jul 9, 2024
fc9b00f
archive differ explanation
nmdefries Jul 9, 2024
1916191
drop location links
nmdefries Jul 9, 2024
8c81bae
don't mention R; naming conventions
nmdefries Jul 9, 2024
257179b
Merge pull request #1983 from cmu-delphi/ndefries/pipeline-manual
nmdefries Jul 9, 2024
5d4fdbd
attempt to fix jenkins builds (#1988)
melange396 Jul 10, 2024
e36057d
fix doctor_visits log location & export_dir (#1980)
melange396 Jul 10, 2024
3a6c411
preserve old weekday solver behavior for changehc (#1990)
melange396 Jul 10, 2024
8875bb1
chore: bump delphi_utils to 0.3.24
Jul 10, 2024
f4e1af2
chore: bump covidcast-indicators to 0.3.55
Jul 10, 2024
96d707a
[create-pull-request] automated change
melange396 Jul 10, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 0.3.54
current_version = 0.3.55
commit = True
message = chore: bump covidcast-indicators to {new_version}
tag = False
6 changes: 6 additions & 0 deletions .git-blame-ignore-revs
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Format geomap.py
d4b056e7a4c11982324e9224c9f9f6fd5d5ec65c
# Format test_geomap.py
79072dcdec3faca9aaeeea65de83f7fa5c00d53f
# Sort setup.py dependencies
6912077acba97e835aff7d0cd3d64309a1a9241d
42 changes: 11 additions & 31 deletions .github/workflows/backfill-corr-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,57 +10,37 @@ name: R backfill corrections

on:
push:
branches: [ main, prod ]
branches: [main, prod]
pull_request:
types: [ opened, synchronize, reopened, ready_for_review ]
branches: [ main, prod ]
types: [opened, synchronize, reopened, ready_for_review]
branches: [main, prod]

jobs:
build:
runs-on: ubuntu-20.04
runs-on: ubuntu-latest
if: github.event.pull_request.draft == false
strategy:
matrix:
r-version: [4.2.1]
defaults:
run:
working-directory: backfill_corrections/delphiBackfillCorrection

steps:
- uses: actions/checkout@v2
- name: Set up R ${{ matrix.r-version }}
- uses: actions/checkout@v4

- name: Set up R 4.2
uses: r-lib/actions/setup-r@v2
with:
r-version: ${{ matrix.r-version }}
use-public-rspm: true
- name: Install linux dependencies
run: |
sudo apt-get install \
libcurl4-openssl-dev \
libgdal-dev \
libudunits2-dev \
libglpk-dev \
libharfbuzz-dev \
libfribidi-dev
- name: Get date
id: get-date
run: |
echo "::set-output name=date::$(/bin/date -u "+%Y%m%d")"
- name: Cache R packages
uses: actions/cache@v2
with:
path: ${{ env.R_LIBS_USER }}
key: ${{ runner.os }}-r-backfillcorr-${{ steps.get-date.outputs.date }}
restore-keys: |
${{ runner.os }}-r-backfillcorr-
r-version: 4.2

- name: Install and cache dependencies
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
uses: r-lib/actions/setup-r-dependencies@v2
with:
extra-packages: any::rcmdcheck
working-directory: backfill_corrections/delphiBackfillCorrection
upgrade: 'TRUE'
upgrade: "TRUE"

- name: Check package
uses: r-lib/actions/check-r-package@v2
with:
Expand Down
5 changes: 3 additions & 2 deletions .github/workflows/build-container-images.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,15 @@ name: Build indicator container images and upload to registry

on:
push:
branches: [ main, prod ]
branches: [main, prod]
workflow_dispatch:

jobs:
build:
runs-on: ubuntu-latest
strategy:
matrix:
packages: [ backfill_corrections ]
packages: [backfill_corrections]
steps:
- name: Checkout code
uses: actions/checkout@v2
Expand Down
51 changes: 35 additions & 16 deletions .github/workflows/python-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,28 +16,42 @@ jobs:
if: github.event.pull_request.draft == false
strategy:
matrix:
packages:
[
_delphi_utils_python,
changehc,
claims_hosp,
doctor_visits,
google_symptoms,
hhs_hosp,
nchs_mortality,
nwss_wastewater,
quidel_covidtest,
sir_complainsalot,
]
include:
- package: "_delphi_utils_python"
dir: "delphi_utils"
- package: "changehc"
dir: "delphi_changehc"
- package: "claims_hosp"
dir: "delphi_claims_hosp"
- package: "doctor_visits"
dir: "delphi_doctor_visits"
- package: "google_symptoms"
dir: "delphi_google_symptoms"
- package: "hhs_hosp"
dir: "delphi_hhs"
- package: "nchs_mortality"
dir: "delphi_nchs_mortality"
- package: "nssp"
dir: "delphi_nssp"
- package: "nwss_wastewater"
dir: "delphi_nwss"
- package: "quidel_covidtest"
dir: "delphi_quidel_covidtest"
- package: "sir_complainsalot"
dir: "delphi_sir_complainsalot"
defaults:
run:
working-directory: ${{ matrix.packages }}
working-directory: ${{ matrix.package }}
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Set up Python 3.8
uses: actions/setup-python@v2
uses: actions/setup-python@v5
with:
python-version: 3.8
cache: "pip"
cache-dependency-path: "setup.py"
- name: Install testing dependencies
run: |
python -m pip install --upgrade pip
Expand All @@ -51,3 +65,8 @@ jobs:
- name: Test
run: |
make test
- uses: akaihola/darker@v2.1.1
with:
options: "--check --diff --isort --color"
src: "${{ matrix.package }}/${{ matrix.dir }}"
version: "~=2.1.1"
2 changes: 1 addition & 1 deletion Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
- TODO: #527 Get this list automatically from python-ci.yml at runtime.
*/

def indicator_list = ["backfill_corrections", "changehc", "claims_hosp", "google_symptoms", "hhs_hosp", "nchs_mortality", "quidel_covidtest", "sir_complainsalot", "doctor_visits", "nwss_wastewater"]
def indicator_list = ["backfill_corrections", "changehc", "claims_hosp", "google_symptoms", "hhs_hosp", "nchs_mortality", "quidel_covidtest", "sir_complainsalot", "doctor_visits", "nwss_wastewater", "nssp"]
def build_package_main = [:]
def build_package_prod = [:]
def deploy_staging = [:]
Expand Down
41 changes: 32 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

In early April 2020, Delphi developed a uniform data schema for [a new Epidata endpoint focused on COVID-19](https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html). Our intent was to provide signals that would track in real-time and in fine geographic granularity all facets of the COVID-19 pandemic, aiding both nowcasting and forecasting. Delphi's long history in tracking and forecasting influenza made us uniquely situated to provide access to data streams not available anywhere else, including medical claims data, electronic medical records, lab test records, massive public surveys, and internet search trends. We also process commonly-used publicly-available data sources, both for user convenience and to provide data versioning for sources that do not track revisions themselves.

Each data stream arrives in a different format using a different delivery technique, be it sftp, an access-controlled API, or an email attachment. The purpose of each pipeline in this repository is to fetch the raw source data, extract informative aggregate signals, and output those signals---which we call **COVID-19 indicators**---in a common format for upload to the [COVIDcast API](https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html).
Each data stream arrives in a different format using a different delivery technique, be it sftp, an access-controlled API, or an email attachment. The purpose of each pipeline in this repository is to fetch the raw source data, extract informative aggregate signals, and output those signals---which we call **COVID-19 indicators**---in a common format for upload to the [COVIDcast API](https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html).

For client access to the API, along with a variety of other utilities, see our [R](https://cmu-delphi.github.io/covidcast/covidcastR/) and [Python](https://cmu-delphi.github.io/covidcast/covidcast-py/html/) packages.

Expand All @@ -13,18 +13,19 @@ For interactive visualizations (of a subset of the available indicators), see ou
## Organization

Utilities:
* `_delphi_utils_python` - common behaviors
* `_template_python` & `_template_r` - starting points for new data sources
* `ansible` & `jenkins` - automated testing and deployment
* `sir_complainsalot` - a Slack bot to check for missing data

- `_delphi_utils_python` - common behaviors
- `_template_python` & `_template_r` - starting points for new data sources
- `ansible` & `jenkins` - automated testing and deployment
- `sir_complainsalot` - a Slack bot to check for missing data

Indicator pipelines: all remaining directories.

Each indicator pipeline includes its own documentation.
Each indicator pipeline includes its own documentation.

* Consult README.md for directions to install, lint, test, and run the pipeline for that indicator.
* Consult REVIEW.md for the checklist to use for code reviews.
* Consult DETAILS.md (if present) for implementation details, including handling of corner cases.
- Consult README.md for directions to install, lint, test, and run the pipeline for that indicator.
- Consult REVIEW.md for the checklist to use for code reviews.
- Consult DETAILS.md (if present) for implementation details, including handling of corner cases.

## Development

Expand All @@ -35,6 +36,28 @@ Each indicator pipeline includes its own documentation.
3. Add new commits to your branch in response to feedback.
4. When approved, tag an admin to merge the PR. Let them know if this change should be released immediately, at a set future date, or if it can just go along for the ride whenever the next release happens.

### Linting and Formatting

Each indicator has a `make lint` command to check for linting errors and a `make
format` command to incrementally format your code (using
[darker](https://github.com/akaihola/darker)). These are both automated with a
[Github Action](.github/workflows/python-ci.yml).

If you get the error `ERROR:darker.git:fatal: Not a valid commit name <hash>`,
then it's likely because your local main branch is not up to date; either you
need to rebase or merge. Note that `darker` reads from `pyproject.toml` for
default settings.

If the lines you change are in a file that uses 2 space indentation, `darker`
will indent the lines around your changes and not the rest, which will likely
break the code; in that case, you should probably just pass the whole file
through black. You can do that with the following command (using the same
virtual environment as above):

```sh
env/bin/black <file>
```

## Release Process

The release process consists of multiple steps which can all be done via the GitHub website:
Expand Down
2 changes: 1 addition & 1 deletion _delphi_utils_python/.bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 0.3.23
current_version = 0.3.24
commit = True
message = chore: bump delphi_utils to {new_version}
tag = False
Expand Down
22 changes: 0 additions & 22 deletions _delphi_utils_python/.pylintrc

This file was deleted.

5 changes: 4 additions & 1 deletion _delphi_utils_python/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,12 @@ install-ci: venv
pip install .

lint:
. env/bin/activate; pylint delphi_utils
. env/bin/activate; pylint delphi_utils --rcfile=../pyproject.toml
. env/bin/activate; pydocstyle delphi_utils

format:
. env/bin/activate; darker delphi_utils

test:
. env/bin/activate ;\
(cd tests && ../env/bin/pytest --cov=delphi_utils --cov-report=term-missing)
Expand Down
31 changes: 11 additions & 20 deletions _delphi_utils_python/data_proc/geomap/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Geocoding data processing pipeline
# Geocoding Data Processing

Authors: Jingjing Tang, James Sharpnack, Dmitry Shemetov

Expand All @@ -7,42 +7,37 @@ Authors: Jingjing Tang, James Sharpnack, Dmitry Shemetov
Requires the following source files below.

Run the following to build the crosswalk tables in `covidcast-indicators/_delph_utils_python/delph_utils/data`
```

```sh
$ python geo_data_proc.py
```

You can see consistency checks and diffs with old sources in ./consistency_checks.ipynb
Find data consistency checks in `./source-file-sanity-check.ipynb`.

## Geo Codes

We support the following geocodes.

- The ZIP code and the FIPS code are the most granular geocodes we support.
- The [ZIP code](https://en.wikipedia.org/wiki/ZIP_Code) is a US postal code used by the USPS and the [FIPS code](https://en.wikipedia.org/wiki/FIPS_county_code) is an identifier for US counties and other associated territories. The ZIP code is five digit code (with leading zeros).
- The FIPS code is a five digit code (with leading zeros), where the first two digits are a two-digit state code and the last three are a three-digit county code (see this [US Census Bureau page](https://www.census.gov/library/reference/code-lists/ansi.html) for detailed information).
- The Metropolitan Statistical Area (MSA) code refers to regions around cities (these are sometimes referred to as CBSA codes). More information on these can be found at the [US Census Bureau](https://www.census.gov/programs-surveys/metro-micro/about.html).
- We are reserving 10001-10099 for states codes of the form 100XX where XX is the FIPS code for the state (the current smallest CBSA is 10100). In the case that the CBSA codes change then it should be verified that these are not used.
- The [ZIP code](https://en.wikipedia.org/wiki/ZIP_Code) is a US postal code used by the USPS and the [FIPS code](https://en.wikipedia.org/wiki/FIPS_county_code) is an identifier for US counties and other associated territories. The ZIP code is five digit code (with leading zeros).
- The FIPS code is a five digit code (with leading zeros), where the first two digits are a two-digit state code and the last three are a three-digit county code (see this [US Census Bureau page](https://www.census.gov/library/reference/code-lists/ansi.html) for detailed information).
- The Metropolitan Statistical Area (MSA) code refers to regions around cities (these are sometimes referred to as CBSA codes). More information on these can be found at the [US Census Bureau](https://www.census.gov/programs-surveys/metro-micro/about.html). We rserve 10001-10099 for states codes of the form 100XX where XX is the FIPS code for the state (the current smallest CBSA is 10100). In the case that the CBSA codes change then it should be verified that these are not used.
- State codes are a series of equivalent identifiers for US state. They include the state name, the state number (state_id), and the state two-letter abbreviation (state_code). The state number is the state FIPS code. See [here](https://en.wikipedia.org/wiki/List_of_U.S._state_and_territory_abbreviations) for more.
- The Hospital Referral Region (HRR) and the Hospital Service Area (HSA). More information [here](https://www.dartmouthatlas.org/covid-19/hrr-mapping/).
FIPS codes depart in some special cases, so we produce manual changes listed below.

## Source files
## Source Files

The source files are requested from a government URL when `geo_data_proc.py` is run (see the top of said script for the URLs). Below we describe the locations to find updated versions of the source files, if they are ever needed.

- ZIP -> FIPS (county) population tables available from [US Census](https://www.census.gov/geographies/reference-files/time-series/geo/relationship-files.html#par_textimage_674173622). This file contains the population of the intersections between ZIP and FIPS regions, allowing the creation of a population-weighted transform between the two. As of 4 February 2022, this source did not include population information for 24 ZIPs that appear in our indicators. We have added those values manually using information available from the [zipdatamaps website](www.zipdatamaps.com).
- ZIP -> HRR -> HSA crosswalk file comes from the 2018 version at the [Dartmouth Atlas Project](https://atlasdata.dartmouth.edu/static/supp_research_data).
- FIPS -> MSA crosswalk file comes from the September 2018 version of the delineation files at the [US Census Bureau](https://www.census.gov/geographies/reference-files/time-series/demo/metro-micro/delineation-files.html).
- State Code -> State ID -> State Name comes from the ANSI standard at the [US Census](https://www.census.gov/library/reference/code-lists/ansi.html#par_textimage_3). The first two digits of a FIPS codes should match the state code here.
- State Code -> State ID -> State Name comes from the ANSI standard at the [US Census](https://www.census.gov/library/reference/code-lists/ansi.html#par_textimage_3).


## Derived files
## Derived Files

The rest of the crosswalk tables are derived from the mappings above. We provide crosswalk functions from granular to coarser codes, but not the other way around. This is because there is no information gained when crosswalking from coarse to granular.



## Deprecated source files
## Deprecated Source Files

- ZIP to FIPS to HRR to states: `02_20_uszips.csv` comes from a version of the table [here](https://simplemaps.com/data/us-zips) modified by Jingjing to include population weights.
- The `02_20_uszips.csv` file is based on the newest consensus data including 5-digit zipcode, fips code, county name, state, population, HRR, HSA (I downloaded the original file from [here](https://simplemaps.com/data/us-zips). This file matches best to the most recent (2020) situation in terms of the population. But there still exist some matching problems. I manually checked and corrected those lines (~20) with [zip-codes](https://www.zip-codes.com/zip-code/58439/zip-code-58439.asp). The mapping from 5-digit zipcode to HRR is based on the file in 2017 version downloaded from [here](https://atlasdata.dartmouth.edu/static/supp_research_data).
Expand All @@ -51,7 +46,3 @@ The rest of the crosswalk tables are derived from the mappings above. We provide
- CBSA -> FIPS crosswalk from [here](https://data.nber.org/data/cbsa-fips-county-crosswalk.html) (the file is `cbsatocountycrosswalk.csv`).
- MSA tables from March 2020 [here](https://www.census.gov/geographies/reference-files/time-series/demo/metro-micro/delineation-files.html). This file seems to differ in a few fips codes from the source for the 02_20_uszip file which Jingjing constructed. There are at least 10 additional fips in 03_20_msa that are not in the uszip file, and one of the msa codes seems to be incorrect: 49020 (a google search confirms that it is incorrect in uszip and correct in the census data).
- MSA tables from 2019 [here](https://apps.bea.gov/regional/docs/msalist.cfm)

## Notes

- The NAs in the coding currently zero-fills.
Loading