Skip to content

Commit

Permalink
Add tax year 2023 data (#40)
Browse files Browse the repository at this point in the history
* Add updated agency workbooks and code

* Add new CPI PDF

* Add 2023 equalizers

* Bump DB version in DESCRIPTION and DB

* Add 2023 tax bills (#41)

* add 2023 tax bills

* lint

* styler?

* attempt 2 to style

* add rpm tif

* update data

* Revert "update data"

This reverts commit d0e31ea.

* fixing tax bill summary / adding input procedure

Capture manual reviewed fields from tax bills via terminal entry

* Drop extraneous readline code

* Swap tabulapdf for deprecated tabulizer dependency

* Update TIF agency names output

* Add 2023 TIF report data

* Fix pct formatting for pct_burden, reduction_pct

* Fix TIF records with missing agency number

* Update README figures and links

* Update sample summary with additional bill

* Update test thresholds

* Bump package version

* Cleanup pkgdown manifest

---------

Co-authored-by: Eric Langowski <33432469+erhla@users.noreply.github.com>
  • Loading branch information
dfsnow and erhla authored Aug 6, 2024
1 parent de3d26b commit f0e9763
Show file tree
Hide file tree
Showing 35 changed files with 118 additions and 73 deletions.
2 changes: 1 addition & 1 deletion CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@ message: "If you use this software, please cite it as below."
authors:
- family-names: "Cook County Assessor's Office"
title: "PTAXSIM"
version: 0.6.1
version: 0.6.2
date-released: 2021-12-31
url: https://github.com/ccao-data/ptaxsim"
10 changes: 5 additions & 5 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: ptaxsim
Type: Package
Title: Calculate Cook County Property Tax Bills and Simulate Scenarios
Version: 0.6.1
Version: 0.6.2
Authors@R: c(
person(given = "Dan", family = "Snow", email = "daniel.snow@cookcountyil.gov", role = c("aut", "cre")),
person(given = "Rob", family = "Ross", role = c("aut", "ctb")),
Expand All @@ -22,7 +22,7 @@ Imports:
glue,
RSQLite,
utils
RoxygenNote: 7.3.0
RoxygenNote: 7.3.2
Suggests:
arrow,
covr,
Expand Down Expand Up @@ -51,7 +51,7 @@ Suggests:
sf,
snakecase,
stringr,
tabulizer,
tabulapdf,
testthat,
tidyr,
units,
Expand All @@ -60,6 +60,6 @@ Depends:
R (>= 2.10)
Remotes:
paleolimbot/geoarrow,
ropensci/tabulizer
ropensci/tabulapdf
Config/Requires_DB_Version: 2021.0.4
Config/Wants_DB_Version: 2022.0.0
Config/Wants_DB_Version: 2023.0.0
10 changes: 7 additions & 3 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,7 @@ There are some minor differences between PTAXSIM and the real bill. The taxing d
We can also look at a single property over multiple years, in this case broken out by taxing district. To do so, pass a vector of multiple years to the `year_vec` argument of `tax_bill()`:

```{r multi_year_1, message=FALSE, warning=FALSE}
multiple_years <- tax_bill(2010:2021, "14081020210000")
multiple_years <- tax_bill(2010:2023, "14081020210000")
multiple_years
```

Expand Down Expand Up @@ -203,7 +203,10 @@ library(ggplot2)
# Plot the amount of taxes going to each district over time
multiple_years_plot <- ggplot(data = multiple_years_summ) +
geom_area(aes(x = year, y = final_tax, fill = agency_minor_type)) +
geom_area(
aes(x = year, y = final_tax, fill = agency_minor_type),
alpha = 0.7
) +
geom_vline(xintercept = 2016, linetype = "dashed", alpha = 0.3) +
annotate(
"text",
Expand All @@ -215,7 +218,8 @@ multiple_years_plot <- ggplot(data = multiple_years_summ) +
scale_y_continuous(
name = "Total Tax Amount",
labels = scales::dollar,
expand = c(0, 0)
expand = c(0, 0),
n.breaks = 8
) +
scale_x_continuous(name = "Year", n.breaks = 7) +
scale_fill_manual(values = scales::hue_pal()(10)) +
Expand Down
50 changes: 27 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,13 +37,13 @@ Table of Contents
> installation](#database-installation) for details.
>
> [**Link to PTAXSIM
> database**](https://ccao-data-public-us-east-1.s3.amazonaws.com/ptaxsim/ptaxsim-2022.0.0.db.bz2)
> (DB version: 2022.0.0; Last updated: 2024-01-19 04:40:35)
> database**](https://ccao-data-public-us-east-1.s3.amazonaws.com/ptaxsim/ptaxsim-2023.0.0.db.bz2)
> (DB version: 2023.0.0; Last updated: 2024-08-05 19:43:42)
PTAXSIM is an R package/database to approximate Cook County property tax
bills. It uses real assessment, exemption, TIF, and levy data to
generate historic, line-item tax bills (broken out by taxing district)
for any property from 2006 to 2022. Given some careful assumptions and
for any property from 2006 to 2023. Given some careful assumptions and
data manipulation, it can also provide hypothetical, but factually
grounded, answers to questions such as:

Expand Down Expand Up @@ -173,9 +173,9 @@ database:

1. Download the compressed database file from the CCAO’s public S3
bucket. [Link
here](https://ccao-data-public-us-east-1.s3.amazonaws.com/ptaxsim/ptaxsim-2022.0.0.db.bz2).
here](https://ccao-data-public-us-east-1.s3.amazonaws.com/ptaxsim/ptaxsim-2023.0.0.db.bz2).
2. (Optional) Rename the downloaded database file by removing the
version number, i.e. ptaxsim-2022.0.0.db.bz2 becomes
version number, i.e. ptaxsim-2023.0.0.db.bz2 becomes
`ptaxsim.db.bz2`.
3. Decompress the downloaded database file. The file is compressed
using [bzip2](https://sourceware.org/bzip2/).
Expand Down Expand Up @@ -438,7 +438,7 @@ broken out by taxing district. To do so, pass a vector of multiple years
to the `year_vec` argument of `tax_bill()`:

``` r
multiple_years <- tax_bill(2010:2021, "14081020210000")
multiple_years <- tax_bill(2010:2023, "14081020210000")
multiple_years
#> Key: <year, pin, agency_num>
#> year pin class tax_code av eav agency_num
Expand All @@ -449,11 +449,11 @@ multiple_years
#> 4: 2010 14081020210000 206 73001 69062 227905 030210001
#> 5: 2010 14081020210000 206 73001 69062 227905 030210002
#> ---
#> 122: 2021 14081020210000 206 73105 70000 210189 043030000
#> 123: 2021 14081020210000 206 73105 70000 210189 044060000
#> 124: 2021 14081020210000 206 73105 70000 210189 050200000
#> 125: 2021 14081020210000 206 73105 70000 210189 050200001
#> 126: 2021 14081020210000 206 73105 70000 210189 080180000
#> 144: 2023 14081020210000 206 73105 70000 211141 043030000
#> 145: 2023 14081020210000 206 73105 70000 211141 044060000
#> 146: 2023 14081020210000 206 73105 70000 211141 050200000
#> 147: 2023 14081020210000 206 73105 70000 211141 050200001
#> 148: 2023 14081020210000 206 73105 70000 211141 080180000
#> agency_name agency_major_type agency_minor_type
#> <char> <char> <char>
#> 1: COUNTY OF COOK COOK COUNTY COOK
Expand All @@ -462,11 +462,11 @@ multiple_years
#> 4: CITY OF CHICAGO LIBRARY F... MUNICIPALITY/TOWNSHIP LIBRARY
#> 5: CITY OF CHICAGO SCHOOL BL... MUNICIPALITY/TOWNSHIP MISC
#> ---
#> 122: CHICAGO COMMUNITY COLLEGE... SCHOOL COMM COLL
#> 123: BOARD OF EDUCATION SCHOOL UNIFIED
#> 124: CHICAGO PARK DISTRICT MISCELLANEOUS PARK
#> 125: CHICAGO PARK DISTRICT AQU... MISCELLANEOUS BOND
#> 126: METRO WATER RECLAMATION D... MISCELLANEOUS WATER
#> 144: CHICAGO COMMUNITY COLLEGE... SCHOOL COMM COLL
#> 145: BOARD OF EDUCATION SCHOOL UNIFIED
#> 146: CHICAGO PARK DISTRICT MISCELLANEOUS PARK
#> 147: CHICAGO PARK DISTRICT AQU... MISCELLANEOUS BOND
#> 148: METRO WATER RECLAMATION D... MISCELLANEOUS WATER
#> agency_tax_rate final_tax
#> <num> <num>
#> 1: 0.00423 964.04
Expand All @@ -475,11 +475,11 @@ multiple_years
#> 4: 0.00102 232.46
#> 5: 0.00116 264.37
#> ---
#> 122: 0.00145 225.36
#> 123: 0.03517 4984.70
#> 124: 0.00311 483.37
#> 125: 0.00000 0.00
#> 126: 0.00382 593.71
#> 144: 0.00158 245.36
#> 145: 0.03829 5411.54
#> 146: 0.00318 493.83
#> 147: 0.00000 0.00
#> 148: 0.00345 535.76
```

The `tax_bill()` function will automatically combine the years and PIN
Expand Down Expand Up @@ -512,7 +512,10 @@ library(ggplot2)

# Plot the amount of taxes going to each district over time
multiple_years_plot <- ggplot(data = multiple_years_summ) +
geom_area(aes(x = year, y = final_tax, fill = agency_minor_type)) +
geom_area(
aes(x = year, y = final_tax, fill = agency_minor_type),
alpha = 0.7
) +
geom_vline(xintercept = 2016, linetype = "dashed", alpha = 0.3) +
annotate(
"text",
Expand All @@ -524,7 +527,8 @@ multiple_years_plot <- ggplot(data = multiple_years_summ) +
scale_y_continuous(
name = "Total Tax Amount",
labels = scales::dollar,
expand = c(0, 0)
expand = c(0, 0),
n.breaks = 8
) +
scale_x_continuous(name = "Year", n.breaks = 7) +
scale_fill_manual(values = scales::hue_pal()(10)) +
Expand Down
35 changes: 18 additions & 17 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
destination: public
url: https://github.com/ccao-data/ptaxsim
template:
bootstrap: 5

authors:
sidebar:
Expand All @@ -10,21 +12,20 @@ authors:
href: https://github.com/rross0

reference:
- title: Functions
- subtitle: Calculate tax bills
contents:
- tax_bill
- subtitle: Lookup tax data
contents:
- starts_with("lookup_")
- subtitle: Check database validity
contents:
- starts_with("check_")

- title: Functions
- subtitle: Calculate tax bills
- contents:
- tax_bill
- subtitle: Lookup tax data
- contents:
- starts_with("lookup_")
- subtitle: Check database validity
- contents:
- starts_with("check_")

- title: Data
- subtitle: Sample tax bills
desc: Real Cook County tax bill data used for testing and examples
- contents:
- sample_tax_bills_detail
- sample_tax_bills_summary
- title: Data
- subtitle: Sample tax bills
desc: Real Cook County tax bill data used for testing and examples
contents:
- sample_tax_bills_detail
- sample_tax_bills_summary
3 changes: 3 additions & 0 deletions data-raw/agency/Agency Rate Report 2023.xlsx
Git LFS file not shown
10 changes: 7 additions & 3 deletions data-raw/agency/agency.R
Original file line number Diff line number Diff line change
Expand Up @@ -268,7 +268,7 @@ agency <- map_dfr(file_names, function(file) {
agency_name = str_trim(str_squish(agency_name)),
agg_ext_base_year = as.integer(agg_ext_base_year),
agg_ext_base_year = na_if(agg_ext_base_year, 0),
home_rule_ind = ifelse(home_rule_ind %in% c("Y", "No PTELL"), TRUE, FALSE),
home_rule_ind = home_rule_ind %in% c("Y", "HR", "No PTELL"),
home_rule_ind = replace_na(home_rule_ind, FALSE),
across(
c(
Expand All @@ -286,8 +286,12 @@ agency <- map_dfr(file_names, function(file) {
across(starts_with("cty_"), ~ replace_na(.x, 0)),
# Make all percentages decimals
across(
c(pct_burden, reduction_pct),
~ ifelse(year != 2017, .x / 100, .x)
pct_burden,
~ ifelse(!year %in% c(2017, 2023), .x / 100, .x)
),
across(
reduction_pct,
~ ifelse(!year %in% c(2017), .x / 100, .x)
),
reduction_type = ifelse(
!toupper(reduction_type) %in% c("NO REDUCTION", "NONE"),
Expand Down
4 changes: 2 additions & 2 deletions data-raw/agency/tif_agency_names.csv
Git LFS file not shown
Binary file modified data-raw/cpi/cpihistory.pdf
Binary file not shown.
2 changes: 1 addition & 1 deletion data-raw/create_db.R
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ db_send_queries <- function(conn, sql) {
# changes. This is checked against Config/Requires_DB_Version in the DESCRIPTION
# file via check_db_version(). Schema is:
# "MAX_YEAR_OF_DATA.MAJOR_VERSION.MINOR_VERSION"
db_version <- "2022.0.0"
db_version <- "2023.0.0"

# Set the package version required to use this database. This is checked against
# Version in the DESCRIPTION file. Basically, we have a two-way check so that
Expand Down
4 changes: 2 additions & 2 deletions data-raw/eq_factor/eq_factor.csv
Git LFS file not shown
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
4 changes: 2 additions & 2 deletions data-raw/sample_tax_bills/agency_name_match.csv
Git LFS file not shown
8 changes: 6 additions & 2 deletions data-raw/sample_tax_bills/sample_tax_bills_detail.R
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,11 @@ row_to_names <- function(df) {

# Different tax bills can have different table sizes depending on the number of
# taxing district.

extract_tax_bill <- function(file) {
base_file <- basename(file)
tbl <- pdf_text(file)[[1]] %>%
tbl <- pdf_text(file) %>%
paste(., collapse = "\n") %>%
str_extract(., regex("MISCELLANEOUS TAXES.*", dotall = TRUE)) %>%
str_split(., "\n") %>%
unlist() %>%
Expand Down Expand Up @@ -61,7 +63,8 @@ extract_tax_bill <- function(file) {
agency_name,
paste0(
"TAXES|Assess|Property|EAV|Local Tax|",
"Total Tax|Do not|Equalizer|cookcountyclerk.com"
"Total Tax|Do not|Equalizer|cookcountyclerk.com|",
"Pursuant|meaning of|If paying later|\\d{15}+|By \\d{2}/"
)
)
)
Expand All @@ -76,6 +79,7 @@ extract_tax_bill <- function(file) {
return(out)
}


# Collect all scanned tables + meta data in a data frame
bills <- map(list_pdf_inputs, extract_tax_bill)
bills_df <- bind_rows(bills)
Expand Down
4 changes: 2 additions & 2 deletions data-raw/sample_tax_bills/sample_tax_bills_detail.csv
Git LFS file not shown
4 changes: 2 additions & 2 deletions data-raw/sample_tax_bills/sample_tax_bills_summary.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data-raw/tax_code/2023 Tax Code Agency Rate.xlsx
Git LFS file not shown
Git LFS file not shown
3 changes: 3 additions & 0 deletions data-raw/tif/main/2023 Cook County TIF Summary.xlsx
Git LFS file not shown
19 changes: 16 additions & 3 deletions data-raw/tif/tif.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ library(purrr)
library(readxl)
library(snakecase)
library(stringr)
library(tabulizer)
library(tabulapdf)
library(tidyr)

calc_mode <- function(x) {
Expand Down Expand Up @@ -112,7 +112,7 @@ tif_main_pdf <- map_dfr(summ_file_names_pdf, function(file) {

# Extract tables from PDFs. Some tables get an extra 3rd column which we can
# drop
tables <- extract_tables(file) %>%
tables <- extract_tables(file, method = "stream") %>%
map(function(x) if (ncol(x) > 6) x[, c(1:2, 4:7)] else x) %>%
.[lapply(., nrow) > 1]

Expand All @@ -122,7 +122,10 @@ tif_main_pdf <- map_dfr(summ_file_names_pdf, function(file) {
"agency_num", "tif_name", "first_year",
"curr_year_revenue", "prev_year_revenue", "pct_diff"
)) %>%
filter(agency_num != "AGENCY") %>%
filter(
agency_num != "AGENCY" | is.na(agency_num),
first_year != "Year"
) %>%
mutate(across(where(is.character), ~ na_if(.x, "-"))) %>%
mutate(across(where(is.character), ~ na_if(.x, ""))) %>%
mutate(
Expand Down Expand Up @@ -190,6 +193,16 @@ tif_main <- bind_rows(
"030320500",
agency_num
),
agency_num = ifelse(
agency_num == "003300777700501" & year == 2006,
"030770501",
agency_num
),
agency_num = ifelse(
agency_num == "003300777700509" & year %in% c(2007, 2008),
"030770509",
agency_num
),
agency_num = ifelse(
agency_num == "030770502/507" & year %in% 2011:2012,
"030770502",
Expand Down
Binary file modified data/sample_tax_bills_detail.rda
Binary file not shown.
Binary file modified data/sample_tax_bills_summary.rda
Binary file not shown.
Binary file modified man/figures/README-multi_year_4-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion tests/testthat/test-lookup.R
Original file line number Diff line number Diff line change
Expand Up @@ -232,7 +232,7 @@ test_that("lookup values/data are correct", {
)
expect_known_hash(
lookup_agency(sum_df$year, sum_df$tax_code),
"c4d062201d"
"30ede4ede0"
)
})

Expand Down
Loading

0 comments on commit f0e9763

Please sign in to comment.