-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathREADME.Rmd
300 lines (215 loc) · 25 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
---
title: "Table of Contents"
output:
github_document:
toc: true
toc_depth: 3
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r opts, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
options(datatable.prettyprint.char = 25L)
```
```{r setup, include = FALSE}
library(ptaxsim)
ptaxsim_db_conn <- DBI::dbConnect(RSQLite::SQLite(), "./ptaxsim.db")
metadata <- DBI::dbGetQuery(ptaxsim_db_conn, "SELECT * FROM metadata")
ptaxsim::check_db_conn(ptaxsim_db_conn)
ptaxsim::check_db_sync(ptaxsim_db_conn)
stopifnot(
utils::compareVersion(
metadata$db_version,
packageDescription("ptaxsim")[["Config/Requires_DB_Version"]]
) >= 0
)
```
## PTAXSIM
<a href='https://github.com/ccao-data/ptaxsim'><img src='man/figures/logo.png' align="right" height="139"/></a>
[![R-CMD-check](https://github.com/ccao-data/ptaxsim/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/ccao-data/ptaxsim/actions/workflows/R-CMD-check.yaml)
[![test-coverage](https://github.com/ccao-data/ptaxsim/actions/workflows/test-coverage.yaml/badge.svg)](https://github.com/ccao-data/ptaxsim/actions/workflows/test-coverage.yaml)
[![lint](https://github.com/ccao-data/ptaxsim/actions/workflows/lint.yaml/badge.svg)](https://github.com/ccao-data/ptaxsim/actions/workflows/lint.yaml)
[![pre-commit](https://github.com/ccao-data/ptaxsim/actions/workflows/pre-commit.yaml/badge.svg)](https://github.com/ccao-data/ptaxsim/actions/workflows/pre-commit.yaml)
[![codecov](https://codecov.io/gh/ccao-data/ptaxsim/branch/master/graph/badge.svg)](https://codecov.io/gh/ccao-data/ptaxsim)
> :warning: NOTE: PTAXSIM relies on a separate SQLite database to function correctly. You must download and decompress the database before using this package!
> See [Database installation](#database-installation) for details.
>
> [**Link to PTAXSIM database**](`r metadata$source_url_database`) (DB version: `r metadata$db_version`; Last updated: `r metadata$created_at`)
PTAXSIM is an R package/database to approximate Cook County property tax bills. It uses real assessment, exemption, TIF, and levy data to generate historic, line-item tax bills (broken out by taxing district) for any property from `r metadata$data_year_min` to `r metadata$data_year_max`. Given some careful assumptions and data manipulation, it can also provide hypothetical, but factually grounded, answers to questions such as:
* [What would my property tax bill be if my assessed value was $50K lower? What if my school district's levy goes up?](https://ccao-data.github.io/ptaxsim/articles/introduction.html)
* [How do appeals affect tax bills? What if nobody appeals?](https://ccao-data.github.io/ptaxsim/articles/appeals.html)
* [How do exemptions affect tax bills? What if a current exemption amount is increased?](https://ccao-data.github.io/ptaxsim/articles/exemptions.html)
* [How do TIF districts affect tax bills? What if a nearby TIF district did not exist?](https://ccao-data.github.io/ptaxsim/articles/tifs.html)
* [Where have tax bills changed the most? Where would they change the most given X policy change?](https://ccao-data.github.io/ptaxsim/articles/mapping.html)
PTAXSIM can generate hundreds, or even millions, of tax bills in a single function call, which enables complex tax analysis on a municipality or even whole-county level. PTAXSIM is accurate (within $10 of the real bill) for >99% of historic property tax bills. However, it is currently an experimental tool only and is *not* recommended for critical use. See [Notes](#notes-and-caveats) and [Disclaimer](#disclaimer) for more information.
For detailed documentation on included functions and data, [**visit the full reference list**](https://ccao-data.github.io/ptaxsim/reference/) or the [**introduction vignette**](https://ccao-data.github.io/ptaxsim/articles/introduction.html).
For examples of PTAXSIM's functionality and usage, click one of the questions above or see the [**vignettes page**](https://ccao-data.github.io/ptaxsim/articles/index.html).
## FAQs
**Q: Who is the target audience for PTAXSIM?**
PTAXSIM is a currently a developer and researcher-focused tool. Its intended audience is academics and policymakers interested in tax policy analysis or the history of the property tax system. It is not intended to predict or explain individual bills. In the future, we plan to make PTAXSIM more accessible via a web frontend and/or API.
**Q: I got my Assessment Notice with a *new* assessed value. Can I input the *new* assessed value into PTAXSIM to predict next year’s property tax bill?**
*No.* Assessments sent by the Assessor are not final. A property’s assessed value can change at multiple stages: at the Assessor’s Office due to a reassessment, at the Assessor’s Office due to an appeal, and at the Board of Review due to an appeal. Assessments are not finalized until these stages are complete/certified, and only finalized assessments are used to calculate bills.
Additionally, an increased assessed value does not necessarily result in an increased bill. There are many other factors that contribute to a property’s tax bill – including the assessed value of other properties.
**Q: I know my property's *final* assessed value. Can I input the *final* assessed value into PTAXSIM to predict next year’s property tax bill?**
*No.* Even if you know your property’s final assessed value with certainty, precisely predicting a future tax bill in Cook County is difficult because tax rates stem from multiple agencies (and the numbers they produce each year). These agencies include various taxing districts (typically 10 to 14 per property, including school districts, municipal/township/city governments, and Cook County), multiple Cook County property tax offices (Assessor, Board of Review, and Clerk), and the Illinois Department of Revenue (IDOR).
It is true, however, that PTAXSIM is technically capable of predicting bills. To do this, you must have technical competency in the R programming language, and must make explicit predictions for each of the numeric inputs listed in the table below.
**Q: I can code in R. What other numbers, besides assessed value, do I need to input into PTAXSIM to generate predictions?**
To predict next year’s bill for one property (PIN), you must predict the PIN's taxable value (EAV), as well as what will happen to the individual levies and tax bases of all taxing districts associated with the PIN. The table below lists each input, along with some complications and options:
| Input | What this means | Complications and possible implementation option(s) |
|:-------------------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
| The PIN’s **taxable value** | The PIN’s Equalized Assessed Value (EAV), which incorporates the assessed value, exemptions, and IDOR’s equalizer. | *Complications*: In a tax year, AVs can change at the Assessor’s Office and at the Board of Review. Also, IDOR calculates a new equalizer every year. <br><br> *Option*: Predict what you think is an accurate AV, apply what exemptions you qualify for, and use past equalizers issued by IDOR. |
| **Tax extensions**, also called levies, for each taxing agency associated with that PIN’s tax code (typically, 10-14 agencies) | The total extension is the total dollar amount each taxing agency decides to collect from property owners within the boundaries of its district. | *Complication*: Each agency sets their own tax levy. <br><br> *Option*: For each agency associated with that PIN’s tax code, you might use the past year extension, and apply a percentage increase or decrease. |
| **Tax bases** for each taxing agency associated with that PIN’s tax code (typically, 10-14 agencies) | For each agency, the tax base equals the sum of the EAVs of all properties in its district, except for any region of the agency that is in a TIF (where the base is frozen). | *Complication*: Even after a township has been reassessed, many agencies span across township boundaries. (For example: the taxing district of the City of Chicago is divided up into eight assessment townships, but property tax bills for Chicagoans depend on assessments throughout all eight townships). <br><br> *Option*: For each agency associated with that PIN’s tax code, you might use the past year total EAV, and apply a percentage increase or decrease. |
## Installation
### Package installation
You can install the released version of `ptaxsim` directly from GitHub with one of the following commands:
```{r, eval=FALSE}
# Using remotes
remotes::install_github("ccao-data/ptaxsim")
# Using renv
renv::install("ccao-data/ptaxsim")
# Using pak
pak::pak("ccao-data/ptaxsim")
# Append the @ symbol for a specific version
remotes::install_github("ccao-data/ptaxsim@0.6.0")
```
> :warning: NOTE: Windows users may need to install Rtools in order to build and use this package. Please follow the [instructions here](https://cran.r-project.org/bin/windows/Rtools/) to install Rtools for your version of R. Once Rtools is installed, installation can proceed normally using the code above.
### Database installation
PTAXSIM relies on a separate SQLite database to function correctly. This database contains the information about properties, taxing districts, and TIF districts necessary to calculate tax bills. To use this database:
1. Download the compressed database file from the CCAO's public S3 bucket. [Link here](`r metadata$source_url_database`).
2. (Optional) Rename the downloaded database file by removing the version number, i.e. `r paste0("ptaxsim-", metadata$db_version, ".db.bz2")` becomes `ptaxsim.db.bz2`.
3. Decompress the downloaded database file. The file is compressed using [bzip2](https://sourceware.org/bzip2/).
- On Windows, you can easily decompress bzip2 files using [7-Zip](https://www.7-zip.org/download.html).
- On *nix systems, bzip2 is typically installed by default and can be used via the command line i.e. `bzip2 -d ptaxsim.db.bz2`. If bzip2 is not installed, use the package manager on your system (brew, apt, etc.) to install it first.
4. Place the decompressed database file (`ptaxsim.db`) in a convenient location, preferably at the root of your R project.
5. At the beginning of your project, instantiate a [DBI](https://dbi.r-dbi.org/) connection to the database file with the name `ptaxsim_db_conn`. The PTAXSIM R functions look for this connection object name by default. If you wish to change the name, you can pass the custom named object to the `conn` argument of each PTAXSIM function. Below is a sample DBI connection:
```{r db_conn, message=FALSE}
library(ptaxsim)
# Create the DB connection with the default name expected by PTAXSIM functions
ptaxsim_db_conn <- DBI::dbConnect(RSQLite::SQLite(), "./ptaxsim.db")
```
## Usage
PTAXSIM has a single primary function - `tax_bill()` - with two required arguments:
1. `year_vec` - A numeric vector of tax years
2. `pin_vec` - A character vector of Property Index Numbers (PINs)
The output is a `data.table` containing the tax amount directed to each taxing district, by PIN and year. By default, `tax_bill()` can only generate *historic* tax bills; it cannot generate future or counterfactual bills. To generate future/counterfactual bills, you must provide additional data to `tax_bill()` via its secondary arguments. See the [introduction page](https://ccao-data.github.io/ptaxsim/articles/introduction.html) for more details.
### Single bill, single year
The simplest use of `tax_bill()` is to calculate a single bill for a single year:
```{r single_bill_1, message=FALSE}
single_bill <- tax_bill(year_vec = 2020, pin_vec = "17341020511001")
single_bill
```
To compare this output to a real tax bill, we can reorder the rows and keep only the columns that appear on an actual printed bill.
```{r single_bill_2, message=FALSE}
library(dplyr)
single_bill %>%
select(agency_name, final_tax, agency_tax_rate) %>%
mutate(agency_tax_rate = agency_tax_rate * 100) %>%
arrange(-row_number()) %>%
setNames(c("Agency", "2020 Tax", "2020 Rate")) %>%
knitr::kable("html", digits = 3)
```
Here's the real 2020 tax bill for this PIN for comparison:
![](man/figures/README-sample_bill.png)
There are some minor differences between PTAXSIM and the real bill. The taxing district names may not be identical. Additionally, PTAXSIM aggregates the different Cook County agencies (Public Safety, Health Facilities, and County of Cook) into a single line-item (COUNTY OF COOK).
### Single bill, multiple years
We can also look at a single property over multiple years, in this case broken out by taxing district. To do so, pass a vector of multiple years to the `year_vec` argument of `tax_bill()`:
```{r multi_year_1, message=FALSE, warning=FALSE}
multiple_years <- tax_bill(2010:2023, "14081020210000")
multiple_years
```
The `tax_bill()` function will automatically combine the years and PIN into their Cartesian product. The result is a tax amount per taxing district, per PIN, per year. We can collapse these amounts and then plot them to see how a single PIN has changed over time:
```{r multi_year_2, message=FALSE, warning=FALSE}
multiple_years_summ <- multiple_years %>%
group_by(year, agency_minor_type) %>%
summarize(final_tax = sum(final_tax)) %>%
mutate(
agency_minor_type = factor(
agency_minor_type,
levels = c(
"TIF", "BOND", "COOK", "LIBRARY", "MUNI", "PARK",
"UNIFIED", "COMM COLL", "WATER", "MISC"
)
)
)
```
<details>
<summary><strong>Click here</strong> to show plot code</summary>
```{r multi_year_3, message=FALSE, warning=FALSE}
library(ggplot2)
# Plot the amount of taxes going to each district over time
multiple_years_plot <- ggplot(data = multiple_years_summ) +
geom_area(
aes(x = year, y = final_tax, fill = agency_minor_type),
alpha = 0.7
) +
geom_vline(xintercept = 2016, linetype = "dashed", alpha = 0.3) +
annotate(
"text",
x = 2015.8,
y = 12500,
label = "RPM TIF enacted",
hjust = 1
) +
scale_y_continuous(
name = "Total Tax Amount",
labels = scales::dollar,
expand = c(0, 0),
n.breaks = 8
) +
scale_x_continuous(name = "Year", n.breaks = 7) +
scale_fill_manual(values = scales::hue_pal()(10)) +
theme_minimal() +
guides(fill = guide_legend(title = "District Type"))
```
</details>
```{r multi_year_4, out.width="85%", message=FALSE, warning=FALSE, echo=FALSE}
multiple_years_plot
```
For more advanced usage, such as counterfactual analysis, see the [vignettes page](https://ccao-data.github.io/ptaxsim/articles/index.html).
## Data
The PTAXSIM backend database contains cleaned data from the Cook County Clerk, Treasurer, and Assessor. The database is updated whenever all the data necessary to calculate a new tax year becomes available. Typically this occurs roughly a year after assessments are mailed i.e. 2019 bill data was available in mid-2020.
### Data sources
| Table Name | Source Agency | Source Link | Ingest Script | Contains |
|------------------|-------------------|------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------|-------------------------------------------------------------------|
| agency | Clerk | [Tax Extension - Agency Tax Rate Reports](https://www.cookcountyclerkil.gov/property-taxes/tax-extension-and-rates) | [data-raw/agency/agency.R](data-raw/agency/agency.R) | Taxing district extensions, limits, and base EAV |
| agency_info | Clerk + imputed | [Tax Extension - Agency Tax Rate Reports](https://www.cookcountyclerkil.gov/property-taxes/tax-extension-and-rates) | [data-raw/agency/agency.R](data-raw/agency/agency.R) | Taxing district name, type, and subtype |
| agency_fund | Clerk | [Tax Extension - Agency Tax Rate Reports](https://www.cookcountyclerkil.gov/property-taxes/tax-extension-and-rates) | [data-raw/agency/agency.R](data-raw/agency/agency.R) | Funds and line-items that contribute to each district's extension |
| agency_fund_info | Clerk | [Tax Extension - Agency Tax Rate Reports](https://www.cookcountyclerkil.gov/property-taxes/tax-extension-and-rates) | [data-raw/agency/agency.R](data-raw/agency/agency.R) | Fund name and whether the fund is statutorily capped |
| cpi | IDOR | [History of CPI's Used for the PTELL](https://www2.illinois.gov/rev/localgovernments/property/Documents/cpihistory.pdf) | [data-raw/cpi/cpi.R](data-raw/cpi/cpi.R) | CPI-U used to calculate PTELL limits |
| eq_factor | Clerk | [Tax Extension - Agency Tax Rate Reports](https://www.cookcountyclerkil.gov/property-taxes/tax-extension-and-rates) | [data-raw/eq_factor/eq_factor.R](data-raw/eq_factor/eq_factor.R) | Equalization factor applied to AV to get EAV |
| pin | Clerk + Treasurer | CLERKVALUES and TAXBILLAMOUNTS internal SQL tables | [data-raw/pin/pin.R](data-raw/pin/pin.R) | PIN-level tax code, AV, and exemptions |
| tax_code | Clerk | [Tax Extension - Tax Code Agency Rate Reports](https://www.cookcountyclerkil.gov/property-taxes/tax-extension-and-rates) | [data-raw/tax_code/tax_code.R](data-raw/tax_code/tax_code.R) | Crosswalk of tax codes by district |
| tif | Clerk | [TIF Reports - Cook County Summary Reports](https://www.cookcountyclerkil.gov/property-taxes/tifs-tax-increment-financing/tif-reports) | [data-raw/tif/tif.R](data-raw/tif/tif.R) | TIF revenue, start year, and cancellation year |
| tif_crosswalk | Clerk | Manually created from TIF summary and distribution reports | [data-raw/tif/tif.R](data-raw/tif/tif.R) | Fix for data issue identified in #39 |
| tif_distribution | Clerk | [TIF Reports - Tax Increment Agency Distribution Reports](https://www.cookcountyclerkil.gov/property-taxes/tifs-tax-increment-financing/tif-reports) | [data-raw/tif/tif.R](data-raw/tif/tif.R) | TIF EAV, frozen EAV, and distribution percentage by tax code |
### Database diagram
> :warning:
> [Click here](https://ccao-data.github.io/ptaxsim/mermaid/er-diagram-big.svg)
> for a more detailed version of the diagram below.
![](https://ccao-data.github.io/ptaxsim/mermaid/er-diagram-small.svg)
## Notes and caveats
- Currently, the per-district tax calculations for properties in the Red-Purple Modernization (RPM) TIF are slightly flawed. However, the total tax bill per PIN is still accurate. See issue [#4](https://github.com/ccao-data/ptaxsim/issues/4) for more information.
- Special Service Area (SSA) rates must be calculated manually when creating counterfactual bills. See issue [#3](https://github.com/ccao-data/ptaxsim/issues/3) for more information.
- In rare instances, a TIF can have multiple `agency_num` identifiers (usually there's only one per TIF). The `tif_crosswalk` table determines what the "main" `agency_num` is for each TIF and pulls the name and TIF information using that identifier. See issue [GitLab #39](https://gitlab.com/ccao-data-science---modeling/packages/ptaxsim/-/issues/39) for more information.
- PTAXSIM is relatively memory-efficient and can calculate every district line-item for every tax bill for the last 15 years (roughly 350 million rows). However, the memory required for this calculation is substantial (around 100 GB).
- PTAXSIM's accuracy is measured automatically with an [integration test](tests/testthat/test-accuracy.R). The test takes a random sample of 1 million PINs, calculates the total bill for each PIN, and compares it to the real total bill.
- This repository contains an edited version of PTAXSIM's commit history. Historical Git LFS and other data files (.csv, .xlsx, etc.) were removed in the transition to GitHub. The most current version of these files is available starting in commit [1f06639](https://github.com/ccao-data/ptaxsim/commit/1f06639d98a720999222579b7ff61bcce061f1ec). If you need the historical LFS files for any reason, please visit the [GitLab archive](https://gitlab.com/ccao-data-science---modeling/packages/ptaxsim) of this repository.
## Disclaimer
This package and the included database are for educational purposes only. The Assessor's office releases the package and database without any representations or warranties of any kind, whether express or implied. Any data, figures, or amounts contained within the package/database, used by the package/database, or produced by the package are solely for illustrative purposes.
Any results produced by this package as distributed are not official, as they are hypothetical, and should not be relied upon for any business or commercial purpose. The Assessor's office expressly disclaims any liability for any entity's reliance on this package and/or database.
## Release procedures
The PTAXSIM package consists of two components, a package and a database, each of which can be updated independently. Both components have built-in checks to ensure that minimum version expectations are met, i.e. package version `0.5.4` expects database version `2021.0.1` or higher, and database version `2021.0.1` expects package version `0.5.3` or higher.
### Schema
The package uses the [SemVer](https://semver.org/) schema for versioning: `<MAJOR VERSION>.<MINOR VERSION>.<PATCH>`.
The database uses a custom schema for versioning: `<TAX YEAR>.<MAJOR VERSION>.<MINOR VERSION>`. Where `TAX YEAR` denotes the most recent year of data in the database.
### Timing
The PTAXSIM database is updated whenever [all the data necessary](#data-sources) to calculate a new tax year becomes available. Typically this occurs about a year after initial assessments are mailed. Once all necessary data is available, it is manually incorporated into the database, and the database tax year version is updated to reflect the new data.
### Checklists
The process of updating the package and/or database can be somewhat involved. As such, please use the following release checklists when creating a new version:
- [Database release checklist](https://github.com/ccao-data/ptaxsim/issues/new?assignees=&labels=release&projects=&template=release-database.md&title=Database+release+%5BVERSION%5D)
- [Package release checklist](https://github.com/ccao-data/ptaxsim/issues/new?assignees=&labels=release&projects=&template=release-package.md&title=Package+release+%5BVERSION%5D)