Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bus Cost Analysis Refactor 2 #1175

Merged
merged 17 commits into from
Jul 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
34791cc
minor updates to narrative. experimenting with exporting to pdf and html
csuyat-dot Jul 2, 2024
b81c2bd
testing weasyprint to export NB to pdf
csuyat-dot Jul 2, 2024
b43c380
still experimenting converting NB > HTML > PDF using nbconvert and we…
csuyat-dot Jul 3, 2024
cc43299
final attempt at converting NB to pdf. also testing papermill and nbc…
csuyat-dot Jul 8, 2024
37c1c12
edited and renamed weasyprint files
csuyat-dot Jul 8, 2024
770fe50
testing LaTex page break. got page break to work with straight nbconv…
csuyat-dot Jul 8, 2024
07ab332
tested papermill and LaTex but ended up using weasyprint and <br> tag…
csuyat-dot Jul 9, 2024
4686e2c
renamed files in pseudo-chronilogical order, updated Makefile
csuyat-dot Jul 10, 2024
17822c5
renamed files again to a working syntax. updated makefile, ran makefi…
csuyat-dot Jul 10, 2024
c456f70
tested refactored outlier flag function. added updated func to affect…
csuyat-dot Jul 10, 2024
eb1b777
fixed keyerror. ran Makefile and everything is running well
csuyat-dot Jul 10, 2024
824a757
added new column for # of projects to pivot_source table
csuyat-dot Jul 10, 2024
781d04f
created a new agg function that also returns the count of projects an…
csuyat-dot Jul 11, 2024
c97266d
more changes to NB from feedback
csuyat-dot Jul 11, 2024
26a0b82
final addtions and edits: appendix, removed dupe tables, expanded int…
csuyat-dot Jul 12, 2024
7e90fa8
fixed the spacing of the final NB, Makefile is running/producing the …
csuyat-dot Jul 12, 2024
101d8a5
updated readme, ran make file, deleted blank front page of pdf, delet…
csuyat-dot Jul 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 13 additions & 6 deletions bus_procurement_cost/Makefile
Original file line number Diff line number Diff line change
@@ -1,8 +1,15 @@
# runs all scripts for bus procurement cost
all_bus_scripts:
python fta_data_cleaner.py
python tircp_data_cleaner.py
python dgs_data_cleaner.py
python cost_per_bus_cleaner.py
jupyter nbconvert --to notebook --execute --inplace cost_per_bus_analysis.ipynb
jupyter nbconvert --to html --no-input --no-prompt cost_per_bus_analysis.ipynb
python _01_fta_data_cleaner.py
python _02_tircp_data_cleaner.py
python _03_dgs_data_cleaner.py
python _04_cost_per_bus_cleaner.py

#execute NB
jupyter nbconvert --to notebook --execute --inplace _05_cost_per_bus_analysis.ipynb

#convert NB to HTML then to PDF
jupyter nbconvert --to html --no-input --no-prompt _05_cost_per_bus_analysis.ipynb
pip install WeasyPrint
weasyprint _05_cost_per_bus_analysis.html cost_per_bus_analysis.pdf

15 changes: 10 additions & 5 deletions bus_procurement_cost/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,28 +57,28 @@ Analyze bus procurement projects to see how much transit agencies pay for them.
Executing `make all_bus_scripts` will run the following scripts
<br></br>

- **fta_data_cleaner.py:**
- **_01_fta_data_cleaner.py:**
* Reads in and cleans FTA data
* outputs 2 files:
* cleaned, all projects: `clean_fta_all_projects.parquet`
* cleaned, bus only projects:`clean_fta_bus_only.parquet`
<br></br>

- **tircp_data_cleaner.py**
- **_02_tircp_data_cleaner.py**
* Reads in and cleans tircp data
* outputs 2 files:
* cleaned, all projects: `clean_tircp_all_project.parquet`
* cleaned, bus only projects:`clean_tircp_bus_only.parquet`
<br></br>

- **dgs_data_cleaner.py**
- **_03_dgs_data_cleaner.py**
* Reads in and cleans DGS data
* outputs 2 files:
* cleaned, bus only projects: `clean_dgs_all_projects.parquet`
* cleaned, bus only projects with options:`clean_dgs_bus_only_w_options.parquet`
<br></br>

- **cost_per_bus_cleaner.py**
- **_04_cost_per_bus_cleaner.py**
* Reads in and merges all the bus only datasets
* updates columns names
* calculates `cost_per_bus`, z-score and idetifies outliers.
Expand All @@ -97,4 +97,9 @@ Executing `make all_bus_scripts` will run the following scripts
* hides the code cells and prompts
<br></br>

output files are saved to GCS at: `calitp-analytics-data/data-analyses/bus_procurement_cost`
- **weasyprint ...html ...pdf
* convers the HTML files to PDF, perserving the same style fonts, tables and charts.

Output data files are saved to GCS at: `calitp-analytics-data/data-analyses/bus_procurement_cost`

Final deliverable: `cost_per_bus_analysis.pdf`
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import pandas as pd
import shared_utils
from calitp_data_analysis.sql import to_snakecase
from bus_cost_utils import *
from _bus_cost_utils import GCS_PATH, new_prop_finder, new_bus_size_finder, project_type_finder, col_row_updater

def col_splitter(
df: pd.DataFrame,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import pandas as pd
import shared_utils
from calitp_data_analysis.sql import to_snakecase
from bus_cost_utils import *
from _bus_cost_utils import GCS_PATH, new_prop_finder, new_bus_size_finder, project_type_finder, col_row_updater

def clean_tircp_columns() -> pd.DataFrame:
"""
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import pandas as pd
import shared_utils
from calitp_data_analysis.sql import to_snakecase
from bus_cost_utils import *
from _bus_cost_utils import GCS_PATH, new_prop_finder, new_bus_size_finder

def calculate_total_cost(row):
"""
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import pandas as pd
from bus_cost_utils import *
from _bus_cost_utils import GCS_PATH, new_outlier_flag_v2
from scipy.stats import zscore


Expand Down Expand Up @@ -81,7 +81,7 @@ def prepare_all_data() ->pd.DataFrame:
merge2["zscore_cost_per_bus"] = zscore(merge2["cost_per_bus"])

#flag any outliers
merge2["is_cpb_outlier?"] = merge2["zscore_cost_per_bus"].apply(outlier_flag)
merge2["is_cpb_outlier?"] = new_outlier_flag_v2(merge2,'zscore_cost_per_bus')
return merge2


Expand Down
Loading
Loading