Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bus cost refactor #1161

Merged
merged 36 commits into from
Jun 28, 2024
Merged
Changes from 1 commit
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
258d87d
add refactor concepts, notes
Apr 5, 2024
50b56d7
started NB for all refactor work
csuyat-dot Jun 13, 2024
0326a43
renamed old NBs to seperate from current work
csuyat-dot Jun 13, 2024
da829dd
started new bus_cost_utils.py to start dropping in shared functions. …
csuyat-dot Jun 13, 2024
de61eae
testing new function to flag if cpb is outlier
csuyat-dot Jun 13, 2024
16fc0df
comparing outliers in new and old DFs
csuyat-dot Jun 17, 2024
f6a8867
improved cpb aggregate function
csuyat-dot Jun 17, 2024
359e707
tested new cpb_aggregate function against old version. bus count, tot…
csuyat-dot Jun 18, 2024
0b239b9
testng new ways to reduce variables in favor of pivot tables
csuyat-dot Jun 18, 2024
82d6181
comparing pivot tables against new cpb agg function. updated input an…
csuyat-dot Jun 19, 2024
173694d
ran updated scripts, everything exported to gcs with no errors. new m…
csuyat-dot Jun 19, 2024
cd2c247
made sure charts and graphs are still workinh. started work on trimmi…
csuyat-dot Jun 19, 2024
519b07b
more changes
csuyat-dot Jun 20, 2024
1887b14
switching the weighted average caclulation for average cost per bus a…
csuyat-dot Jun 20, 2024
7312850
more organization
csuyat-dot Jun 20, 2024
877d41f
started writing conclusion. created new function to min/max values of…
csuyat-dot Jun 21, 2024
8ece2d7
consolidated some of the summary cells down to 1 cell per section
csuyat-dot Jun 21, 2024
167d356
small edits
csuyat-dot Jun 21, 2024
dc485c4
more organizing of cells and creating headings for better navigation
csuyat-dot Jun 24, 2024
b9bc6c1
turned zeb projects list to a variable, updated all the proceeding va…
csuyat-dot Jun 24, 2024
8b58b8e
added bus size chart that excluded the not-specified responses
csuyat-dot Jun 24, 2024
2ea2948
final changes before overwriting initial scripts
csuyat-dot Jun 25, 2024
9ba8be8
overwrote fta cleaner script. double checked and ensured script is go…
csuyat-dot Jun 25, 2024
5786212
overwrote TIRCP cleaner script. ran with no errors, files saving to G…
csuyat-dot Jun 25, 2024
d0b4691
overwrote dgs cleaner script. ran with no errors. wrote to GCS. GTG
csuyat-dot Jun 25, 2024
b44a240
added min max summary and outlier flag to utils file. cpb cleaner scr…
csuyat-dot Jun 25, 2024
50fa908
started to copy over cells, functions, variables and tables to the fi…
csuyat-dot Jun 25, 2024
c2cedd7
minor bug fixed for markdown to work in final nb
csuyat-dot Jun 25, 2024
2a4120e
moved charts over to final NB
csuyat-dot Jun 26, 2024
7be2f2c
moved min max function to NB. reorganized the charts and disabled the…
csuyat-dot Jun 26, 2024
d021015
updating Makefile with additional commands, was able to run makefile …
csuyat-dot Jun 26, 2024
13ac4ac
full run of Makefile. analysis nb now shows mainly ZEB metrics
csuyat-dot Jun 26, 2024
a433c2c
updated output file name for TIRCP cleaner to be consistent with othe…
csuyat-dot Jun 26, 2024
492cbc1
update readme
csuyat-dot Jun 28, 2024
95787fb
removed old, initial exploratory notebooks
csuyat-dot Jun 28, 2024
8322540
left notes on refacor_notes
csuyat-dot Jun 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
moved min max function to NB. reorganized the charts and disabled the…
… overall charts
  • Loading branch information
csuyat-dot committed Jun 26, 2024
commit 7be2f2c8ecfab26131d088b6f9e39b4d2b0491de
24 changes: 12 additions & 12 deletions bus_procurement_cost/bus_cost_utils.py
Original file line number Diff line number Diff line change
@@ -235,20 +235,20 @@ def col_row_updater(df: pd.DataFrame, col1: str, val1, col2: str, new_val):

return

def bus_min_max_summary(data:pd.DataFrame, col1:str, col_list=["transit_agency",
"total_agg_cost",
"total_bus_count",
"new_cost_per_bus"]):
"""
function to display min/max of specific column in aggregated bus df.
#def bus_min_max_summary(data:pd.DataFrame, col1:str, col_list=["transit_agency",
# "total_agg_cost",
# "total_bus_count",
# "new_cost_per_bus"]):
# """
# function to display min/max of specific column in aggregated bus df.

"""
# """

return display(Markdown(f"**Max {col1}**"),
data[data[col1] == data[col1].max()][col_list],
Markdown(f"**Min {col1}**"),
data[data[col1] == data[col1].min()][col_list]
)
# return display(Markdown(f"**Max {col1}**"),
# data[data[col1] == data[col1].max()][col_list],
# Markdown(f"**Min {col1}**"),
# data[data[col1] == data[col1].min()][col_list]
# )

def outlier_flag(col):
Copy link
Member

@tiffanychu90 tiffanychu90 Jul 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def outlier_flag(df: pd.DataFrame, col: str) -> pd.DataFrame:
   
   # This applies the lambda function for you already, also worked in the absolute value
   # There are 2 ways to write this, to do the same thing
   
   return df.apply(lambda x: True if abs(x[col]) > 3 
   else False, axis=1)
   OR something like (double check)
   df[col].apply(lambda x: True if abs(x) > 3 else False) 
   
   # Also, if you don't like booleans, you can do `.astype(int)` and it'll change True/False or 1/0 (in that order)

In your notebook:

df_agg["new_is_cpb_outlier"] = outlier_flag(df_agg, "new_zscore_cost_per_bus")

Copy link
Member

@tiffanychu90 tiffanychu90 Jul 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .apply takes a lambda function, which operates on a row. There are 2 ways to write it, and this all depends on what you need to access in the row. If you have just 1 column (z-score), you can write it either way. If you need to access 2 column values, you will have to write it like df.apply(lambda x: some condition, axis=1)

Here, we want to access 2 columns in the lambda condition (state and temperature)
Ex: df.apply(lambda x: 1 if ( (x.state == "CA" ) and (x.temperature < 80) ) else 0, axis=1)

The difference in syntax is that you place the .apply in a different place, and there's also the axis=1 (operate on row) that's present.

"""
Loading