add prepare_sector_split.R #30

jacobvjk · 2024-07-30T14:18:49Z

closes #9
closes #10

workflow.multi.loanbook gains module prepare_sector_split.R, which calculates equal weights and primary energy based sector weights for companies
deprecates the "worst_case" sector split, which was an option in the previous workflow.aggregate.loanbooks. This option has not been used and is unnecessarily complicated.
beyond that deprecation, the script maintains the functionality from workflow.aggregate.loanbooks
run_match_prioritize.R gains an option to split loan values based on outputs of prepare_sector_split.R. Option is set via config.yml
auxiliary data sets (primary_energy_efficiency and unit_conversion) needed to calculate the primary energy based sector split are integrated directly into prepare_sector_split.R. Previously, they were generated in data-raw/, but since we are currently not using a package structure and the data sets are not expected to be used elsewhere, they are integrated directly in the script for now.
prepare_sector_split.R can remove inactive companies based on the output of prepare_abcd.R, in case the option remove_inactive_companies == TRUE

jdhoffa

Mainly a rubber stamp/ review of syntax, but LGTM

jdhoffa · 2024-08-01T10:14:36Z

helper_functions.R

+  abcd_id <- abcd %>%
+    dplyr::distinct(.data$company_id, .data$name_company)


NIT: single line pipe (i don't really care that much, just making note)

jdhoffa · 2024-08-01T10:14:51Z

helper_functions.R

+  # identify lost_companies_sector_split and write to csv for inspection
+  lost_companies_sector_split <- companies_sector_split %>%
+    dplyr::anti_join(
+      abcd_id,
+      by = c("company_id")
+    )


NIT: single line pipe

jdhoffa · 2024-08-01T10:19:35Z

run_match_prioritize.R

+config_files <- config::get("file_names")
+config_match_prio <- config::get("match_prioritize")
+config_prepare_sector_split <- config::get("sector_split")


Does this add new/necessary keys to the config? Should that be documented somewhere?

no, they are already in the example.config.yml

It should be documented, but at this point I prefer getting the full workflow set up before writing an extensive documentation about these. But will open a task for it

Sounds good.

We should consider how exactly we want to slice and dice the architecture here, and what/ where things should be documented (e.g. {config.yml + scripts} could also be {function args + functions} in a package).

The technical profile document will inform this, and ideally we can discuss it a bit more in the Tech Review.

jacobvjk added 10 commits July 30, 2024 16:18

add prepare_sector_split.R

c0e34bb

Merge branch 'main' into 9-prepare-sector-split

f39c112

Merge branch 'main' into 9-prepare-sector-split

5e77d5a

add optional sector split to match_prioritize

90eaa25

Merge branch 'main' into 9-prepare-sector-split

50bbf4e

clarify sector split script

cb6f2de

Merge branch 'main' into 9-prepare-sector-split

bf37671

simplify

646ca7b

remove unnecessary dependencies

c99824c

clean up helper functions

3ea94d8

jacobvjk marked this pull request as ready for review July 31, 2024 11:36

jacobvjk requested review from jdhoffa and cjyetman July 31, 2024 11:36

jdhoffa approved these changes Aug 1, 2024

View reviewed changes

jacobvjk mentioned this pull request Aug 1, 2024

config.yml keys gain documentation #35

Closed

jacobvjk merged commit c3e825c into main Aug 1, 2024

jacobvjk deleted the 9-prepare-sector-split branch August 1, 2024 12:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add prepare_sector_split.R #30

add prepare_sector_split.R #30

jacobvjk commented Jul 30, 2024 •

edited

Loading

jdhoffa left a comment

jdhoffa Aug 1, 2024

jdhoffa Aug 1, 2024

jdhoffa Aug 1, 2024

jacobvjk Aug 1, 2024

jdhoffa Aug 1, 2024

		abcd_id <- abcd %>%
		dplyr::distinct(.data$company_id, .data$name_company)

add prepare_sector_split.R #30

add prepare_sector_split.R #30

Conversation

jacobvjk commented Jul 30, 2024 • edited Loading

jdhoffa left a comment

Choose a reason for hiding this comment

jdhoffa Aug 1, 2024

Choose a reason for hiding this comment

jdhoffa Aug 1, 2024

Choose a reason for hiding this comment

jdhoffa Aug 1, 2024

Choose a reason for hiding this comment

jacobvjk Aug 1, 2024

Choose a reason for hiding this comment

jdhoffa Aug 1, 2024

Choose a reason for hiding this comment

jacobvjk commented Jul 30, 2024 •

edited

Loading