Polities harmonization by afuenteshinojosa · Pull Request #16 · eduaguilera/whep

afuenteshinojosa · 2025-06-23T13:04:11Z

Closes #38.

…ndra/polities-harmonization

eduaguilera

Nice work, @afuenteshinojosa.
I have added some comments on the code, and I add here some more general comments. Sorry @afuenteshinojosa if you find there are many comments. You are the first team contributor and we are still setting up the procedures!

Regarding the routes to files in the input and output folders, I have a few comments:

The routes @afuenteshinojosa wrote would not work in my computer, as I have a different folder structure with OneDrive (e.g. I don't have the "Desktop" folder)
I would avoid defining the user in each script, as it would imply having to do many changes just to get the code running if more than one script has been modified... That's why I use a "Common_data.R" script to do this kind of setup, but probably there are other better ways to do it...
Any suggestion to address these problems? @lbm364dl

I think the Methods description document (now a Google Doc) should be placed here in the GitHub repository as a markdown document (not sure where, though). In addition, I like having more explanations in the code itself, to avoid having to look at the documentation. As I see it, reading the code with its comments should be enough to understand it, and the documentation should include explanations on the origin of the input files (or even better if a sort of label could be added to these files if they are located in GitHub)
It seems that the FT_cleaned.R and FT_and_WHEP.R scripts should be excuted sequentially... For me, it is useful to have a script (or it could be also in the read_me or other documentation document... but in any case it should be accessible and easy to see) showing the order in which scripts have to be run.
We should establish conventions regarding file names and script names

inst/scripts/FT_cleaned.R

eduaguilera · 2025-06-25T06:07:51Z

inst/scripts/FT_cleaned.R

+    polity_name_FT = polity_name_FT_raw,
+    start_year, end_year, `Comments FT`, polity_name_full
+  ) |>
+  # Manually changing polity names


Some comments on this:

I think these changes should not be done in the code, but directly in Excel/Google Sheets

I think we should not modify the original column, but rather to create a new column with the names we want

In this case, the new column would be "polity_name" (which is the variable name we have chosen for WHEP polity names)

inst/scripts/FT_cleaned.R

inst/scripts/FT_and_WHEP.R

lbm364dl · 2025-06-25T08:13:20Z

You're doing a great job for the first PR @afuenteshinojosa! I know you need to get this done fast so I won't add my own review yet, I will do it later, but I wanted to follow-up some of @eduaguilera's comments.

Regarding the routes to files in the input and output folders, I have a few comments:

The routes @afuenteshinojosa wrote would not work in my computer, as I have a different folder structure with OneDrive (e.g. I don't have the "Desktop" folder)

I would avoid defining the user in each script, as it would imply having to do many changes just to get the code running if more than one script has been modified... That's why I use a "Common_data.R" script to do this kind of setup, but probably there are other better ways to do it...

Any suggestion to address these problems? @lbm364dl

Yes, there's definitely a clean way to do it. There's a function that lets us get the path of package files programatically:

system.file("extdata", "input/processed/polities/whep-polities.xlsx", package = utils::packageName())

This will only work if called inside functions defined in your package. If that's true, then utils::packageName() will automatically get the name of your package, and the whole function above will look for files inside the special named folder inst/extdata, which is where we put input files. So the path in the second argument must just start from inside extdata.

Again, this will not work if you write it directly in the script file, because that's not recognized as part of the package (we will have to move this code to R folder to follow the package function practices anyway, but more on that later). Similarly, I have already defined this private function:

https://github.com/eduaguilera/WHEP/blob/ce86fb24ed195fb5540b8cba10a789eff0994480/R/input_files.R#L118

I suggest looking at it and creating another one called .read_local_xlsx in the same file. Recall that you can only use it if you call devtools::load_all() to load the package functions. Also, your R session should ideally be in the main folder of the project (in RStudio opening project instead of opening individual files).

That being said, I have my reasons for not wanting to use Excel files as inputs here. The main reason is its lack of transparency when tracking changes in git. You can see in this Pull Request itself, binary files (and Excel ones are binary) can't be previewed, and it won't show the file changes either, because it's not a plain text file. That's why I would prefer always using CSVs. They are also easier to work with programatically. I won't enforce this decision though. Tell me what you think.

Lastly, remember we should be using renv for tracking dependencies, so if we use new ones (like readxl here), we should add them, both in renv and in DESCRIPTION file. If in doubt on how to do it you can check my guide or ask me.

I think the Methods description document (now a Google Doc) should be placed here in the GitHub repository as a markdown document (not sure where, though). In addition, I like having more explanations in the code itself, to avoid having to look at the documentation. As I see it, reading the code with its comments should be enough to understand it, and the documentation should include explanations on the origin of the input files (or even better if a sort of label could be added to these files if they are located in GitHub)

@afuenteshinojosa suggested adding it as an R markdown article in the package, the same way I created the workflow guide. I agree this is a good idea for a broader explanation. It's also true that this will end up being a function in the package and it must be documented, but this function documentation could focus more on the actual structure of the output and leave the methodology explanation for the R markdown article. Tell me what you think!

It seems that the FT_cleaned.R and FT_and_WHEP.R scripts should be excuted sequentially... For me, it is useful to have a script (or it could be also in the read_me or other documentation document... but in any case it should be accessible and easy to see) showing the order in which scripts have to be run.

When we move this code to the R folder as package functions, I agree that there should be a single function which inside calls both parts.

We should establish conventions regarding file names and script names

The Tidyverse style explains this. The most important choices are all lowercase and separating words with _ or - (I decided to use _ in my files). You can check the link if you're interested in more.

inst/scripts/FT_and_WHEP.R

Alejandra Fuentes added 4 commits June 23, 2025 10:40

polities

fd9a552

data in csv

14bf545

polities updated

dca9507

polities updated

8b011cd

afuenteshinojosa requested review from eduaguilera, lbm364dl and rasmuse June 23, 2025 13:04

Alejandra Fuentes added 5 commits June 23, 2025 15:42

updating polities and script

4c737f2

some changes

d9ffa64

updating polities

dbb3585

script short-lines

894a1fc

script changes

339de4c

afuenteshinojosa marked this pull request as draft June 23, 2025 14:13

Alejandra Fuentes added 5 commits June 23, 2025 16:16

script changes

4ad8c3f

Merge branch 'main' of https://github.com/eduaguilera/WHEP into aleja…

0ce5ad3

…ndra/polities-harmonization

edited script and output

9a0c3a8

script and csv changes

1a2e649

polities updated

ce6d893

eduaguilera reviewed Jun 25, 2025

View reviewed changes

lbm364dl reviewed Jun 25, 2025

View reviewed changes

inst/scripts/FT_and_WHEP.R Outdated Show resolved Hide resolved

Alejandra Fuentes and others added 9 commits June 25, 2025 12:49

better organized polities scripts and sources

b6e3037

update description in polities.R document

01b1d14

describing get_polities function

b3f4c77

improving code

84058ed

adding package to DESCRIPTION file

3245a20

update (polity_code)

9e51521

update polity_names in cases continent-name Other

bac73f8

Merge branch 'main' into alejandra/polities-harmonization

4bac5a4

Put back R 4.5.0 version

c8a45bc

lbm364dl force-pushed the alejandra/polities-harmonization branch from 7a5fa09 to f13bf36 Compare August 12, 2025 10:44

lbm364dl added 3 commits August 12, 2025 16:08

Make get_polities output an sf object

534a2d2

Update get_polities docs

f6b251d

Update dependencies

9253b88

lbm364dl force-pushed the alejandra/polities-harmonization branch from 93ce22c to 9253b88 Compare August 13, 2025 09:07

lbm364dl added 3 commits August 13, 2025 12:06

Add get_polity_sources docs

2cbc208

Remove old TODOs

3742f4a

Remove sf before testing (too slow)

9220344

lbm364dl marked this pull request as ready for review September 8, 2025 16:04

lbm364dl requested review from eduaguilera and lbm364dl and removed request for lbm364dl September 8, 2025 16:04

lbm364dl added 2 commits September 10, 2025 16:27

Add useful error messages for inconsistent polity data

28f7f38

Add script to run when manually updating polities

57351ba

lbm364dl modified the milestones: v0.2.0, v0.3.0 Oct 15, 2025

lbm364dl added 9 commits October 28, 2025 10:00

Use only CShapes in polities table

8113e80

Update polity code format

b8be28e

Add intermediate alias table

d5966e7

Create expanded alias table

654e364

Add get_polity_code function with mismatch analysis

c36194b

Add FAOSTAT polity mapping example

d86c52e

Make polities static R package data

4122693

Update polities tests

a2f91e9

Merge branch 'main' into alejandra/polities-harmonization

0387a2c

lbm364dl force-pushed the alejandra/polities-harmonization branch 2 times, most recently from 78b8c81 to 89b342c Compare November 19, 2025 09:47

Remove old code

4dfe371

lbm364dl force-pushed the alejandra/polities-harmonization branch from 89b342c to 4dfe371 Compare November 19, 2025 09:50

lbm364dl removed this from the v0.3.0 milestone Feb 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Polities harmonization#16

Polities harmonization#16
afuenteshinojosa wants to merge 90 commits intomainfrom
alejandra/polities-harmonization

afuenteshinojosa commented Jun 23, 2025 •

edited by lbm364dl

Loading

Uh oh!

eduaguilera left a comment

Uh oh!

Uh oh!

eduaguilera Jun 25, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lbm364dl commented Jun 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

afuenteshinojosa commented Jun 23, 2025 • edited by lbm364dl Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eduaguilera left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

eduaguilera Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lbm364dl commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

afuenteshinojosa commented Jun 23, 2025 •

edited by lbm364dl

Loading

lbm364dl commented Jun 25, 2025 •

edited

Loading