Skip to content

Polities harmonization#16

Open
afuenteshinojosa wants to merge 90 commits intomainfrom
alejandra/polities-harmonization
Open

Polities harmonization#16
afuenteshinojosa wants to merge 90 commits intomainfrom
alejandra/polities-harmonization

Conversation

@afuenteshinojosa
Copy link
Copy Markdown
Collaborator

@afuenteshinojosa afuenteshinojosa commented Jun 23, 2025

Closes #38.

@afuenteshinojosa afuenteshinojosa marked this pull request as draft June 23, 2025 14:13
Copy link
Copy Markdown
Owner

@eduaguilera eduaguilera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, @afuenteshinojosa.
I have added some comments on the code, and I add here some more general comments. Sorry @afuenteshinojosa if you find there are many comments. You are the first team contributor and we are still setting up the procedures!

  1. Regarding the routes to files in the input and output folders, I have a few comments:
  • The routes @afuenteshinojosa wrote would not work in my computer, as I have a different folder structure with OneDrive (e.g. I don't have the "Desktop" folder)
  • I would avoid defining the user in each script, as it would imply having to do many changes just to get the code running if more than one script has been modified... That's why I use a "Common_data.R" script to do this kind of setup, but probably there are other better ways to do it...
  • Any suggestion to address these problems? @lbm364dl
  1. I think the Methods description document (now a Google Doc) should be placed here in the GitHub repository as a markdown document (not sure where, though). In addition, I like having more explanations in the code itself, to avoid having to look at the documentation. As I see it, reading the code with its comments should be enough to understand it, and the documentation should include explanations on the origin of the input files (or even better if a sort of label could be added to these files if they are located in GitHub)

  2. It seems that the FT_cleaned.R and FT_and_WHEP.R scripts should be excuted sequentially... For me, it is useful to have a script (or it could be also in the read_me or other documentation document... but in any case it should be accessible and easy to see) showing the order in which scripts have to be run.

  3. We should establish conventions regarding file names and script names

polity_name_FT = polity_name_FT_raw,
start_year, end_year, `Comments FT`, polity_name_full
) |>
# Manually changing polity names
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments on this:

  1. I think these changes should not be done in the code, but directly in Excel/Google Sheets
  2. I think we should not modify the original column, but rather to create a new column with the names we want
  3. In this case, the new column would be "polity_name" (which is the variable name we have chosen for WHEP polity names)

@lbm364dl
Copy link
Copy Markdown
Collaborator

lbm364dl commented Jun 25, 2025

You're doing a great job for the first PR @afuenteshinojosa! I know you need to get this done fast so I won't add my own review yet, I will do it later, but I wanted to follow-up some of @eduaguilera's comments.

  1. Regarding the routes to files in the input and output folders, I have a few comments:
  • The routes @afuenteshinojosa wrote would not work in my computer, as I have a different folder structure with OneDrive (e.g. I don't have the "Desktop" folder)
  • I would avoid defining the user in each script, as it would imply having to do many changes just to get the code running if more than one script has been modified... That's why I use a "Common_data.R" script to do this kind of setup, but probably there are other better ways to do it...
  • Any suggestion to address these problems? @lbm364dl

Yes, there's definitely a clean way to do it. There's a function that lets us get the path of package files programatically:

system.file("extdata", "input/processed/polities/whep-polities.xlsx", package = utils::packageName())

This will only work if called inside functions defined in your package. If that's true, then utils::packageName() will automatically get the name of your package, and the whole function above will look for files inside the special named folder inst/extdata, which is where we put input files. So the path in the second argument must just start from inside extdata.

Again, this will not work if you write it directly in the script file, because that's not recognized as part of the package (we will have to move this code to R folder to follow the package function practices anyway, but more on that later). Similarly, I have already defined this private function:

https://github.com/eduaguilera/WHEP/blob/ce86fb24ed195fb5540b8cba10a789eff0994480/R/input_files.R#L118

I suggest looking at it and creating another one called .read_local_xlsx in the same file. Recall that you can only use it if you call devtools::load_all() to load the package functions. Also, your R session should ideally be in the main folder of the project (in RStudio opening project instead of opening individual files).

That being said, I have my reasons for not wanting to use Excel files as inputs here. The main reason is its lack of transparency when tracking changes in git. You can see in this Pull Request itself, binary files (and Excel ones are binary) can't be previewed, and it won't show the file changes either, because it's not a plain text file. That's why I would prefer always using CSVs. They are also easier to work with programatically. I won't enforce this decision though. Tell me what you think.

Lastly, remember we should be using renv for tracking dependencies, so if we use new ones (like readxl here), we should add them, both in renv and in DESCRIPTION file. If in doubt on how to do it you can check my guide or ask me.

  1. I think the Methods description document (now a Google Doc) should be placed here in the GitHub repository as a markdown document (not sure where, though). In addition, I like having more explanations in the code itself, to avoid having to look at the documentation. As I see it, reading the code with its comments should be enough to understand it, and the documentation should include explanations on the origin of the input files (or even better if a sort of label could be added to these files if they are located in GitHub)

@afuenteshinojosa suggested adding it as an R markdown article in the package, the same way I created the workflow guide. I agree this is a good idea for a broader explanation. It's also true that this will end up being a function in the package and it must be documented, but this function documentation could focus more on the actual structure of the output and leave the methodology explanation for the R markdown article. Tell me what you think!

  1. It seems that the FT_cleaned.R and FT_and_WHEP.R scripts should be excuted sequentially... For me, it is useful to have a script (or it could be also in the read_me or other documentation document... but in any case it should be accessible and easy to see) showing the order in which scripts have to be run.

When we move this code to the R folder as package functions, I agree that there should be a single function which inside calls both parts.

  1. We should establish conventions regarding file names and script names

The Tidyverse style explains this. The most important choices are all lowercase and separating words with _ or - (I decided to use _ in my files). You can check the link if you're interested in more.

@lbm364dl lbm364dl force-pushed the alejandra/polities-harmonization branch from 7a5fa09 to f13bf36 Compare August 12, 2025 10:44
@lbm364dl lbm364dl force-pushed the alejandra/polities-harmonization branch from 93ce22c to 9253b88 Compare August 13, 2025 09:07
@lbm364dl lbm364dl marked this pull request as ready for review September 8, 2025 16:04
@lbm364dl lbm364dl requested review from eduaguilera and lbm364dl and removed request for lbm364dl September 8, 2025 16:04
@lbm364dl lbm364dl modified the milestones: v0.2.0, v0.3.0 Oct 15, 2025
@lbm364dl lbm364dl force-pushed the alejandra/polities-harmonization branch 2 times, most recently from 78b8c81 to 89b342c Compare November 19, 2025 09:47
@lbm364dl lbm364dl force-pushed the alejandra/polities-harmonization branch from 89b342c to 4dfe371 Compare November 19, 2025 09:50
@lbm364dl lbm364dl removed this from the v0.3.0 milestone Feb 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Polities and maps in the WHEP project

3 participants