Conversation
eduaguilera
left a comment
There was a problem hiding this comment.
Nice work, @afuenteshinojosa.
I have added some comments on the code, and I add here some more general comments. Sorry @afuenteshinojosa if you find there are many comments. You are the first team contributor and we are still setting up the procedures!
- Regarding the routes to files in the input and output folders, I have a few comments:
- The routes @afuenteshinojosa wrote would not work in my computer, as I have a different folder structure with OneDrive (e.g. I don't have the "Desktop" folder)
- I would avoid defining the user in each script, as it would imply having to do many changes just to get the code running if more than one script has been modified... That's why I use a "Common_data.R" script to do this kind of setup, but probably there are other better ways to do it...
- Any suggestion to address these problems? @lbm364dl
-
I think the Methods description document (now a Google Doc) should be placed here in the GitHub repository as a markdown document (not sure where, though). In addition, I like having more explanations in the code itself, to avoid having to look at the documentation. As I see it, reading the code with its comments should be enough to understand it, and the documentation should include explanations on the origin of the input files (or even better if a sort of label could be added to these files if they are located in GitHub)
-
It seems that the FT_cleaned.R and FT_and_WHEP.R scripts should be excuted sequentially... For me, it is useful to have a script (or it could be also in the read_me or other documentation document... but in any case it should be accessible and easy to see) showing the order in which scripts have to be run.
-
We should establish conventions regarding file names and script names
inst/scripts/FT_cleaned.R
Outdated
| polity_name_FT = polity_name_FT_raw, | ||
| start_year, end_year, `Comments FT`, polity_name_full | ||
| ) |> | ||
| # Manually changing polity names |
There was a problem hiding this comment.
Some comments on this:
- I think these changes should not be done in the code, but directly in Excel/Google Sheets
- I think we should not modify the original column, but rather to create a new column with the names we want
- In this case, the new column would be "polity_name" (which is the variable name we have chosen for WHEP polity names)
|
You're doing a great job for the first PR @afuenteshinojosa! I know you need to get this done fast so I won't add my own review yet, I will do it later, but I wanted to follow-up some of @eduaguilera's comments.
Yes, there's definitely a clean way to do it. There's a function that lets us get the path of package files programatically: system.file("extdata", "input/processed/polities/whep-polities.xlsx", package = utils::packageName())This will only work if called inside functions defined in your package. If that's true, then Again, this will not work if you write it directly in the script file, because that's not recognized as part of the package (we will have to move this code to R folder to follow the package function practices anyway, but more on that later). Similarly, I have already defined this private function: I suggest looking at it and creating another one called That being said, I have my reasons for not wanting to use Excel files as inputs here. The main reason is its lack of transparency when tracking changes in git. You can see in this Pull Request itself, binary files (and Excel ones are binary) can't be previewed, and it won't show the file changes either, because it's not a plain text file. That's why I would prefer always using CSVs. They are also easier to work with programatically. I won't enforce this decision though. Tell me what you think. Lastly, remember we should be using
@afuenteshinojosa suggested adding it as an R markdown article in the package, the same way I created the workflow guide. I agree this is a good idea for a broader explanation. It's also true that this will end up being a function in the package and it must be documented, but this function documentation could focus more on the actual structure of the output and leave the methodology explanation for the R markdown article. Tell me what you think!
When we move this code to the R folder as package functions, I agree that there should be a single function which inside calls both parts.
The Tidyverse style explains this. The most important choices are all lowercase and separating words with |
7a5fa09 to
f13bf36
Compare
93ce22c to
9253b88
Compare
78b8c81 to
89b342c
Compare
89b342c to
4dfe371
Compare
Closes #38.