From 76a4b59eb28684791d4cda72068bd85cfa5f32ac Mon Sep 17 00:00:00 2001 From: Quarto GHA Workflow Runner Date: Wed, 7 Feb 2024 14:54:28 +0000 Subject: [PATCH] Built site for gh-pages --- .nojekyll | 2 +- mod_reproducibility.html | 8 +++----- search.json | 2 +- sitemap.xml | 38 +++++++++++++++++++------------------- 4 files changed, 24 insertions(+), 26 deletions(-) diff --git a/.nojekyll b/.nojekyll index 8c8fed9..151c26d 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -dee67d66 \ No newline at end of file +16adff2a \ No newline at end of file diff --git a/mod_reproducibility.html b/mod_reproducibility.html index e54bae2..946a08c 100644 --- a/mod_reproducibility.html +++ b/mod_reproducibility.html @@ -378,11 +378,9 @@

File Names

Documentation

- +

Documenting a project can feel like a Sisyphean task but it is often not as hard as one might imagine and well worth the effort! One simple practice you can adopt to dramatically improve the reproducibility of your project is to create a “README” file in the top-level of your project’s folder system. This file can be formatted however you’d like but generally READMEs should include (1) a project overview written in plain language, (2) a basic table of contents for the primary folders in your project folder, and (3) a brief description of the file naming scheme you’ve adopted for this project.

+

Your project’s README becomes the ‘landing page’ for those navigating your repository and makes it easy for team members to know where documentation should go (in the README!). You may also choose to create a README file for some of the sub-folders of your project. This can be particularly valuable for your “data” folder(s) as it is an easy place to store data source/provenance information that might be overwhelming to include in the project-level README file.

+

Finally, you should choose a place to keep track of ideas, conversations, and decisions about the project. While you can take notes on these topics on a piece of paper, adopting a digital equivalent is often helpful because you can much more easily search a lengthy document when it is machine readable. We will discuss GitHub during the Version Control module but GitHub offers something called Issues that can be a really effective place to record some of this information.

Best Practices / Recommendations

diff --git a/search.json b/search.json index eea1290..71212d5 100644 --- a/search.json +++ b/search.json @@ -293,7 +293,7 @@ "href": "mod_reproducibility.html#project-organization-documentation", "title": "Reproducibility Best Practices", "section": "Project Organization & Documentation", - "text": "Project Organization & Documentation\nMuch of the popular conversation around reproducibility centers on reproducibility as it pertains to code. That is definitely an important facet but before we write even a single line it is critically important that we first discuss what factors go into project-wide reproducibility. “Perfect” code in a project that isn’t structured thoughtfully can still result in a synthesis project that is not reproducible while even “bad” code can be made more intelligible when it is placed in a well-documented/organized project!\n\nFolder Structure\nThe simplest and often most effective way of beginning a reproducible project is adopting (and sticking to) a good file organization system. There is no single “best” way of organizing your projects’ files as long as you are consistent. Consistency allows those navigating your system to deduce where particular files are likely to be without having in-depth knowledge of the entire suite of materials.\nTo begin, it is best to have a single folder for each project. This makes it simple to find the project’s inputs and outputs and also makes collaboration and documentation much cleaner. Later in your project’s life cycle, this ‘one folder’ approach will also make it easier to share your project with external reviewers or new team members. For researchers used to working alone there can be a temptation to think about your leadership of a project as the fundamental unit rather than the individual projects’ scopes. This method works fine when working alone but greatly increases the difficulty of communication and co-working in projects led by teams. RStudio (the primary Integrated Developer Environment for R) and most version control systems assume that each project’s materials will be placed in a single folder and either of these systems can confer significant benefits to your work (well worth any potential reorganization difficulty).\nWithin your project folder, it is valuable to structure your folders and files hierarchically. Having a folder with dozens of mixed file types of various purposes that may be either inputs or outputs is cumbersome to document and difficult to navigate. If instead you adopt a system of sub-folders that group files based on purpose and/or source engagement becomes much simpler. You need not use an intricate web of sub-folders either; often just a single layer of these sub-folders provides sufficient structure to meet your project’s organizational needs.\n\n\n\n\n\n\nDiscussion: Folder Structure\n\n\n\nWith a partner discuss (some of) the following questions:\n\nHow do you typically organize your projects’ files?\nWhat benefits do you see of your current approach?\nWhat–if any–limitations to your system have you experienced?\nDo you think your structure would work well in a team environment?\n\nIf not, what changes might you make to better fit that context?\n\n\n\n\n\n\nFile Names\nBeyond the structure and degree of nestedness you adopt for your folders, your files can (and should) include a lot of helpful contextual information about themselves. An ideal file name should be very informative about that file’s contents, purpose, and relation to other project files. Some or all of that information may be reinforced by the folder(s) in which the file is placed, but the file name itself should also confer that information. This may feel redundant but if late in your project’s lifecycle you decide a different folder system is needed, information-dense file names will allow you to change file locations without excessive difficulty.\nYou should also consider how ‘machine readable’ your file names are. One fundamental way in which this changes user’s experience is how file management applications (e.g., Apple’s Finder) visually display files. By default files are typically sorted alphabetically and numerically. So, even if the script “wrangle.R” should be run first in your workflow, most file explorers would put that script last or at the bottom. If instead you changed it’s name to “01_wrangle.R” now it would likely be sorted towards the top and encountered earlier by those interested in your workflow. Notice too in that example that we have “zero padded” the script so that if we eventually had a tenth script file explorers would correctly sort it (“10…” would be before “1…” in most file sorting systems).\nYou should also avoid spaces and accented characters (e.g., é, ü, etc.) as some computers will not be able to recognize these characters. Windows operating systems in particular have a very difficult time parsing folder names with spaces (e.g., “raw data” versus “raw_data”). Using a mix of upper and lowercase letters can be effective when done carefully but also requires a lot of attention to detail on the part of those creating new files. It may be simplest to stick with all lowercase or all uppercase for your file names.\nBe consistent with any delimiters you use in file names! Two common ones are the hyphen (-) and underscore (_). If you use one instead of spaces, be sure to only use that one for that use-case rather than using them interchangeably. You may find it useful to use one delimiter to separate a type of information and then the other in lieu of spaces. For example, “fxn_calc-diversity.R” uses the prefix “fxn_” to indicate that the script contains a function while the words to the right of the underscore briefly describe the purpose of that function.\nIn that same vein, you may want to consider using “slugs” in your file names. Slugs are human-readable, unique pieces of file names that are shared between files and the outputs that they create. For example, the files created by “01_wrangle.R” could all begin with “01_” (the slug in this case). The benefit of this approach is that diagnosing strange outputs–or simply finding the source of a given file or graph–is a straightforward matter of looking for the matching slug.\n\n\nDocumentation\n\nInclude a README with (A) project overview, (B) basic table of contents of rest of folder hierarchy, and (C) file naming scheme explanation\nMay also want folder-specific READMEs (at least for the first level of subfolders) to give greater detail / data provenance information\nKeep track of ideas, discussions, and decisions about the project\n\n\n\nBest Practices / Recommendations\n\nQuarantine inputs from others until you can rename / repurpose for consistency with your chosen organization schema\nThe raw data and products of scripts should be separated into different folders\nNever touch raw data", + "text": "Project Organization & Documentation\nMuch of the popular conversation around reproducibility centers on reproducibility as it pertains to code. That is definitely an important facet but before we write even a single line it is critically important that we first discuss what factors go into project-wide reproducibility. “Perfect” code in a project that isn’t structured thoughtfully can still result in a synthesis project that is not reproducible while even “bad” code can be made more intelligible when it is placed in a well-documented/organized project!\n\nFolder Structure\nThe simplest and often most effective way of beginning a reproducible project is adopting (and sticking to) a good file organization system. There is no single “best” way of organizing your projects’ files as long as you are consistent. Consistency allows those navigating your system to deduce where particular files are likely to be without having in-depth knowledge of the entire suite of materials.\nTo begin, it is best to have a single folder for each project. This makes it simple to find the project’s inputs and outputs and also makes collaboration and documentation much cleaner. Later in your project’s life cycle, this ‘one folder’ approach will also make it easier to share your project with external reviewers or new team members. For researchers used to working alone there can be a temptation to think about your leadership of a project as the fundamental unit rather than the individual projects’ scopes. This method works fine when working alone but greatly increases the difficulty of communication and co-working in projects led by teams. RStudio (the primary Integrated Developer Environment for R) and most version control systems assume that each project’s materials will be placed in a single folder and either of these systems can confer significant benefits to your work (well worth any potential reorganization difficulty).\nWithin your project folder, it is valuable to structure your folders and files hierarchically. Having a folder with dozens of mixed file types of various purposes that may be either inputs or outputs is cumbersome to document and difficult to navigate. If instead you adopt a system of sub-folders that group files based on purpose and/or source engagement becomes much simpler. You need not use an intricate web of sub-folders either; often just a single layer of these sub-folders provides sufficient structure to meet your project’s organizational needs.\n\n\n\n\n\n\nDiscussion: Folder Structure\n\n\n\nWith a partner discuss (some of) the following questions:\n\nHow do you typically organize your projects’ files?\nWhat benefits do you see of your current approach?\nWhat–if any–limitations to your system have you experienced?\nDo you think your structure would work well in a team environment?\n\nIf not, what changes might you make to better fit that context?\n\n\n\n\n\n\nFile Names\nBeyond the structure and degree of nestedness you adopt for your folders, your files can (and should) include a lot of helpful contextual information about themselves. An ideal file name should be very informative about that file’s contents, purpose, and relation to other project files. Some or all of that information may be reinforced by the folder(s) in which the file is placed, but the file name itself should also confer that information. This may feel redundant but if late in your project’s lifecycle you decide a different folder system is needed, information-dense file names will allow you to change file locations without excessive difficulty.\nYou should also consider how ‘machine readable’ your file names are. One fundamental way in which this changes user’s experience is how file management applications (e.g., Apple’s Finder) visually display files. By default files are typically sorted alphabetically and numerically. So, even if the script “wrangle.R” should be run first in your workflow, most file explorers would put that script last or at the bottom. If instead you changed it’s name to “01_wrangle.R” now it would likely be sorted towards the top and encountered earlier by those interested in your workflow. Notice too in that example that we have “zero padded” the script so that if we eventually had a tenth script file explorers would correctly sort it (“10…” would be before “1…” in most file sorting systems).\nYou should also avoid spaces and accented characters (e.g., é, ü, etc.) as some computers will not be able to recognize these characters. Windows operating systems in particular have a very difficult time parsing folder names with spaces (e.g., “raw data” versus “raw_data”). Using a mix of upper and lowercase letters can be effective when done carefully but also requires a lot of attention to detail on the part of those creating new files. It may be simplest to stick with all lowercase or all uppercase for your file names.\nBe consistent with any delimiters you use in file names! Two common ones are the hyphen (-) and underscore (_). If you use one instead of spaces, be sure to only use that one for that use-case rather than using them interchangeably. You may find it useful to use one delimiter to separate a type of information and then the other in lieu of spaces. For example, “fxn_calc-diversity.R” uses the prefix “fxn_” to indicate that the script contains a function while the words to the right of the underscore briefly describe the purpose of that function.\nIn that same vein, you may want to consider using “slugs” in your file names. Slugs are human-readable, unique pieces of file names that are shared between files and the outputs that they create. For example, the files created by “01_wrangle.R” could all begin with “01_” (the slug in this case). The benefit of this approach is that diagnosing strange outputs–or simply finding the source of a given file or graph–is a straightforward matter of looking for the matching slug.\n\n\nDocumentation\nDocumenting a project can feel like a Sisyphean task but it is often not as hard as one might imagine and well worth the effort! One simple practice you can adopt to dramatically improve the reproducibility of your project is to create a “README” file in the top-level of your project’s folder system. This file can be formatted however you’d like but generally READMEs should include (1) a project overview written in plain language, (2) a basic table of contents for the primary folders in your project folder, and (3) a brief description of the file naming scheme you’ve adopted for this project.\nYour project’s README becomes the ‘landing page’ for those navigating your repository and makes it easy for team members to know where documentation should go (in the README!). You may also choose to create a README file for some of the sub-folders of your project. This can be particularly valuable for your “data” folder(s) as it is an easy place to store data source/provenance information that might be overwhelming to include in the project-level README file.\nFinally, you should choose a place to keep track of ideas, conversations, and decisions about the project. While you can take notes on these topics on a piece of paper, adopting a digital equivalent is often helpful because you can much more easily search a lengthy document when it is machine readable. We will discuss GitHub during the Version Control module but GitHub offers something called Issues that can be a really effective place to record some of this information.\n\n\nBest Practices / Recommendations\n\nQuarantine inputs from others until you can rename / repurpose for consistency with your chosen organization schema\nThe raw data and products of scripts should be separated into different folders\nNever touch raw data", "crumbs": [ "Quantitative Modules", "Reproducibility" diff --git a/sitemap.xml b/sitemap.xml index f0a05e5..cf3cfd6 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -2,78 +2,78 @@ https://lter.github.io/ssecr/mod_facilitation.html - 2024-02-07T14:45:39.355Z + 2024-02-07T14:54:06.651Z https://lter.github.io/ssecr/topic_interactivity.html - 2024-02-07T14:45:39.359Z + 2024-02-07T14:54:06.651Z https://lter.github.io/ssecr/mod_spatial.html - 2024-02-07T14:45:39.359Z + 2024-02-07T14:54:06.651Z https://lter.github.io/ssecr/topic_spatial.html - 2024-02-07T14:45:39.359Z + 2024-02-07T14:54:06.651Z https://lter.github.io/ssecr/mod_data-viz.html - 2024-02-07T14:45:39.355Z + 2024-02-07T14:54:06.651Z https://lter.github.io/ssecr/index.html - 2024-02-07T14:45:39.355Z + 2024-02-07T14:54:06.651Z https://lter.github.io/ssecr/mod_reports.html - 2024-02-07T14:45:39.359Z + 2024-02-07T14:54:06.651Z https://lter.github.io/ssecr/mod_version-control.html - 2024-02-07T14:45:39.359Z + 2024-02-07T14:54:06.651Z https://lter.github.io/ssecr/mod_data-disc.html - 2024-02-07T14:45:39.355Z + 2024-02-07T14:54:06.651Z https://lter.github.io/ssecr/mod_reproducibility.html - 2024-02-07T14:45:39.359Z + 2024-02-07T14:54:06.651Z https://lter.github.io/ssecr/CONTRIBUTING.html - 2024-02-07T14:45:39.339Z + 2024-02-07T14:54:06.631Z https://lter.github.io/ssecr/mod_project-mgmt.html - 2024-02-07T14:45:39.355Z + 2024-02-07T14:54:06.651Z https://lter.github.io/ssecr/mod_wrangle.html - 2024-02-07T14:45:39.359Z + 2024-02-07T14:54:06.651Z https://lter.github.io/ssecr/mod_findings.html - 2024-02-07T14:45:39.355Z + 2024-02-07T14:54:06.651Z https://lter.github.io/ssecr/mod_credit.html - 2024-02-07T14:45:39.355Z + 2024-02-07T14:54:06.651Z https://lter.github.io/ssecr/mod_thinking.html - 2024-02-07T14:45:39.359Z + 2024-02-07T14:54:06.651Z https://lter.github.io/ssecr/mod_stats.html - 2024-02-07T14:45:39.359Z + 2024-02-07T14:54:06.651Z https://lter.github.io/ssecr/mod_logic-models.html - 2024-02-07T14:45:39.355Z + 2024-02-07T14:54:06.651Z https://lter.github.io/ssecr/mod_team-sci.html - 2024-02-07T14:45:39.359Z + 2024-02-07T14:54:06.651Z