diff --git a/README.md b/README.md index 6a0ada22..eeb89c7e 100644 --- a/README.md +++ b/README.md @@ -10,14 +10,14 @@ ### Summary -The **`gcamfaostat`** R package is designed for the preparation, processing, and synthesis of the Food and Agriculture Organization (FAO) Statistics ([FAOSTAT](https://www.fao.org/faostat/en/#data)) agroeconomic data. The primary purpose is to facilitate FAOSTAT data use in global economic and multisector dynamic models while ensuring transparency, traceability, and reproducibility. Here, we provide an overview of the development of **`gcamfaostat`** (v1.0) and demonstrate its capabilities in generating and maintaining agroeconomic data required for the Global Change Analysis Model ([GCAM](https://jgcri.github.io/gcam-doc/toc.html)). Our initiative seeks to enhance the quality and accessibility of data for the global agroeconomic modeling community, with the aim of fostering more robust and harmonized outcomes in a collaborative, efficient, and open-source framework. One of the important features of the package is the possibility to construct the FAO Food Balance Sheets at the disaggregated commodity level (with over 500 commodities), which provides a comprehensive and detailed data input for a variety of analytical and modeling applications. The processed data and visualizations offered by **`gcamfaostat`** can also be valuable to a broader audience interested in gaining insights into the intricacies of global agriculture. +The **`gcamfaostat`** R package is designed for the preparation, processing, and synthesis of the Food and Agriculture Organization (FAO) Statistics ([FAOSTAT](https://www.fao.org/faostat/en/#data)) agroeconomic data. The primary purpose is to facilitate FAOSTAT data use in global economic and multisector dynamic models while ensuring transparency, traceability, and reproducibility. Here, we provide an overview of the development of **`gcamfaostat`** (v1.0.0) and demonstrate its capabilities in generating and maintaining agroeconomic data required for the Global Change Analysis Model ([GCAM](https://jgcri.github.io/gcam-doc/toc.html)). Our initiative seeks to enhance the quality and accessibility of data for the global agroeconomic modeling community, with the aim of fostering more robust and harmonized outcomes in a collaborative, efficient, and open-source framework. One of the important features of the package is the possibility to construct the FAO Food Balance Sheets at the disaggregated commodity level (with over 500 commodities), which provides a comprehensive and detailed data input for a variety of analytical and modeling applications. The processed data and visualizations offered by **`gcamfaostat`** can also be valuable to a broader audience interested in gaining insights into the intricacies of global agriculture. This tool bridges a crucial gap in the literature by offering several key features and capabilities. 1. **Transparency and Reproducibility**: **`gcamfaostat`** incorporates functions for downloading, cleaning, synthesizing, and balancing agroeconomic datasets in a traceable, transparent, and reproducible manner. This enhances the credibility of the processing and allows for better scrutiny of the methods. We have documented and demonstrated the use of the package in generating and updating agroeconomic data needed for the GCAM. 2. **Expandability and Consistency**: **`gcamfaostat`** can be used to flexibly process and update agroeconomic data for any agroeconomic model. The package framework can be also easily expanded to include new modules for consistently processing new data. -3. **Community Collaboration and Efficiency**: The package provides an open-source platform for researchers to continually enhance the processing methods. This collaborative approach, which establishes a standardized and streamlined process for data preparation and processing, carries benefits that extend to all modeling groups. By reducing the efforts required for data processing and fostering harmonized base data calibration, it contributes to a reduction in modeling uncertainty and enhances the overall research efficiency. +3. **Community Collaboration and Efficiency**: The package provides an open-source platform for researchers to continually enhance the processing methods. This collaborative approach, which establishes a standardized and streamlined process for data preparation and processing, carries benefits that extend to all modeling groups. By reducing the effort required for data processing and fostering harmonized base data calibration, it contributes to a reduction in modeling uncertainty and enhances the overall research efficiency. 4. **User Accessibility**: Where applicable, the processed data can be mapped and aggregated to user-specified regions and sectors for agroeconomic modeling. However, beyond the modeling community, **`gcamfaostat`** can be valuable to a broader range of users interested in understanding global agriculture trends and dynamics, as it provides user-friendly data processing and visualization tools. *** @@ -82,7 +82,7 @@ Data development is never a once and for all task, and continued efforts are nee 1. **Sustain processing functions for updated raw data**: ensuring that our processing functions remain up-to-date when raw data undergoes revisions is imperative. 2. **Evaluate and enhance assumptions**: a critical examination of the assumptions utilized in processes like interpolation, extrapolation, aggregation, disaggregation, and mapping is essential and should be an ongoing endeavor. 3. **Revise assumptions in low-quality data zones**: regions and sectors with little or low-quality data require careful consideration. We will need to adjust our assumptions when improved data becomes available. -4. **Promoting broader applications**: leveraging data processed by gcamfaostat can significantly contribute to harmonizing input data in global agroeconomic modeling. Encouraging the utilization of this data and fostering collaboration to enhance data processing is of significant importance. +4. **Promoting broader applications**: leveraging data processed by gcamfaostat can significantly contribute to harmonizing input data in global agroeconomic modeling. Encouraging the utilization of this data and fostering collaboration to enhance data processing is crucial. 5. **Assess sensitivity in downstream applications**: understanding the sensitivity of downstream data applications, e.g., global agroeconomic projections, to upstream data processing assumptions is crucial. This awareness empowers us to make informed decisions and refinements. We welcome and value community contributions to gcamfaostat. Please read our [Contributing Guidelines](CONTRIBUTING.md) for information on how to contribute to this package. Through collective and collaborative efforts, we hope to improve the interface between raw data, modeling community, and broader audience. We would be grateful for the feedback and suggestions on potential improvements of the developed data processing framework. @@ -102,4 +102,4 @@ We welcome and value community contributions to gcamfaostat. Please read our [Co -Copyright 2019 Battelle Memorial Institute; see the LICENSE file. +Copyright 2023 Battelle Memorial Institute; see the LICENSE file. diff --git a/docs/articles/vignette_getting_started.html b/docs/articles/vignette_getting_started.html index 5ffc7481..d8339072 100644 --- a/docs/articles/vignette_getting_started.html +++ b/docs/articles/vignette_getting_started.html @@ -112,7 +112,7 @@
vignettes/vignette_getting_started.Rmd
vignette_getting_started.Rmd
gcamfaostat
) for the preparation,
processing, and synthesis of FAOSTAT data for global agroeconomic
@@ -178,9 +178,9 @@ gcamfaostat
(Bond-Lamberty et al. 2019)gcamfaostat
(Zhao and Wise
-2023). The gcamfaostat
package is
-aimed to address these limitations (Figure
+2023). The gcamfaostat
package aims
+to address these limitations (Figure
2). The targeted approach incorporates data preparation,
-processing, and synthesizing capabilities within a dedicated package,
+processing, and synthesis capabilities within a dedicated package,
gcamfaostat, while regional and sectoral aggregation functions in the
model data system are implemented using standalone routines within the
gcamdata package. This strategy not only ensures the streamlined
diff --git a/docs/articles/vignette_news.html b/docs/articles/vignette_news.html
index 8ffd4f57..c3437571 100644
--- a/docs/articles/vignette_news.html
+++ b/docs/articles/vignette_news.html
@@ -112,7 +112,7 @@
vignettes/vignette_news.Rmd
vignette_news.Rmd
Beta release * The beta release of gcamfaostat 1.0.0 -includes the data generated for the Global Change Analysis Model v7.0 GCAM -v7.0. *
2023-11-10
+2023-11-09 # FAOSTAT (in the recent Oct release) provide 2021 data # +We will now update gcamfaostat to include the latest data and for GCAM +BYU # A new BYU branch is created for this task
2023-11-08
Raw data archive 2023-08-19
+2023-08-19
The source data downloaded from FAOSTAT is archived at a Zenodo repository.
vignettes/vignette_preparing_data.Rmd
vignette_preparing_data.Rmd
vignettes/vignette_processing_flow.Rmd
vignette_processing_flow.Rmd
As an illustrative example, the first tier comprises 168 commodities, generated by combining production data from QCL, trade data from TM, and other essential balancing elements (such as opening and closing stocks, @@ -229,7 +228,8 @@
# Load gcamfaostat
-devtools::load_all()
+# devtools::load_all()
+library(gcamfaostat)
# package data, GCAM_DATA_MAP, will be loaded
# Get information for an intermediate data frame QCL_CROP_PRIMARY
diff --git a/docs/articles/vignette_use_cases.html b/docs/articles/vignette_use_cases.html
index 53f60cfa..a8dc9e04 100644
--- a/docs/articles/vignette_use_cases.html
+++ b/docs/articles/vignette_use_cases.html
@@ -112,7 +112,7 @@
Use Cases
- 2023-11-07
+ 2023-11-10
Source: vignettes/vignette_use_cases.Rmd
vignette_use_cases.Rmd
@@ -198,7 +198,7 @@ Country aggregation and disaggre
functions to deal with changes in the country classifications.
-- The function removes nonexistent FAO region using area_code, e.g.,
+
- The function removes nonexistent FAO regions using area_code, e.g.,
USSR after 1991.
- All nonexistent countries due to dissolution are removed by
diff --git a/docs/articles/vignette_visualization.html b/docs/articles/vignette_visualization.html
index c2f85c59..937b7d12 100644
--- a/docs/articles/vignette_visualization.html
+++ b/docs/articles/vignette_visualization.html
@@ -112,7 +112,7 @@
Visualization
- 2023-11-07
+ 2023-11-10
Source: vignettes/vignette_visualization.Rmd
vignette_visualization.Rmd
diff --git a/docs/index.html b/docs/index.html
index 00d82b68..850aaeef 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -119,7 +119,7 @@
Summary
-
The gcamfaostat
R package is designed for the preparation, processing, and synthesis of the Food and Agriculture Organization (FAO) Statistics (FAOSTAT) agroeconomic data. The primary purpose is to facilitate FAOSTAT data use in global economic and multisector dynamic models while ensuring transparency, traceability, and reproducibility. Here, we provide an overview of the development of gcamfaostat
(v1.0) and demonstrate its capabilities in generating and maintaining agroeconomic data required for the Global Change Analysis Model (GCAM). Our initiative seeks to enhance the quality and accessibility of data for the global agroeconomic modeling community, with the aim of fostering more robust and harmonized outcomes in a collaborative, efficient, and open-source framework. One of the important features of the package is the possibility to construct the FAO Food Balance Sheets at the disaggregated commodity level (with over 500 commodities), which provides a comprehensive and detailed data input for a variety of analytical and modeling applications. The processed data and visualizations offered by gcamfaostat
can also be valuable to a broader audience interested in gaining insights into the intricacies of global agriculture.
+The gcamfaostat
R package is designed for the preparation, processing, and synthesis of the Food and Agriculture Organization (FAO) Statistics (FAOSTAT) agroeconomic data. The primary purpose is to facilitate FAOSTAT data use in global economic and multisector dynamic models while ensuring transparency, traceability, and reproducibility. Here, we provide an overview of the development of gcamfaostat
(v1.0.0) and demonstrate its capabilities in generating and maintaining agroeconomic data required for the Global Change Analysis Model (GCAM). Our initiative seeks to enhance the quality and accessibility of data for the global agroeconomic modeling community, with the aim of fostering more robust and harmonized outcomes in a collaborative, efficient, and open-source framework. One of the important features of the package is the possibility to construct the FAO Food Balance Sheets at the disaggregated commodity level (with over 500 commodities), which provides a comprehensive and detailed data input for a variety of analytical and modeling applications. The processed data and visualizations offered by gcamfaostat
can also be valuable to a broader audience interested in gaining insights into the intricacies of global agriculture.
This tool bridges a crucial gap in the literature by offering several key features and capabilities.
-
@@ -129,7 +129,7 @@
Summary
Expandability and Consistency: gcamfaostat
can be used to flexibly process and update agroeconomic data for any agroeconomic model. The package framework can be also easily expanded to include new modules for consistently processing new data.
-
-Community Collaboration and Efficiency: The package provides an open-source platform for researchers to continually enhance the processing methods. This collaborative approach, which establishes a standardized and streamlined process for data preparation and processing, carries benefits that extend to all modeling groups. By reducing the efforts required for data processing and fostering harmonized base data calibration, it contributes to a reduction in modeling uncertainty and enhances the overall research efficiency.
+Community Collaboration and Efficiency: The package provides an open-source platform for researchers to continually enhance the processing methods. This collaborative approach, which establishes a standardized and streamlined process for data preparation and processing, carries benefits that extend to all modeling groups. By reducing the effort required for data processing and fostering harmonized base data calibration, it contributes to a reduction in modeling uncertainty and enhances the overall research efficiency.
-
User Accessibility: Where applicable, the processed data can be mapped and aggregated to user-specified regions and sectors for agroeconomic modeling. However, beyond the modeling community,
gcamfaostat
can be valuable to a broader range of users interested in understanding global agriculture trends and dynamics, as it provides user-friendly data processing and visualization tools.
@@ -232,7 +232,7 @@ Future work and contributionRelated publicationshttps://doi.org/10.5281/zenodo.8260225
-
Copyright 2019 Battelle Memorial Institute; see the LICENSE file.
+Copyright 2023 Battelle Memorial Institute; see the LICENSE file.
diff --git a/docs/paper.html b/docs/paper.html
index 7cfcb80f..08e0522b 100644
--- a/docs/paper.html
+++ b/docs/paper.html
@@ -91,17 +91,17 @@ Summary
-The gcamfaostat
R package is designed for the preparation, processing, and synthesis of the Food and Agriculture Organization (FAO) Statistics (FAOSTAT) agroeconomic data. The primary purpose is to facilitate FAOSTAT data use in global economic and multisector dynamic models while ensuring transparency, traceability, and reproducibility. Here, we provide an overview of the development of gcamfaostat v1.0
and demonstrate its capabilities in generating and maintaining agroeconomic data required for the Global Change Analysis Model (GCAM). Our initiative seeks to enhance the quality and accessibility of data for the global agroeconomic modeling community, with the aim of fostering more robust and harmonized outcomes in a collaborative, efficient, and open-source framework. One of the important features of the package is the possibility to construct the FAO Food Balance Sheets at the disaggregated commodity level (with over 500 commodities), which provides a comprehensive and detailed data input for a variety of analytical and modeling applications. The processed data and visualizations offered by gcamfaostat
can also be valuable to a broader audience interested in gaining insights into the intricacies of global agriculture.
+The gcamfaostat
R package is designed for the preparation, processing, and synthesis of the Food and Agriculture Organization (FAO) Statistics (FAOSTAT) agroeconomic data. The primary purpose is to facilitate FAOSTAT data use in global economic and multisector dynamic models while ensuring transparency, traceability, and reproducibility. Here, we provide an overview of the development of gcamfaostat v1.0.0
and demonstrate its capabilities in generating and maintaining agroeconomic data required for the Global Change Analysis Model (GCAM). Our initiative seeks to enhance the quality and accessibility of data for the global agroeconomic modeling community, with the aim of fostering more robust and harmonized outcomes in a collaborative, efficient, and open-source framework. One of the important features of the package is the possibility to construct the FAO Food Balance Sheets at the disaggregated commodity level (with over 500 commodities), which provides a comprehensive and detailed data input for a variety of analytical and modeling applications. The processed data and visualizations offered by gcamfaostat
can also be valuable to a broader audience interested in gaining insights into the intricacies of global agriculture.
Statement of need
-Global economic and multisector dynamic models have become pivotal tools for investigating complex interactions between human activities and the environment, as evident in recent research [@Doelman2022Quantifying;@Fujimori2022Land-based;@IPCC2022Annex;@Ven2023multimodel]. Agriculture and land use (AgLU) plays a critical role in these models, particularly when used to address key agroeconomic questions [@Graham2023Agricultural;@Yarlagadda2023Trade;@Zhang2023Agriculture;@Zhao2021Global;@Zhao2020critical]. Sound economic modeling hinges significantly upon the accessibility and quality of data [@Bruckner2019FABIO;@Calvin2022GMD;@Chepeliev2022JGEA]. The FAOSTAT serves as one of the key global data sources, offering open-access data on country-level agricultural production, land use, trade, food consumption, nutrient content, prices, and more [@FAOSTAT2023FAOSTAT]. However, the raw data from FAOSTAT requires cleaning, balancing, and synthesis, involving assumptions such as interpolation and mapping, which can introduce uncertainties. In addition, some of the core datasets reported by FAOSTAT, such as FAO’s Food Balance Sheets (FBS), are compiled at a specific level of aggregation, combining together primary and processed commodities (e.g., wheat and flour), which creates additional data processing challenges for the agroeconomic modeling community [@Chepeliev2022JGEA]. It is noteworthy that each agroeconomic modeling team typically develops its own assumptions and methods to prepare and process FAOSTAT data [@bond2019gcamdata]. While largely overlooked, the uncertainty in the base data calibration approach likely contributed to the disparities in model outcomes [@Lampe2014AgMIP;@zhao2021role]. Hence, our motivation is to create an open-source tool (gcamfaostat
) for the preparation, processing, and synthesis of FAOSTAT data for global agroeconomic modeling. This tool bridges a crucial gap in the literature by offering several key features and capabilities.
+Global economic and multisector dynamic models have become pivotal tools for investigating complex interactions between human activities and the environment, as evident in recent research [@Doelman2022Quantifying;@Fujimori2022Land-based;@IPCC2022Annex;@Ven2023multimodel]. Agriculture and land use (AgLU) plays a critical role in these models, particularly when used to address key agroeconomic questions [@Graham2023Agricultural;@Yarlagadda2023Trade;@Zhang2023Agriculture;@Zhao2021Global;@Zhao2020critical]. Sound economic modeling hinges significantly upon the accessibility and quality of data [@Bruckner2019FABIO;@Calvin2022GMD;@Chepeliev2022JGEA]. The FAOSTAT serves as one of the key global data sources, offering open-access data on country-level agricultural production, land use, trade, food consumption, nutrient content, prices, and more [@FAOSTAT2023FAOSTAT]. However, the raw data from FAOSTAT requires cleaning, balancing, and synthesis, involving assumptions such as interpolation and mapping, which can introduce uncertainties. In addition, some of the core datasets reported by FAOSTAT, such as FAO’s Food Balance Sheets (FBS), are compiled at a specific level of aggregation, combining together primary and processed commodities (e.g., wheat and flour), which creates additional data processing challenges for the agroeconomic modeling community [@Chepeliev2022JGEA]. It is noteworthy that each agroeconomic modeling team typically develops its own assumptions and methods to prepare and process FAOSTAT data [@bond2019gcamdata]. While largely overlooked, the uncertainty in the base data calibration approach likely contribute to the disparities in model outcomes [@Lampe2014AgMIP;@zhao2021role]. Hence, our motivation is to create an open-source tool (gcamfaostat
) for the preparation, processing, and synthesis of FAOSTAT data for global agroeconomic modeling. This tool bridges a crucial gap in the literature by offering several key features and capabilities.
-
Transparency and Reproducibility:
gcamfaostat
incorporates functions for downloading, cleaning, synthesizing, and balancing agroeconomic datasets in a traceable, transparent, and reproducible manner. This enhances the credibility of the processing and allows for better scrutiny of the methods. We have documented and demonstrated the use of the package in generating and updating agroeconomic data needed for the GCAM.
-
Expandability and Consistency:
gcamfaostat
can be used to flexibly process and update agroeconomic data for any agroeconomic model. The package framework can be also easily expanded to include new modules for consistently processing new data.
-
-Community Collaboration and Efficiency: The package provides an open-source platform for researchers to continually enhance the processing methods. This collaborative approach, which establishes a standardized and streamlined process for data preparation and processing, carries benefits that extend to all modeling groups. By reducing the efforts required for data processing and fostering harmonized base data calibration, it contributes to a reduction in modeling uncertainty and enhances the overall research efficiency.
+Community Collaboration and Efficiency: The package provides an open-source platform for researchers to continually enhance the processing methods. This collaborative approach, which establishes a standardized and streamlined process for data preparation and processing, carries benefits that extend to all modeling groups. By reducing the effort required for data processing and fostering harmonized base data calibration, it contributes to a reduction in modeling uncertainty and enhances the overall research efficiency.
-
User Accessibility: Where applicable, the processed data can be mapped and aggregated to user-specified regions and sectors for agroeconomic modeling. However, beyond the modeling community,
gcamfaostat
can be valuable to a broader range of users interested in understanding global agriculture trends and dynamics, as it provides user-friendly data processing and visualization tools.
@@ -109,8 +109,8 @@ Statement of needDesign and Functionality
Bridging the gap between FAOSTAT and global economic modeling
- shows a standard framework of using FAOSTAT data in GCAM. GCAM is a widely recognized global economic and multisector dynamic model complemented by the gcamdata R package, which serves as its data processing system. Particularly, gcamdata includes modules (data processing chunks) and functions to convert raw data inputs into hundreds of XML files needed by GCAM [@bond2019gcamdata]. As an illustration, in the latest GCAM version, GCAM v7 [@bond_lamberty_2023], about 280 XML files, with a combined size of 4.1 GB, are generated. Although AgLU-related XMLs represent only about 10% of the total number of files, they contribute over 50% in size (~2.1 GB). The majority of AgLU-related data, whether directly or indirectly, rely on raw data sourced from the FAOSTAT.
-Nonetheless, the FAOSTAT data employed within gcamdata has traditionally involved manual downloads and may have undergone preprocessing. In light of the increasing data needs, maintaining the FAOSTAT data processing tasks in gcamdata has become increasingly challenging. In addition, the processing of FAOSTAT data in the AgLU modules of gcamdata is tailored specifically for GCAM. Consequently, the integration of FAOSTAT data updates has proven to be a non-trivial task, and the data processed by the AgLU module has limited applicability in other modeling contexts [@zhao_cmp360]. The gcamfaostat
package is aimed to address these limitations (). The targeted approach incorporates data preparation, processing, and synthesizing capabilities within a dedicated package, gcamfaostat, while regional and sectoral aggregation functions in the model data system are implemented using standalone routines within the gcamdata package. This strategy not only ensures the streamlined operation of gcamfaostat
but also contributes to keeping model data system lightweight and more straightforward to maintain.
+ shows a standard framework of using FAOSTAT data in GCAM. GCAM is a widely recognized global economic and multisector dynamic model complemented by the gcamdata R package, which serves as its data processing system. Particularly, gcamdata includes modules (data processing chunks) and functions to convert raw data inputs into hundreds of XML input files used by GCAM [@bond2019gcamdata]. As an illustration, in the latest GCAM version, GCAM v7 [@bond_lamberty_2023], about 280 XML files, with a combined size of 4.1 GB, are generated. Although AgLU-related XMLs represent only about 10% of the total number of files, they contribute over 50% in size (~2.1 GB). The majority of AgLU-related data, whether directly or indirectly, rely on raw data sourced from the FAOSTAT.
+Nonetheless, the FAOSTAT data employed within gcamdata has traditionally involved manual downloads and may have undergone preprocessing. In light of the increasing data needs, maintaining the FAOSTAT data processing tasks in gcamdata has become increasingly challenging. In addition, the processing of FAOSTAT data in the AgLU modules of gcamdata is tailored specifically for GCAM. Consequently, the integration of FAOSTAT data updates has proven to be a non-trivial task, and the data processed by the AgLU module has limited applicability in other modeling contexts [@zhao_cmp360]. The gcamfaostat
package aims to address these limitations (). The targeted approach incorporates data preparation, processing, and synthesis capabilities within a dedicated package, gcamfaostat, while regional and sectoral aggregation functions in the model data system are implemented using standalone routines within the gcamdata package. This strategy not only ensures the streamlined operation of gcamfaostat
but also contributes to keeping model data system lightweight and more straightforward to maintain.
The original framework of utilizing FAOSTAT data in GCAM and similar large-scale models. Note that FAOSTAT data is mainly processed in the AgLU modules in gcamdata while there could be interdependency across data processing modules.
@@ -138,7 +138,7 @@ Data preparationThe archived data is consistent with the Prebuilt package data.
- The function downloads the latest raw data from FAOSTAT.
-
Table 1. FAOSTAT dataset processed in gcamfaostat v1.0.
+Table 1. FAOSTAT dataset processed in gcamfaostat v1.0.0.