-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Health & Education solutions #23
Comments
The Health and Education part of the model is one of the integration steps: each individual solution produces its results, then the overall set of solutions is brought into integration to do a number of things:
|
Hi Denton, here are my notes from our call with Chad on Tuesday:
https://population.un.org/wpp/Download/Standard/Population/ I found and studied the TAM module. I will work on generating and saving TAM for the following solutions (let me know if I'm missing any): Is there any particular format these TAM data should be stored in? Regarding emissions, should I dump it into the same CSV file as TAM or keep it separately? Also curious, what does VMA stand for? |
For the current phase of the effort, where the Excel files are still being used and we need the Python implementation to match Excel, I think we don't want to download new data from the UN. Once we've retired the Excel files and are looking to the future, we'd work on downloading the freshest data. For now, I think we want to use the population data used in Excel. I think that is from the Unit Adoption Data tab, the tables starting at cells P17 and P69.
To be honest I'd start with one. SolarPVUtility tends to work well, as it is a relatively straightforward solution.
For two dimensional data we typically use CSV, as it is dramatically faster to load than Excel files. If the file is one which a human researcher is likely to open and modify, we tend to prefer xlsx as it will preserve formatting, columns widths, and other things which make it more pleasant to work with. For these files, generated automatically and used internally, I'd say CSV.
I think it will be the two dimensional data, with rows from 2012-2060 and columns for each Drawdown region.
I'd expect each would be a unique CSV file.
Drawdown tends to use CO2eq for everything. CH4 and F-gasses are converted into the equivalent concentration of CO2.
Yes, I think so.
Variable Meta-Analysis. Multiple sources for a given data point, like the fuel efficiency of hybrid cars or the amount of CO2 which an acre of degraded land could sequester, are entered into the VMA row by row. It will then calculate a single value, typically the mean though the human researcher can apply judgement to adjust it. |
c2.co2_mmt_reduced() and c2.co2eq_mmt_reduced() are the best routines to call to get emissions reductions for a solution. The result is in millions of metric tons. co2_mmt_reduced() returns CO2 results, co2eq_mmt_reduced() includes methane and other GHGs by converting them to CO2-equivalents.
Most of the energy solutions have good data at the regional level, because IEA and IANA publish extensive energy generation information. Many of the other solutions lack regional data, they only have data for the World. In Excel, adoption data is often implemented via interpolation between two years, typically datapoints in 2018 and 2050. In the spreadsheet even when there is no regional adoption data, the 2018 year will be populated with zero. For tests/test_excel_integration.py to pass, we also populate 2018 with zero in Python but we make the rest of the years NaN to ensure that calculations involving the nonexistent regional data will not produce a result. airplanes is one where the solutions doesn't have any data at the regional level, so the regional results are mostly NaN.
This is where we're in uncharted territory: the process where the researchers work out constraints between the solutions (i.e. don't add up to more than the total demand for energy) has not been implemented in Python and I'm not very familiar with how it is done. I know there is a separate spreadsheet where the researchers paste their results. I imagine we'd need to import that spreadsheet, and write a new test which compares results from the Python equivalent.
I imagine the tests we need to write would look similar, but probably not exactly verify_tam_data() and verify_tam_data_eleven_sources(). Those two routines test that the Python combining all of the sources for TAM data matches the result that Excel gets. There are two versions of it because some of the Excel files shifted their columns around to make room for more sources. The test we need to write here can wouldn't test the TAM Data tab in the spreadsheets, as those are now inputs. It would test that the result, the new TAM with limits imposed by the other solutions, matches. |
For step 3:
I think scaling down the TAM according to population sounds right, but I don't think it can be directly compared to the TAM Data tab in a test case. The TAM Data tab contains the unscaled, original TAM. tests/test_excel_integration.py currently compares the Python output to TAM Data, and passes, so we know that after scaling it won't match. I think the test would instead read the "FamPlanning" tab from the spreadsheet, call equivalent routines in the Python code you're providing, and compare that they match. It may be that this can all be done as part of tests/test_excel_integration.py by adding another list of cells in the "FamPlanning" tab to verify[], but I'm not quite sure of that.
I was referring to (at least one) entirely separate spreadsheet used during integration. The researchers paste CO2 emissions reductions and costs into a separate spreadsheet. I think the equivalent in Python would be tools to iterate through all solutions gathering the c2.co2eq_mmt_reduced() and other outputs. I think this CO2 emissions spreadsheet is probably not needed for the Health and Education solutions you're working on here, I think the FamPlanning tab in the individual solutions is likely to be the thing to use to write a test. The "FamPlanning" tab appears to implement the handling of data/unitadoption_pds_population.csv vs data/unitadoption_ref_population.csv. If so, I think we'd want Python code which can produce a Pandas dataframe in the same format, allowing it to be compared in tests/test_excel_integration.py
I imagine that walking through the energy solutions to compute a new TAM and generate results would look like:
I'd recommend that at least for now, to add a new Python file in tools/.py for this integration code. We'll eventually add it as a GitHub Action to run every night, and have it send us a CL if the results have changed by someone checking in a change to one of these solutions during the previous day. |
I took a closer look at the FamPlanning tab and have a few follow up questions:
Would the list of cells I add to the test be B10:G56? I suppose E10:G56 will depend on the answer to #2 from above, since it's dynamic based on the conventional selection.
Are you referring to the "Summary Family Planning Model File" that's referred to in cells R3:U5 in the FamPlanning tab? If so, would giving me access to those files help me complete this issue?
Great, thanks for jotting down those requirements. I will keep the design in mind as I build this out for SolarPVUtility. |
The formulae refer to Unit Adoption Calculations cells in the Q19 range, and subtract the Q71 range. Those are labeled REF1 vs PDS population, so yes I'd assume the text in cell B8 is outdated or incorrect.
I think we can start with just Coal, and get something working. I don't understand how the Family Planning outputs from each solution get combined into an overall Health and Education solution in the results. Non-energy solutions won't have 'Coal', they presumably will have something else.
I don't know. I guess we ignore it for now and implement it if it becomes clear that it is important.
There is an overview of the RRS solutions, like energy, in the Documentation directory.
Yes, I think these should be verified in tests/test_excel_integration.py. That would be a good first step. Combining the FamPlanning results from all of the solutions to come up with a total can come later.
Yes, I'd say B10:G56. If we do need to support more than just 'Coal' then we'll have to figure out how to do that. Right now, test_excel_integration uses an expected.zip file which contains the values from every sheet of the Excel file, for every scenario. We'd need to additionally store multiple FamPlanning sheets for the different conventional energy sources.
I don't have that file, Chad probably does. |
I took a look through the
The above begs the question: what would be the equivalent of cells to B10:G56 from FamPlanning which we earlier agreed would be what this new test reads in as validation data? Here is an exchange I had with @chadfrisch regarding some clarifications on these new Excel files:
Since the Electricity_cluster sheets consolidate all energy solutions, I don't believe I'll be able to start by writing a test for single solution, correct? It's seeming like the test will need to take the sum of TAM & CO2eq from the 8 energy solutions and validate them against the cells in this sheet. Thanks for your patience with all the requirements clarifications this issue has called for. I'm certainly antsy to start writing some code soon... Assuming an affirmative response to all my questions above, I have a rough conceptual idea of what the implementation will look like. Would you like to schedule a 30 min sync this coming week to crystallize the approach? |
I didn't initially have
It seems so, yes. It will need to sum the TAM and emissions from all of the energy solutions.
I don't really know, however one guiding principal is that at this stage of the project is that the goal is to match the results of the model as it is, even in cases where we make a note to followup later of how things should be fixed or done differently. So I'd say that implementing LLDC as 0.0 for those fields would be best.
If B26:G72 from the Electricity_cluster_mdc seems reasonable, then by all means we can pursue that. I suspect that it will become more clear as we work on it. |
The models for the Family Planning and Educating Girls solutions are not separate Excel files but instead are constructed as an analysis built in to all of the other solutions. The difference in results from these two factors are computed for each of the other ~78 solutions, and summed.
This will need to be implemented in Python, likely in the same way by adding the handling to the model and computed for each of the other solutions.
The text was updated successfully, but these errors were encountered: