Replies: 6 comments
-
@dalonsoa @alexdewar Thoughts on this? |
Beta Was this translation helpful? Give feedback.
-
Wow! This is a big change! To be honest, I - we all - will need to seat together and think on the implications of these changes. For a long time. A few comments that come to my mind: As inputs are still provided as CSV files, I feel this affects much more the inner workings of MUSE, which will use a DB to store all of that information, than the input layer. So, I feel this will definitely need to be left for MUSEv2. The amount of things that will need changing are enormous and equivalent to re-writing the code from scratch. A DB is, in practice, a global variable and when MUSE was being re-written from v0, an explicit effort was made to not use global variables to keep the state of the simulation, but rather have the MCA own most of the information and pass the relevant bits to one or other function. I don't have any problem with the DB idea, I think it makes sense - at least for static stuff - but just commenting on the possible implications. Having said that, MUSEv1 uses On the aspect of re-organising the input structure, probably your version makes more sense for the user making things simpler for them, but we need to consider how those files map to the internal data structures of MUSE. They have been designed with the original input structure in mind - they were developed simultaneously - so changing one without changing the other will require some interface layer to make them work together. MUSEv1, for example, relies heavily on the concept of Considering all of this, I would suggest we park the full implementation of this for the design of MUSEv2, and we try to figure out how (or which of) the advantages you describe we can be retrofitted into MUSEv1. |
Beta Was this translation helpful? Give feedback.
-
@tsmbland This looks v well considered and seems to make sense (at least from my fairly shallow understanding of the current input format). I think I agree with @dalonsoa that this is probably something that should wait till MUSE v2 though. A major change in the input file format is essentially a breaking change, so if we were doing semantic versioning that would be a major version bump in any case. Given that MUSE v1 is likely to live on as the "legacy" version for a while, we should probably aim to maintain backwards compatibility for the foreseeable future. Once I'm a bit more familiar with the codebase I might have some more thoughts about this and we should def keep it in mind when reworking the input layer. |
Beta Was this translation helpful? Give feedback.
-
Thanks @dalonsoa and @alexdewar I agree that we definitely don't want to break backwards compatibility in v1. In theory, we could support both formats in v1 by implementing a layer that converts input files from the new format back to the original format, and none of the inner workings would need to change. Obviously this would have no benefits in terms of performance, but would allow users to get comfortable with the new format, and lower the barriers for switching to v2 (for both users and developers). Not saying we need to do this, just an option. In terms of implementation, we wouldn't necessarily need to use an actual database (although it may be a good idea to). I like the database-style format as I think it's an intuitive way of structuring data, but the data could still be read in and stored using pandas and xarray without building an actual database (Update: or an in-memory database). |
Beta Was this translation helpful? Give feedback.
-
This looks good! I think we need a meeting about it too though. |
Beta Was this translation helpful? Give feedback.
-
I've been thinking about ways that we could restructure the model inputs, and I think we could massively improve things by using a database structure, whilst still maintaining a set of human-readable csv files. See details below.
Database schema
This is one possible way that the model inputs could be represented as a database:
Each box is a table, with fields and data types indicated. Arrows represent relationships between tables. I've coloured the tables as follows:
All parameters are related to existing parameters in the current input files, so I won't include full explanations here (although see further down for a basic description of each table). There are a few parameters I'm not sure about (either I don't understand what they're for, or not sure which table they should belong in), which I've coloured in red.
One point to note is the distinction between a technology (e.g. wind turbines) and a technology installation, which is an installation of a technology in a specific region (e.g. wind turbines in the UK). The
Technology
table holds core properties of the technology (which won't differ between regions), and theTechnologyInstallation
table holds properties related to a specific installation of a technology in a region (which may differ between regions). Some parameters are allowed to vary between timeslices, which can be indicated in theTechnologyInstallationTimeslice
table. This schema no longer contains distinct tables for different sectors. Instead,Sector
is added as a field in theTechnology
table.A few of the tables would contain time series data (black border) for parameters that can change over the course of the simulation. However, in reality, basic users may want to keep most parameters fixed over the course of the simulation, and just define them for the base year (which would be mandatory). Therefore, and alternative way of representing the data could be as follows:
In this case, the main tables only store data for the base year. If any parameters change over time, these changes can be stored in various 'trend' tables (blue). For many simple simulations, in which most parameters are held constant over time, these tables will be all/mostly empty. It may seem like an unnecessary duplication of tables, but I think it could simplify things for most users, and also make clearer what's mandatory to define (base year) and what's optional (future years).
Input file schema
Many users won't be comfortable with using databases directly, so we'll want to keep using a system of csv files that users can manually edit with Excel or a text editor. This can be done without too many deviations from the above schema. I've grouped tables into folders according to the colouring scheme above (although I haven't included a table for regions, as these are already defined as a list in the config file).
(See example files at the bottom, which is probably easier than reading through the schema)
Agent/
Agent.csv
technodata/Agents.csv
AgentShare.csv
technodata/Agents.csv
Objective.csv
technodata/Agents.csv
Commodity/
Commodity.csv
input/GlobalCommodities.csv
. I've excludedCommodityEmissionFactor_CO2
andHeatRate
as these aren't used by the model, but can add back if helpfulCommodityPrice.csv
input/Projections.csv
CommodityTrade.csv
input/BaseYearExport.csv
andinput/BaseYearImport.csv
. I've combined imports and exports to have a single fieldNetImport
as I assume this is all that matters, but I may be wrongTechnology/
Technology.csv
technodata/{sector}/Technodata.csv
TechnologyInstallation.csv
technodata/{sector}/Technodata.csv
Flow.csv
technodata/{sector}/CommIn.csv
andtechnodata/{sector}/CommIn.csv
. I've combined inputs and outputs into a single field for net flow as I assume that's all that matters, but again I may be wrongAllocation.csv
technodata/{sector}/Technodata.csv
Consumption.csv
technodata/preset/
ExistingCapacity.csv
technodata/{sector}/ExistingCapacity.csv
Trend/
AllocationTrend.csv
CommodityPriceProjection.csv
CommodityTradeTrend.csv
ConsumptionTrend.csv
FlowTrend.csv
TechnologyInstallationTrend.csv
TechnologyInstallationTimeslice.csv
UtilizationFactor
andMinimumServiceFactor
to vary during different timeslices (e.g. specific months, times of day etc.)TechnologyInstallation
and/orTechnologyInstallationTrend
technodata/{sector}/TechnodataTimeslices.csv
I've had a go at restructuring the default model into this format, and attached below. This should clarify what I have in mind. I've also included the default model in the original format for comparison:
new_default.zip
old_default.zip
I haven't modified the settings file yet, but this should be mostly unchanged.
Advantages compared to current system
Notes
Discussion points
Beta Was this translation helpful? Give feedback.
All reactions