-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract most important information from XML into simplistic format #13
Comments
@okworx is the ilcd importer mostly finished or worked out? Where can I test it or where is the working repo or fork for that? |
@shirubana in this fork: https://github.com/mfastudillo/brightway2-io. try: from pathlib import Path
import bw2io
import bw2calc
import bw2data
from bw2io.importers.ilcd import ILCDImporter
import pandas as pd
bw2data.projects.set_current('ilcd_import')
bw2io.bw2setup()
path_to_example = Path('bw2io/data/examples/ilcd-example.zip')
so = ILCDImporter(dirpath= path_to_example,dbname='example_ilcd')
so.apply_strategies()
so.match_database('biosphere3',fields=['database','code'])
so.match_database(fields=['database','code'])
so.statistics()
so.drop_unlinked(True)
so.write_database() You can pick an example of ILCD from the GLAD website, and tried with a few and it works. Quite a number of elementary flows are not matched and need to be dropped. |
I had to do a similar process in another project, the Lavoisier, an LCI data format converter (https://github.com/JosePauloSavioli/Lavoisier). The process worked with a reading function and a mapping class. The mapping class would have an output dictionary to populate, and a mapping dictionary with keys as XML fields and values as function calls to modify the data from these fields and populate the output dictionary. It was something like this (the reading function would take out namespaces automatically): mapping = {
"/processDataSet/processInformation/dataSetInformation/UUID": lambda x: setattr(self.output_dict, "UUID", self.modify_UUID(x))
} The mapping dictionary is passed to the reading function. The reading function parses the XML and verifies if the element is in the mapping. If it is, it organizes the XML data in a dictionary (like the xmltodict library does) and calls the function bound to the element within the mapping dict with the data. The reading function then modifies the data for the new format and returns it to the lambda function, which sets it in the output dictionary. This is the basic flow, but for Lavoisier, the output dict would be an abstraction of the output format of the conversion. The process worked well with LCA inventory data as it could perform the conversion in unique fields or sets of fields (like passing all data from one exchange to a class that could modify data for the new format). Still, it had some minor drawbacks related to parsing (as it is treated as a continuous flow of information, so the dataset is not loaded in memory). I saw @mfastudillo is ahead on developing an ILCD importer for Brightway. This interests me a lot since one of the issues I have with converting datasets is that there is no software where I can import a .spold and an ILCD .zip file to compare information about it (JosePauloSavioli/Lavoisier#3). I also had to study a lot the ILCD (and Ecospold 2) format to make the conversion possible, so I have an extensive knowledge of the format and on reading and working with data in it and have been through the struggle of mapping elementary flows between formats D: @mfastudillo If it is in your interest, I would like to help you develop the importer. I'm open for a meeting or an exchange of emails if you want (my GitHub page has the email). I could fork it but I still have limited knowledge in Brighway, so I can help better in other ways. |
Hi @JosePauloSavioli , sure, contributions are more than welcome! I'll try to update the issues. The importer follows an extract - transform - load logic, and one of the most tricky things is the "extract" part where we parse different fields of the ilcd zip file into a list of dictionaries, this does not require much brightway knowledge but knowledge about the ilcd format is very useful |
Hmmm, this is really a tricky part. I saw difficulties in 4 ways:
I think this can become pretty specific. @mfastudillo, would you mind sharing more about the difficulties that you are having in extracting? Do you prefer me to discuss this in the Issues of your fork? |
Hi @JosePauloSavioli , yes I think the issues in the forked repository are a better place to discuss the main issues |
Issue closed during cleanup for Brightcon 2023 |
Extract the information items from the XML into the data structure from #12.
See the "ILCD Format in a nutshell" guide for details on what to find where.
First, we need to parse all the flows in order to have the information ready for looking it up later when we read the process(es).
Process
default namespace
http://lca.jrc.it/ILCD/Process
xmlns:common="http://lca.jrc.it/ILCD/Common"
Metadata
UUID (string)
/processDataSet/processInformation/dataSetInformation/common:UUID
name (string)
This consists of 4 parts:
/processDataSet/processInformation/dataSetInformation/name/baseName
/processDataSet/processInformation/dataSetInformation/name/treatmentStandardsRoutes
/processDataSet/processInformation/dataSetInformation/name/mixAndLocationTypes
/processDataSet/processInformation/dataSetInformation/name/functionalUnitFlowProperties
which we want to concatenate with a semicolon + a space
;
as separator characters.reference year (number)
/processDataSet/processInformation/time/common:referenceYear
valid until year (number)
/processDataSet/processInformation/time/common:dataSetValidUntil
geographical representativity (location code, string)
/processDataSet/processInformation/geography/locationOfOperationSupplyOrProduction/@location
reference product(s)
The exchange with the reference flow is this one
/processDataSet/exchanges/exchange[@dataSetInternalID=/processDataSet/processInformation/quantitativeReference/referenceToReferenceFlow]
reference product internal id (integer)
/processDataSet/processInformation/quantitativeReference/referenceToReferenceFlow]
we will need this internally for parsing and processing
reference product name (string)
/processDataSet/exchanges/exchange[@dataSetInternalID=/processDataSet/processInformation/quantitativeReference/referenceToReferenceFlow]/referenceToFlowDataSet/common:shortDescription
reference product amount (number)
/processDataSet/exchanges/exchange[@dataSetInternalID=/processDataSet/processInformation/quantitativeReference/referenceToReferenceFlow]/resultingAmount
This will need to be multiplied with the amount from the flow.
Inventory
for each exchange:
here is the list of exchanges:
/processDataSet/exchanges
Each of them is uniquely identified by its
dataSetInternalID
attribute.One (or multiple, we only need to support one for now) of them is the reference product - the one whose
@dataSetInternalID
attribute matches the "reference product internal id" from aboveinternal ID (integer)
exchange/@dataSetInternalID
we need that for internal processing
flow name (string)
exchange/referenceToFlowDataSet/common:shortDescription
flow UUID (string)
exchange/referenceToFlowDataSet/@refObjectId
exchange direction (string)
exchange/exchangeDirection
exchange amount (double)
exchange/resultingAmount
For each exchange, we'll need to look up the actual flow that is referenced (from the list of flows that we have parsed before) by its UUID and then read the flow's name, compartment, flow amount and unit.
The amount from the exchange and the amount from the flow need to be multiplied and yield the actual resulting amount for this exchange.
Flow
default namespace
http://lca.jrc.it/ILCD/Flow
xmlns:common="http://lca.jrc.it/ILCD/Common"
name (string)
/flowDataSet/flowInformation/dataSetInformation/name/baseName
UUID (string)
/flowDataSet/flowInformation/dataSetInformation/common:UUID
compartment (string)
/flowDataSet/flowInformation/dataSetInformation/classificationInformation/common:elementaryFlowCategorization/common:category[@level=2]
type of flow (string)
/flowDataSet/modellingAndValidation/LCIMethod/typeOfDataSet
reference flow property amount (double)
/flowDataSet/flowProperties/flowProperty[@dataSetInternalID=/flowDataSet/flowInformation/quantitativeReference/referenceToReferenceFlowProperty]/meanValue
reference flow property UUID (string)
/flowDataSet/flowProperties/flowProperty[@dataSetInternalID=/flowDataSet/flowInformation/quantitativeReference/referenceToReferenceFlowProperty]/referenceToFlowPropertyDataSet/@refObjectId
With the UUID of the reference flow property, we can use the lookup function @grain11 wrote to lookup the unit.
The text was updated successfully, but these errors were encountered: