You are working as an AI engineer at a reputable organisation. You have a client from the green energy sector who needs to be able to evaluate the impact of various green and sustainable energy sources on CO2 emissions in the UK. They have acquired data from two independent sources, the first covering CO2 emissions by country over time with detail of the contribution of various fossil fuels and other activities, while the second charts energy consumption in a 30-year series with a detailed breakdown of the contribution from renewable energy sources. The challenge is to make a meaningful connection between the two sets of data in order to build a predictive model to demonstrate the impact on CO2 emissions of replacing non-renewable energy sources with renewable energy sources. You have carried out an initial data exploration and found:
- The data provided is taken from the two independent sources and is significantly imbalanced.
- The data contains both numeric and nominal attributes.
- There are a few missing values in the data. You’ve had a meeting with your client and have agreed to model the data using artificial intelligence techniques - namely, supervised learning and feature selection optimisation. Feature selection is important in removing irrelevant attributes and helps reduce computation cost. You are expected to present a report to your client by constructing two robust models which must follow the guidelines presented below:
- Design and build a supervised learning model on the full data.
- Use optimisation techniques (learned in this module) to find a subset of relevant features.
- Design and build a supervised learning model on the derived subset of features.
- Critically evaluate the two learning models (with and without feature selection).
- Evaluate the robustness of the generated models by applying appropriate validation techniques (and identifying a suitable subset of data for validation). While setting the parameters of the optimisation methods, pay special attention to selecting an appropriate fitness function (evaluation criteria). The fitness function plays an important role in the relative success or failure of the potential solutions and setting the direction of the search.