Extract: read the data, often from mupltiple sources/formats.
Transform: clean and structure the data to suit business needs.
Load: load the data into a database for storage that can be used for future analysis or business use.
Each member of our project group chose two data sources to analyze Annual International Energy Source Consumption throughout several years and countries.
- Natural Gas Consumption (EIA)
- Natural Gas Consumption (UN Data)
- Total Electricity Consumption (UN Data)
- Electricity Consumption by State (EIA)
All data extracted were in CSV format
- Wrote a function to get the list of countries from UN database, it creates a country ID for unique countries, and this is used for joining purpose.
- Renaming column names since SQL columns cannot start with an integer
- drop rows that contained ‘NA’ data
- converting string to nummeric
- All of our data were in a CSV format, so we went with SQL to store the data.
- Compare energy consumption based on countries and year, create bar chart to see the trend of increasing energy usage, conduct analysis on why certain countries may consume more energy compare to other – this will require other data sets (country population for example).
- finding data that can be used for the project (relatable to what everyone else were finding year wise)
- finding data based on countries instead of areas/continents
- dropping unnecessary data and renaming columns in order to input it into SQL database
- converting values from string to numeric data type
- creating table in relation to SQL and jupyter notebook
- learned how to use lambda