From ec8f77ac7ddabc28cf42c7d7c4cde5acffe905ab Mon Sep 17 00:00:00 2001 From: Matthew Marcos Date: Wed, 29 Jun 2016 14:35:59 +0800 Subject: [PATCH 1/3] ETL basic --- README.md | 4 ++++ etl/etl-basic.md | 21 +++++++++++++++++++++ 2 files changed, 25 insertions(+) create mode 100644 etl/etl-basic.md diff --git a/README.md b/README.md index 5e47840..ca1fdc4 100644 --- a/README.md +++ b/README.md @@ -56,6 +56,10 @@ This repository serves as a collection of things learned by people working in AG - [Basic SQLite3](sqlite3/basic-sqlite3.md) +### ETL + +- [Basic SQLite3](etl/basic-etl.md) + ## Contributing If you want to share your learnings for today, please check out CONTRIBUTING.md diff --git a/etl/etl-basic.md b/etl/etl-basic.md new file mode 100644 index 0000000..eb9e20a --- /dev/null +++ b/etl/etl-basic.md @@ -0,0 +1,21 @@ +# ETL - Basic + +**Date: (June 29, 2016)** + +ETL or *Extract - Transform - Load* is a process done in data warehousing. It is used when: +- You want to aggregate data from different sources into one collection (the warehouse) +- You want to select and arrange only the pertinent information for any kind of analytic, and provide his/her own user view of the data (Data mart) +- You want to clean the data and make meaningful sense out of it. + +ETL can be broken down to three major steps: +1. Extract + - The part where you gather data from different data sources (csv files, databases, etc...). Data can also come from different data warehouses. + - It should be designed to avoid negative effects on source system such as its performance, response time, or any kind of data locking. +2. Transform + - Making the extracted data usable. + - This includes mapping the data, matching rows, enhancing data, summarizing data, etc. + - Transformation also includes standardizing data (such as currency and time formats) and handling encoding +3. Load + - Fetches prepared data and storing them to the data warehouse and database, or data mart. + +Source: [ETL Tutorial | Extract Transform and Load](https://www.youtube.com/watch?v=WZw0OTgCBOY) From 7007b9761538eeb2df2f9de9d93ce0d38d6803a7 Mon Sep 17 00:00:00 2001 From: Matthew Marcos Date: Wed, 29 Jun 2016 14:37:53 +0800 Subject: [PATCH 2/3] Fixed link name for ETL Basic --- README.md | 2 +- etl/{etl-basic.md => basic-etl.md} | 0 2 files changed, 1 insertion(+), 1 deletion(-) rename etl/{etl-basic.md => basic-etl.md} (100%) diff --git a/README.md b/README.md index ca1fdc4..f0dd4d1 100644 --- a/README.md +++ b/README.md @@ -58,7 +58,7 @@ This repository serves as a collection of things learned by people working in AG ### ETL -- [Basic SQLite3](etl/basic-etl.md) +- [Basic ETL](etl/basic-etl.md) ## Contributing diff --git a/etl/etl-basic.md b/etl/basic-etl.md similarity index 100% rename from etl/etl-basic.md rename to etl/basic-etl.md From 089ac6c1623feaed64c9aaca0bf6b754acaa6b0d Mon Sep 17 00:00:00 2001 From: Matthew Marcos Date: Wed, 29 Jun 2016 14:41:06 +0800 Subject: [PATCH 3/3] Update formatting --- etl/basic-etl.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/etl/basic-etl.md b/etl/basic-etl.md index eb9e20a..2488386 100644 --- a/etl/basic-etl.md +++ b/etl/basic-etl.md @@ -8,13 +8,16 @@ ETL or *Extract - Transform - Load* is a process done in data warehousing. It is - You want to clean the data and make meaningful sense out of it. ETL can be broken down to three major steps: + 1. Extract - The part where you gather data from different data sources (csv files, databases, etc...). Data can also come from different data warehouses. - It should be designed to avoid negative effects on source system such as its performance, response time, or any kind of data locking. + 2. Transform - Making the extracted data usable. - This includes mapping the data, matching rows, enhancing data, summarizing data, etc. - Transformation also includes standardizing data (such as currency and time formats) and handling encoding + 3. Load - Fetches prepared data and storing them to the data warehouse and database, or data mart.