- Project Name: Analyze Apartment Sales in Poland
- Scenario: Sales Department from a company called “(Not So) Real Estate Poland” gathers info about the apartments for sales in Poland.
- Problem Statement: They want a tool to:
- “dump” their CSV files in,
- and be able to analyze them later. (in SQL, or some reporting tool like PowerBI)
Apartment Prices in Poland dataset by Krzysztof Jamroz. It's a monthly-updated dataset that looks something like:
- I extracted the data in the original data type. I kept all the information.
- I loaded all data from files into a BigQuery table apartment_sales.
- Then, I created custom views.
BigQuery database
Within the bucket, I also decided to implement some logic to archive those files which are already processed. Here's the simple logic:
Technology | Used For: |
---|---|
Cloud Storage | Storing & archiving files |
Cloud Functions | 1. Monitoring files coming to GCS bucket, 2. Loading data to BigQuery, 3. Archiving processed data |
BigQuery | Data warehouse |
(optional) Looker | Visualizing data. You might visualize using simple views in BigQuery also. |
- No local setup required
- Google Cloud Platform:
- free trial account is sufficient
- User with owner role
- Enabled APIs: Storage API & BigQuery API
- Create bucket gcs_housing_in_poland (for raw & processed files)
- Create function_apartment_sales. Set cloud storage as trigger:
- Set code environment to Python 3.10
- Replace function.py and requirements.txt with cloud_function. Change variables marked as #TODO
- Save & deploy
- Create dataset apartment_sales
- Create main table apartment_sales & views (see bigqueryschema.sql file)
- Download and unzip Apartment Prices in Poland dataset by Krzysztof Jamroz
- Load files into your GCS bucket
- Refresh BigQuery table to verify data was loaded.
- (if any errors, refer to "logs" section of your Cloud Function)
I also made a .ppt presentation on this project, feel free to take a look!