Capstone Project (Mlops-Zoomcamp) - House Price Prediction

Problem Statement

This is a capstone project associated with MLOps Zoomcamp, and it will be peer reviewed and scored.

The end goal of the project is to build an end-to-end machine learning project containing feature engineering, trainig, vallidation,tracking, modeel deployment,hosting and general engineering best practices aimed at making house price prediction.

Dataset

This data set has 414 rows and 7 columns. It provides the market historical data set of real estate valuations which are collected from Sindian Dist., New Taipei City, Taiwan. This data set is recommended for learning and practicing your skills in exploratory data analysis, data visualization, and regression modelling techniques. Feel free to explore the data set with multiple supervised and unsupervised learning techniques. The Following data dictionary gives more details on this data set:

Data Dictionary

Column Position	Atrribute Name	Definition	Data Type	Example	% Null Ratios
1	X1 transaction date	The transaction date (for example, 2013.250=2013 March, 2013.500=2013 June, etc.)	Qualitative	2013.500, 2013.500, 2013.333	0
2	X2 house age	The house age (unit: year)	Quantitative	19.5, 13.3, 5.0	0
3	X3 distance to the nearest MRT station	The distance to the nearest MRT station (unit: meter)	Quantitative	390.5684, 405.21340, 23.38284	0
4	X4 number of convenience stores	The number of convenience stores in the living circle on foot	Quantitative	6, 8, 1	0
5	X5 latitude	The geographic coordinate, latitude (unit: degree)	Quantitative	24.97937, 24.97544, 24.94925	0
6	X6 longtitude	The geographic coordinate, longitude (unit: degree)	Quantitative	121.54243, 121.49587, 121.51151	0
7	Y house price of unit area	The house price of unit area (10000 New Taiwan Dollar/Ping, where Ping is a local unit, 1 Ping = 3.3 meter squared) for example, 29.3 = 293,000 New Taiwan Dollar/Ping	Quantitative	29.3, 33.6, 47.7

Design & flow architecture

The architecture below depicts the system design:

Language, frameworks, libraries, Services and Tools used to bootstrap this project.

: Container
: Prediction service (web app)
: s3 for storage,RDS as database, EC2 as virtual machine
: Experiment tracking and model registry
: Workflow orchestration
: open source app framework in Python language
: Monitoring
: Monitoring Dashboard
: Monitoring Database
Pylint + Black + isort : Linter and code formaters

Flow

Training , orchestration, Tracking, Model Registry & Deployment

make train

Prediction service setup , Monitoring service setup, Integratin Test, Streamlit provisioning

make build

Batch Prediction

python stream_send.py

Prediction

[http:](http://localhost:8501)

Project Tree Structure

The following is the resulting repo structure:

|-- Makefile                                                                                              
|-- README.md
|-- Test
|   `-- integration_test
|       `-- run.sh
|-- Tracking_Orchestration
|   |-- Pipfile
|   |-- Pipfile.lock
|   |-- test.py
|   |-- track.sh
|   `-- train.py
|-- data
|   |-- batch_test.csv
|   |-- data.xlsx
|   `-- train.csv
|-- images
|   |-- MLFLOW_EXPER.PNG
|   |-- deploy.PNG
|   |-- docker.PNG
|   |-- drift.PNG
|   |-- mlflow_model.PNG
|   |-- train.PNG
|   `-- web_page_STREAMLIT.PNG
|-- pyproject.toml
|-- streamlit
|   |-- Dockerfile
|   |-- Pipfile
|   |-- Pipfile.lock
|   |-- frontend.py
|   `-- images
|       `-- house.jpg
`-- web_service_monitoring
    |-- Pipfile
    |-- Pipfile.lock
    |-- docker-compose.yml
    |-- evidently_service
    |   |-- Dockerfile
    |   |-- app.py
    |   |-- config
    |   |   |-- grafana_dashboards.yaml
    |   |   |-- grafana_datasources.yaml
    |   |   `-- prometheus.yml
    |   |-- config.yaml
    |   |-- dashboards
    |   |   |-- cat_target_drift.json
    |   |   |-- classification_performance.json
    |   |   |-- data_drift.json
    |   |   |-- num_target_drift.json
    |   |   `-- regression_performance.json
    |   |-- datasets
    |   |   `-- train.csv
    |   `-- requirements.txt
    |-- prediction_service
    |   |-- Dockerfile
    |   |-- app.py
    |   `-- requirements.txt
    |-- requirements.txt
    |-- stream_send.py
    `-- test.py

   13 directories, 46 files

Acknowledgments

I am extremely grateful for the time this set of wonderful people put in place to ensure we understood the various aspect of data and analytical engineering

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Capstone Project (Mlops-Zoomcamp) - House Price Prediction

Problem Statement

Dataset

Data Dictionary

Design & flow architecture

Flow

Project Tree Structure

Acknowledgments

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
Test/integration_test		Test/integration_test
Tracking_Orchestration		Tracking_Orchestration
data		data
images		images
streamlit		streamlit
web_service_monitoring		web_service_monitoring
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

OLAMIDE100/Capstone-Project-Mlops-ZoomCamp

Folders and files

Latest commit

History

Repository files navigation

Capstone Project (Mlops-Zoomcamp) - House Price Prediction

Problem Statement

Dataset

Data Dictionary

Design & flow architecture

Flow

Project Tree Structure

Acknowledgments

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages