Skip to content

Latest commit

 

History

History
36 lines (18 loc) · 1.23 KB

README.md

File metadata and controls

36 lines (18 loc) · 1.23 KB

Boston-Crime-Data-ETL

The aim of this project is to create an automated pipeline that takes in data from Boston Crime Data (https://data.boston.gov/dataset/crime-incident-reports-august-2015-to-date-source-new-system) data. It then performs the appropriate transformations and loads the data into SQL Server database.

For the same, we use the following breakdown:

• Write a function to extract Boston Crime Data Files.

• Transform the data while maintaining the control number.

• Load the data into SQL Server.

• Maintained a log file containing timestamps for every aspect of ETL.

alt text

Steps to run:

• Paste the link from the website.

• Enter Server name, DB name, and the necessary details required to create a folder/connect to SQL Server.

• Simply run all the cells to perform ETL.

Insights:

• Resolved data quality issues by minimizing redundancy, disparity, and errors.

• Acquired valuable insights and experience in automating data ingestion and data quality validation.

Future Scope:

• Include data from various sources.

• Carry out analysis of the transformed data on Power BI (https://github.com/saran820/Boston-Crime-Data-Analysis-Report).