Skip to content

Latest commit

 

History

History
18 lines (14 loc) · 1.12 KB

README.md

File metadata and controls

18 lines (14 loc) · 1.12 KB

Data Wrangling A3: Data Integration and Data Reshaping

  • Assignment_Specifications.pdf: Assignment specifications
  • Assignment_Solutions.ipynb/pdf: Assignment solutions. Python code to integrate several datasets into one single schema and find and fix possible problems in the data.
  • Input data: 7 datasets in various formats and data is about housing information in Victoria, Australia.
  • Input files: GTFS_Melbourne_Train_Information.zip, vic_suburb_boundary.zip, 30945305.zip.
  • Output files: 30945305_A3_solution.zip

Tasks completed:

  1. Task 1: Data Integration

    • Integrated the 7 input files into one dataset with a specified schema mentioned in the assignment specifications.
    • File types: .txt, .xlsx, json, xml, html and pdf
  2. Task 2: Data Shaping

    • Studied the effects of different normalization/transformation methods (i.e. standardization, min-max normalization, log, power, box-cox transformation) on various attributes.
    • Observe and explain their effect.

Libraries used: pandas, numpy, re, json, bs4, tabula, scipy, matplotlib, sklearn, sklearn.model_selection, sklearn.metrics, sklearn.linear_model