This repository contains both a Python notebook and an R script for the same data cleaning and wrangling task to demonstrate the equivalent code structures in these two languages. Pre-processing task includes but not limited to:
- Reading in .sav data files
- Dealing with labelled data and value labels
- Basic frequency tables
- Filtering by group
- Removing missing data
- Creating new variables or recoding them into the same ones
- Calculating group-centered/scaled variables
- Removing outliers based on within-group quartiles
- Replacing missing values with group means
- Exporting data into csv
Data used: The U.S. public-use PIAAC data (2012-2014) (https://nces.ed.gov/surveys/piaac/datafiles.asp)