Submission for the Getting and Cleaning Data Coursera class project.
This GitHub repository is part of the submission of project detailed here:
The data this repository is concerned with was downloaded from:
The repo contains the following files:
- --- this file - provides overview of the analysis
- codebook.txt --- describes the processed data
- data_download.R --- Windows script to download the original data
- run_analysis.R --- R script performing the analysis as per the instruction notes linked above
The analysis steps are explained in comments in the run_analysis.R file, but the general process is to:
- load both (test and train) data tables and attach columns indicating performed activity and the subject
- merge the tables by appending one to another
- perform a join to activity labels
- select only columns with mean and std (standard deviation) measurement data
- average the measurement data by activity and subject - in other words getting a mean of all values for each possible combination of subject and activity
- sort the resulting table
You can read the final dataset by using:
sorteddata <- fread('sorteddata.txt')