Skip to content

Latest commit

 

History

History
154 lines (110 loc) · 3.83 KB

PA1_template.md

File metadata and controls

154 lines (110 loc) · 3.83 KB

Reproducible Research: Peer Assessment 1

Loading and preprocessing the data

options(scipen = 1, digits = 7)
library(data.table)
data <- read.csv("activity.csv")
data$day <- as.Date(data$date, "%Y-%m-%d")
data$weekday <- weekdays(data$day)
weekday.list <- unique(data$weekday)
day.type <- c(rep("Weekday", 5), rep("Weekend", 2))
data$dayType <- day.type[match(data$weekday, weekday.list)]

Remove the NA values

data.noNA <- data[!is.na(data$steps),]

What is mean total number of steps taken per day?

Sum the steps for each day.

total.steps.by.day <- xtabs(steps ~ day, data=data.noNA)

Create a histogram of the mean number of steps per day.

hist(total.steps.by.day, main="Histogram of Total Number of Steps per Day", xlab="")

plot of chunk unnamed-chunk-4

Calculate the mean of the number of steps per day.

mean(total.steps.by.day)
## [1] 10766.19

Calculate the median of the number of steps per day.

median(as.vector(total.steps.by.day))
## [1] 10765

What is the average daily activity pattern?

Take the average number of steps by the interval across days.

library(plyr)
avgsteps <- ddply(data.noNA, .(interval), summarize, avg=mean(steps))
with(avgsteps, plot(interval, avg, type="l", 
                                   main="Average Daily Activity Pattern", 
                                   xlab="Interval (5 minute increment)",
                                   ylab="Average Number of Steps"))

plot of chunk unnamed-chunk-7

Interval with the maximum number of steps on average across the day.

avgsteps[avgsteps$avg == max(avgsteps$avg),"interval"]
## [1] 835

Imputing missing values

Total number of mission values in original data.

sum(is.na(data$steps))
## [1] 2304

For simplicity replace missing values with the average daily number of steps for that interval. Round the average number of steps to a whole value since there are not partial steps.

nareplaced <- merge(data, avgsteps, by="interval")
na.idx <- which(is.na(nareplaced$steps))
nareplaced$steps[na.idx] <- round(nareplaced[na.idx,"avg"],0)

Find the number of steps per day for new data set.

intervalsteps <- ddply(nareplaced, .(day), summarize, steps=sum(steps))

Histogram with the total number of steps taken per day

hist(intervalsteps$steps, main="Total Number of Steps per Day by 5-Minute Interval (NAs Replaced)", xlab="")

plot of chunk unnamed-chunk-12

Calculate the mean of the number of steps per day.

mean(intervalsteps$steps)
## [1] 10765.64

Calculate the median of the number of steps per day.

median(intervalsteps$steps)
## [1] 10762

Brief discussion of changes due to replacing NA values

The overall shape of the distribution remained the same with a higher peak. The mean was almost the same which makes sense given that the rounded average of each interval was used to fill in the corresponding interval with a missing value. The median moved only slightly due to the addition of additional values since the mean and median of the observations with step values were very close.

Are there differences in activity patterns between weekdays and weekends?

Convert the day type variable created above during the initial processing to a factor. Compute average steps per interval using new data then plot the data to compare weekdays with weekends.

library(ggplot2)
nareplaced$dayType <- factor(nareplaced$dayType)
intervalsteps <- aggregate(steps ~ interval + dayType, data=nareplaced, FUN=mean)
ggplot(intervalsteps, aes(interval, steps)) + geom_line() + xlab("Interval") + ylab("Number of steps") + facet_grid(dayType ~ .)

plot of chunk unnamed-chunk-15