Sawatch iOS Telemetry Data Report

Jane Stout

2 December 2018

This report provides an analysis of N = 1000 instances of trip information for five vehicles colleted via iOS telemetry. My goal was to explore and visualize patterns in the data and ultimately build a model that would predict fuel usage. Table 1 presents variables used in my analysis.

Table 1. Variable Labels and Definitions Used in the Current Analysis

Variable label	Definition
fuel_used (Target)	Liters of fuel used since last report
kmph	Kilometers per hour (i.e., speed)
celsius	Ambient air temperature
altitude_delta	Delta in meters between last altitude report and current
g_force	Calculated g net gravity
kml	Kilometers per liter
kilometers	Kilometers since last report
weather_type_Bad	Ambient weather was Bad (1: mist, overcast clouds, broken clouds) or Good (0: few clouds, clear sky)
vehicle_type	Vehicle Type: SUV (1) or Sedan (0)
latitude	Degrees latitude
longitude	Degrees longitude

Geospatial Analysis

I first plotted latitude and longitude coordinates for each datapoint in the dataset (the code is found here). As seen in Figure 1, there were five distinct data collection locations.

Figure 1. Geospatial Data Collection Locations

Note. Red dots represent data collection locations.

Exporatory Data Analysis

Figure 2 displays a scatter matrix containing (a) correlations between continuous variables (excluding latitude and longtiude) and (b) histograms for each variable along the diagonal. As seen in the histograms, several variables contained a large number of zero values. These zero values correspond to instances where vehicles were idle (e.g., at a stop light). Scatterplots also point to some clearly co-linear relationships. For instance, kilometers since last report (kilometers) and kilometers per hour (kmph) have nearly a 1:1 relationship. This will become relevant during the modeling phase of my analysis.

Figure 2. Scatter Matrix of Continuous Variables

Figure 3 displays fuel usage as a function of whether the weather was bad (e.g., cloudy; misty) or good (e.g., sunny) at each time of data collection. Here, we see greater fuel usage when the weather is good.

Figure 3. Average Fuel Usage by Weather Type

Figure 4 displays fuel usage as a function of the type of vehicle: SUV vs. Sedan. This figure shows, in this particular sample, Sedans used more fuel, on average, than SUVs.

Figure 4. Average Fuel Usage by Vehicle Type

Next, I built a mulitple linear regression model where I regressed fuel_used on kmph, celsius, altitude_delta, g_force, kml, weather_type, and vehicle_type. Note that, given the strong relationship between kmph and kilometers (see Figure 1), I omitted kilometers from the model.

I first split the full dataset into training and testing data (75% and 25% of the full dataset, respectively) and assessed colinearity, if any, among features in the model. To do so I observed Variance Inflation Factors (VIFs) among all of the features. VIF values of >10 indicate a given feature is extremely correlated with other features, which may result in an unreliable model. As seen in Table 2, the VIFs were all acceptable (<10).

Table 2. VIFs for Features in Model

Variable label	VIF coefficient
kmph	9.65
celsius	3.15
altitude_delta	1.30
g_force	1.83
kml	9.87
weather_type	3.63
vehicle_type	4.37

Next, I assessed heteroscedasticity of the model residuals by plotting a scatterplot of the predicted values and the residuals (see Figure 5). Residuals were reasonably homoscadastic (distributed evenly), though there is evidence of some outliers in the model fit (i.e., scatter dots that notably deviate in the positive/negative direction of the y axis).

Figure 5. Scatterplot of Predicted Values Versus Residuals

Next, I compared my model for training versus testing data. Table 3 displays model fit information and coefficients.

As seen in Table 3, the training and tesing data both fit the model well (Adjusted R-squareds >= .76). Given the small absolute amount of fuel usages per trip, b coefficients are difficult to interpret. However, t statistics provide a measure of the strength of the effects and the p values indicate whether each effect was statistically significant. The following is an interpretation of coefficients for the tesing model:

Higher vehicle speed (kpmh) predicts significantly greater fuel usage; this is a strong predictor
Higher ambient air temperature (celsius) significantly predicts less fuel usage; this is a relatively weak predictor
As elevation level rises (altitude_delta), fuel usage also rises significantly
As g net gravity increases (g_force), fuel usage also increses; this is a relatively weak predictor
More kilometers per liter (kml) are associated with less fuel usage
Outdoor weather (weather_type) does not reliably predict fuel usage
Whether a vehicle is a SUV versus Sedan (vehicle_type) does not reliably predict fuel usage

Table 3. Fit statistics and Coefficients for Testing Versus Training Regression Models

Future Directions

In the future, I would like to anlayze telemetric data for electric vehicles to gauge battery efficieny. My personal experience with electric vehicles is that battery life degrades quickly in cold weather and at high speeds. I wonder what other telemetrics can address this type of research question.

It would also be interesting to compare fuel versus battery efficiency for gas-powered versus electric vehicles in terms of financial cost. For instance, given the same distance traveled, same altitude change, etc, which car is more expensive to drive (i.e., cost of fuel versus cost of electricty to charge battery)?

In summary, telemetrics provide an exciting data source that can address practical research questions for the fleet industry.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
images		images
.DS_Store		.DS_Store
README.md		README.md
analysis.py		analysis.py
geoplotting.py		geoplotting.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sawatch iOS Telemetry Data Report

Jane Stout

2 December 2018

Geospatial Analysis

Exporatory Data Analysis

Future Directions

About

Releases

Packages

Languages

gagejane/Sawatch-ios

Folders and files

Latest commit

History

Repository files navigation

Sawatch iOS Telemetry Data Report

Jane Stout

2 December 2018

Geospatial Analysis

Exporatory Data Analysis

Future Directions

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages