This data set includes information about individual rides made in a bike-sharing system covering the greater San Francisco Bay area.
Bay Area Bike Share is a company that provides on-demand bike rentals for customers in San Francisco, Redwood City, Palo Alto, Mountain View, and San Jose. Users can unlock bikes from a variety of stations throughout each city, and return them to any station within the same city. Users pay for the service either through a yearly subscription or by purchasing 3-day or 24-hour passes. Users can make an unlimited number of trips, with trips under thirty minutes in length having no additional charge; longer trips will incur overtime fees.
The Features included in the Data are as follows:
- Member Year of Birth
- Member Gender
- User Type (Subscriber or Customer – “Subscriber” = Member or “Customer” = Casual/One-time)
- Trip Duration(in seconds)
- Bike ID
- Start Time and Date
- End Time and Date
- Start Station ID
- End Station ID
- Start Station Name
- End Station Name
- End Station Latitude
- End Station Longitude
- Start Station Latitude
- Start Station Longitude
In the exploration I found out that the average trip duration was about 10 minutes. Also viewing the bike trip relationship with some other features we found that:
- Although most trips are taken during week days, more time is spent on trips during weekends.
- Through the user type distribution shows we have more subscribers than customers but customers tend to spend more time on trips than subscribers.
- We have more males renting bike than females but females spend more time on trips than males.
- People within age group 18-60 rent bikes more but 60+ tend to spend more time on trips.
For the presentation, the focus will be on the bike trip duration and it's relationship with user type and days of the week that led to some insights that allowed us understand the data.
I'll start with introducing the variables of interests in the univariate plots. Then i go ahead to show the bivariate visualization that depicts the relationship of the bike trip duration against the other variables of interest. These plots would be visualized as boxplots.
Finally I'll depict the multivariate visualization (as pointplot) of the bike trip against the days of the week and user type. These visualizations have been adequately formatted to correctly communicate the message it's meant to pass across.
Check out the presentation here