Over the past decade, bicycle-sharing systems have been growing in number and popularity in cities across the world. Bicycle-sharing systems allow users to rent bicycles for short trips, typically 30 minutes or less. Thanks to the rise in information technologies, it is easy for a user of the system to access a dock within the system to unlock or return bicycles. These technologies also provide a wealth of data that can be used to explore how these bike-sharing systems are used.
In this project, we will focus on the record of individual trips taken in 2016 from our selected cities: New York City, Chicago, and Washington, DC. Each of these cities has a page where we can freely download the trip data.:
If you visit these pages, you will notice that each city has a different way of delivering its data. Chicago updates with new data twice a year, Washington DC is quarterly, and New York City is monthly.
(Image is from a copyright-free website: https://www.pexels.com/royalty-free-images/.)
Table of Contents |
---|
Prerequisites 🔍📜 |
Design 📐 |
Conclusions 📌 |
License 🔖 |
- Python 3.6.3
- Jupyter Notebook
- Anaconda-Navigator
-
Exploratory analysis
-
Data is provided by Motivate, a bike-share system provider for many major cities in the United States. I will compare the system usage between three large cities: New York City, Chicago, and Washington, DC;
-
Compare the system usage between three large cities: New York City, Chicago, and Washington, DC;
-
Examine if there are any differences within each system for those users that are registered, regular users and those users that are short-term, casual users.
-
-
Visualization
- In this project, Python is the main tool used to explore data related to bikeshare systems for three major bikeshare systems in the United States as well as perform data wrangling to unify the format of data from the three systems and write code to compute descriptive statistics. External packages beyond Python library are introduced to help visualizing the data.
We have done quite a lot of profound analysis based on such a limited set of data, however, there are also a lot of potential analyses that could be performed on the data which are not possible with only the data provided. For example, detailed location data have not been investigated. Where are the most commonly used docks? What are the most common routes? As another example, the weather has potential to have a large impact on daily ridership. How much is ridership impacted when there is rain or snow? Are subscribers or customers affected more by changes in weather?
We can also apply this technique to medical information processing or drug development field, such as management of health data and medical records, estimating the effects of one drug over another or placebo as well as evaluating its toxicity. More important, nowadays computer based drug design is developing rapidly, that machine learning based on substantial clinical data plays an increasing important role in the design of new drug.