The project are implemented with hadoop stream and python:
1. trip_data and trip_fare data is put in `input` folder.
2. hadoop should be added to your system path
Our topic is "Understanding taxi economics", Problems we solved include the following:
1. How does revenue vary across neighborhoods and how does it correlate with the median household income in the neighborhood?
2. How does revenue vary over time? Are the months or seasons when taxi companies make more (or less) money?
3. How long do cab drives ride without passengers? How does this vary over time?
4. Are revenues affected during major events? E.g., parades, presidential visits, storms
Shaopeng Zhang
Hao Chen
Guang Xiong