Skip to content

Latest commit

 

History

History
66 lines (48 loc) · 4.35 KB

README.md

File metadata and controls

66 lines (48 loc) · 4.35 KB

Data Science Portfolio

A repository of the projects I worked on or currently working on. It is updated regularly. The projects are either written in R (R markdown) or Python (Jupyter Notebook). Click on the projects to see full analysis and code.

Please contact me on Linkedin if you are looking to hire a data scientist.

Projects:

image

  • Used LSTM to fit five years of Beijing weather data
  • Constructed anomaly scores from the difference between model prediction and real data
  • Investigated time periods with high anomaly scores
  • Results confirms that high anomaly scores corresponds with extreme weather (floods, heavy rain, firework celebration, etc)
  • Keywords(Anomaly Detection, Time Series, LSTM, Weather, Beijing, Semi-supervised learning) image

  • Predicted US (2016) election results in realtime as the voting results of each region becomes available.
  • Regressed states with results against polling data and predicted results for the remaining states
  • Monte Carlos simulation used to simulate the winner of the election.
  • Compared simulated results with exchange rates fluctuations to see if market is efficient.
  • Keywords(Python, Linear Regression, Monte Carlos Simulation)


  • Fitted power-law and log-normal distribution to US baby names data since 1960.
  • Use bootstrapping techniques to find a distribution of the power-law parameters
  • Crawled Twitter to find 20000 random user and fitted power law distribution to users' friends count and followers count.
  • Keywords(R, Power-law, Bootstrapping, Log-normal)


  • Parsed a few GB of Tweets to select all the tweets in UK and in English.
  • Used 'qdap' package to analyze the emotion of the Tweets
  • Plotted the emotions over the day and over the week and analysed the interesting results.
  • Keywords(R, Twitter API, Time Series, Sentiment Analysis, ggplot)

  • Downloaded economic indicators data using World Bank API, and cleaned data
  • Downloaded search query of next and last year in Google for each country
  • Fitted linear regression between GDP and future orientation
  • Keywords(R, World Bank API, Google API, Data Cleaning, Linear regression)

  • Predicted UK (2017) election victories as the voting results as it happened.
  • retrieved from Tweets of result announcement and extracted time of announcement for each region.
  • Regressed regions with results against polling data and predicted results for the remaining regions
  • Monte Carlos simulation used to simulate the winner of the election.
  • Keywords(Python, Twitter API, Merging Data)