Data Mining Class Exercise 2 for Olga, Simon and Fabian
scripts
includes all R scripts needed to reproduce this projectoutput
contains the outputs generated by the R scripts, including the knitted HTML report
- Running
01 - api request.R
requires your own API key. - In order to successfully knit
output/report.Rmd
, scripts01
through05
have to be run in advance. These scripts download, clean and prepare the data and visualisations required for the report. Warning: Scripts02
and05
take about 2 hours each, so if you're strapped for time,output/report.html
provides the fully knitted final report.
- Set up API: Simon
- Create corpus of Guardian articles on the company Amazon: Fabian
- NER Classifier not 100% accurate: Includes mentions of the rainforest (see Word Cloud)
-
Sentiment analysis of corpus and 2-3 sentences on the analysis: Simon - Word cloud and / or word frequencies of corpus and 2-3 sentences on the analysis: Olga
- Topic modelling of corpus and 2-3 sentences on the analysis: Fabian
- Create and submit final report: Simon and Fabian
Andrea's Feedback from CE1
You worked reproducibly using advanced features of GitHub (e.g., the todo list!). The substantive part (the idea, the research question...) is usually not considered in this seminar, but in your case is really well-developed and so it boosted the grade a but. It could lead to a reseach paper. If you want to work at it together I am willing to supervise. Excellent! You missed the 6.0 grade because you did not use issues and had only one PR, ideally each one of you would have made one. Also, you could do a few more commits to practice (it's below average).