Building aspect-based sentiment analysis system to analyze product reviews using spaCy and Keras.
This workshop was originally presented at Warsaw IT Days 2019 by Stanisław Giziński and Krzysztof Kowalczyk from our Machine Learning Club.
We are also currently implementing full support for Polish language in spaCy, you can track our progress via our GitHub organization and the see results on our project website
We will be using Anaconda distribution of Python to make installation of machine learning libraries easier.
Any other distribution of Python>=3.7
should do fine, but if you want to have exactly the same setup:
- Clone this repository:
git clone https://github.com/knum-mimuw/spacy-workshop
- Download and install Miniconda Python 3.7 installer, make sure to add binaries to PATH variable when prompted at the end of installation
- Open the terminal (on Windows, use newly installed Anaconda Prompt instead of CMD / Powershell)
- Create conda environment:
conda create -n spacy-wdi python=3.7.1 spacy jupyterlab
, this may take a while - Activate the environment:
source activate spacy-wdi
(on Windows:activate spacy-wdi
) - Download machine learning models:
python -m spacy download en
- Navigate to the cloned repository folder (
cd spacy-workshop
) and start jupyter lab (jupyter lab
) - Download "The Guardian Articles" dataset and extract it (there is only one CSV in there, place it in the cloned repository folder.
- Download "Semeval Aspect-Based Sentiment Analysis" dataset. Unfortunately you have to create an account there, because we are not allowed to redistribute this dataset directly due to licensing issues.
To check if the setup process was completed, go to localhost:8888
,
select "new console: Python 3" and type the following lines into the console:
import spacy
nlp = spacy.load("en_core_web_sm")
If the code doesn't crash, everything was installed correctly.
During the workshops, we will be:
- learning about NLP from notebooks: 1-NLP-Introduction.ipynb and 2-NLP-Glossary.ipynb
- going through spaCy library: objects, methods, attributes, etc.
- (optional) building an aspect-based sentiment analysis pipeline on SemEval dataset (for reference implementation see
utils/SemEval.ipynb
) - (optional) building a standard sentiment analysis pipeline for Guardian articles (for reference implementation see
utils/TheGuardian.ipynb
)