This repositiory covers how to extract the Named Entities in the SemEval2010 - Task 8 Dataset. You can look up the Dataset here. Also, the SemEval2010 - Task 8 statement can be read here.
For the implementation, the task has been divided into two parts:
Creation of the Corpus Reader Class to parse the data from the dataset into a DataFrame for the Model.
Some Common NLP Tasks such as: POS Identification, Dependency Parsing, Full Synctactic Parsing, HyperNym, HoloNym, MeroNym and HypoNym Extraction, etc.
Creation of the Decision Tree Model to perform the Relation Classification and Identification.
-
Download the Submission. The code file for this task is called,
Task1_2Demo.py
. -
Create a File called "test_sentence.txt" containing the test sentences for which you want to run the Task 1 and Task 2.
-
Save this file in the same directory as the code, i.e.,
/Code
. -
run
pip install -r requirements.txt
run
python -m spacy download en_core_web_sm
To Download all packages and dependencies
-
Run the Code as
python Task1_2Demo.py
and you will see all the outputs printed on the console or on the IDE you are using. -
This code can also be run on Google Colab as:
a) Open Google Colab, upload the test_sentence.txt
b) Import the Task1_2Demo.ipynb on Colab. You should be able to run all the tasks.
c) The Notebook Link is clickable here:
-
Open the IDE of your choice. And run the Command
python -m spacy download en_core_web_sm
-
Download the SemEval dataset available on E-Learning and put in the same directory, i.e,
/Code
and name it assemeval_train.txt
-
Put all your test sentences in a file called
semeval_test.txt
or to run this on the entire test set, Download the file available on e-learning and rename assemeval_test.txt
-
Once you copy the test sentences, just make sure that you have run the pip install step as before.
-
Run the code as
python Task3Demo.py