The purpose of this project is to create a chatbot for Concordia University that is able to answer questions relating to courses, topics and students that are enrolled in Concordia University. The courses are added to the knowledge graph automatically using Concordia University's website.
- Download the files below into the project's main directory (/COMP474_A1_W20/)
DBPedia Spotlight (190MB)
DBPedia Spotlight Model (2016-10/en) (1.8 GB) - Navigate to the project directory and run
server_1_extract_en.sh
to extract the model - Run
server_2_init.sh
to start the server. - (Optional) When the server has been initialized, run
server_3_test.sh
to test that the server is running. It should return annotations.
- Clone the repository
- Navigate to conu-knowledge-graph\venv\pyvenv.cfg
- Change home directory to your Python installation
(Ex: home=C:\Program Files\Python38)
- Change home directory to your Python installation
- Open the conu-knowledge-graph folder as a PyCharm folder
- Add an existing interpreter
File -> Settings -> Project: conu-knowledge-graph -> Project Interpreter -> Top right cog -> Add -> Virtualenv Environment -> Existing Environment -> [...] -> [...]\conu-knowledge-graph\venv\Scripts\python.exe
The courseExtraction module gets all the HTML pages from the Concordia website (CourseDataCollector.py
). It then parses the data (the htmls) to extract the courses from each web page (CourseExtractionMain.py
). Then, we can reach for these courses as a 'Course' object list from the script CourseExtractorFromTxt.py
.
The spotlightAnnotations module gives a basic method to get back every term in a block of text that DBpedia has linked to their database. It comes with the term, and the link. There are also two other methods that attempt to put all those terms/links into a file, but due to the API limitations, I was not able to.
The studentCreation module creates the Student objects along with all the information related to the courses they have taken.
The rdfPopulator module create the RDF schema using the Turtle format. The script ttlCreator.py
contains all the necessary information. The RDF schema can be found in rdfPopulator/output.ttl
- Once the project has been properly set up (from Part I), run
ChatbotMain.py
- Enter a question for ConU ChatBot to answer.
The following questions are currently supported:- What's COMP 474 about?
- Which courses did Bianca Patry take?
- Which courses cover Natural Language Processing?
- Who is familiar with Education?
- The results are outputted after querying our knowledge graph