Skip to content

Multi-stage Informational Retrieval & Ranking System developed as part of Statistical Natural Language Processing coursework

Notifications You must be signed in to change notification settings

akshayjoshii/Statistical-NLP-Information-Retrieval-Project

Repository files navigation

Ankit Agrawal 2581532

Akshay Joshi 2581346

Abstract: In this final project, the task is to develop and evaluate a two-stage information retrieval modelthat given a query returns thenmost relevant documents and then ranks the sentences withinthe documents. For the first part, you should implement a baseline document retriever withtf-idf features. To get full credits, in the second part you should improve over the baseline ofthe document retriever with an advanced approach of your choice. The third part extends themodel to return the ranked sentences. The answer to the query should be found in one of thetop-ranked sentences.In addition to the source code, you should submit a 4-6 page report that describes the problemand why it is interesting/challenging in your own words, the preprocessing steps, the models youhave developed, your evaluation results and an analysis of the results.

Instructions:

  1. Install dependencies from requirements.txt
  2. Run Extract.py to parse xml for documents and query and generate preprocessed documents. (Execution time: 3-4 mins)
  3. Run TF-IDF.py to get the results of all three tasks i.e. TF-IDF(Baseline), BM25Plus and MRR for sentences. (Execution time: 6-8 mins)

Github link: https://github.com/akshayjoshii/Statistical-NLP-Information-Retrieval-Project

About

Multi-stage Informational Retrieval & Ranking System developed as part of Statistical Natural Language Processing coursework

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages