GitHub - akshayjoshii/Statistical-NLP-Information-Retrieval-Project: Multi-stage Informational Retrieval & Ranking System developed as part of Statistical Natural Language Processing coursework

Ankit Agrawal 2581532

Akshay Joshi 2581346

Abstract: In this final project, the task is to develop and evaluate a two-stage information retrieval modelthat given a query returns thenmost relevant documents and then ranks the sentences withinthe documents. For the first part, you should implement a baseline document retriever withtf-idf features. To get full credits, in the second part you should improve over the baseline ofthe document retriever with an advanced approach of your choice. The third part extends themodel to return the ranked sentences. The answer to the query should be found in one of thetop-ranked sentences.In addition to the source code, you should submit a 4-6 page report that describes the problemand why it is interesting/challenging in your own words, the preprocessing steps, the models youhave developed, your evaluation results and an analysis of the results.

Instructions:

Install dependencies from requirements.txt
Run Extract.py to parse xml for documents and query and generate preprocessed documents. (Execution time: 3-4 mins)
Run TF-IDF.py to get the results of all three tasks i.e. TF-IDF(Baseline), BM25Plus and MRR for sentences. (Execution time: 6-8 mins)

Github link: https://github.com/akshayjoshii/Statistical-NLP-Information-Retrieval-Project

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.vscode		.vscode
Extracted Docs		Extracted Docs
Unprocessed_Docs		Unprocessed_Docs
.gitignore		.gitignore
Extract.py		Extract.py
FinalProject_SNLP2020.pdf		FinalProject_SNLP2020.pdf
README.md		README.md
SNLP Project Report.pdf		SNLP Project Report.pdf
TF-IDF.py		TF-IDF.py
bm25_ranking.py		bm25_ranking.py
extracted_test_questions.txt		extracted_test_questions.txt
patterns.txt		patterns.txt
regex_tokenize.py		regex_tokenize.py
requirements.txt		requirements.txt
test_questions.txt		test_questions.txt
trec_documents.xml		trec_documents.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Contributors 2

Languages

akshayjoshii/Statistical-NLP-Information-Retrieval-Project

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages