GitHub - Krutash/Vector-Space-IR-model: We designed an Information Retrieval system based on Vector Space model in python. We Also have implemented Bi gram Indices for Phrasal query search and Champion List retrieval. We also compared time of whole retrieving in our project report.

Author: UTKARSH KUMAR

Please refer to "requirements.txt" for information about required libraries for smooth running of the code.
Run "corpusProcess.py" first to generate corpus files.
The default courpus is "wiki_56" but a new corpus or a list of corpus can be given as a command line argument. when running "corpusProcess.py".
To test queries, please provide your query in "query.txt" and run "test_queries.py".
By default the "test_queries.py" take the files generated by "corpusProcess.py".
"test_queries.py" also accpet command line arguments with file name ordered as :
1. Query file
2. Index file
3. Bigram_Index file
4. Document IDs file
Please give 5-10 minutes to each script to preprocess and perform file i/o and construct required Data structures.

If you want to explore and experiment how the model performs with other corpus find some corpus files here at: https://drive.google.com/drive/folders/1ZsnuEm7_N6aUwhjFpv-TZXFt4DiYex4t?usp=sharing

Provide feedback

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Report_Group_17.pdf		Report_Group_17.pdf
corpusProcess.py		corpusProcess.py
query.txt		query.txt
readme.md		readme.md
requirements.txt		requirements.txt
test_queries.py		test_queries.py
wiki_56		wiki_56