Author: UTKARSH KUMAR
-
Please refer to "requirements.txt" for information about required libraries for smooth running of the code.
-
Run "corpusProcess.py" first to generate corpus files.
-
The default courpus is "wiki_56" but a new corpus or a list of corpus can be given as a command line argument. when running "corpusProcess.py".
-
To test queries, please provide your query in "query.txt" and run "test_queries.py".
-
By default the "test_queries.py" take the files generated by "corpusProcess.py".
-
"test_queries.py" also accpet command line arguments with file name ordered as :
- Query file
- Index file
- Bigram_Index file
- Document IDs file
-
Please give 5-10 minutes to each script to preprocess and perform file i/o and construct required Data structures.
If you want to explore and experiment how the model performs with other corpus find some corpus files here at: https://drive.google.com/drive/folders/1ZsnuEm7_N6aUwhjFpv-TZXFt4DiYex4t?usp=sharing