- Tokenizing docs
- Deleting stop words
- Stemming
- A dictionary containing all words of our dataset
- Store frequency of each word in our dataset
- Sotre every document that a word showed up in it and its frequency and its position in this document
- Sotre tf.idf for each term in order to reduce computation cost
- Select related document from inverted index
- Apply query operations (Not and Phrase operations)
- Score documents respected to:
- How many words in query exist in theis document
- How many times each word of query repeted in this document
- Show top-5 results with related sentences of those documents