Skip to content

Information Retrieval with Lucene and CISI dataset. Index documents and search between them with IB, DFR, BM-25, TF-IDF, Boolean, Axiomatic, LM-Dirichlet similarity and calculate Recall, Precision, MAP (Mean Average Precision) and F-Measure

License

Notifications You must be signed in to change notification settings

Berozain/LuceneCISI

Repository files navigation

Lucene with CISI dataset

Information Retrieval with Lucene and CISI dataset

This is an example of how you can use Lucene to Information Retrieval with the CISI dataset. Index documents and search between them with IB, DFR, BM-25, TF-IDF, Boolean, Axiomatic, LM-Dirichlet similarity. You can enable and disable stemmer and set custom stop words. We use Lucene version 9.5.0 in this project. Don't forget to change the paths inside the code to your computer, then run it. You can use Eclipse to open and run this project.

Query

You can write query easily like Lending book or for advanced search you can use this format docTitle="" docContent="" docAuthors="" to find best results.

Evaluation

There are 111 queries with the most relevant results in order of relevance in the CISI dataset. In the evaluation section, we check how similar our results are to the best results. So we calculate Recall, Precision, MAP (Mean Average Precision) and F-Measure for all queries.

Resources

  1. Lucene
  2. CISI dataset

Developed by

  1. Behrouz Amoushahi
  2. Mehdi Jabalameli

About

Information Retrieval with Lucene and CISI dataset. Index documents and search between them with IB, DFR, BM-25, TF-IDF, Boolean, Axiomatic, LM-Dirichlet similarity and calculate Recall, Precision, MAP (Mean Average Precision) and F-Measure

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages