The RI-indexer is a Python-based tool for indexing and searching text documents.
- Implements two types of tokenization: Split and NLTK tokenizer.
- Supports two stemming algorithms: Porter and Lancaster.
- Generates descriptors and inverted indices for documents.
- Allows searching through indexed documents based on indexed terms.
This project provides an interface to:
- Tokenize text documents
- Perform stemming
- Generate document descriptors and inverted indices
- Search through indexed documents
Contributions are welcome! If you want to contribute to this project:
- Fork the repository
- Create a new branch
- Make your changes and submit a pull request
Feel free to enhance or customize the sections as needed to better describe the specific features and functionality of your "RI-indexer" project.