This project is a simple implementation of a next word prediction system using an N-gram model. Given a sequence of words, the system predicts the next word based on the probabilities calculated from a corpus of text data.
- Supports prediction for unigrams, bigrams, and trigrams.
- Allows customization of N-gram size.
- Provides an evaluation module for assessing prediction accuracy.
- User-friendly interface for inputting text and viewing predictions.
- Clone the repository:
- Install dependencies:
1. Prepare your text corpus and ensure it is in a suitable format.
2. Run the preprocessing script to tokenize and clean the corpus:
3. Train the model by specifying the N-gram size and the cleaned corpus file:
4. Use the trained model for prediction:
To evaluate the prediction accuracy, run the evaluation script:
Contributions are welcome! If you find any issues or have suggestions for improvements, please open an issue or create a pull request.
This project is licensed under the MIT License - see the LICENSE file for details.