Skip to content

Latest commit

 

History

History
71 lines (52 loc) · 4.02 KB

README.md

File metadata and controls

71 lines (52 loc) · 4.02 KB

code = perceive

Assignment 5: Open-ended IR Technique

Westmont College Fall 2023

CS 128 Information Retrieval and Big Data

Assistant Professor Mike Ryu (mryu@westmont.edu)

Author Information

Guide

MarkovModel

MarkovModel is a Python class that implements a query likelihood language model based on Markov chains. This class allows you to create language models for generating text and estimating the probability of a query given a document. The model supports both character-based and word-based representations with customizable order.

Project Structure

  • src/models.py: Contains the implementation of the MarkovModel class. Also includes main() which defines example usage of the MarkovModel class.
  • data/: An empty directory where you can store your training data.
  • test/: An empty directory intended for future testing.

Installation

No installation required, but make sure you have the required libraries from requirements.txt. Then, just include the MarkovModel class in your project.

Usage

  1. Import the MarkovModel class:
    from src.models import MarkovModel
  2. Create an instance of the MarkovModel class by providing the mode ('char' or 'word') and the training text:
    text_data = [...]  # List of training documents
    markov_model = MarkovModel(mode='word', text=text_data, n=3)
  3. Train the model with the training data:
    markov_model.train(text=text_data)
  4. Generate text using the trained model:
    generated_text = markov_model.generate(start='The quick brown fox', max_len=200)
    print(generated_text)
  5. Estimate the probability of a query given a document:
    query = 'natural language processing'
    result = markov_model._most_probable_doc(query=query, l=0.7, corpus_percentage=1.0)
    print(result)

Data and Testing

The data/ directory is intended for storing your training data. Feel free to populate this directory with text documents to train your model!

The test/ directory is currently empty, and more thorough testing should be implemented in the future to ensure the reliability of the MarkovModel class.

Feel free to contribute by adding your own test cases or improving the model based on your specific use case!

Acknowledgements and Sources

While working on this assignment, I used the following resources: