Skip to content

Latest commit

 

History

History
175 lines (131 loc) · 6.52 KB

README.md

File metadata and controls

175 lines (131 loc) · 6.52 KB

⛓ Markov

A Crystal library for building Markov Chains and running Markov Processes.

Build Status Docs GitHub release

What is a Markov Chain?

A Markov Chain is essentially a mechanism for guessing probable future events based on a sample of past events. For a great explanation, watch this Khan Academy video.

Visit the API Documentation for a more in-depth look at the library's functionality.

Installation

Add this to your application's shard.yml:

dependencies:
  markov:
    github: mccallofthewild/markov

In your terminal, install Crystal dependencies with:

$ shards install

or

$ crystal deps

Usage

Begin by requiring the Markov module:

require "markov"

Basic -- Hello Markov

A classic Markov text generator. This example will work well for small (array-sized) data sets.

NOTE: Markov::Chain is a generic type which contains, receives and generates elements of LinkType.

We'll start with the sample text:

example_string = "how much wood would a woodchuck chuck if a woodchuck could chuck wood"

There are several Markov::Chain constructors to choose from. The simplest one takes in a LinkType array of elements as sample and a seed of LinkType. seed is the element in sample you want to start the chain with. If not provided, a random element will be chosen.

example_arr = example_string.split(" ") #=> ["how","much","wood","would","a","woodchuck","chuck","if","a","woodchuck","could","chuck","wood"]
seed = example_arr[0] #=> "how"

example_chain = Markov::Chain(String).new sample: example_arr, seed: seed

Finally, we'll generate a probable sequence of elements with the Markov::Chain#generate method:

puts example_chain.generate(10)

Output:

["much", "wood", "would", "a", "woodchuck", "could", "chuck", "if", "a", "woodchuck"]

That's it!

If we wanted to get the elements one at a time, we could use the Markov::Chain#next method instead:

puts example_chain.next #=> "much"
puts example_chain.next #=> "wood"
puts example_chain.next #=> "would"

Advanced

This implementation was built for larger data sets, with asynchronous input in mind.

In this example, we will create a Markov::Chain which can generate realistic movie titles.

To begin, we instantiate a Markov::TransitionTable. A TransitionTable is a mechanism for training and implementing Markov processes.

example_table = Markov::TransitionTable(String).new

Markov::TransitionTable#add

Now we'll add a movie title using the Markov::TransitionTable#add method:

movie_one = %w(the great gatsby) # shortcut syntax for ["the","great","gatsby"]

movie_one.each do |word|
  example_table.add(word)
end

Markov::TransitionTable#add adds elements one at a time. At a deeper level, it's adding each new word to the previous word's Transition Matrix (Markov::TransitionMatrix).

Markov::TransitionTable#fill

For syntactic sugar, if we have an array of elements, we can avoid looping through and #add-ing them by using the Markov::TransitionTable#fill method instead:

movie_one = %w(the great gatsby) # shortcut syntax for ["the","great","gatsby"]

example_table.fill table_with: movie_one

Markov::TransitionTable#reset

A problem arises at this point:

movie_two = %w(great expectations)
example_table.fill table_with: movie_two

The above code sequentially adds each word to the TransitionTable. But The Great Gatsby and Great Expectations are two separate movie titles; the "Great" at the beginning of Great Expectations is not a probable transition from the "Gatsby" at the end of The Great Gatsby.

To solve this, use Markov::TransitionTable#reset. #reset clears the TransitionTable's last added key, allowing us to separate titles like so:

movie_one = %w(the great gatsby)
example_table.fill table_with: movie_one

example_table.reset
movie_two = %w(great expectations)
example_table.fill table_with: movie_two

example_table.reset
movie_three = %w(the great escape)
example_table.fill table_with: movie_three

Implementing the TransitionTable with a Markov::Chain

Finally, we can put the TransitionTable to use by passing it to a Markov::Chain constructor as transition_table:

example_chain = Markov::Chain(String).new transition_table: example_table, seed: "great"

Handling Dead Ends

With small and/or unique data sets, Markov chains are fallible to reaching dead ends. That is, they can often reach a point where there is nothing to transition to.

When this happens in the Markov module, Markov::Exceptions::EmptyTransitionMatrixException is raised.

For example:

dead_end_array = %w(some say the world will end in fire)
dead_end_chain = Markov::Chain(String).new sample: dead_end_array, seed: "fire"
# nothing comes after "fire", so the chain is at a dead end.
dead_end_chain.next # raises `EmptyTransitionMatrixException`

To prevent this, use the Markov::Chain#on_dead_end exception handler.

This method takes in a callback block with arguments of: the Markov::Chain's @transition_table, the Markov::Chain instance, and the EmptyTransitionMatrixException raised.

The block's return value of LinkType fills in as the next item in the chain.

dead_end_array = %w(some say the world will end in fire)
dead_end_chain = Markov::Chain(String).new sample: dead_end_array, seed: "fire"

dead_end_chain.on_dead_end do |transition_table, chain, exception|
  "some"
end

dead_end_chain.next #=> "some"
dead_end_chain.next #=> "say"
dead_end_chain.next #=> "the"

Contributing

  1. Fork it ( https://github.com/mccallofthewild/markov/fork )
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request

Contributors