CS 216 - Data Structures
Homework #5


Due: Wednesday 3/6 at the beginning of class

NOTE: This work is to be done in groups of 2-3.

Hash Tables: the Markov Word Processor

(Mind you, this word processor will process words as a food processor processes food!)

1.  Implement a chained hash table using Java generics according to the ChainedHashTable interface.  You may, but are not required to, make use of your linked list implementation.

2.  Using your hash table, implement a word-level order-1 Markov text generator according to section 15.3 of Programming Pearls, 2nd ed. by Jon Bentley, and chapter 3 of The Practice of Programming by Brian Kernighan and Rob Pike.  The order-2 Markov chain algorithm is described by Kernighan and Pike as follows:

set w1 and w2 to the first two words in the text
print w1 and w2
loop:
    randomly choose w3, one of the successors of prefix w1 w2 in the text
    print w3
    replace w1 and w2 by w2 and w3
    repeat loop

You may implement this order-2 algorithm if you wish, but the order-1 algorithm is given as follows:

set w1 to the first word in the text
print w1
loop:
    randomly choose w2, one of the successors of prefix w1 in the text
    print w2
    replace w1 by w2
    repeat loop

To implement this algorithm simply, let "word" here be interpreted as "token".  Read all words of a text from the standard input and build a hash table that associates strings with lists of strings.  For an order-1 implementation, each word is associated with a list of words that follow it in the text (including repeats).  For an order-2 implementation, a string with a space-separated pair of words is associated with a list of words that follow them in the text. (including repeats).  Once this table is built, it may be used to look up a list of successors for the random generation.

Hint: There is no termination to the pseudocode loops above.  When building your prefix-successor table before generation, add an extra pseudo-word "*END*" after reading the text.  This is not part of the input text.  Rather, consider it a pretend successor of the last n words for an order n algorithm.  When generating the text and you come to this pseudo-word "*END*", do not print it.  Instead, terminate the loop.

For example, consider the possible order 1 output (1, 2, 3) for Alice's Adventures in Wonderland by Lewis Carroll (Charles Lutwidge Dodgson).  Or consider order 1, order 2, and order 3 output of Fox in Socks by Dr. Seuss (Theodor Seuss Geisel).

Image created with http://www.wordle.net/