Gettysburg College

CS 216
Data Structures and Algorithms

Fall 2024

Assignment 7

Due: Mon, Oct 21, by 11:59pm
  • method stubs: HNode|HNodeComparator|HuffmanTree|HuffmanZip.java
  • junit tests: HuffmanTreeTest.java
  • coverage-HuffmanTree.pdf
  • api-HuffmanTree.pdf
  • run-hzip.png screenshot of Terminal as shown here:
    1. in main() of HuffmanZip print "hello, world"
    2. ensure code (stubs) compile without errors
    3. run HuffmanZip inside Eclipse untile Eclise shows "hello, world"
    4. make sure to do step 3. before step 5.
    5. follow instructions below on how to create and run executable jar
Due: Thu, Oct 24, by 11:59pm
  • HNode|HNodeComparator|HuffmanTree|HuffmanZip.java
  • HuffmanTreeTest.java
  • coverage-HuffmanTree.pdf
  • run-jar.png screenshot of Terminal as shown here:

Description

Implement the Huffman Tree Data Structure to support the Huffman Encoding Algorithm. A discussion of the main ideas is given in Section 5.2 of Algorithms by Dasgupta et al. This data structure and algorithm will be used to encode (compress) large text files. The Huffman Tree is a binary tree where each node stores a subset of characters and their cumulative frequency.

Preliminaries

HNode

HNode is similar to HNode in a binary search tree and has the following data members: Create class HNode with the following methods:

HNode(char c, int f)

Creates a leaf node representing the given character and its frequency.
HNode(HNode left, HNode right)

Creates a node with the given left and right children.
boolean isLeaf()

Returns true if the node is a leaf.
boolean contains(char ch)

Returns true if the node contains the given character (no loops; use the relevant String method(s)).
char getSymbol()

Returns the symbol stored in the node. If the node is not a leaf, returns the null character '\0'.
String toString()

Returns a string representation of the node in the format  symbols:frequency . For example  a:20  or  cdah:90 .

HNodeComparator

Create class HNodeComparator that compares two HNode objects based on their frequencies. When the frequencies are the same, compare the symbol sets lexicographically (i.e. dictionary order; use method compareTo of String class).

This comparator is used for constructing a Priority Queue as part of the algorithm for building a HuffmanTree. This is similar to the Binary Search Tree which also needed a comparator in the constructor.

HuffmanTree

Create a class HuffmanTree with the following methods:

data members

The Huffman Tree has only one data member, which is the root of the tree.
HuffmanTree(TreeMap<Character, Integer> frequencies)

Builds a Huffman Tree from the given characters and their corresponding frequencies. Look for a relevant method of the map that lets you get an iterable collection.

We are using TreeMap here, which is a hash map that offers a consistent (sorted) traversal of its keys/entries, which in turn ensures that we always get the same Huffman Tree.

Building the tree works as follows: Create HNode foreach Entry and store it in a Priority Queue. Repeatedly pop two HNodes, merge them into a new HNode and put the new node in the queue. Stop when the queue has only one node -- that node is the root of the tree.

String encodeLoop(char symbol)

Returns the binary encoding of the given symbol as a string of '0' and '1' characters (it is assumed that the symbol is in the tree).
String encode(char symbol)

Returns the binary encoding of the given symbol as a string of '0' and '1' characters (it is assumed that the symbol is in the tree)

See method encode(char,HNode).

String encode(char symbol, HNode curr)

(recursive) Returns the binary encoding of the given symbol as a string of '0' and '1' characters starting at the given node.

It is assumed that the symbol is in the tree. What is a good/convenient return value for a leaf node?.

char decode(String code)

Returns the symbol that corresponds to the given code (or the null character '\0' if this is not a valid code).
void writeCode(char symbol, BitOutputStream stream) throws IOException

Writes the individual bits of the binary encoding of the given symbol to the given stream (it is assumed that the symbol is in the tree).

This is similar to the method encodeLoop(...) but here the bits 0/1 are written to the given stream, instead of being appended to a String.

Do not try/catch for file related exceptions since it clutters the code. Applies to all relevant the methods below.

char readCode(BitInputStream stream) throws IOException

Reads from the given stream the individual bits of the binary encoding of the next symbol and returns the corresponding character; (or the null character '\0' if the bits in the stream did not produce a symbol).

This is similar to the method decode(String) but here the bits 0/1 come from the given stream, not from a String.

JUnit Tests

Create class HuffmanTreeTest that shows evidence of thorough testing with the following methods:

Do not test class HuffmanZip with JUnit. This will be done in the terminal by actually running the application to compress a large file (see Section "Running from Command Line").

HuffmanZip

Create class HuffmanZip that allows the user to encode and decode a text file using the Huffman Encoding Algorithm. The data structures to consider in your implementation are:

Class HuffmanZip must have only static members. Below are the required methods for this class, but consider adding additional (private) methods - the guiding principle is one loop per method:

void encode(String filename) throws IOException

Encodes the text file with the given name using the Huffman Encoding Algorithm.

Put the .hz extension to the name of the encoded/compressed file. For example:

war-and-peace.txt    becomes   war-and-peace.txt.hz

Given the name of a text file the method produces as output a binary file as follows:

  1. read the given text file one character at time to build a map of character frequencies
  2. build the Huffman Tree
  3. save the map of frequencies to the binary file
  4. again read the given text file one character at time and use the Huffman Tree to write the binary code of each character to the binary file
  5. (see below for reading/writing regular and binary files)
For example:
wap.txt:              The Project Gutenberg EBook of War and Peace...    [the  text  input  file]
wap.txt.hz:  *********01010010010101010101010100010001010101111101...    [the binary output file]
             |the map||the binary codes of T,h,e, ,P,r,o,j,e,c,t, ...
void decode(String filename) throws IOException

Decodes the text file with the given name using the Huffman Encoding Algorithm.

Put the .huz extension to the name of the decoded/text file. For example:

war-and-peace.txt.hz    becomes   war-and-peace.txt.huz

Given the name of a binary file the method produces as output the original text file as follows:

  1. read the map from the binary file and build the Huffman Tree
  2. use the Huffman Tree to extract each character from the binary file and immediately write the character to the text file
  3. (see below for reading/writing regular and binary files)
For example:
             |the map||the binary codes of T,h,e, ,P,r,o,j,e,c,t, ...
wap.txt.hz:  *********01010010010101010101010100010001010101111101...    [the binary input file]
wap.txt.huz:          The Project Gutenberg EBook of War and Peace...    [the  text output file]
the standard main method

This is the standard main method. See section Test Files for the files to download, the download location, and how to check the file sizes.

Initially, inside main simply run the relevant method you want to test/execute with a fixed file name. For example:

encode("tlc-logic.txt");       // encode/compress it
decoded("tlc-logic.txt.hz");   // decode/decompress it

Make sure to run HuffmanZip at last once to ensure that it works with the hardcoded values.

Then change the main to use its command-line parameters (the bolded words below) which are stored in the String[] parameter args of the main method:

  • the first cell of args will contain either the string "-encode" or the string "-decode"
  • the second cell of args will contain the name of the file

Eventually, it should be possible to run your program from the command line as shown below in Section Executable JAR.

Reading/Writing Text Files

Read and write the regular/uncompressed text files (.txt, .huz) one byte/character at a time. There are a number of way to accomplish this, but for this assignment use the following Java classes: Don't forget to close the streams.

Reading/Writing Compressed Files

Read and write the compressed text files (.hz) one bit at a time. Download in your project the following files:

BitOutputStream.java , BitInputStream.java

Here is the API:

These classes allow you to read from or write to the stream a data structure as one whole object using methods:

readObject(), writeObject(Object)

Use these methods to read/write the TreeMap from/to the binary input/output files.

Test Files

To test your code download the following files in the HuffmaZip/ project folder (not in src/, not in bin/).

In the left panel in Eclipse click on the project name (HuffmanZip) and hit F5, i.e. refresh the project - the .txt files should show up as part of the project.

To check your work, open a terminal and go to the main project folder (HuffmanZip) and list the folder contents - the 5-th column shows the file size in bytes:


cd   Desktop/cs216/HuffmanZip            (make sure you are in project folder)             

ls  -l  (MacOS)
dir     (WinOS)

---------- x xxxxxxx xxxx     140 xxx xx xx:xx tlc-logic.txt
---------- x xxxxxxx xxxx     668 xxx xx xx:xx tlc-logic.txt.hz
---------- x xxxxxxx xxxx 3288707 xxx xx xx:xx war-and-peace.txt
---------- x xxxxxxx xxxx 1881432 xxx xx xx:xx war-and-peace.txt.hz
                             ^
                             |
                         size/bytes

Executable JAR

Create an executable JAR file that can be run as a standalone program. Follow steps 1-4 described here:

Create Executable JAR in Eclipse

Make sure to run HuffmanZip at least one time before the next step. It is enough to print a simple message in the main method. Without this step there won't be a Launch Configuration.

In Step 4 choose HuffmanZip under Launch Configuration and Browse to the project's main folder (HuffmanZip/) and save the jar under the name hzip.jar .

The program can now be run in the terminal as follows (copy the full line):

cd   path/to/project/HuffmanZip    (go to correct project folder)
java  -jar  hzip.jar  -encode  war-and-peace.txt        (produces war-and-peace.txt.hz)

java  -jar  hzip.jar  -decode  war-and-peace.txt.hz     (produces war-and-peace.txt.huz)
If this doesn't work try replacing java with the full path which varies based on your installation. Try the following:

but replace javaw with java
if path is too long, hover over it with the mouse
Check the file sizes as shown above. Here is a sample session:
Turn in the same screenshot of your Terminal.

What to turn in

Upload the .java, .pdf, and .png files in the Moodle dropbox. (Do NOT upload .html files.)