Description
Implement the
Huffman Tree Data Structure to support the
Huffman Encoding Algorithm. A discussion of the main ideas is given in Section 5.2 of
Algorithms by Dasgupta et al. This data structure and algorithm will be used to encode (compress) large text files. The
Huffman Tree is a
binary tree where each node stores a subset of characters and their cumulative frequency.
Preliminaries
- Create project HuffmanZip
- All classes for this assignment should be put in separate files
- See Section JUnit Tests below for details on how test
- reading: Ch. 5, pp. 152-155 | example: PDF
HNode
HNode
is similar to
HNode
in a
binary search tree and has the following data members:
- pointers to the left and right children of this node
- (single data member) the symbols that are stored in the leaves of the subtree represented by this node
- the (cumulative) frequency of the symbols stored in this node
- [ this will not be a generic class, since the type of data items is fixed ]
Create class
HNode
with the following methods:
HNode(char c, int f)
Creates a leaf node representing the given character and its frequency.
|
HNode(HNode left, HNode right)
Creates a node with the given left and right children.
|
boolean isLeaf()
Returns true if the node is a leaf.
|
boolean contains(char ch)
Returns true if the node contains the given character (no loops; use the relevant String method(s)).
|
char getSymbol()
Returns the symbol stored in the node. If the node is not a leaf, returns the null character '\0' .
|
String toString()
Returns a string representation of the node in the format symbols:frequency . For example a:20 or cdah:90 .
|
HNodeComparator
Create class
HNodeComparator
that compares two
HNode
objects based on their
frequencies
. When the frequencies are the same, compare the
symbol sets
lexicographically (i.e. dictionary order; use method
compareTo
of
String
class).
This comparator is used for constructing a
Priority Queue
as part
of the algorithm for building a
HuffmanTree
. This is similar to
the
Binary Search Tree
which also needed a
comparator in
the constructor.
HuffmanTree
Create a class
HuffmanTree
with the following methods:
data members
The Huffman Tree has only one data member, which is the root of the tree.
|
HuffmanTree(TreeMap<Character, Integer> frequencies)
Builds a Huffman Tree from the given characters and their corresponding frequencies. Look for a relevant method of the map that lets you get an iterable collection.
We are using TreeMap here, which is a hash map that offers a consistent (sorted) traversal of its keys/entries, which in turn ensures that we always get the same Huffman Tree.
Building the tree works as follows: Create HNode foreach Entry and store it in a Priority Queue . Repeatedly pop two HNode s, merge them into a new HNode and put the new node in the queue. Stop when the queue has only one node -- that node is the root of the tree.
|
String encodeLoop(char symbol)
Returns the binary encoding of the given symbol as a string of '0' and '1' characters (it is assumed that the symbol is in the tree).
|
String encode(char symbol)
Returns the binary encoding of the given symbol as a string of '0' and '1' characters (it is assumed that the symbol is in the tree)
See method encode(char,HNode) .
|
String encode(char symbol, HNode curr)
(recursive) Returns the binary encoding of the given symbol as a string of '0' and '1' characters starting at the given node.
It is assumed that the symbol is in the tree. What is a good/convenient return value for
a leaf node?.
|
char decode(String code)
Returns the symbol that corresponds to the given code (or the null character '\0' if this is not a valid code).
|
void writeCode(char symbol, BitOutputStream stream) throws IOException
Writes the individual bits of the binary encoding of the given symbol to the given stream (it is assumed that the symbol is in the tree).
This is similar to the method encodeLoop(...) but here the bits 0/1 are written to the given stream , instead of being appended to a String .
Do not try/catch for file related exceptions since it clutters the code. Applies to all relevant the methods below.
|
char readCode(BitInputStream stream) throws IOException
Reads from the given stream the individual bits of the binary encoding of the next symbol and returns the corresponding character; (or the null character '\0' if the bits in the stream did not produce a symbol).
This is similar to the method decode(String) but here the bits 0/1 come from the given stream , not from a String .
|
JUnit Tests
Create class
HuffmanTreeTest
that shows evidence of thorough testing with the following methods:
- the Tester will not have any data members and will have only two
@Test
methods
- make sure to put
@Test
in front each method of the Tester
- create a method
test_HuffmanTree()
:
- create a
TreeMap
and fill it with test data for at least 5 characters and corresponding frequencies (similar to the class example or the book, but create your own tree with your own frequencies)
- create the tree
- test methods
encode(char)
, encodeLoop(char)
, decode(String)
for each character in the tree, and include test cases for invalid inputs
- test methods
writeCode
and readCode
:
create appropriate stream, write the code for a symbol, nothing to assert, then close the stream
create appropriate stream, read a symbol, assert (how many things?), then close the stream
repeat the above but write two symbols, assert (how many things?)
- create method
test_HNode()
and test all HNode
methods:
- create a couple of leaf nodes and test all methods for one of the nodes; merge the nodes into a new parent node and test all methods for the parent
Do not test class
HuffmanZip
with JUnit. This will be done in the terminal by actually running the application to compress a large file (see Section "Running from Command Line").
HuffmanZip
Create class
HuffmanZip
that allows the user to encode and decode a text file using the
Huffman Encoding Algorithm. The data structures to consider in your implementation are:
- TreeMap: a hash map variant for counting the character frequencies
- PriorityQueue: for building the Huffman Tree; you could try to make the code work with your own Priority Queue data structure,
Binary Search Tree
, by implementing method removeMin()
Class
HuffmanZip
must have only static members. Below are the required methods for this class, but consider adding additional (private) methods - the guiding principle is
one loop per method:
void encode(String filename) throws IOException
Encodes the text file with the given name using the Huffman Encoding Algorithm.
Put the .hz extension to the name of the encoded/compressed file. For example:
war-and-peace.txt becomes war-and-peace.txt.hz
Given the name of a text file the method produces as output a binary file as follows:
- read the given text file one character at time to build a map of character frequencies
- build the Huffman Tree
- save the map of frequencies to the binary file
- again read the given text file one character at time and use the Huffman Tree to write the binary code of each character to the binary file
- (see below for reading/writing regular and binary files)
For example:
wap.txt: The Project Gutenberg EBook of War and Peace... [the text input file]
wap.txt.hz: *********01010010010101010101010100010001010101111101... [the binary output file]
|the map||the binary codes of T,h,e, ,P,r,o,j,e,c,t, ...
|
void decode(String filename) throws IOException
Decodes the text file with the given name using the Huffman Encoding Algorithm.
Put the .huz extension to the name of the decoded/text file. For example:
war-and-peace.txt.hz becomes war-and-peace.txt.huz
Given the name of a binary file the method produces as output the original text file as follows:
- read the map from the binary file and build the Huffman Tree
- use the Huffman Tree to extract each character from the binary file and immediately write the character to the text file
- (see below for reading/writing regular and binary files)
For example:
|the map||the binary codes of T,h,e, ,P,r,o,j,e,c,t, ...
wap.txt.hz: *********01010010010101010101010100010001010101111101... [the binary input file]
wap.txt.huz: The Project Gutenberg EBook of War and Peace... [the text output file]
|
the standard main method
This is the standard main method. See section Test Files for the files to download, the download location, and how to check the file sizes.
Initially, inside main simply run the relevant method you want to test/execute with a fixed file name. For example:
encode("tlc-logic.txt"); // encode/compress it
decoded("tlc-logic.txt.hz"); // decode/decompress it
Make sure to run HuffmanZip at last once to ensure that it works with the hardcoded values.
Then change the main to use its command-line parameters (the bolded words below) which are stored in the String[] parameter args of the main method:
- the first cell of
args will contain either the string "-encode" or the string "-decode"
- the second cell of
args will contain the name of the file
Eventually, it should be possible to run your program from the command line as shown below in Section Executable JAR.
|
Reading/Writing Text Files
Read and write the regular/uncompressed text files (
.txt,
.huz)
one byte/character at a time. There are a number of way to accomplish this, but for this assignment use the following Java classes:
- FileReader: use this class for reading character by character the text file to compress; see method
int read()
but note the need to typecast to char
- FileWriter: use this class for writing character by character the decoded text file; see method
void write(int b)
and note the no need to typecast to int
Don't forget to close the streams.
Reading/Writing Compressed Files
Read and write the compressed text files (
.hz)
one bit at a time. Download in your project the following files:
BitOutputStream.java ,
BitInputStream.java
Here is the API:
- BitOutputStream: use this class to write the frequency table and the individual bits to the encoded/compressed file; see methods
writeBit(int)
, writeObject(Object)
- BitInputStream: use this class to read the frequency table and the individual bits from the encoded/compressed file; see methods
readBit()
, readObject()
These classes allow you to read from or write to the stream a data structure as one whole object using methods:
readObject()
, writeObject(Object)
Use these methods to read/write the TreeMap from/to the binary input/output files.
Test Files
To test your code download the following files in the
HuffmaZip/
project folder (not in
src/
, not in
bin/
).
In the left panel in Eclipse click on the project name (
HuffmanZip
) and hit F5, i.e. refresh the project - the
.txt
files should show up as part of the project.
To check your work, open a terminal and go to the main project folder (
HuffmanZip
) and list the folder contents - the 5-th column shows the file size in bytes:
cd Desktop/cs216/HuffmanZip (make sure you are in project folder)
ls -l (MacOS)
dir (WinOS)
---------- x xxxxxxx xxxx 140 xxx xx xx:xx tlc-logic.txt
---------- x xxxxxxx xxxx 668 xxx xx xx:xx tlc-logic.txt.hz
---------- x xxxxxxx xxxx 3288707 xxx xx xx:xx war-and-peace.txt
---------- x xxxxxxx xxxx 1881432 xxx xx xx:xx war-and-peace.txt.hz
^
|
size/bytes
Executable JAR
Create an executable JAR file that can be run as a standalone program. Follow steps 1-4 described here:
Make sure to run
HuffmanZip
at least one time before the next step. It is enough to print a simple message in the
main
method. Without this step there won't be a
Launch Configuration
.
In Step 4 choose
HuffmanZip
under
Launch Configuration
and
Browse
to the project's main folder (
HuffmanZip/
) and save the jar under the name
hzip.jar
.
The program can now be run in the terminal as follows (copy the full line):
cd path/to/project/HuffmanZip (go to correct project folder)
java -jar hzip.jar -encode war-and-peace.txt (produces war-and-peace.txt.hz)
java -jar hzip.jar -decode war-and-peace.txt.hz (produces war-and-peace.txt.huz)
If this doesn't work try replacing
java
with the full path
which varies based on your installation. Try the following:
|
but replace javaw with java
if path is too long, hover over it with the mouse
|
Check the file sizes as shown above. Here is a sample session:
Turn in the same screenshot of your Terminal.
What to turn in
Upload the .java, .pdf, and .png files in the Moodle dropbox. (Do NOT upload .html files.)