Gettysburg College

CS 216
Data Structures and Algorithms

Fall 2024

Assignment 4 Example

Suppose that you are developing an application for Gettysburg College and you need a data structure to store information about each student. It would be convenient to use the student ID numbers as indices in the data structure, but since the ID numbers consist of 8 digits you need to reserve capacity of 108, even though you would only need a capacity of about 2700.

With a HashMap you get the convenience of using the student ID numbers as indices, while keeping the storage requirements reasonable. The idea is that the indices that the user provides are transformed/mapped internally by the data structure to indices that fit within the desired capacity.

Insertions in HashMap

Consider a HashMap with capacity m = 2700. Here is one way to ensure that student ID's map to the valid cells in the map:

HashMap<Integer, Student> map(2700);    // create a map of capacity 2700

           Key     Value
map.put(76544632, mickey);     // internally mickey is stored at index   76544632 % 2700 = 2332
map.put(67587658, minnie);     // internally minnie is stored at index   67587658 % 2700 = 1258
map.put(14742300, donald);     // internally donald is stored at index   14742300 % 2700 = 300
map.put(87648487, calvin);     // internally calvin is stored at index   87648487 % 2700 = 1087
...
map.put(14742300, garfield);   // donald is replaced by garfield (used the same index 14742300)
Essentially, each key, k, supplied by the user is transformed to an integer hash code, h, which is then mapped to a bucket index, i, into the HashMap as i = h % capacity, which ensures that the key given by the user maps into a valid index internally in the HashMap data structure. (This is only the simplest choice of hash function/mapping, but there other many other possibilities.)

Note that from the user's point of view mickey is stored at index 76544632, minnie is stored at index 67587658, etc. However, internally the map only has 2700 cells, so mickey, minnie, etc. are stored at completely different locations.

Handling Collisions on Insert

Unfortunately, it is possible for the hash function to map two different keys to the same cell within the map, in which case we say that a collision occurs. For example:

           Key     Value
map.put(76544632, mickey);     // internally mickey is stored at index   76544632 % 2700 = 2332
map.put(67587658, minnie);     // internally minnie is stored at index   67587658 % 2700 = 1258
...
map.put(34049332, jerry);      // internally jerry is stored at index   34049332 % 2700 = 2332 <- same as mickey
map.put(22571632, tom);        // internally tom is stored at index     22571632 % 2700 = 2332 <- same as mickey, jerry
In the example above mickey, jerry, and tom have different keys/indices from the user's point of view, but are forced to share cell 2332 inside the hash map.

Note that the following is not a collision:

map.put(67587658, donald);     // used exactly the same key (67587658) as minnie
                               // so donald replaces minnie at index 67587658
Here are possible ways to handle collisions.