Assignments for CS 216, Fall 2024

Suppose that you are developing an application for Gettysburg College and you need a data structure to store information about each student. It would be convenient to use the student ID numbers as indices in the data structure, but since the ID numbers consist of 8 digits you need to reserve capacity of 10⁸, even though you would only need a capacity of about 2700.

With a HashMap you get the convenience of using the student ID numbers as indices, while keeping the storage requirements reasonable. The idea is that the indices that the user provides are transformed/mapped internally by the data structure to indices that fit within the desired capacity.

Insertions in HashMap

Consider a HashMap with capacity m = 2700. Here is one way to ensure that student ID's map to the valid cells in the map:

HashMap<Integer, Student> map(2700);    // create a map of capacity 2700

           Key     Value
map.put(76544632, mickey);     // internally mickey is stored at index   76544632 % 2700 = 2332
map.put(67587658, minnie);     // internally minnie is stored at index   67587658 % 2700 = 1258
map.put(14742300, donald);     // internally donald is stored at index   14742300 % 2700 = 300
map.put(87648487, calvin);     // internally calvin is stored at index   87648487 % 2700 = 1087
...
map.put(14742300, garfield);   // donald is replaced by garfield (used the same index 14742300)

Essentially, each key, k, supplied by the user is transformed to an integer hash code, h, which is then mapped to a bucket index, i, into the HashMap as i = h % capacity, which ensures that the key given by the user maps into a valid index internally in the HashMap data structure. (This is only the simplest choice of hash function/mapping, but there other many other possibilities.)

Note that from the user's point of view mickey is stored at index 76544632, minnie is stored at index 67587658, etc. However, internally the map only has 2700 cells, so mickey, minnie, etc. are stored at completely different locations.

Handling Collisions on Insert

Unfortunately, it is possible for the hash function to map two different keys to the same cell within the map, in which case we say that a collision occurs. For example:

           Key     Value
map.put(76544632, mickey);     // internally mickey is stored at index   76544632 % 2700 = 2332
map.put(67587658, minnie);     // internally minnie is stored at index   67587658 % 2700 = 1258
...
map.put(34049332, jerry);      // internally jerry is stored at index   34049332 % 2700 = 2332 <- same as mickey
map.put(22571632, tom);        // internally tom is stored at index     22571632 % 2700 = 2332 <- same as mickey, jerry

In the example above mickey, jerry, and tom have different keys/indices from the user's point of view, but are forced to share cell 2332 inside the hash map.

Note that the following is not a collision:

map.put(67587658, donald);     // used exactly the same key (67587658) as minnie
                               // so donald replaces minnie at index 67587658

Here are possible ways to handle collisions.

Chaining -- each cell in the map (each bucket) is a linked list and values with the same key are stored in the same linked list; in the above example mickey, jerry, and tom will all be stored in the same cell
```
2330 |*-|-->//
2331 |*-|-->//
2332 |*-|-->|76544632:mickey|-->|34049332:jerry|-->|22571632:tom|-->//
2333 |*-|-->//
2334 |*-|-->//
```
Note that we need to keep a whole entry with both the original key and the corresponding value, i.e. |76544632:mickey|, |34049332:jerry|, etc. The user of the hash map has no knowledge of index 2332 -- this is internal to the hash map.
Linear Probing -- each cell in the map (each bucket) stores exactly one record; when a collision occurs, we look for the first empty slot, i.e. for the first hash index i starting from the computed hash code, h, i.e. we try

i = h, h+1, h+2, ..., h+capacity (with %capacity for wrap around)
```
2330 |    -empty-      |
2331 |    -empty-      |
2332 | 76544632:mickey |
2333 | 34049332:jerry  |
2334 | 22571632:tom    |
2335 |    -empty-      |
```
NOTE: in Linear Probing we use the special value EMPTY to indicate that the cell is available
NOTE: if the load factor is exceeded we expand and rehash the HashMap

CS 216
Data Structures and Algorithms

Assignment 4 Example

Insertions in HashMap

Handling Collisions on Insert

CS 216 Data Structures and Algorithms

Assignment 4 Example

Insertions in HashMap

Handling Collisions on Insert

CS 216
Data Structures and Algorithms