With a HashMap
you get the convenience of using the student ID numbers as indices, while keeping the storage requirements reasonable. The idea is that the indices that the user provides are transformed/mapped internally by the data structure to indices that fit within the desired capacity.
Consider a HashMap
with capacity m = 2700
. Here is one way to ensure that student ID's map to the valid cells in the map:
Essentially, each key, k, supplied by the user is transformed to an integer hash code, h, which is then mapped to a bucket index, i, into the HashMap asHashMap<Integer, Student> map(2700); // create a map of capacity 2700 Key Value map.put(76544632, mickey); // internally mickey is stored at index 76544632 % 2700 = 2332 map.put(67587658, minnie); // internally minnie is stored at index 67587658 % 2700 = 1258 map.put(14742300, donald); // internally donald is stored at index 14742300 % 2700 = 300 map.put(87648487, calvin); // internally calvin is stored at index 87648487 % 2700 = 1087 ... map.put(14742300, garfield); // donald is replaced by garfield (used the same index 14742300)
i = h % capacity
, which ensures that the key given by the user maps into a valid index internally in the HashMap
data structure. (This is only the simplest choice of hash function/mapping, but there other many other possibilities.)
Note that from the user's point of view mickey
is stored at index 76544632
, minnie
is stored at index 67587658
, etc. However, internally the map only has 2700
cells, so mickey
, minnie
, etc. are stored at completely different locations.
Unfortunately, it is possible for the hash function to map two different keys to the same cell within the map, in which case we say that a collision occurs. For example:
In the example aboveKey Value map.put(76544632, mickey); // internally mickey is stored at index 76544632 % 2700 = 2332 map.put(67587658, minnie); // internally minnie is stored at index 67587658 % 2700 = 1258 ... map.put(34049332, jerry); // internally jerry is stored at index 34049332 % 2700 = 2332 <- same as mickey map.put(22571632, tom); // internally tom is stored at index 22571632 % 2700 = 2332 <- same as mickey, jerry
mickey
, jerry
, and tom
have different keys/indices from the user's point of view, but are forced to share cell 2332 inside the hash map.
Note that the following is not a collision:
Here are possible ways to handle collisions.map.put(67587658, donald); // used exactly the same key (67587658) as minnie // so donald replaces minnie at index 67587658
mickey
, jerry
, and tom
will all be stored in the same cell
2330 |*-|-->// 2331 |*-|-->// 2332 |*-|-->|76544632:mickey|-->|34049332:jerry|-->|22571632:tom|-->// 2333 |*-|-->// 2334 |*-|-->//Note that we need to keep a whole entry with both the original
key
and the corresponding value
, i.e. |76544632:mickey|
, |34049332:jerry|
, etc. The user of the hash map has no knowledge of index 2332 -- this is internal to the hash map.
i
=h
,h+1
,h+2
, ...,h+capacity
(with%capacity
for wrap around)
2330 | -empty- | 2331 | -empty- | 2332 | 76544632:mickey | 2333 | 34049332:jerry | 2334 | 22571632:tom | 2335 | -empty- |
EMPTY
to indicate that the cell is availableHashMap