When using hashmaps, should hash collisions be avoided or encouraged?

Question

If I am writing a Employee class which will be saved in some hash based collection like hashmap/hashset. Should the Employee object int hashcode() implementation avoid or encourage hash collisions? Especially for performace w.r.t insertion and retrieval

Avoid them, more collision you have, the slower the use will be. In fact , like the `equals` you may use all fields, so there is no 'try to do or not colision', just just all fields — azro, Dec 12 '20 at 13:20
Unless your assignment specifically requires you to write your own hashCode algorithm, you’re better off using existing ones. For instance, Employee.hashCode might simply do `return this.employeeID.hashCode();` or `return Objects.hash(firstName, lastName, jobTitle);`. Just remember that it needs to be based on the same attributes that `equals(Object)` checks. — VGR, Dec 12 '20 at 13:48
You can compute the hash on a subset of the attributes that `equals` uses; this just means possible hash collisions on objects that are not equal. This is perfectly legitimate. Generally it's not a good idea, but in some cases there could be reasons (e.g., performance, in cases where you judge the collisions due to omitted fields unlikely). — a guest, Dec 12 '20 at 13:51
Does this answer your question? [What Exactly is Hash Collision](https://stackoverflow.com/questions/45795637/what-exactly-is-hash-collision) — geocodezip, Dec 12 '20 at 14:38

a guest · Answer 1 · 2020-12-12T14:24:24.970

Avoided. The point of a good hashing algorithm is to distribute the hashed objects uniformly over the available hash buckets.

Consider the degenerate case:

   int hashCode() { return 0; }

This fulfills all technical requirements for a hashCode implementation. It also absolutely ensures collisions. The result is that everything goes into the same bucket, and (in typical implementations) your hash map has the same performance as an array list.

On the other hand, 'a few' collisions is not going to be noticeable in most cases. You just don't want to have 'too many' entries in any one bucket.

In your particular case, an Employee record probably has a unique 'employee id'. You could use that as the sole content of the hashCode. Even better if the id is already an integer. You'd get collisions in the map (not the hashCode result) for cases where the id modulo map size was the same, but that's unavoidable anyway.

When using hashmaps, should hash collisions be avoided or encouraged?

1 Answers1