4

I understand that the order of elements in a HashSet is supposed to be arbitrary. But out of curiosity, could anyone tell me exactly how the order is determined?

I noticed that when I insert two elements (say A and B), the order would come out A, B, then re-executing the same code again would give me B, A, then re-excecuting it the third time would give me A, B.

I mean, that's kind of un-deterministic, and a bit weird.

Ed Marty
  • 39,590
  • 19
  • 103
  • 156
One Two Three
  • 22,327
  • 24
  • 73
  • 114
  • 1
    The "order" is based on(as the name suggests) ```hashCode()``` of the objects. In two executions, same object can have different hashcodes. – NeplatnyUdaj Jan 21 '14 at 15:28
  • http://stackoverflow.com/questions/9364134/what-hashing-function-does-java-use-to-implement-hashtable-class – C.B. Jan 21 '14 at 15:28
  • 1
    the code for the class in question is available, read it, then come back with a specific question about how it works. –  Jan 21 '14 at 15:28
  • 1
    The order can also change as the HashSet grows. Suppose A and B are initially in the same bucket, and their order depends on the order in which they were added. Later, the HashSet gets big enough to need to reorganize itself, and in the new organization, with more buckets, A and B are in different buckets. Now their order is controlled by their hash codes. – Patricia Shanahan Jan 21 '14 at 15:33

4 Answers4

4

The order is determined by the Hashing algorithm used within the Hash Map/Set, the exact settings of that Map and the Hashcodes of the objects.

If your objects have consistent hashcodes over multiple runs (Strings for example) and are placed in the same order into a map with the same settings then in general they would come out in the same order each time. If they don't then they won't.

The source for HashMap can be seen here: http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/HashMap.java

In fact an interesting quote from that source is:

This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.

So not only may the order be different each time your program runs, but in fact the API itself makes no guarantee that the order will remain constant even inside one run of the program!

"Un-deterministic and a bit weird" is a good description of ordering of a HashMap - and is actually pretty much what the docs say. If you want ordering use either LinkedHashMap or TreeMap. If you don't want ordering then don't worry about it, by having the ordering being effectively random HashMap is giving you extremely fast responses from the methods who's behavior it does guarantee!

Tim B
  • 40,716
  • 16
  • 83
  • 128
  • I would highly recommend anyone seriously wondering about this to just take a peek at the source code. HashMap and HashSet are somewhat complicated, there are multiple reasons the order is inconsistent. Looking at the code is the best way to understand it better. – Radiodef Jan 21 '14 at 15:39
3

In principle there are two contributing factors:

  1. Hashcode of your keys might be non-deterministic, this will be the case when you use default hashCode implementation, which relies on memory location

  2. HashSet itself can be non deterministic, take a look at HashMap.initHashSeedAsNeeded (HashSet uses HashMap in standard Oracle SDK as underlying datastructure), depending on some factors it can use sun.misc.Hashing.randomHashSeed(this) to initialize hashSeed field which is then used when computing hashCode of a key

Randomization can be important to achieve probabilistic performance guaranties. This is what javadoc says for hashSeed:

/** * A randomizing value associated with this instance that is applied to
* hash code of keys to make hash collisions harder to find. If 0 then
* alternative hashing is disabled. */

hgrey
  • 3,033
  • 17
  • 21
1

The order will not change (in practice) unless you add / remove something to your HashSet.

The order is based on the internal hashtable buckets. And that depends on both the hashCode() of an object and the size of the hashtable.

Simplified example:

A's hashcode is 10, B's hashCode is 11. The hastable has size 2. The mapping from hash code to position in hashtable would be purely based on the last bit, i.e. even hashcodes go into table[0], odd ones into table[1].

table[0] = { A }
table[1] = { B }

Iterating over those values would most likely be A, B now. And that result should be reproducible each time as long as table size stays the same.

Adding a third element C with hashCode 12 would (when not resizing the table) add it to bucket #0 as well.

table[0] = { A, C }
table[1] = { B }

So your iteration would be A, C, B. Or depending in whether you inserted A before C: C, A, B

Adding elements will in practice resize the table and re-hash using an adjusted mapping. E.g. table size would be doubled and the last 2 bits could be used to determine the bucket

table[0] = { C }
table[1] = {   }
table[2] = { A }
table[3] = { B }

And the order would have changed completely by adding just 1 element.

zapl
  • 63,179
  • 10
  • 123
  • 154
0

Only HashSet keeps and garatuees no order, even no arbitrary order (Why can hashCode() return the same value for different objects in Java?)! Dont force an order there! Serialize and Deserialize them and the original order will be destroyed.

Use LinkedHashSet instead of HashSet.

Community
  • 1
  • 1
Grim
  • 1,938
  • 10
  • 56
  • 123
  • 1
    The OP didn't ask how to maintain order. The OP asked why it is. – Radiodef Jan 21 '14 at 15:41
  • The OP asked how to `exactly determine` order. First paragraph, last line. – Grim Jan 21 '14 at 15:43
  • 1
    No, the OP asked how the order of a HashSet is determined. The OP already knows HashSet has arbitrary ordering. The first sentence says so. The OP wants to know why it changes. – Radiodef Jan 21 '14 at 15:45
  • 1
    And i say: Respect the Documentation more than the implementation. Spec before Impl. Maybe the "why" changes, maybe the "why" does not change. Makes no difference. Trust only in spec and dont ask why. Theoretically the same hashcode can be given twice, in this case there is no arbitrary order! – Grim Jan 21 '14 at 15:53