2

I have a question, what algorithm can I use to generate a set of 2^21 random unique numbers in Java? is there another library in java that generates random numbers aside math.random?

Thanks in advance!

Ricardo
  • 169
  • 2
  • 2
  • 11
  • from what sized sample space? Greater than 2^21 obviously... – Mitch Wheat May 01 '11 at 07:59
  • If you need the set materialised and unique, then a shuffle of the numbers from 1 to 2^21 will work (gonna need plenty of memory!) – Mitch Wheat May 01 '11 at 08:03
  • 3
    If they're unique, they aren't really random, now are they? As you go through your list, you are able to predict the next number with increasing accuracy. If your range is limited by the size of the set, on the last one, you have 100% possible accuracy prediction, which REALLY isn't random! – corsiKa May 01 '11 at 08:07
  • 1
    Cross post http://forums.oracle.com/forums/thread.jspa?threadID=2215602&tstart=0 The simplest solution is to shuffle 2^21 values which will ensure uniqueness. See my solution posted on the Oracle forums. – Peter Lawrey May 01 '11 at 08:17
  • @Mitch Wheat, you need 8 MB, just over 1 cents worth. ;) – Peter Lawrey May 01 '11 at 08:20
  • @glowcoder, I would describe them as unique numbers in a random order. – Peter Lawrey May 01 '11 at 08:24
  • @glowcoder Although uniqueness leeks information, if the numbers are from a space greater than 0-2^21 than you still can have randomness... – Philip JF May 01 '11 at 10:29

3 Answers3

3

The key question is what do you meen by "numbers"?

Generally though, this problem can be solved by 'generate a list of numbers, put that into random order, take the first 2^21 of them' The first part is trivial The second part can be solved by the fisher yates algorithm The real problem is if you want to use a very large space of numbers. Then you need a lazy solution

Here is what I would do: Use a data structure to represent the list that outwardly looks like an array, but internally is represented using a hashtable based sparse array representation. Furthermore, if when attempting to read from a cell, if you don't hit something in the hash, simply return the index for that cell.

Your modified fisher yates stops at 2^21 for the index variable and uses a random variable j between index and the "length" of the array (number of integers)

This lazy approach generates a random non-repeating list of any kind of number in O(n) time and O(n) space where n is the length of the array you are trying to generate. That is the best you can do.

For an explanation of Fisher-Yates http://en.wikipedia.org/wiki/Fisher-Yates_shuffle

Philip JF
  • 28,199
  • 5
  • 70
  • 77
2

Take a look at Apache Commons Math API's random pacakge http://commons.apache.org/math/userguide/random.html

gouki
  • 4,382
  • 3
  • 20
  • 20
0

You could use Format-Preserving Encryption to encrypt a counter. Your counter just goes from 0 upwards, and the encryption uses a key of your choice to turn it into a seemingly random value of whatever radix and width you want.

Block ciphers normally have a fixed block size of e.g. 64 or 128 bits. But Format-Preserving Encryption allows you to take a standard cipher like AES and make a smaller-width cipher, of whatever radix and width you want (e.g. radix 2, width 21 for the parameters of the question), with an algorithm which is still cryptographically robust.

It is guaranteed to never have collisions (because cryptographic algorithms create a 1:1 mapping). It is also reversible (a 2-way mapping), so you can take the resulting number and get back to the counter value you started with.

AES-FFX is one proposed standard method to achieve this.

Craig McQueen
  • 41,871
  • 30
  • 130
  • 181