8

Random.NextDouble() (a Double from the range [0.0,1.0)) is sometimes multiplied with a large Int64 (let Int64 big = 9000000000L), and the result floored to obtain a random Int64 value larger than what can be obtained from Random.Next() (an Int32 from the range [0,Int32.MaxValue)).

Random r = new Random();
long big = 9000000000L;
long answer = (long) (r.NextDouble() * big);

It seems to me that the total number of unique values for a Double in the range [0.0, 1.0) provides an upper-bound for the number of unique Int64 it can possibly generate. A loose upper-bound, in fact, as many different Doubles will map to the same Int64.

Hence, I would like to know: what is the total number of unique values for a double in the range [0.0, 1.0)?

Even better if you can tell me what is the largest value "big" can take so that "answer" can be a value from the range [0,big), and whether the distribution of values of "answer" is uniform, assuming that Random.NextDouble() is uniform.

Edit: Double (double) here refers to IEEE 754 floating-point double, while Int64 (long) and Int32 (int) refer to 64-bit and 32-bit signed 2's complement respectively.


Inspired by this question: Generating 10 digits unique random number in java

While I used C#, this question is language-agnostic and is more about discrete mathematics than programming, but it bothers me not mainly from a sense of mathematical curiousity, but from that of a programmer wanting to use a formula only if it does what it is supposed to do and from a security viewpoint.

Community
  • 1
  • 1
blizpasta
  • 2,624
  • 3
  • 25
  • 31

6 Answers6

7

IEEE-754 has 11 bits of exponent, and 52 of mantissa. Assuming the sign bit is 0 (positive), If the exponent ranges from 0x001 to 0x3FE, the value is a standard floating point number between 0 and 1. The mantissa is interpreted with a leading 1 that is not stored. For each of these 0x3FE values for the exponent, there are 2^52 values of the mantissa. In addition, if the exponent is 0x000, the mantissa is interpreted without that leading value, but as if the exponent were 0x001, for a total of 0x3FF = 1023 exponents where all mantissas are valid. This is a total of 1023*2^52 values. In addition, negative 0 may count, which is one more value.

If random doubles were generated uniformly from all values, then this would indeed produce a bias when multiplying in order to generate an Int64. However, any reasonable random library will approximate a uniform distribution on [0, 1), and this will not be biased when turning it into an Int64. The largest value for "big" that will allow all integers in [0, big) to be produced is 2^53 -- the resolution of the 2^52 numbers between 1/2 and 1 is 2^(-53). However, it's often the case that these numbers are produced by dividing random integers by the integer range (usually Int32) meaning you can't actually produce more numbers than this source. Consider directly combining two Int32s instead, e.g. by shifting one by 32 bits and combining them into an Int64. (Though be wary -- the state space for the generator might only be 32 bits.)

wnoise
  • 9,764
  • 37
  • 47
  • 1
    Nice. Some random number generators have a 48-bit internal seed value to assure that all 32 bits which are provided are truly random. Combining two 32-bit random numbers has to be done carefully to avoid making some of the bits dependent on other bits within the 64-bit result. – S.Lott Mar 18 '11 at 12:06
  • @S.Lott: Can you expand on what care needs to be taken? Off the top of my head I'd assume that just concatenating two 32 bit numbers to form a 64 bit would be all that is needed... – Chris Mar 18 '11 at 12:50
  • 2
    @Chris: The issue is one of "independence" of pseudo-random numbers. An unexpected issue is that *n* different sequences with *n* different seeds will not yield perfectly independent *n* tuples of numbers from each sequence. You may not even want to use adjacent pairs from a single sequence. You have to use two values offset from each other within the sequence. That's why good RNG's have a "jumpahead" function so you can use a value paired with a value from the further ahead in the sequence. – S.Lott Mar 18 '11 at 14:00
  • @S.Lott: ah, yes. I did wonder about something along those lines but hadn't really thought about it in full. Cheers. – Chris Mar 18 '11 at 14:24
5

As a corollary to your question, I'll tell you that the Random C# generator uses internally a generator that "gives him" numbers between 0...Int32.MaxValue - 1. Then it divides the number by Int32.MaxValue (technically it multiplies by the inverse of that number) to return a double. So in C#, there are only Int32.MaxValue possible doubles returned (0...Int32.MaxValue - 1)

xanatos
  • 109,618
  • 12
  • 197
  • 280
  • Gabe (first) and you have made my question moot (even though I stated language agnostic, the code was clearly C#) because I had made a wrong assumption about the implementation of Random.NextDouble. I could "unmoot" it by adding an assumption but I think it's better to ask that in another question and let this question serve as a reminder about always validating your assumptions before using them to build other things with. – blizpasta Mar 18 '11 at 10:12
  • @Gabe :-) Your .NET-Fu is obviously stronger than mine :-) – xanatos Mar 18 '11 at 16:42
3

The IEEE754 is pretty clear on the precision of doubles:

http://en.wikipedia.org/wiki/IEEE_754-2008

You have 52 bits of precision plus an additional assumed bit.

You have exponents from -1022 to 1023, about 11 bits, including a sign.

The 64th bit is the overall sign for the number.

We'll ignore subnormalized numbers.

You're asking about exponents between -1022 and 0. This means you have about 10 of the available 11 bits of exponent available to you.

You have 52+1 bits of mantissa available.

This is about 62 bits of usable precision to represent 2**62 distinct values from

enter image description here

S.Lott
  • 384,516
  • 81
  • 508
  • 779
  • Nice. It could be a problem that they aren't evenly distributed. There are a lot more distinct values between 0 and 0.1 than between 0.9 and 1.0. – Jonas Elfström Mar 18 '11 at 11:58
  • @Jonas Elfström: "they aren't evenly distributed". Correct. It's not a "problem". It's the way floating-point works. – S.Lott Mar 18 '11 at 12:02
  • I was thinking in terms of picking one value from the discrete set of possible floats between 0 and 1.0 as a random number. If you somehow create an algorithm that randomly sets the exponent between -1022 and 0 and the mantissa between 1 and 2^52 you will get a random number that is exponentially more probable to be very low than to be over 0.1. I'm not arguing with you, just rambling along. – Jonas Elfström Mar 18 '11 at 12:22
1

@wnoise pretty much nailed it, but here's my two cents.

IEEE floats can be compared and incremented as integers with some restrictions, see this question for details. So, if we cast +0.0 and 1.0 to 64 bit integers, we get the number of steps between zero and one:

#include <iostream>

int main()
{
        double zero = 0.0;
        double one = 1.0;
        unsigned long long z = *reinterpret_cast<unsigned long long*>(&zero);
        unsigned long long o = *reinterpret_cast<unsigned long long*>(&one);
        std::cout << z << std::endl;
        std::cout << o << std::endl;
}

This gives me 0 and 4607182418800017408, respectively, i.e. there are 4607182418800017408 unique double values in the range [0.0, 1.0).

Community
  • 1
  • 1
kbjorklu
  • 1,338
  • 8
  • 10
0

The total number of unique values for a double in the range [0.0, 1.0) depends on the representation of double in the particular environment.

One of the most common representation is the one specified by IEEE 754. That format is e. g. mandated by Java and C# (see 1.3 Types and variables for the latter).

Oswald
  • 31,254
  • 3
  • 43
  • 68
  • 1
    *One* of the most common? What architecture developed in the last 30 years has had a `double` that *wasn't* IEEE 754? – Gabe Mar 18 '11 at 09:53
  • Sorry for not specifying IEEE 754 floating-point double value, especially when I stated that it is language-agnostic. Will add it to the question. – blizpasta Mar 18 '11 at 09:57
  • 1
    @Gabe I don't know of any. But I do not know all architectures. – Oswald Mar 18 '11 at 09:58
0

That depends on the implementation of double.There are implementations that do not allow denormalized values and leave out the leading one; determining the number of possible values here is easy:

  • there are a few "special" values (0, +0, -0, +∞, -∞, silent NaN, signalling NaN) that typically cost you one possible exponent
  • there is no way that shifting the mantissa and modifying the exponent gives an equivalent number

If your implementation allows denormalized values, determining this number becomes a bit more difficult, but I'd start by mapping the possible values in this representation to the equivalent representation with the fixed leading one (which will use one bit less in the mantissa); if you've found an appropriate mapping, this will be injective, and you have reduced the problem to a simpler one.

Simon Richter
  • 28,572
  • 1
  • 42
  • 64