4

I need a stored procedure to generate @n records, each with a unique random 8 digit number. This number must not be incremental and must not exist already in the table.

CREATE TABLE Codes
(
    ID UNIQUEIDENTIFIER PRIMARY KEY,
    Code INT,
    CONSTRAINT UQ_Code UNIQUE(Code) 
);

I can generate random numbers:

DECLARE @min int = 0,
        @max int = 99999999,
        @n INT = 100;

SELECT TOP (@n) FLOOR(CAST(CRYPT_GEN_RANDOM(4) AS BIGINT) / 4294967296 * ((@max - @min) + 1)) + @min
FROM   sys.all_objects s1 
              CROSS JOIN sys.all_objects s2;

But what I'm struggling to figure out is how to atomically generate and insert @n numbers into the [Codes] table whilst making provision to avoid collisions. Can this be done without a loop?

Update
By "must not be incremental" I simply meant that for each call to the SP, I don't want it to return "1, 2, 3, 4" or any other common pattern. I need to be able to consume all values so ultimately incremental values will exist but will be generated at different points in time rather than sequentially.

Giorgi Nakeuri
  • 35,155
  • 8
  • 47
  • 75
Paul Fleming
  • 24,238
  • 8
  • 76
  • 113
  • why UNIQUEIDENTIFIER PRIMARY KEY? seems like an int identity would be fine here – KM. Apr 22 '15 at 17:06
  • 1
    This is a classic "I want random numbers that aren't random" question. In a truly random sequence the number 42 could arrse a few hundred times in a row. A random sequence of _unique_ values is, presumably, a question of pulling numbers at random from a hat. You can't pull the same number twice, though you might happen to get all of the numbers in descending order. What do you _think_ "must not be incremental" means? What happens if your unique identifier key causes 13 to be inserted just after a preexisting 12 in the table? – HABO Apr 22 '15 at 17:13
  • @HABO I don't need true random numbers and incremental values are fine, but I don't want the set that I generate to be incremental. These numbers are assigned to individuals and I don't want a group of individuals to get 1, 2, 3, 4, etc. – Paul Fleming Apr 23 '15 at 07:54
  • @KM The PK type doesn't really matter her. It's just what we use to aid distribution. Identifiers are generated by the caller, not the DB. – Paul Fleming Apr 23 '15 at 07:54

3 Answers3

1

You can use cte with calculated codes, distinct and check if the Code already exists in your table:

;with cte_stream as (
    select
        floor(cast(crypt_gen_random(4) as bigint) / 4294967296 * ((@max - @min) + 1)) + @min as Code
    from sys.all_objects as s1 
        cross join sys.all_objects as s2;
)
insert into [Codes]
select distinct top (@n) s.Code
from cte_stream as s
where not exists (select * from [Codes] as c where c.Code = s.Code)

So distinct helps you to avoid collision between new codes and exists help you to avoid collisions with already existing codes in the [Codes] table, and order by newid() helps you to get random values from new codes

Roman Pekar
  • 107,110
  • 28
  • 195
  • 197
  • How do you handle the "must not be incremental" requirement stated by the OP? I assume that means that you would have to unrandomize a sequence that generates ..., 42, 43, ... . I'm not sure what it means with respect to the unique identifier used to order the values. – HABO Apr 22 '15 at 16:58
  • @HAMO this requirement also not really clear for me. If the numbers are really "random" they sholdn't be incremental. Of course I could add `order by newid()` to the query – Roman Pekar Apr 22 '15 at 17:05
  • @RomanPekar This appears to work but it takes an age (12s) to execute. I suspect this is due to the non-determinate nature of `CRYPT_GEN_RANDOM` and the unrestricted CTE. I need this to be efficient. – Paul Fleming Apr 23 '15 at 08:05
  • It's hard to optimize this without having your function to test. HAve you tried a loop solution, just to check how long it takes? – Roman Pekar Apr 23 '15 at 08:15
  • @RomanPekar I haven't tried a loop but duration isn't the only factory. I need to ensure the if this is happening concurrently, that there will be no collisions. – Paul Fleming Apr 23 '15 at 12:01
  • I'd suggest just to check the loop solution to see how fast you generating function works. Or at least try to create codes for all sys.objects * sys.objects to see how fast this would be – Roman Pekar Apr 24 '15 at 07:02
0

You can try this :

    DECLARE @min int = 0,
        @max int = 99999999,
        @n INT = 100;

Insert Into Codes(ID, Code)
SELECT G, RandomNumber
From (
    Select TOP (@n) NEWID() As G, FLOOR(CAST(CRYPT_GEN_RANDOM(4) AS BIGINT) / 4294967296 * ((@max - @min) + 1)) + @min As RandomNumber
    FROM   sys.all_objects s1 
                  CROSS JOIN sys.all_objects s2) AS Ran
Where RandomNumber Not IN (Select Code From Codes);

Edit: I added the where condition to avoid existing codes.

Majdi
  • 176
  • 10
0

You can try the following:

DECLARE @c INT = 10

;WITH cteDigits AS (SELECT d FROM (VALUES(1),(2),(3),(4),(5),(6),(7),(8),(9)) t(d)),
cteCombinations AS (SELECT @c + 100 AS rn
UNION ALL
SELECT rn + 1 FROM cteCombinations WHERE rn < @c + 99 + @c
)
SELECT 99999999 - ca1.d*10000000 
                - ca2.d*1000000 
                - ca3.d*100000 
                - ca4.d*10000 
                - ca5.d*1000
                - ca6.d*100 
                - ca7.d*10 
                - ca8.d AS Code
FROM cteCombinations c
CROSS APPLY(SELECT TOP 1 d FROM cteDigits d WHERE d < 9 AND d.d <> c.rn ORDER BY NEWID()) ca1
CROSS APPLY(SELECT TOP 1 d FROM cteDigits d WHERE d.d <> c.rn ORDER BY NEWID()) ca2
CROSS APPLY(SELECT TOP 1 d FROM cteDigits d WHERE d.d <> c.rn ORDER BY NEWID()) ca3
CROSS APPLY(SELECT TOP 1 d FROM cteDigits d WHERE d.d <> c.rn ORDER BY NEWID()) ca4
CROSS APPLY(SELECT TOP 1 d FROM cteDigits d WHERE d.d <> c.rn ORDER BY NEWID()) ca5
CROSS APPLY(SELECT TOP 1 d FROM cteDigits d WHERE d.d <> c.rn ORDER BY NEWID()) ca6
CROSS APPLY(SELECT TOP 1 d FROM cteDigits d WHERE d.d <> c.rn ORDER BY NEWID()) ca7
CROSS APPLY(SELECT TOP 1 d FROM cteDigits d WHERE d.d <> c.rn ORDER BY NEWID()) ca8
OPTION(MAXRECURSION 0)

Output:

Code
82520154
41164702
16701568
23744767
34570681
18158118
17548441
57261417
18272038
16576412

The idea is to generate random digits from 1 to 9 eight times

1, 4, 6, 2, 8, 9, 4, 3
5, 8, 1, 1, 5, 7, 5, 1
....

then just subtract formula from 99999999.

EDIT:

To avoid collisions you can do a left join:

Insert Into Codes
Select Top(@n) t.Code
From(
SELECT FLOOR(CAST(CRYPT_GEN_RANDOM(4) AS BIGINT) / 4294967296 * ((@max - @min) + 1)) + @min AS Code
FROM   sys.all_objects s1 
              CROSS JOIN sys.all_objects s2) t
Left Join Codes c on t.Code = c.Code
Where c.Code Is NULL

EDIT2:

I have created table where those codes are already randomly inserted:

CREATE TABLE Codes(ID BIGINT NOT NULL IDENTITY(1, 1) PRIMARY KEY, Code BIGINT NOT NULL, IsUsed BIT NOT NULL)
GO

;WITH t AS(
SELECT 10000000 + ROW_NUMBER() OVER(ORDER BY (SELECT 1)) rn 
FROM 
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS t1(n)
CROSS JOIN (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS t2(n)
CROSS JOIN (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS t3(n)
CROSS JOIN (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS t4(n)
CROSS JOIN (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS t5(n)
CROSS JOIN (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS t6(n)
CROSS JOIN (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS t7(n)
CROSS JOIN (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS t8(n)
)

INSERT INTO Codes
SELECT rn, 0 FROM t WHERE rn <= 99999999
ORDER BY NEWID()

CREATE CLUSTERED INDEX someindex ON codes(code)
GO

CREATE INDEX someindex2 ON codes(IsUsed)
GO

This step takes about 10 minutes. Then just use update statement with output:

;WITH    cte
          AS ( SELECT TOP 1000
                        *
               FROM     dbo.Codes
               WHERE    IsUsed = 0
               ORDER BY id
             )
    UPDATE  cte
    SET     IsUsed = 1
    OUTPUT  Inserted.Code

It updates bit and returns updated codes. It uses index seek and thus is super fast and updates and returns 100.000 rows within 1 sec.

Giorgi Nakeuri
  • 35,155
  • 8
  • 47
  • 75
  • Thanks for the answer but it only suggests how to generate a random number. I already have that part. My question is about avoiding collisions. – Paul Fleming Apr 23 '15 at 12:02
  • 1
    That exhibits the same issue as Roman's answer. It generates a random code for every record in `sys.all_objects s1 CROSS JOIN sys.all_objects s2) t`. On my powerful local machine, this takes 12 seconds. Not good... – Paul Fleming Apr 23 '15 at 15:27
  • Am I right to assume that this solution requires that ALL codes exist up front, then I pick from them at random? That's an interesting approach. Unfortunately this won't work for me because it's a multi-tenant database and the codes are unique only per tenant. I expect to have 25,000 tenants on day one. This would mean I'd be seeding the table with 2,500,000,000,000 records (2.5 trillion). – Paul Fleming Apr 24 '15 at 11:12