3

I am creating normal GUIDs and then storing these as strings in a table. They are very long and I don't need to have them that long as my table only will ever have a maximum of 10,000 rows and if it so happens that a value is created twice I can arrange for a second attempt at an insert.

I am creating the column for that holds the random value and its default like this:

ALTER TABLE [dbo].[Question] ADD GlobalId VARCHAR (50) DEFAULT(NEWID());

Is there some way I can shorten a GUID and still retain some randomness for example 1 in a million repeat values? Note that insert time of a row is not a big concern. What is most important for me is that every time there's an insert I would like to have a random string placed into GlobalId.

Update:

It's been suggested that I use a 4 byte INT and I would be okay with that but I am not sure how to generate a number that would fit in a 4 byte INT as a default in SQL Server.

Samantha J T Star
  • 30,952
  • 84
  • 245
  • 427
  • 4
    Why do you use a GUID at all then? Why don't you use just a simple `int` instead? It's only 4 bytes vs. 16 bytes of raw GUID. If you shorten the GUID value, the chance of duplicates will start to increase. – xxbbcc Dec 26 '13 at 16:41
  • 1
    I don't need to use a GUID. With SQL Server is there a simple way I could generate a random that could be stored in those 4 bytes ? Also if I used 4 bytes then what's the maximum number that could be stored? – Samantha J T Star Dec 26 '13 at 16:54
  • Even if you used HEX values... that's 16 characters per byte. With 4 bytes, you're talking 16^4 = 65536 different values. – sam yi Dec 26 '13 at 17:15
  • you can also use CHECKSUM to get back a 4 bytes integer. For example : `SELECT BINARY_CHECKSUM(NEWID())`, but there will be duplicates (collisions) from time to time – rudi bruchez Dec 26 '13 at 17:21
  • Just as a side note : if your globalId is indexed, beware of index fragmentation, you'll need to rebuild the index regularly – rudi bruchez Dec 26 '13 at 17:22

3 Answers3

4

I'm guessing you're just using GUID as a quick random generator...

If you'd like quid to be a bit shorter.. just do

LEFT(NEWID(),10) -- You don't have to use 10.. i've just decided to test with 10.

You can use this to test it. I've tested it with first 1 million inserts and there were 6 duplicates.

set nocount on;

if object_id('tempdb..#tmp') is not null
    drop table #tmp
create table #tmp (val varchar(10) primary key)

declare @i int, @new varchar(10)
set @i = 0

while @i < 1000000
begin

    select @new = left(newid(),10)
    begin try 
        insert into #tmp (val) values (@new)
    end try
    begin catch
        print @i
    end catch

    set @i = @i + 1
end

Obviously, if you use longer strings, you'll get less hits... and if you removed "-", you'll get more "random"

If you have some time.. you should look up random number generator instead and implement it instead. Here is a quick link. How do I generate random number for each row in a TSQL Select?

Community
  • 1
  • 1
sam yi
  • 4,806
  • 1
  • 29
  • 40
  • The reason I would like it to be as short as possible is because I have code that takes a number of rows. For example 50. It then takes the IDs from these rows and put those IDs into another row as a JSON string. So row length is quite important for me. I like your idea very much. I think it meets my needs. – Samantha J T Star Dec 26 '13 at 17:11
  • @SamanthaJ If you're worried about length, don't use strings for your values - use `int`-s and send them as hexadecimal values in your JSON data. On the average you'll have shorter string values that way. – xxbbcc Dec 26 '13 at 17:13
  • I think it's more random on the "right" side. Do right(newid(),10) instead. – sam yi Dec 26 '13 at 17:14
  • I agree with xxbbcc. Per your comment above, you can have 65k unique values in 4 bytes using hex. If you open that up to other characters... even just alpha numeric... 36 (26+10)... you're looking at 1.67mm uniquely representable values. – sam yi Dec 26 '13 at 17:16
4

First off, if you store the field as a uniqueidentifier instead of a string, the storage size will be only 16 bytes. If you convert it to a varchar as you are doing, the storage size will be 36 bytes per record (32 characters and 4 dashes). So varchar(50) in your example also takes up 36 bytes of space per row. If you just want to shorten the string, you can do this:

select left(convert(varchar(36), newid()), 8)

That will give you the first 8 characters of the NEWID, which will yield 2^32 or over 4 billion possibilities. I'd guess the probability of a collision with less than 10k tries is unlikely, but you should probably handle the scenario just in case.

TTT
  • 22,611
  • 8
  • 63
  • 69
  • oh wow- sam beat me to it. I like his better since you don't actually need the convert part. I was going to add that using right instead of left gets you up to 12 chars before hitting a dash, but sam just added that too. Seems like I'm always a minute behind. – TTT Dec 26 '13 at 17:17
1

This is not specifically an answer to your question but a suggestion to use a different approach (if you can).

If you don't truly need the uniqueness of a GUID, then don't use a GUID - while the storage overhead may not be big (only ~10K rows according to your question) there's no point in storing something that doesn't need to be stored.

If you only need random values stored with your rows you can try using the RAND TSQL function to generate a random number. It works for int-s. If you only have a few thousand rows the chance of repeat numbers is not very high but you may still want to check to make sure you don't get a duplicate (assuming you need a unique number). Here's the int reference about the available range of values.

If you do, however, need unique values, I'd just use a unique constraint on an auto-generated number column instead. Or keep using GUIDs (stored as uniqueidentifier or binary(16) instead of varchar) - 10K rows is really not that many and a GUID is really, really unique (if generated right).

xxbbcc
  • 16,930
  • 5
  • 50
  • 83