Guid.NewGuid() VS a random string generator from Random.Next()

34,965

Solution 1

I am looking for a more in depth reason as to why the cooked up method may be more likely to generate collisions given the same degrees of freedom as a Guid.

First, as others have noted, Random is not thread-safe; using it from multiple threads can cause it to corrupt its internal data structures so that it always produces the same sequence.

Second, Random is seeded based on the current time. Two instances of Random created within the same millisecond (recall that a millisecond is several million processor cycles on modern hardware) will have the same seed, and therefore will produce the same sequence.

Third, I lied. Random is not seeded based on the current time; it is seeded based on the amount of time the machine has been active. The seed is a 32 bit number, and since the granularity is in milliseconds, that's only a few weeks until it wraps around. But that's not the problem; the problem is: the time period in which you create that instance of Random is highly likely to be within a few minutes of the machine booting up. Every time you power-cycle a machine, or bring a new machine online in a cluster, there is a small window in which instances of Random are created, and the more that happens, the greater the odds are that you'll get a seed that you had before.

(UPDATE: Newer versions of the .NET framework have mitigated some of these problems; in those versions you no longer have every Random created within the same millisecond have the same seed. However there are still many problems with Random; always remember that it is only pseudo-random, not crypto-strength random. Random is actually very predictable, so if you are relying on unpredictability, it is not suitable.)

As other have said: if you want a primary key for your database then have the database generate you a primary key; let the database do its job. If you want a globally unique identifier then use a guid; that's what they're for.

And finally, if you are interested in learning more about the uses and abuses of guids then you might want to read my "guid guide" series; part one is here:

http://blogs.msdn.com/b/ericlippert/archive/2012/04/24/guid-guide-part-one.aspx

Solution 2

As written in other answers, my implementation had a few severe problems:

  • Thread safety: Random is not thread safe.
  • Predictability: the method couldn't be used for security critical identifiers like session tokens due to the nature of the Random class.
  • Collisions: Even though the method created 20 'random' numbers, the probability of a collision is not (number of possible chars)^20 due to the seed value only being 31 bits, and coming from a bad source. Given the same seed, any length of sequence will be the same.

Guid.NewGuid() would be fine, except we don't want to use ugly GUIDs in urls and .NETs NewGuid() algorithm is not known to be cryptographically secure for use in session tokens - it might give predictable results if a little information is known.

Here is the code we're using now, it is secure, flexible and as far as I know it's very unlikely to create collisions if given enough length and character choice:

class RandomStringGenerator
{
    RNGCryptoServiceProvider rand = new RNGCryptoServiceProvider();
    public string GetRandomString(int length, params char[] chars)
    {
        string s = "";
        for (int i = 0; i < length; i++)
        {
            byte[] intBytes = new byte[4];
            rand.GetBytes(intBytes);
            uint randomInt = BitConverter.ToUInt32(intBytes, 0);
            s += chars[randomInt % chars.Length];
        }
        return s;
    }
}

Solution 3

"Auto generating user ids and post ids for identification in the database"...why not use a database sequence or identity to generate keys?

To me your question is really, "What is the best way to generate a primary key in my database?" If that is the case, you should use the conventional tool of the database which will either be a sequence or identity. These have benefits over generated strings.

  1. Sequences/identity index better. There are numerous articles and blog posts that explain why GUIDs and so forth make poor indexes.
  2. They are guaranteed to be unique within the table
  3. They can be safely generated by concurrent inserts without collision
  4. They are simple to implement

I guess my next question is, what reasons are you considering GUID's or generated strings? Will you be integrating across distributed databases? If not, you should ask yourself if you are solving a problem that doesn't exist.

Solution 4

Your custom method has two problems:

  1. It uses a global instance of Random, but doesn't use locking. => Multi threaded access can corrupt its state. After which the output will suck even more than it already does.
  2. It uses a predictable 31 bit seed. This has two consequences:
    • You can't use it for anything security related where unguessability is important
    • The small seed (31 bits) can reduce the quality of your numbers. For example if you create multiple instances of Random at the same time(since system startup) they'll probably create the same sequence of random numbers.

This means you cannot rely on the output of Random being unique, no matter how long it is.

I recommend using a CSPRNG (RNGCryptoServiceProvider) even if you don't need security. Its performance is still acceptable for most uses, and I'd trust the quality of its random numbers over Random. If you you want uniqueness, I recommend getting numbers with around 128 bits.

To generate random strings using RNGCryptoServiceProvider you can take a look at my answer to How can I generate random 8 character, alphanumeric strings in C#?.


Nowadays GUIDs returned by Guid.NewGuid() are version 4 GUIDs. They are generated from a PRNG, so they have pretty similar properties to generating a random 122 bit number (the remaining 6 bits are fixed). Its entropy source has much higher quality than what Random uses, but it's not guaranteed to be cryptographically secure.

But the generation algorithm can change at any time, so you can't rely on that. For example in the past the Windows GUID generation algorithm changed from v1 (based on MAC + timestamp) to v4 (random).

Solution 5

Use System.Guid as it:

...can be used across all computers and networks wherever a unique identifier is required.

Note that Random is a pseudo-random number generator. It is not truly random, nor unique. It has only 32-bits of value to work with, compared to the 128-bit GUID.

However, even GUIDs can have collisions (although the chances are really slim), so you should use the database's own features to give you a unique identifier (e.g. the autoincrement ID column). Also, you cannot easily turn a GUID into a 4 or 20 (alpha)numeric number.

Share:
34,965

Related videos on Youtube

George Powell
Author by

George Powell

Software Engineer. Interested in web technologies, web security, web scale. Enjoys a challenge and learning new stuff.

Updated on July 17, 2020

Comments

  • George Powell
    George Powell almost 4 years

    My colleague and I are debating which of these methods to use for auto generating user ID's and post ID's for identification in the database:

    One option uses a single instance of Random, and takes some useful parameters so it can be reused for all sorts of string-gen cases (i.e. from 4 digit numeric pins to 20 digit alphanumeric ids). Here's the code:

    // This is created once for the lifetime of the server instance
    class RandomStringGenerator
    {
        public const string ALPHANUMERIC_CAPS = "ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890";
        public const string ALPHA_CAPS = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
        public const string NUMERIC = "1234567890";
    
        Random rand = new Random();
        public string GetRandomString(int length, params char[] chars)
        {
            string s = "";
            for (int i = 0; i < length; i++)
                s += chars[rand.Next() % chars.Length];
    
            return s;
        }
    }
    

    and the other option is simply to use:

    Guid.NewGuid();
    

    see Guid.NewGuid on MSDN

    We're both aware that Guid.NewGuid() would work for our needs, but I would rather use the custom method. It does the same thing but with more control.

    My colleague thinks that because the custom method has been cooked up ourselves, it's more likely to generate collisions. I'll admit I'm not fully aware of the implementation of Random, but I presume it is just as random as Guid.NewGuid(). A typical usage of the custom method might be:

    RandomStringGenerator stringGen = new RandomStringGenerator();
    string id = stringGen.GetRandomString(20, RandomStringGenerator.ALPHANUMERIC_CAPS.ToCharArray());
    

    Edit 1:

    • We are using Azure Tables which doesn't have an auto increment (or similar) feature for generating keys.
    • Some answers here just tell me to use NewGuid() "because that's what it's made for". I'm looking for a more in depth reason as to why the cooked up method may be more likely to generate collisions given the same degrees of freedom as a Guid.

    Edit 2:

    We were also using the cooked up method to generate post ID's which, unlike session tokens, need to look pretty for display in the url of our website (like http://mywebsite.com/14983336), so guids are not an option here, however collisions are still to be avoided.

    • GalacticCowboy
      GalacticCowboy about 11 years
      Random makes NO guarantee of uniqueness. It is perfectly valid for a random sequence to contain the same result multiple times.
    • StarPilot
      StarPilot about 11 years
      If you cook it up yourself, it is less likely to be UNIQUE then a GUID. A GUID makes use of certain hardware factors and the current time to generate a particular GUID. You can research the details yourself. A GUID won't be unique if you hit the same GUID generator too often in a short segment of time. Again, you can look up the details yourself.
    • StarPilot
      StarPilot about 11 years
      If you want to generate unique user IDs and post IDs, you should use auto-incrementing numbers from a database. Hit the database, get back the next number in that sequence. Guaranteed unique.
    • JerKimball
      JerKimball about 11 years
      As @GalacticCowboy said, random should be read as deterministically random, not unique; if you are aiming for uniqueness, use NewGuid
    • erikkallen
      erikkallen about 11 years
      @StarPilot: No it doesn't. Only type 1 GUIDs do, and Guid.NewGuid() returns a type 4 GUID.
    • Eric Lippert
      Eric Lippert about 11 years
      "I'll admit I'm not fully aware of the implementation of Random, but I presume it is just as random as Guid.NewGuid(). Your assumption is completely incorrect, in two ways. (1) GUIDs are not guaranteed to be random at all; they are guaranteed to be unique. (2) Version 4 GUIDs are random (in most, but not all of their bits) and their source of entropy is considerably less prone to collision than the weak source of entropy used by Random.
    • Ryan Gates
      Ryan Gates about 11 years
      You should rewrite the question to explicitly ask which method is more likely to generate collisions if that is all that you are after. The question seems to ask what the best option is.
    • Eric Lippert
      Eric Lippert about 11 years
      @erikkallen: where in the documentation does it state that NewGuid always returns a version 4 guid?
    • Eric Lippert
      Eric Lippert about 11 years
      @GeorgePowell: And while we're looking at your code, IT_IS_NOT_1970_ANYMORE; FormatYourConstantsLikeThis in C#.
    • George Powell
      George Powell about 11 years
      @EricLippert I'll format my code how I like thanks! Capital letters look more stubborn and harder to budge, i.e. constant.
    • Brian
      Brian about 11 years
      You may find the second half of Raymond Chen's article on shortening a GUID to be of interest. blogs.msdn.com/b/oldnewthing/archive/2008/06/27/8659071.aspx
    • erikkallen
      erikkallen about 11 years
      @EricLippert. You're right, it doesn't say that in the documentation. It happens in practice, though.
    • CodesInChaos
      CodesInChaos about 11 years
      @EricLippert If you relied only on documented behavior, 90% of the .net framework would be unusable.
    • Admin
      Admin over 10 years
      @GeorgePowell: Re: formatting: It's about the person after you. The semantics of code formatting are highly subjective. To wit: the original convention of all caps had nothing to do with how "stubborn" it looked - you added that meaning, and I would not read it that way. Decades of maintenance through multiple teams has made many people prefer to agree on a convention rather than trying to interpret it on a per-line basis. In C# official guidance exists, and developers are expected to generally adhere to it. But it is off-topic, I agree.
    • thesaint
      thesaint over 10 years
      @GeorgePowell: Totally agree with you. While I write standard C# formatting everywhere else I still use C++ macro-style for constant just because it is immediately obvious that this thing is never going to change at runtime...
  • GalacticCowboy
    GalacticCowboy about 11 years
    As you note, GUIDs seem to be most useful in distributed or disconnected-edit scenarios. Otherwise, you're mostly just making more work for yourself without any real benefit.
  • CodesInChaos
    CodesInChaos about 11 years
    Being pseudo-random isn't a problem per se (v4 GUIDs are just pseudo-random numbers as well), but Random uses a really bad seed.
  • George Powell
    George Powell about 11 years
    The edit said that I'm not using SQL or any of its GUID features. I'm using azure tables which needs a string as a row key.
  • George Powell
    George Powell about 11 years
    "It does support uniqueidentifier" are you talking about SQL azure or Azure tables? I'm not using SQL azure and I'm not sure what you mean in terms of Azure tables. edit: you deleted your comment...
  • GalacticCowboy
    GalacticCowboy about 11 years
    Yes, in reading further I decided that my comment (and in fact this entire answer) really doesn't apply to your situation. In a general case, the question title and details are really about two different things.
  • CodesInChaos
    CodesInChaos about 11 years
    I don't see 2, 3 and 4 as big advantages over GUIDs or manually generated random strings(using a good PRNG). The collision chance is negligible. Point 1 is the big disadvantage of random IDs. Even with distributed systems machine-id + counter is often preferable.
  • George Powell
    George Powell about 11 years
    We're using Azure tables and accessing them from an azure web roll. I'm no database expert but I think GUIDs/random strings are the only option here?
  • Jordan Parmer
    Jordan Parmer about 11 years
    @GeorgePowell - Ah, that changes things. I see you updated your question now to reflect that. In that case, I defer to Eric's good answer.
  • George Powell
    George Powell about 11 years
    Thanks for your answer, the guid guide was useful and your speculations in part 3 lead us to generate our own ID's and session tokens using RNGCryptoServiceProvider rather than NewGuid(). See my own answer to this question below for details.
  • ygoe
    ygoe over 9 years
    Just curious: Isn't there a slight preference of certain items in chars when uint.MaxValue is not a multiple of chars.Length, because you are using the modulo operator here? Not sure if that's relevant though.
  • julealgon
    julealgon almost 3 years
    Hey Eric, can you update the link the Guid discussion (assuming it is still hosted somewhere)?