How to generate Unique Integer Only IDs like Facebook Twitter

11,637

Solution 1

The Flickr comment up above was very useful. We use sharding as well. We have an bigint (int64) locator field. It is generated by combining an int (int32) database id and an int (int32) identity field.

If you know you will have an int16 number of database max (quite likely), you could combine an int16 (smallint) database id and an int32 (int) user id and an int16 (smallint) action id. I don't know reasonable numbers for your application. But reserve some part for the database id, even if it's just tinyint, so you know you're future safe if you add more databases.

Solution 2

Suppose your IDs are all numeric. Delimit them by a character A (since it surely does not appear in the original IDs) and do a base conversion from base-11 to base-10.

For the example you did we now get different results:

echo base_convert("15A211", 11, 10); //247820
echo base_convert("152A11", 11, 10); //238140

Solution 3

Actually, if you look at (for example) the IDs of users on your Friends (on Facebook), you'd notice that they are sequential among all users, exactly like an AUTO_INCREMENT database field. However, they probably don't start at 1. My friends list, for example, has some numbers in the millions, then suddenly jump to 1 trillion and something, so my guess is that the auto_increment value was bumped up - this may be done to "hide" exactly how many users there are.

Anyway, to generate unique IDs, just create them sequentially with that AUTO_INCREMENT field. Optionally, set the initial value to something high.

Share:
11,637

Related videos on Youtube

stwhite
Author by

stwhite

Updated on June 04, 2022

Comments

  • stwhite
    stwhite almost 2 years

    After searching SO and other sites, I've failed to come up with conclusive evidence to how Facebook, Twitter and Pinterest generate their ID's. The reason this is needed is to avoid url collisions. Moving to an entirely different ID will prevent this because there wont be quadrillions of records.

    • Facebook.com/username/posts/362095193814294
    • Pinterest.com/pin/62487513549577588
    • Twitter.com/#!/username/status/17994686627061761

    If you look at Pinterest as an example, the first few digits relate to the user id, and the last 6 or so digits represent the save id which possibly could be an auto increment.

    To create a similar ID, but not unique I was able to use: base_convert(user_id.save_id, 16, 10). The problem here is that it's not unique, ex: base_convert(15.211, 16, 10) vs. base_convert(152.11, 16, 10). These two are the same. Simply just merging two unique sets of numbers will still produce duplicate results. Throwing uniqid() into the mix will essentially fix the duplicates, but this doesn't seem like a great practice.

    Update: Twitter appears to use this: https://github.com/twitter/snowflake

    Any suggestions on generating a unique ID like the above examples?

  • stwhite
    stwhite about 12 years
    thanks for your response. This is an option but not ideal as you mentioned. It allows for someone to find out how many users are on the site or how many posts are on the site.
  • Niet the Dark Absol
    Niet the Dark Absol about 12 years
    That's why you set the initial AUTO_INCREMENT value. Provided nobody knows for certain what the "first" one was, it's no problem.
  • Marcus Adams
    Marcus Adams about 12 years
    To set the autoincrement: ALTER TABLE tbl AUTO_INCREMENT = 100. You can also set it in the CREATE TABLE.
  • stwhite
    stwhite about 12 years
    It's somewhat obvious when you reset auto incremented values. Going from 1 to 10000000 or other large jumps are apparent that the increment was changed. Unless you start with values such as 11139239. Also, another indication is when you get thousands of values that only change by + 1. Ex: 25665312866, 25665312867, 25665312868, 25665312869, etc.
  • Niet the Dark Absol
    Niet the Dark Absol about 12 years
    Thanks @MarcusAdams - I knew this was possible and had seen the command somewhere, but I've never used it myself and I forgot it :D
  • Niet the Dark Absol
    Niet the Dark Absol about 12 years
    @stwhite: On Facebook, how likely is it that one person (or group of people) will see every single ID, or even enough of them to deterimine much of a pattern?
  • user1212517
    user1212517 about 12 years
    *"we now get different results", sorry :)
  • Marcus Adams
    Marcus Adams about 12 years
    Also, don't forget about auto_increment_increment system variable.
  • stwhite
    stwhite about 12 years
    Is there any chance for duplicate results here? I don't see a chance at all considering on both sides of the delimiter are integers only...
  • user1212517
    user1212517 about 12 years
    1. Those numbers are different, no matter what is the base used to encode them. So if they are different in base 11, they will be different in base 10, too. 2. There are no two different integers that share representation in the same base. All in all this means, the result is immune to duplicates.
  • stwhite
    stwhite over 5 years
    For anyone coming to this answer, it's worth checking out this article to further explain about reserving bits in your IDs to future-proof your ID structure. engineering.pinterest.com/blog/…
  • Brian White
    Brian White over 5 years
    The reason I'm not a fan of int IDs is the easiness of url hacking to change the user id, etc