How to generate Unique Integer Only IDs like Facebook Twitter
Solution 1
The Flickr comment up above was very useful. We use sharding as well. We have an bigint (int64) locator field. It is generated by combining an int (int32) database id and an int (int32) identity field.
If you know you will have an int16 number of database max (quite likely), you could combine an int16 (smallint) database id and an int32 (int) user id and an int16 (smallint) action id. I don't know reasonable numbers for your application. But reserve some part for the database id, even if it's just tinyint, so you know you're future safe if you add more databases.
Solution 2
Suppose your IDs are all numeric. Delimit them by a character A
(since it surely does not appear in the original IDs) and do a base conversion from base-11 to base-10.
For the example you did we now get different results:
echo base_convert("15A211", 11, 10); //247820
echo base_convert("152A11", 11, 10); //238140
Solution 3
Actually, if you look at (for example) the IDs of users on your Friends (on Facebook), you'd notice that they are sequential among all users, exactly like an AUTO_INCREMENT database field. However, they probably don't start at 1. My friends list, for example, has some numbers in the millions, then suddenly jump to 1 trillion and something, so my guess is that the auto_increment value was bumped up - this may be done to "hide" exactly how many users there are.
Anyway, to generate unique IDs, just create them sequentially with that AUTO_INCREMENT field. Optionally, set the initial value to something high.
Related videos on Youtube
stwhite
Updated on June 04, 2022Comments
-
stwhite almost 2 years
After searching SO and other sites, I've failed to come up with conclusive evidence to how Facebook, Twitter and Pinterest generate their ID's. The reason this is needed is to avoid url collisions. Moving to an entirely different ID will prevent this because there wont be quadrillions of records.
- Facebook.com/username/posts/362095193814294
- Pinterest.com/pin/62487513549577588
- Twitter.com/#!/username/status/17994686627061761
If you look at Pinterest as an example, the first few digits relate to the user id, and the last 6 or so digits represent the save id which possibly could be an auto increment.
To create a similar ID, but not unique I was able to use:
base_convert(user_id.save_id, 16, 10)
. The problem here is that it's not unique, ex:base_convert(15.211, 16, 10)
vs.base_convert(152.11, 16, 10)
. These two are the same. Simply just merging two unique sets of numbers will still produce duplicate results. Throwinguniqid()
into the mix will essentially fix the duplicates, but this doesn't seem like a great practice.Update: Twitter appears to use this: https://github.com/twitter/snowflake
Any suggestions on generating a unique ID like the above examples?
-
Michael Mior about 12 yearsHere's how Flickr does this.
-
stwhite about 12 yearsthanks for your response. This is an option but not ideal as you mentioned. It allows for someone to find out how many users are on the site or how many posts are on the site.
-
Niet the Dark Absol about 12 yearsThat's why you set the initial AUTO_INCREMENT value. Provided nobody knows for certain what the "first" one was, it's no problem.
-
Marcus Adams about 12 yearsTo set the autoincrement:
ALTER TABLE tbl AUTO_INCREMENT = 100
. You can also set it in theCREATE TABLE
. -
stwhite about 12 yearsIt's somewhat obvious when you reset auto incremented values. Going from 1 to 10000000 or other large jumps are apparent that the increment was changed. Unless you start with values such as 11139239. Also, another indication is when you get thousands of values that only change by + 1. Ex: 25665312866, 25665312867, 25665312868, 25665312869, etc.
-
Niet the Dark Absol about 12 yearsThanks @MarcusAdams - I knew this was possible and had seen the command somewhere, but I've never used it myself and I forgot it :D
-
Niet the Dark Absol about 12 years@stwhite: On Facebook, how likely is it that one person (or group of people) will see every single ID, or even enough of them to deterimine much of a pattern?
-
user1212517 about 12 years*"we now get different results", sorry :)
-
Marcus Adams about 12 yearsAlso, don't forget about auto_increment_increment system variable.
-
stwhite about 12 yearsIs there any chance for duplicate results here? I don't see a chance at all considering on both sides of the delimiter are integers only...
-
user1212517 about 12 years1. Those numbers are different, no matter what is the base used to encode them. So if they are different in base 11, they will be different in base 10, too. 2. There are no two different integers that share representation in the same base. All in all this means, the result is immune to duplicates.
-
stwhite over 5 yearsFor anyone coming to this answer, it's worth checking out this article to further explain about reserving bits in your IDs to future-proof your ID structure. engineering.pinterest.com/blog/…
-
Brian White over 5 yearsThe reason I'm not a fan of int IDs is the easiness of url hacking to change the user id, etc