Collection ID length in MongoDB

26,052

Solution 1

Why is the default _id a 24 character hex string?

The default unique identifier generated as the primary key (_id) for a MongoDB document is an ObjectId. This is a 12 byte binary value which is often represented as a 24 character hex string, and one of the standard field types supported by the MongoDB BSON specification.

The 12 bytes of an ObjectId are constructed using:

  • a 4 byte value representing the seconds since the Unix epoch
  • a 3 byte machine identifier
  • a 2 byte process id
  • a 3 byte counter (starting with a random value)

What is the importance of an ObjectId?

ObjectIds (or similar identifiers generated according to a GUID formula) allow unique identifiers to be independently generated in a distributed system.

The ability to independently generate a unique ID becomes very important as you scale up to multiple application servers (or perhaps multiple database nodes in a sharded cluster). You do not want to have a central coordination bottleneck like a sequence counter (eg. as you might have for an auto-incrementing primary key), and you will want to insert new documents without risk that a new identifier will turn out to be a duplicate.

An ObjectId is typically generated by your MongoDB client driver, but can also be generated on the MongoDB server if your client driver or application code or haven't already added an _id field.

Do I have to use the default ObjectId?

No. If you have a more suitable unique identifier to use, you can always provide your own value for _id. This can either be a single value or a composite value using multiple fields.

The main constraints on _id values are that they have to be unique for a collection and you cannot update or remove the _id for an existing document.

Solution 2

Now mongoDB current version is 4.2. ObjectId size is still 12 bytes but consist of 3 parts.

ObjectIds are small, likely unique, fast to generate, and ordered. ObjectId values are 12 bytes in length, consisting of:

  • a 4-byte timestamp value, representing the ObjectId’s creation, measured in seconds since the Unix epoch

  • a 5-byte random value

  • a 3-byte incrementing counter, initialized to a random value

Create ObjectId and get timestamp from it

> x = ObjectId()
ObjectId("5fdedb7c25ab1352eef88f60")
> x.getTimestamp()
ISODate("2020-12-20T05:05:00Z")

Reference

Read MongoDB official doc

Share:
26,052

Related videos on Youtube

ashish bandiwar
Author by

ashish bandiwar

Updated on July 09, 2022

Comments

  • ashish bandiwar
    ashish bandiwar almost 2 years

    i am new to mongodb and stack overflow.

    I want to know why on mongodb collection ID is of 24 hex characters? what is importance of that?

    • Neil Lunn
      Neil Lunn almost 10 years
      The official documentation is a good place to start: ObjectId
    • Stennie
      Stennie almost 10 years
      The default unique identifier generated for a primary key (_id) is an ObjectId. This is a 12-byte binary value which is often represented as a 24 character hex string. If you have a more suitable unique identifier to use, you can provide your own value for _id. The importance of an ObjectId is that unique values can be generated in a distributed system (typically by the client driver). This is similar to GUIDs, although more compact.
  • Kenny Worden
    Kenny Worden almost 9 years
    Is that 4 byte value unsigned? If it's not MongoDB will have to do an overhaul in about 22 years...
  • Stennie
    Stennie almost 9 years
    @KennyWorden ObjectIds currently use a signed 32-bit int (i.e. unixtime), so you're correct that the time component will roll over eventually (see also: What will happen to ObjectIDs in year 2038?). Generated ObjectIds should continue be unique (byte wise) for a while after rollover but certain assumptions (such as ordering by a monotonically increasing time prefix) would no longer hold true. I assume there will be a replacement ObjectId subtype introduced before then :).
  • Stennie
    Stennie almost 9 years
    I believe the unixtime component was originally included for uniqueness and a rough ordering of generated ObjectIds, and not to embed a timestamp in default _ids (although certainly developers have made assumptions about the timestamp aspect since then). There have been several ObjectId variations already, as implemented by different legacy drivers (see the "subtypes" in the BSON spec or as written up in UUID Support in Robomongo).
  • Kenny Worden
    Kenny Worden almost 9 years
    SIGNED? Why? They don't need to track time before the epoch!
  • Stennie
    Stennie almost 9 years
    @KennyWorden See also: unix time: "Unix time is a single signed integer number which increments every second". I presume that "seemed like a good idea at the time" to the Unix kernel devs (similar to the choice of 1-Jan-1970 as the unix epoch). The usage in ObjectId generation is following the established convention for convenience. The overall ObjectId wants to be uniquely generated, but as noted I don't think it necessarily has to embed a timestamp. You're also free to use your own unique identifiers for _id rather than the default ObjectId.
  • Kenny Worden
    Kenny Worden almost 9 years
    Thanks for your answers! I'll definitely look into this some more. :)