Why does direction of index matter in MongoDB?

28,175

Solution 1

MongoDB concatenates the compound key in some way and uses it as the key in a BTree.

When finding single items - The order of the nodes in the tree is irrelevant.

If you are returning a range of nodes - The elements close to each other will be down the same branches of the tree. The closer the nodes are in the range the quicker they can be retrieved.

With a single field index - The order won't matter. If they are close together in ascending order they will also be close together in descending order.

When you have a compound key - The order starts to matter.

For example, if the key is A ascending B ascending the index might look something like this:

Row   A B
1     1 1
2     2 6
3     2 7 
4     3 4
5     3 5
6     3 6
7     5 1

A query for A ascending B descending will need to jump around the index out of order to return the rows and will be slower. For example it will return Row 1, 3, 2, 6, 5, 4, 7

A ranged query in the same order as the index will simply return the rows sequentially in the correct order.

Finding a record in a BTree takes O(Log(n)) time. Finding a range of records in order is only OLog(n) + k where k is the number of records to return.

If the records are out of order, the cost could be as high as OLog(n) * k

Solution 2

The simple answer that you are looking for is that the direction only matters when you are sorting on two or more fields.

If you are sorting on {a : 1, b : -1}:

Index {a : 1, b : 1} will be slower than index {a : 1, b : -1}

Solution 3

Why indexes

Understand two key points.

  1. While an index is better than no index, the correct index is much better than either.
  2. MongoDB will only use one index per query, making compound indexes with proper field ordering what you probably want to use.

Indexes aren't free. They take memory, and impose a performance penalty when doing inserts, updates and deletes. Normally the performance hit is negligible (especially compared to gains in read performance), but that doesn't mean that we can't be smart about creating our indexes.

How Indexes

Identifying what group of fields should be indexed together is about understanding the queries that you are running. The order of the fields used to create your index is critical. The good news is that, if you get the order wrong, the index won't be used at all, so it'll be easy to spot with explain.

Why Sorting

Your queries might need Sorting. But sorting can be an expensive operation, so it's important to treat the fields that you are sorting on just like a field that you are querying. So it will be faster if it has index. There is one important difference though, the field that you are sorting must be the last field in your index. The only exception to this rule is if the field is also part of your query, then the must-be-last-rule doesn't apply.

How Sorting

You can specify a sort on all the keys of the index or on a subset; however, the sort keys must be listed in the same order as they appear in the index. For example, an index key pattern { a: 1, b: 1 } can support a sort on { a: 1, b: 1 } but not on { b: 1, a: 1 }.

The sort must specify the same sort direction (i.e. ascending/descending) for all its keys as the index key pattern or specify the reverse sort direction for all its keys as the index key pattern. For example, an index key pattern { a: 1, b: 1 } can support a sort on { a: 1, b: 1 } and { a: -1, b: -1 } but not on { a: -1, b: 1 }.

Suppose there are these indexes:

{ a: 1 }
{ a: 1, b: 1 }
{ a: 1, b: 1, c: 1 }

Example                                                    Index Used
db.data.find().sort( { a: 1 } )                            { a: 1 }
db.data.find().sort( { a: -1 } )                           { a: 1 }
db.data.find().sort( { a: 1, b: 1 } )                      { a: 1, b: 1 }
db.data.find().sort( { a: -1, b: -1 } )                    { a: 1, b: 1 }
db.data.find().sort( { a: 1, b: 1, c: 1 } )                { a: 1, b: 1, c: 1 }
db.data.find( { a: { $gt: 4 } } ).sort( { a: 1, b: 1 } )   { a: 1, b: 1 }
Share:
28,175
johndodo
Author by

johndodo

Updated on March 16, 2020

Comments

  • johndodo
    johndodo about 4 years

    To quote the docs:

    When creating an index, the number associated with a key specifies the direction of the index, so it should always be 1 (ascending) or -1 (descending). Direction doesn't matter for single key indexes or for random access retrieval but is important if you are doing sorts or range queries on compound indexes.

    However, I see no reason why direction of the index should matter on compound indexes. Can someone please provide a further explanation (or an example)?

  • johndodo
    johndodo about 12 years
    The resulting row should probably be 1, 3, 2, 6, 5, 4, 7?
  • johndodo
    johndodo about 12 years
    I still see no reason for it being slower. Only the algorithm should be different (for each group of values in A it should jump to the end of the group and process it in reverse order), but since MongoDB indexes are in memory that should have no noticeable effect on speed. Also, RDBMS know nothing about direction with indexes and the situation there is quite similar afaik?
  • Jared Kells
    Jared Kells about 12 years
    The reason it is a performance hit is because it's not just a sequential list in memory like the simplified example. It's actually a weighted tree. Jumping out of order will involve traversing the tree again. RDMS definitively have order to indexes.
  • Jared Kells
    Jared Kells about 12 years
    Fetching nodes from a BTree in order is as simple as moving along each leaf until you run out and then going up a level and down the next branch. It's O(n) Out of order it's much more CPU intensive.
  • johndodo
    johndodo about 12 years
    Thanks for further clarification. I checked the docs for MySQL indexes - it really is possible to specify index direction, but the setting is ignored.
  • Ashish
    Ashish about 11 years
    @Jared: Good post but I didn't understand one thing. You specified that while finding single items, the order doean't matter. Consider we have a posts collection which has a timestamp field based on which we want to get the latest posts of the user. In this case, doing a descending(-1) index seems logically better than ascending(1) index. Wouldn't -1 index be faster in this case ?
  • Jared Kells
    Jared Kells about 11 years
    @Ashish I think the performance difference of reading records backwards vs forwards would be insignificant. The real cost is when the next record to read isn't adjacent to the current record.
  • Sammaye
    Sammaye over 8 years
    @MarkPieszak because the entire sort would have to be done in memory making the index useless
  • Zaid Masud
    Zaid Masud over 8 years
    @Sammaye I think that's the right idea, although I'm not sure that it's the entire sort. I'd have to look at the implementation to know how it really works, but I would think that the results could be pulled back sorted by a alone, and then the additional b sort would need to be done in memory.
  • Sammaye
    Sammaye over 8 years
    hmm, weird last time I checked the code it dropped partial sorts due to how the sorting was but meh, maybe it's changed
  • Hussain
    Hussain about 7 years
    What if I am sorting on {a: -1, b: -1}, should I have {a: -1, b: -1} index or will {a: 1, b: 1} will be enough.
  • Zaid Masud
    Zaid Masud over 6 years
    @Hussain in your example the {a: 1, b: 1} index should be sufficient as inverting an index completely is fine. e.g. Index on {a: 1} can be used for a sort on {a: -1}
  • Lukas Liesis
    Lukas Liesis over 6 years
    I understand that's an example but if there is index { a: 1, b: 1, c: 1 } do you really need indexes { a: 1} and { a: 1, b: 1} or index { a: 1, b: 1, c: 1 } covers all cases? If queries always use same sort: 1 no sorts in query with -1
  • Somnath Muluk
    Somnath Muluk over 6 years
    If there are many queries which are working on only property 'a', it is faster to search with index with property 'a' for database engine, than searching by index with 3 properties 'a', 'b', 'c'. Because index size will increase and count also increases. ex. If there are 20 chapters in book. So it is faster to go to chapter 3 and then specific page. @LukasLiesis
  • Cyril Duchon-Doris
    Cyril Duchon-Doris almost 5 years
    Is this still a problem for indexing/sorting on a boolean field ? If I want to get only the "active" items of a user, should I create an index { user_id: 1, active: 1 } or { user_id: 1, active: -1 } or does it matter ? (assuming active can be true/false and there are no null values in the DB)
  • Rajan
    Rajan about 3 years
    should I create { date: -1, _id: 1 } & { date: 1, _id: -1 } both or just one?
  • Rajan
    Rajan about 3 years
    should I create index for { date: -1, _id: 1 } & { date: 1, _id: -1 } both or just one of them?
  • Rajan
    Rajan about 3 years
    I found out that one of them will be enough. see here docs.mongodb.com/manual/core/index-compound/#sort-order
  • Rajan
    Rajan about 3 years
    I found out that one of them will be enough. see here docs.mongodb.com/manual/core/index-compound/#sort-order