MongoDB regular expression with indexed field

10,653

The reason for the performance differential here is likely that, with the index enabled, your query must traverse the index (load into memory), then load the matching documents to be returned into memory also. Since you are not using the prefix query all values in the index will be scanned and tested against the regular expression. Not very efficient.

When you remove the index you are just doing a table scan and matching the regex there - essentially you simplified things from the first one slightly.

You might be able to make the indexed version quicker if it were a covered index query, it would also likely be faster if this were a compound index and you needed to combine it with the criteria for another field.

When you use a prefix query, it's not that it only uses an index then, but you use the index efficiently, which is key, and hence you see the real performance gains.

Share:
10,653

Related videos on Youtube

Eramir
Author by

Eramir

C#, .NET, WinForms, ASP.NET, ADO.NET, WCF dev

Updated on July 30, 2020

Comments

  • Eramir
    Eramir almost 4 years

    I was creating my first app using MongoDB. Created index for a field, and tried a find query with $regex param, launched in a shell

    > db.foo.find({A:{$regex:'BLABLA!25500[0-9]'}}).explain()
    {
            "cursor" : "BtreeCursor A_1 multi",
            "nscanned" : 500001,
            "nscannedObjects" : 10,
            "n" : 10,
            "millis" : 956,
            "nYields" : 0,
            "nChunkSkips" : 0,
            "isMultiKey" : false,
            "indexOnly" : false,
            "indexBounds" : {
                    "A" : [
                            [
                                    "",
                                    {
    
                                    }
                            ],
                            [
                                    /BLABLA!25500[0-9]/,
                                    /BLABLA!25500[0-9]/
                            ]
                    ]
            }
    }
    

    It's very strange, because when i'm launching the same query, but with no index in collection, the performance is much better.

    > db.foo.find({A:{$regex:'BLABLA!25500[0-9]'}}).explain()
    {
            "cursor" : "BasicCursor",
            "nscanned" : 500002,
            "nscannedObjects" : 500002,
            "n" : 10,
            "millis" : 531,
            "nYields" : 0,
            "nChunkSkips" : 0,
            "isMultiKey" : false,
            "indexOnly" : false,
            "indexBounds" : {
    
            }
    }
    

    Obviously, searching a field with index without regex is working much faster(i.e. searching document with constant field) , but i'm really interested in reason of such behavior.

    • Lycha
      Lycha over 12 years
      Don't know what is causing the slowdown with the index, but if you want regex to utilize index you could try ^BLABLA!25500[0-9] to let mongodb know what are the first characters (if it suits your usecase).
    • Eramir
      Eramir over 12 years
      It Works! How could I forgot ... now it takes only 49 millis. Thanks !But i still don't know why it's slower with index, if mongo understands that it can't use index and process full search in collection ... How does it cares about index ? ...
    • Shivan Dragon
      Shivan Dragon about 12 years
      @Lycha: that's just an assumption you make, and is wrong: doing a regex on an indexed field will make use of that index (will not scan each document in the collection), even if the regex expression doesn't say anything about the prefix (even if the regex is something like /.*whatever.*/)
    • Lycha
      Lycha about 12 years
      @AndreiBodnarescu Where did you read that? Based on the performance my recommendation seems to be correct. This is quote from mongodb website "For simple prefix queries (also called rooted regexps) like /^prefix/, the database will use an index when available and appropriate ". Based on this it will only use index for prefixed regexps.
    • Shivan Dragon
      Shivan Dragon about 12 years
      I must've missed that in their docs, however doing some basic tests shows that for a basic (like, text) indexed field, doing a regex search will always use indexes, even if you specify a prefix or not. This is also true for embedded arrays fields as long as you use dot notation (not $elemMatch). So basically, as far as index usage in searches is concerned, it's moot if you specify a regex with a prefix or one without
    • Lycha
      Lycha about 12 years
      @AndreiBodnarescu At least my mongodb doesn't leverage index any useful way when using regex without prefix. In fact if field is indexed it just slows down the non-prefixed regex query (as OP reported and I also tested). I'm running MongoDB 1.8.2. So using prefixed regex gives huge speed boost in my test and also reported by OP (definately not moot point to use it or not).