MongoDB Full and Partial Text Search

95,567

Solution 1

As at MongoDB 3.4, the text search feature is designed to support case-insensitive searches on text content with language-specific rules for stopwords and stemming. Stemming rules for supported languages are based on standard algorithms which generally handle common verbs and nouns but are unaware of proper nouns.

There is no explicit support for partial or fuzzy matches, but terms that stem to a similar result may appear to be working as such. For example: "taste", "tastes", and tasteful" all stem to "tast". Try the Snowball Stemming Demo page to experiment with more words and stemming algorithms.

Your results that match are all variations on the same word "LEONEL", and vary only by case and diacritic. Unless "LEONEL" can be stemmed to something shorter by the rules of your selected language, these are the only type of variations that will match.

If you want to do efficient partial matches you'll need to take a different approach. For some helpful ideas see:

There is a relevant improvement request you can watch/upvote in the MongoDB issue tracker: SERVER-15090: Improve Text Indexes to support partial word match.

Solution 2

As Mongo currently does not supports partial search by default...

I created a simple static method.

import mongoose from 'mongoose'

const PostSchema = new mongoose.Schema({
    title: { type: String, default: '', trim: true },
    body: { type: String, default: '', trim: true },
});

PostSchema.index({ title: "text", body: "text",},
    { weights: { title: 5, body: 3, } })

PostSchema.statics = {
    searchPartial: function(q, callback) {
        return this.find({
            $or: [
                { "title": new RegExp(q, "gi") },
                { "body": new RegExp(q, "gi") },
            ]
        }, callback);
    },

    searchFull: function (q, callback) {
        return this.find({
            $text: { $search: q, $caseSensitive: false }
        }, callback)
    },

    search: function(q, callback) {
        this.searchFull(q, (err, data) => {
            if (err) return callback(err, data);
            if (!err && data.length) return callback(err, data);
            if (!err && data.length === 0) return this.searchPartial(q, callback);
        });
    },
}

export default mongoose.models.Post || mongoose.model('Post', PostSchema)

How to use:

import Post from '../models/post'

Post.search('Firs', function(err, data) {
   console.log(data);
})

Solution 3

Without creating index, we could simply use:

db.users.find({ name: /<full_or_partial_text>/i}) (case insensitive)

Solution 4

If you want to use all the benefits of MongoDB's full-text search AND want partial matches (maybe for auto-complete), the n-gram based approach mentioned by Shrikant Prabhu was the right solution for me. Obviously your mileage may vary, and this might not be practical when indexing huge documents.

In my case I mainly needed the partial matches to work for just the title field (and a few other short fields) of my documents.

I used an edge n-gram approach. What does that mean? In short, you turn a string like "Mississippi River" into a string like "Mis Miss Missi Missis Mississ Mississi Mississip Mississipp Mississippi Riv Rive River".

Inspired by this code by Liu Gen, I came up with this method:

function createEdgeNGrams(str) {
    if (str && str.length > 3) {
        const minGram = 3
        const maxGram = str.length
        
        return str.split(" ").reduce((ngrams, token) => {
            if (token.length > minGram) {   
                for (let i = minGram; i <= maxGram && i <= token.length; ++i) {
                    ngrams = [...ngrams, token.substr(0, i)]
                }
            } else {
                ngrams = [...ngrams, token]
            }
            return ngrams
        }, []).join(" ")
    } 
    
    return str
}

let res = createEdgeNGrams("Mississippi River")
console.log(res)

Now to make use of this in Mongo, I add a searchTitle field to my documents and set its value by converting the actual title field into edge n-grams with the above function. I also create a "text" index for the searchTitle field.

I then exclude the searchTitle field from my search results by using a projection:

db.collection('my-collection')
  .find({ $text: { $search: mySearchTerm } }, { projection: { searchTitle: 0 } })

Solution 5

I wrapped @Ricardo Canelas' answer in a mongoose plugin here on npm

Two changes made: - Uses promises - Search on any field with type String

Here's the important source code:

// mongoose-partial-full-search

module.exports = exports = function addPartialFullSearch(schema, options) {
  schema.statics = {
    ...schema.statics,
    makePartialSearchQueries: function (q) {
      if (!q) return {};
      const $or = Object.entries(this.schema.paths).reduce((queries, [path, val]) => {
        val.instance == "String" &&
          queries.push({
            [path]: new RegExp(q, "gi")
          });
        return queries;
      }, []);
      return { $or }
    },
    searchPartial: function (q, opts) {
      return this.find(this.makePartialSearchQueries(q), opts);
    },

    searchFull: function (q, opts) {
      return this.find({
        $text: {
          $search: q
        }
      }, opts);
    },

    search: function (q, opts) {
      return this.searchFull(q, opts).then(data => {
        return data.length ? data : this.searchPartial(q, opts);
      });
    }
  }
}

exports.version = require('../package').version;

Usage

// PostSchema.js
import addPartialFullSearch from 'mongoose-partial-full-search';
PostSchema.plugin(addPartialFullSearch);

// some other file.js
import Post from '../wherever/models/post'

Post.search('Firs').then(data => console.log(data);)
Share:
95,567

Related videos on Youtube

Leonel
Author by

Leonel

Updated on March 23, 2021

Comments

  • Leonel
    Leonel about 3 years

    Env:

    • MongoDB (3.2.0) with Mongoose

    Collection:

    • users

    Text Index creation:

      BasicDBObject keys = new BasicDBObject();
      keys.put("name","text");
    
      BasicDBObject options = new BasicDBObject();
      options.put("name", "userTextSearch");
      options.put("unique", Boolean.FALSE);
      options.put("background", Boolean.TRUE);
      
      userCollection.createIndex(keys, options); // using MongoTemplate
    

    Document:

    • {"name":"LEONEL"}

    Queries:

    • db.users.find( { "$text" : { "$search" : "LEONEL" } } ) => FOUND
    • db.users.find( { "$text" : { "$search" : "leonel" } } ) => FOUND (search caseSensitive is false)
    • db.users.find( { "$text" : { "$search" : "LEONÉL" } } ) => FOUND (search with diacriticSensitive is false)
    • db.users.find( { "$text" : { "$search" : "LEONE" } } ) => FOUND (Partial search)
    • db.users.find( { "$text" : { "$search" : "LEO" } } ) => NOT FOUND (Partial search)
    • db.users.find( { "$text" : { "$search" : "L" } } ) => NOT FOUND (Partial search)

    Any idea why I get 0 results using as query "LEO" or "L"?

    Regex with Text Index Search is not allowed.

    db.getCollection('users')
         .find( { "$text" : { "$search" : "/LEO/i", 
                              "$caseSensitive": false, 
                              "$diacriticSensitive": false }} )
         .count() // 0 results
    
    db.getCollection('users')
         .find( { "$text" : { "$search" : "LEO", 
                              "$caseSensitive": false, 
                              "$diacriticSensitive": false }} )
    .count() // 0 results
    

    MongoDB Documentation:

    • BrTkCa
      BrTkCa almost 7 years
    • Leonel
      Leonel almost 7 years
      This question is related to partial search using a text index and not case sensitive search. @LucasCosta please don't tag this question as duplicated.
    • BrTkCa
      BrTkCa almost 7 years
      It is a possible, needs at least 5 votes @Leonel
    • BrTkCa
      BrTkCa almost 7 years
      Did you tried /LEO/i? You can to use regex in search value in mongodb
    • Leonel
      Leonel almost 7 years
      @LucasCosta text index search does not allow regex.
    • TomoMiha
      TomoMiha almost 5 years
      Search without index: stackoverflow.com/a/48250561/557432
  • WhatsThePoint
    WhatsThePoint over 5 years
    Code only answers arent encouraged as they dont provide much information for future readers please provide some explanation to what you have written
  • Levente Orbán
    Levente Orbán about 5 years
    How can I return a data from the Post.search()?
  • flash
    flash over 4 years
    @LeventeOrbán promises! I'll put an answer below.
  • Gaëtan Boyals
    Gaëtan Boyals over 4 years
    Upvoted, and it works with aqp. Thanks !
  • Dominus Vilicus
    Dominus Vilicus over 4 years
    new RegExp(string, 'i') for anyone who needs dynamic string search
  • Ricardo Canelas
    Ricardo Canelas about 4 years
  • tony Macias
    tony Macias about 4 years
    how do i set a variable in there?
  • vigviswa
    vigviswa about 4 years
    I am using Monk and collection is just the db.get() function, to connect to the database
  • Meir
    Meir over 3 years
    @RicardoCanelas is there a way to add an index on a subdocument field? also what if a field is an array?
  • Exis Zhang
    Exis Zhang over 3 years
    Great answer! Was wondering if this works with async/await function as well. Just tried it, and it didn't work for me.
  • imekinox
    imekinox about 3 years
    Be aware that this is not efficient and scalable as the search is not over an indexed field, for large tables this will be slow.
  • Nice-Guy
    Nice-Guy over 2 years
    Now there is a better way. Checkout Atlas Search in the free tier to improve efficiency: docs.atlas.mongodb.com/atlas-search
  • Stunner
    Stunner over 2 years
    Let's talk about performance.. what is the time taken if I have 1 million records.
  • lucaswxp
    lucaswxp over 2 years
    In my opinion this is the best solution so far, a shame mongo doesn't have ngram out of the box.
  • Djb
    Djb over 2 years
    Thx ! How to get score weight ?
  • Oliver Dixon
    Oliver Dixon about 2 years
    It returns inaccurate results though.. For example, "Large" would include: "Grayson Lars". Honestly Mongo is completely useless for text search, might as well use a side database like ElasticSearch.
  • Johannes Fahrenkrug
    Johannes Fahrenkrug about 2 years
    @OliverDixon Really? That's crazy. Why would large include Grayson Lars? Is it considering that a phonetic match?
  • Oliver Dixon
    Oliver Dixon about 2 years
    @JohannesFahrenkrug I have no idea, but that's what our test data was showing based on the n-grams approuch :-/
  • Alex Totolici
    Alex Totolici about 2 years
    great answer, but how do you handle diacritics for the partial search? :))