When should you use full-text indexing?

50,651

Solution 1

It will depend upon your DBMS. I believe that most systems will not take advantage of the full-text index unless you use the full-text functions. (e.g. MATCH/AGAINST in mySQL or FREETEXT/CONTAINS in MS SQL)

Here is two good articles on when, why, and how to use full-text indexing in SQL Server:

  1. How To Use SQL Server Full-Text Searching
  2. Solving Complex SQL Problems with Full-Text Indexing

Solution 2

FTS can help in this scenario, the question is whether it is worth it or not.

To begin with, let's look at why LIKE may not be the most effective search. When you use LIKE, especially when you are searching with a % at the beginning of your comparison, SQL Server needs to perform both a table scan of every single row and a byte by byte check of the column you are checking.

FTS has some better algorithms for matching data as does some better statistics on variations of names. Therefore FTS can provide better performance for matching Smith, Smythe, Smithers, etc when you look for Smith.

It is, however, a bit more complex to use FTS, as you'll need to master CONTAINS vs FREETEXT and the arcane format of the search. However, if you want to do a search where either FName or LName match, you can do that with one statement instead of an OR.

To determine if FTS is going to be effective, determine how much data you have. I use FTS on a database of several hundred million rows and that's a real benefit over searching with LIKE, but I don't use it on every table.

If your table size is more reasonable, less than a few million, you can get similar speed by creating an index for each column that you're going to be searching on and SQL Server should perform an index scan rather than a table scan.

Solution 3

According to my test scenario:

  • SQL Server 2008
  • 10.000.000 rows each with a string like "wordA wordB wordC..." (varies between 1 and 30 words)
  • selecting count(*) with CONTAINS(column, "wordB")
  • result size several hundred thousands
  • catalog size approx 1.8GB

Full-text index was in range of 2s whereas like '% wordB %' was in range of 1-2 minutes.

But this counts only if you don't use any additional selection criteria! E.g. if I used some "like 'prefix%'" on a primary key column additionally, the performance was worse since the operation of going into the full-text index costs more than doing a string search in some fields (as long those are not too much).

So I would recommend full-text index only in cases where you have to do a "free string search" or use some of the special features of it...

Solution 4

To answer the question specifically for MSSQL, full-text indexing will NOT help in your scenario.

In order to improve that query you could do one of the following:

  1. Configure a full-text catalog on the column and use the CONTAINS() function.
  2. If you were primarily searching with a prefix (i.e. matching from the start of the name), you could change the predicate to the following and create an index over the column.

    where fname like 'prefix%'

(1) is probably overkill for this, unless the performance of the query is a big problem.

Share:
50,651
Eduardo Scoz
Author by

Eduardo Scoz

Code makes me happy.

Updated on July 09, 2022

Comments

  • Eduardo Scoz
    Eduardo Scoz almost 2 years

    We have a whole bunch of queries that "search" for clients, customers, etc. You can search by first name, email, etc. We're using LIKE statements in the following manner:

    SELECT * 
    FROM customer 
    WHERE fname LIKE '%someName%'
    

    Does full-text indexing help in the scenario? We're using SQL Server 2005.

  • Brannon
    Brannon over 15 years
    To clarify, FTS would not help without changing the existing query.
  • Josef
    Josef over 15 years
    FTS would not help without changing the existing query. But, if your dataset is small enough, you can use LIKE without a problem for some time.