SQL Wildcard Search - Efficiency?

22,332

Solution 1

Having the wildcard at the end of the string, like 'abc%', would help if that column were indexed, as it would be able to seek directly to the records which start with 'abc' and ignore everything else. Having the wild card at the beginning means it has to look at every row, regardless of indexing.

Good article here with more explanation.

Solution 2

Only wildcards at the end of a Like character string will use an index.

You should look at using FTS Contains if you want to improve speed of wildcards at the front and back of a character string. Also see this related SO post regarding Contains versus Like.

Solution 3

From Microsoft it is more efficient to leave the closing wildcard because it can, if one exists, use an index rather than performing a scan. Think about how the search might work, if you have no idea what's before it then you have to scan everything, but if you are only searching the tail end then you can order the rows and even possible (depending on what you're looking for) do a quasi-binary search.

Some operators in joins or predicates tend to produce resource-intensive operations. The LIKE operator with a value enclosed in wildcards ("%a value%") almost always causes a table scan. This type of table scan is a very expensive operation because of the preceding wildcard. LIKE operators with only the closing wildcard can use an index because the index is part of a B+ tree, and the index is traversed by matching the string value from left to right.

So, the above quote also explains why there was a huge processor spike when running two wildcards. It completed faster only by happenstance because there is enough horsepower to cover up the inefficiency. When trying to determine performance on a query you want to look at the execution of the query rather than the resources of the server because those can be misleading. If I have a server with enough horsepower to serve a weather vain and I'm running queries on tables as small as 500,000 rows the results are going to be misleading.

Less the fact that Microsoft quoted your answer, when doing performance analysis, consider taking the dive into learning how to read the execution plan. It's an investment and very dry, but it will be worth it in the long run.

In short though, whoever was indicating that the trailing wildcard only is more efficient, is correct.

Share:
22,332
WhoaItsAFactorial
Author by

WhoaItsAFactorial

iOS, PHP, Javscript, Ruby developer. Hockey watcher. Jack Daniels and Guiness (never together) drinker. All around badass.

Updated on March 29, 2020

Comments

  • WhoaItsAFactorial
    WhoaItsAFactorial about 4 years

    There has been a debate at work recently at the most efficient way to search a MS SQL database using LIKE and wildcards. We are comparing using %abc%, %abc, and abc%. One person has said that you should always have the wildcard at the end of the term (abc%). So, according to them, if we wanted to find something that ended in "abc" it'd be most efficient to use `reverse(column) LIKE reverse('%abc').

    I set up a test using SQL Server 2008 (R2) to compare each of the following statements:

    select * from CLMASTER where ADDRESS like '%STREET'
    select * from CLMASTER where ADDRESS like '%STREET%'   
    select * from CLMASTER where ADDRESS like reverse('TEERTS%')  
    select * from CLMASTER where reverse(ADDRESS) like reverse('%STREET')
    

    CLMASTER holds about 500,000 records, there are about 7,400 addresses that end "Street", and about 8,500 addresses that have "Street" in it, but not necessarily at the end. Each test run took 2 seconds and they all returned the same amount of rows except for %STREET%, which found an extra 900 or so results because it picked up addresses that had an apartment number on the end.

    Since the SQL Server test didn't show any difference in execution time I moved into PHP where I used the following code, switching in each statement, to run multiple tests quickly:

    <?php
    
        require_once("config.php");
        $connection = odbc_connect( $connection_string, $U, $P );
    
        for ($i = 0; $i < 500; $i++) {
        $m_time = explode(" ",microtime());
        $m_time = $m_time[0] + $m_time[1];
    
        $starttime = $m_time;
    
        $Message=odbc_exec($connection,"select * from CLMASTER where ADDRESS like '%STREET%'");
        $Message=odbc_result($Message,1);
    
        $m_time = explode(" ",microtime());
        $m_time = $m_time[0] + $m_time[1];
    
        $endtime = $m_time;
    
        $totaltime[] = ($endtime - $starttime);
    
    }
    
    odbc_close($connection);
    
    echo "<b>Test took and average of:</b> ".round(array_sum($totaltime)/count($totaltime),8)." seconds per run.<br>";
    echo "<b>Test took a total of:</b> ".round(array_sum($totaltime),8)." seconds to run.<br>";
    
    ?>
    

    The results of this test was about as ambiguous as the results when testing in SQL Server.

    %STREET completed in 166.5823 seconds (.3331 average per query), and averaged 500 results found in .0228.

    %STREET% completed in 149.4500 seconds (.2989 average per query), and averaged 500 results found in .0177. (Faster time per result because it finds more results than the others, in similar time.)

    reverse(ADDRESS) like reverse('%STREET') completed in 134.0115 seconds (.2680 average per query), and averaged 500 results found in .0183 seconds.

    reverse('TREETS%') completed in 167.6960 seconds (.3354 average per query), and averaged 500 results found in .0229.

    We expected this test to show that %STREET% would be the slowest overall, while it was actually the fastest to run, and had the best average time to return 500 results. While the suggested reverse('%STREET') was the fastest to run overall, but was a little slower in time to return 500 results.

    Extra fun: A coworker ran profiler on the server while we were running the tests and found that the use of the double wildcard produced a significant increase CPU usage, while the other tests were within 1-2% of each other.

    Are there any SQL Efficiency experts out that that can explain why having the wildcard at the end of the search string would be better practice than the beginning, and perhaps why searching with wildcards at the beginning and end of the string was faster than having the wildcard just at the beginning?

  • Adam Robinson
    Adam Robinson almost 12 years
    Which, by extension, means that doing something like reverse(col) like 'abc%' is a bad idea.
  • Bort
    Bort almost 12 years
    Yes, REVERSE or any other computation that changes the indexed column means you lose sargability.
  • Mike Perrenoud
    Mike Perrenoud almost 12 years
    @Jeremy1026 - I've updated my answer with a bit more clarification regarding the results of the servers performance usage.
  • WhoaItsAFactorial
    WhoaItsAFactorial almost 12 years
    Thanks for the answer/comments provided
  • WhoaItsAFactorial
    WhoaItsAFactorial almost 12 years
    Thanks for the answer you provided.
  • WhoaItsAFactorial
    WhoaItsAFactorial almost 12 years
    Thank you for the answer provided, unfortunately switching to Contains isn't a viable solution for us since we would need to full text index quite a few (into the hundreds) tables to make it a viable solution. And we often search for specific substrings and other items.
  • luky
    luky about 4 years
    @AdamRobinson no reverse isnt bad idea, i use it too, just store the reversed values in new column
  • Matěj Štágl
    Matěj Štágl over 2 years
    @SliverNinja wildcard at the front is not supported by contains