SQL String comparison speed 'like' vs 'patindex'

13,538

Solution 1

That kind of repeatable difference in performance is most likely due to a difference in the execution plans for the two queries.

Have SQL Server return the actual execution plan when each query is run, and compare the execution plans.

Also, run each query twice, and throw out the timing for the first run, when you compare the performance of the two queries. (The first query run may include a lot of heavy lifting (statement parsing and database i/o). The second run will give you an elapsed time that is more validly compared to the other query.

Can anyone explain why LIKE is so much slower than PATINDEX?

The execution plan for each query will likely explain the difference.

Is it simply a matter of how efficiently the two functions have been written?

It's not really a matter of how efficiently the functions are written. What really matters is the generated execution plan. What matters is if the predicates are sargable and whether the optimizer chooses to use available indexes.


[EDIT]

In the quick test I ran, I see a difference in the execution plans. With the LIKE operator in the join predicate, the plan includes a "Table Spool (Lazy Spool)" operation on table2 after the "Computer Scalar" operation. With the PATINDEX function, I don't see a "Table Spool" operation in the plan. But the plans I'm getting may be significantly different than the plans you get, given differences in the queries, tables, indexes and statistics.

[EDIT]

The only difference I see in the execution plan output for the two queries (aside from expression placeholder names) is the calls to the three internal functions (LikeRangeStart, LikeRangeEnd, and LikeRangeInfo in place of one call to the PATINDEX function. These functions appear to be called for each row in a result set, and the resulting expression are used for scan of the inner table in a nested loop.

So, it does look as if the three function calls for the LIKE operator could be more expensive (elapsed time wise) than the single call to the PATINDEX function. (The explain plan shows those functions being called for each row in the outer resultset of a nested loop join; for a large number of rows, even a slight difference in the elapsed time could be multiplied enough times to exhibit a significant performance difference.)


After running some test cases on my system, I'm still baffled at the results you are seeing.

Maybe it is an issue with the performance of the calls to the PATINDEX function vs. the calls to the three internal functions (LikeRangeStart, LikeRangeEnd, LikeRangeInfo.)

It's possible that with those performed on a "large" enough result set, a small difference in elapsed time could be multiplied into a significant difference.

But I actually find it to be somewhat surprising that a query using the LIKE operator would take significantly longer to execute than an equivalent query using the PATINDEX function.

Solution 2

I'm not at all convinced by the thesis that it is the extra overhead of the LikeRangeStart, LikeRangeEnd, LikeRangeInfo functions that is responsible for the time discrepancy.

It is simply not reproducible (at least in my test, default collation etc). When I try the following

SET STATISTICS IO OFF;
SET STATISTICS TIME OFF;

DECLARE @T TABLE (name sysname )
INSERT INTO @T
SELECT TOP 2500 name + '...' + 
   CAST(ROW_NUMBER() OVER (ORDER BY (SELECT 0)) AS VARCHAR)
FROM sys.all_columns

SET STATISTICS IO ON;
SET STATISTICS TIME ON;
PRINT '***'
SELECT     COUNT(*)
FROM       @T AS a
INNER JOIN @T AS b ON (a.name LIKE '%' + b.name + '%')

PRINT '***'
SELECT     COUNT(*)
FROM       @T AS a
INNER JOIN @T AS b ON (PATINDEX('%' + b.name + '%', a.name) > 0)

Which gives essentially the same plan for both but also contains these various internal functions I get the following.

LIKE

Table '#5DB5E0CB'. Scan count 2, logical reads 40016
CPU time = 26953 ms,  elapsed time = 28083 ms.

PATINDEX

Table '#5DB5E0CB'. Scan count 2, logical reads 40016
CPU time = 28329 ms,  elapsed time = 29458 ms.

I do notice however that if I substitute a #temp table instead of the table variable the estimated number of rows going into the stream aggregate is significantly different.

The LIKE version has an estimated 330,596 and PATINDEX an estimated 1,875,000.

I notice you also have a hash join in your plan. Possibly because the PATINDEX version seems to estimate a greater number of rows than LIKE this query gets a larger memory grant so doesn't have to spill the hash operation to disc. Try tracing the hash warnings in Profiler to see if this is the case.

Solution 3

Perhaps this is a question of DB Caching...

Try out reset cache before running each query using DBCC helpers:

Share:
13,538
David Roberts
Author by

David Roberts

Updated on June 10, 2022

Comments

  • David Roberts
    David Roberts almost 2 years

    I had a query as follows (simplified)...

    SELECT     *
    FROM       table1 AS a
    INNER JOIN table2 AS b ON (a.name LIKE '%' + b.name + '%')
    

    For my dataset this was taking around 90 seconds to execute, so I have been looking for ways of speeding it up. For no good reason, I thought I'd try PATINDEX instead of LIKE...

    SELECT     *
    FROM       table1 AS a
    INNER JOIN table2 AS b ON (PATINDEX('%' + b.name + '%', a.name) > 0)
    

    On the same dataset this executes in the blink of an eye and returns the same results.

    Can anyone explain why LIKE is so much slower than PATINDEX? Given that LIKE is just returning a BOOLEAN whereas PATINDEX is returning the actual location I would have expected the latter to be slower if anything, or is it simply a matter of how efficiently the two functions have been written?

    Ok, here is each query in full, followed by its execution plan. "#StakeholderNames" is just a temp table of likely names which I am matching against.

    I have pulled back the live data and run each query several times. The first is taking about 17 seconds (so somewhat less than the original 90 seconds on the live database) and the second under 1 second...

    SELECT              sh.StakeholderID,
                        sh.HoldingID,
                        i.AgencyCommissionImportID,
                        1
    
        FROM            AgencyCommissionImport AS i
        INNER JOIN      #StakeholderNames AS sn ON REPLACE(REPLACE(i.ClientName,' ',''), ',','') LIKE '%' + sn.Name + '%'
        INNER JOIN      Holding AS h ON (h.ProviderName = i.Provider) AND (h.HoldingReference = i.PlanNumber)
        INNER JOIN      StakeholderHolding AS sh ON (sn.StakeholderID = sh.StakeholderID) AND (h.HoldingID = sh.HoldingID)
        WHERE           i.AgencyCommissionFileID = @AgencyCommissionFileID
                    AND (i.MatchTypeID = 0)
                    AND ((i.MatchedHoldingID IS NULL)
                        OR (i.MatchedStakeholderID IS NULL))
    
       |--Table Insert(OBJECT:([tempdb].[dbo].[#Results]), SET:([#Results].[StakeholderID] = [AttivoGroup_copy].[dbo].[StakeholderHolding].[StakeholderID] as [sh].[StakeholderID],[#Results].[HoldingID] = [AttivoGroup_copy].[dbo].[StakeholderHolding].[HoldingID] as [sh].[HoldingID],[#Results].[AgencyCommissionImportID] = [AttivoGroup_copy].[dbo].[AgencyCommissionImport].[AgencyCommissionImportID] as [i].[AgencyCommissionImportID],[#Results].[MatchTypeID] = [Expr1014],[#Results].[indx] = [Expr1013]))
            |--Compute Scalar(DEFINE:([Expr1014]=(1)))
                 |--Compute Scalar(DEFINE:([Expr1013]=getidentity((1835869607),(2),N'#Results')))
                      |--Top(ROWCOUNT est 0)
                           |--Hash Match(Inner Join, HASH:([h].[ProviderName], [h].[HoldingReference])=([i].[Provider], [i].[PlanNumber]), RESIDUAL:([AttivoGroup_copy].[dbo].[Holding].[ProviderName] as [h].[ProviderName]=[AttivoGroup_copy].[dbo].[AgencyCommissionImport].[Provider] as [i].[Provider] AND [AttivoGroup_copy].[dbo].[Holding].[HoldingReference] as [h].[HoldingReference]=[AttivoGroup_copy].[dbo].[AgencyCommissionImport].[PlanNumber] as [i].[PlanNumber] AND [Expr1015] like [Expr1016]))
                                |--Nested Loops(Inner Join, OUTER REFERENCES:([sh].[HoldingID]))
                                |    |--Nested Loops(Inner Join, OUTER REFERENCES:([sn].[StakeholderID]))
                                |    |    |--Compute Scalar(DEFINE:([Expr1016]=('%'+#StakeholderNames.[Name] as [sn].[Name])+'%', [Expr1017]=LikeRangeStart(('%'+#StakeholderNames.[Name] as [sn].[Name])+'%'), [Expr1018]=LikeRangeEnd(('%'+#StakeholderNames.[Name] as [sn].[Name])+'%'), [Expr1019]=LikeRangeInfo(('%'+#StakeholderNames.[Name] as [sn].[Name])+'%')))
                                |    |    |    |--Table Scan(OBJECT:([tempdb].[dbo].[#StakeholderNames] AS [sn]))
                                |    |    |--Clustered Index Seek(OBJECT:([AttivoGroup_copy].[dbo].[StakeholderHolding].[PK_StakeholderHolding] AS [sh]), SEEK:([sh].[StakeholderID]=#StakeholderNames.[StakeholderID] as [sn].[StakeholderID]) ORDERED FORWARD)
                                |    |--Clustered Index Seek(OBJECT:([AttivoGroup_copy].[dbo].[Holding].[PK_Holding] AS [h]), SEEK:([h].[HoldingID]=[AttivoGroup_copy].[dbo].[StakeholderHolding].[HoldingID] as [sh].[HoldingID]) ORDERED FORWARD)
                                |--Compute Scalar(DEFINE:([Expr1015]=replace(replace([AttivoGroup_copy].[dbo].[AgencyCommissionImport].[ClientName] as [i].[ClientName],' ',''),',','')))
                                     |--Clustered Index Scan(OBJECT:([AttivoGroup_copy].[dbo].[AgencyCommissionImport].[PK_AgencyCommissionImport] AS [i]), WHERE:([AttivoGroup_copy].[dbo].[AgencyCommissionImport].[AgencyCommissionFileID] as [i].[AgencyCommissionFileID]=[@AgencyCommissionFileID] AND [AttivoGroup_copy].[dbo].[AgencyCommissionImport].[MatchTypeID] as [i].[MatchTypeID]=(0) AND ([AttivoGroup_copy].[dbo].[AgencyCommissionImport].[MatchedHoldingID] as [i].[MatchedHoldingID] IS NULL OR [AttivoGroup_copy].[dbo].[AgencyCommissionImport].[MatchedStakeholderID] as [i].[MatchedStakeholderID] IS NULL)))
    
    
    SELECT              sh.StakeholderID,
                        sh.HoldingID,
                        i.AgencyCommissionImportID,
                        1
    
        FROM            AgencyCommissionImport AS i
        INNER JOIN      #StakeholderNames AS sn ON (PATINDEX('%' + sn.Name + '%', REPLACE(REPLACE(i.ClientName,' ',''), ',','')) > 0)
        INNER JOIN      Holding AS h ON (h.ProviderName = i.Provider) AND (h.HoldingReference = i.PlanNumber)
        INNER JOIN      StakeholderHolding AS sh ON (sn.StakeholderID = sh.StakeholderID) AND (h.HoldingID = sh.HoldingID)
        WHERE           i.AgencyCommissionFileID = @AgencyCommissionFileID
                    AND (i.MatchTypeID = 0)
                    AND ((i.MatchedHoldingID IS NULL)
                        OR (i.MatchedStakeholderID IS NULL))
    
       |--Table Insert(OBJECT:([tempdb].[dbo].[#Results]), SET:([#Results].[StakeholderID] = [AttivoGroup_copy].[dbo].[StakeholderHolding].[StakeholderID] as [sh].[StakeholderID],[#Results].[HoldingID] = [AttivoGroup_copy].[dbo].[StakeholderHolding].[HoldingID] as [sh].[HoldingID],[#Results].[AgencyCommissionImportID] = [AttivoGroup_copy].[dbo].[AgencyCommissionImport].[AgencyCommissionImportID] as [i].[AgencyCommissionImportID],[#Results].[MatchTypeID] = [Expr1014],[#Results].[indx] = [Expr1013]))
            |--Compute Scalar(DEFINE:([Expr1014]=(1)))
                 |--Compute Scalar(DEFINE:([Expr1013]=getidentity((1867869721),(2),N'#Results')))
                      |--Top(ROWCOUNT est 0)
                           |--Hash Match(Inner Join, HASH:([h].[ProviderName], [h].[HoldingReference])=([i].[Provider], [i].[PlanNumber]), RESIDUAL:([AttivoGroup_copy].[dbo].[Holding].[ProviderName] as [h].[ProviderName]=[AttivoGroup_copy].[dbo].[AgencyCommissionImport].[Provider] as [i].[Provider] AND [AttivoGroup_copy].[dbo].[Holding].[HoldingReference] as [h].[HoldingReference]=[AttivoGroup_copy].[dbo].[AgencyCommissionImport].[PlanNumber] as [i].[PlanNumber] AND patindex([Expr1015],[Expr1016])>(0)))
                                |--Nested Loops(Inner Join, OUTER REFERENCES:([sh].[HoldingID]))
                                |    |--Nested Loops(Inner Join, OUTER REFERENCES:([sn].[StakeholderID]))
                                |    |    |--Compute Scalar(DEFINE:([Expr1015]=('%'+#StakeholderNames.[Name] as [sn].[Name])+'%'))
                                |    |    |    |--Table Scan(OBJECT:([tempdb].[dbo].[#StakeholderNames] AS [sn]))
                                |    |    |--Clustered Index Seek(OBJECT:([AttivoGroup_copy].[dbo].[StakeholderHolding].[PK_StakeholderHolding] AS [sh]), SEEK:([sh].[StakeholderID]=#StakeholderNames.[StakeholderID] as [sn].[StakeholderID]) ORDERED FORWARD)
                                |    |--Clustered Index Seek(OBJECT:([AttivoGroup_copy].[dbo].[Holding].[PK_Holding] AS [h]), SEEK:([h].[HoldingID]=[AttivoGroup_copy].[dbo].[StakeholderHolding].[HoldingID] as [sh].[HoldingID]) ORDERED FORWARD)
                                |--Compute Scalar(DEFINE:([Expr1016]=replace(replace([AttivoGroup_copy].[dbo].[AgencyCommissionImport].[ClientName] as [i].[ClientName],' ',''),',','')))
                                     |--Clustered Index Scan(OBJECT:([AttivoGroup_copy].[dbo].[AgencyCommissionImport].[PK_AgencyCommissionImport] AS [i]), WHERE:([AttivoGroup_copy].[dbo].[AgencyCommissionImport].[AgencyCommissionFileID] as [i].[AgencyCommissionFileID]=[@AgencyCommissionFileID] AND [AttivoGroup_copy].[dbo].[AgencyCommissionImport].[MatchTypeID] as [i].[MatchTypeID]=(0) AND ([AttivoGroup_copy].[dbo].[AgencyCommissionImport].[MatchedHoldingID] as [i].[MatchedHoldingID] IS NULL OR [AttivoGroup_copy].[dbo].[AgencyCommissionImport].[MatchedStakeholderID] as [i].[MatchedStakeholderID] IS NULL)))
    
  • David Roberts
    David Roberts over 12 years
    Hmm! I've just re-run the queries with a similar size (but different) data set on our existing test database and there is nowhere near as great a differnec. So, it looks like it might be dependent on the actual data. As I said above I'll pull back the live data as soon as I can to replicate exactly what I did before and see what the execution plans look like.
  • David Roberts
    David Roberts over 12 years
    By the way, I did run the queries several time each with the same results (when I had the original data to play with!).
  • Martin Smith
    Martin Smith over 12 years
    I don't agree with this analysis. Can you provide a repro where the presence of these internal functions with otherwise similar plans makes any significant difference whatsoever?