Performance of SQL comparison using substring vs like with wildcard

25,206

Solution 1

I found this reference in an IBM redbook related to SQL performance. It sounds like the SUBSTR scalar function can be handled in an optimized manner by an iSeries.

If you search for the first character and want to use the SQE instead of the CQE, you can use the scalar function substring on the left sign of the equal sign. If you have to search for additional characters in the string, you can additionally use the scalar function POSSTR. By splitting the LIKE predicate into several scalar function, you can affect the query optimizer to use the SQE.

http://publib-b.boulder.ibm.com/abstracts/sg246654.html?Open

Solution 2

I ran the following in the SQL Advisor in IBM Data Studio on one of the tables in my DB2 LUW 10.1 database:

SELECT *
FROM PDM.DB30
WHERE DB30_SYSTEM_ID = 'XXX'
    AND DB30_VERSION_ID = 'YYY'
    AND SUBSTR(DB30_REL_TABLE_NM, 1, 4) = 'ZZZZ'

and

SELECT * 
FROM PDM.DB30 
WHERE DB30_SYSTEM_ID = 'XXX' 
    AND DB30_VERSION_ID = 'YYY' 
    AND DB30_REL_TABLE_NM LIKE 'ZZZZ%' 

They both had the exact same access path utilizing the same index, the same estimated IO cost and the same estimated cardinality, the only difference being the estimated total CPU cost for the LIKE was 178,343.75 while the SUBSTR was 197,518.48 (~10% difference).

The cumulative total cost for both were the same though, so this difference is negligible as per the advisor.

Solution 3

Yes, Method 2 would be faster. LIKE is not as efficient a function.

To compare performance of various techniques, try using Visual Explain. You will find it buried in System i Navigator. Under your system connection, expand databases, then click onyour RDB name. In the lower right pane you can then click on the option to Run an SQL Script. Enter in your SELECT statement, and choose the menu option for Visual Explain or Run and Explain. Visual explain will break down the execution plan for your statement and show you the cost for each part as estimated on your tables with the indexes available.

Solution 4

You can actually run with real examples in your database.

LIKE is always better at my run.

select count(*) from u_log where log_text like 'AUT%';
1 row(s) returned : 90ms taken

select count(*) from u_log where substr(log_text,1,3)='AUT';
1 row(s) returned : 493ms taken
Share:
25,206
Swoop
Author by

Swoop

Web Programmer, currently using ASP.Net with C#.

Updated on September 25, 2020

Comments

  • Swoop
    Swoop over 3 years

    I am working on a join condition between 2 tables where one of the columns to match on is a concatentation of values. I need to join columnA from tableA to the first 2 characters of columnB from tableB.

    I have developed 2 different statements to handle this and I have tried to analyze the performance of each method.

    Method 1:

    ON tB.columnB   like  tA.columnA || '%'
    

    Method 2:

    ON substr(tB.columnB,1,2) = tA.columnA
    

    The query execution plan has a lot less steps using Method 1 compared to Method 2, however, it looks like Method 2 executes much faster. Also, the execution plan shows a recommended index for Method 2 that could improve its performance.

    I am running this on an IBM iSeries, though would be interested in answers in a general sense to learn more about sql query optimization.

    Does it make sense that Method 2 would execute faster?

    This SO question is similar, but it looks like no one provided any concrete answers to the performance difference of these approaches: T-SQL speed comparison between LEFT() vs. LIKE operator.

    PS: The table design that requires this type of join is not something that I can get changed at this time. I realize having the fields separated which hold different types of data would be preferrable.

  • Swoop
    Swoop over 12 years
    I have been using Visual Explain to help optimize my queries, but I am still trying to learn how to make the most of this tool. Do you know of any advanced documentation for it? My google searches thus far have only found basic likes, like how to load Visual Explain.
  • Larry Lustig
    Larry Lustig over 12 years
    LIKE can be quite efficient if the wildcard is at the end of the comparison string and the engine understands to use an available index for the comparison.
  • WarrenT
    WarrenT over 12 years
    @Larry are you saying under some circumstances that the optimizer would understand a wildcard at the end to be equivalent to LEFT()? Can you provide any example where it would more efficient?
  • Larry Lustig
    Larry Lustig over 12 years
    Actually, it would surprise me if any major SQL engine did not understand that any non-wildcard characters at the start of a LIKE comparator could use an indexed search if the LIKE column was indexed. That's a pretty basic optimization. I would normally expect the LIKE to be optimized and the LEFT() to be missed by the optimizer (I don't know if optimizers in general understand the effects of non-SQL functions in queries).