T-SQL - How can I make a SELECT query with multiple LIKE clauses quicker?

15,880

Solution 1

Since you get multiple hits on multiple words, you can use the selected ID's as a filter for the actual selection of the articles:

Select TOP 5 * 
from NewsArticles 
where ID in (SELECT ID
    FROM NewsArticles as n
    JOIN #SearchItem as s
    ON n.headline COLLATE DATABASE_DEFAULT LIKE '% ' + s.Search + ' %' OR 
       n.body COLLATE DATABASE_DEFAULT LIKE '% ' + s.Search + ' %'
)
ORDER BY publishDate DESC

It should still be reasonably fast (compared to the original query) and duplicate-free.

(as in Rawheiser's response, there is an assumption that an ID field actually exists :))

Solution 2

If you are doing these types of searches, you should use full text search. You need to read up in BOL about how to set this up as it is complicated. However when you have a wildcard as the first character, then SQL server cannot use indexes which is why this is slow.

Solution 3

If there is an news Article key you can use a query to that joins back to itself such as:

select top 5 *
from NewsArticles as na
join 
    ( 
        SELECT distinct idNo , publishDate 
        FROM NewsArticles as n
        JOIN #SearchItem as s
        ON n.headline COLLATE DATABASE_DEFAULT LIKE '% ' + s.Search + ' %' OR 
        n.body COLLATE DATABASE_DEFAULT LIKE '% ' + s.Search + ' %'
    ) as sk
        on sk.idNo = na.idNo
ORDER BY sk.publishDate DESC

Solution 4

You could also try Full-Text Search. Something like

SELECT TOP 5 * FROM NewsArticles 
WHERE CONTAINS((headline,body), 'FORMSOF(INFLECTIONAL, sustainable) OR conservation OR FORMSOF(INFLECTIONAL, environmental)')

But, as HLGEM says, it will be important to read up on setting up FTS. Just creating an index for those two columns may be enough since you are searching for single words, but once you add phrases you want to start editing stop words and breakers and noise words.

Share:
15,880
sr28
Author by

sr28

Updated on June 09, 2022

Comments

  • sr28
    sr28 about 2 years

    I've got a search function for news articles that looks like this (contains more than 5 search items):

    SELECT TOP 5 *
    FROM NewsArticles
    WHERE (headline LIKE '% sustainable %'OR
    headline LIKE '% sustainability %' OR
    headline LIKE '% conservation %' OR
    headline LIKE '% environmental % OR
    headline LIKE '% environmentally %') 
    OR
    (body LIKE '% sustainable %'OR
    body LIKE '% sustainability %' OR
    body LIKE '% conservation %' OR
    body LIKE '% environmental % OR
    body LIKE '% environmentally %')
    ORDER BY publishDate DESC 
    

    This query is designed to pull out the top 5 news stories relating to sustainability and sits on my main sustainability homepage. However, it takes a while to run and the page is slow to load. So I'm looking up ways to speed this up. Having so many LIKE clauses seems cumbersome so I've tried something with a JOIN like this:

    CREATE TABLE #SearchItem (Search varchar(255))
    
    INSERT INTO #SearchItem VALUES
    ('sustainable'),
    ('sustainability'),
    ('conservation'),
    ('environmental'),
    ('environmentally')
    
    SELECT TOP 5 *
    FROM NewsArticles as n
    JOIN #SearchItem as s
    ON n.headline COLLATE DATABASE_DEFAULT LIKE '% ' + s.Search + ' %' OR 
    n.body COLLATE DATABASE_DEFAULT LIKE '% ' + s.Search + ' %'
    ORDER BY n.publishDate DESC
    

    This seems to work very well for performance, but seems to sometimes bring back duplicate articles where one of the search words appears in both the body and the headline (which is often the case). I've tried using the word using 'SELECT DISTINCT TOP 5 *' but this gives me an error saying 'The ntext data type cannot be selected as DISTINCT because it is not comparable'. Is there away of stopping this from bringing back duplicates without doing 2 separate searches and using UNION?

  • Tulains Córdova
    Tulains Córdova almost 11 years
    What's BOL and what's upas ?
  • dburges
    dburges almost 11 years
    BOL is Books online which is the SQL Server help. Upas was a typo for up as
  • sr28
    sr28 almost 11 years
    Thanks but whilst the query worked, it took far too long to run (23 seconds). Still, it was faster than my original :)
  • Rawheiser
    Rawheiser almost 11 years
    As always is the case with advice - "your mileage may vary".
  • WonderWorker
    WonderWorker about 8 years
    BOL for us means Boots opticians