T-SQL - How can I make a SELECT query with multiple LIKE clauses quicker?
Solution 1
Since you get multiple hits on multiple words, you can use the selected ID's as a filter for the actual selection of the articles:
Select TOP 5 *
from NewsArticles
where ID in (SELECT ID
FROM NewsArticles as n
JOIN #SearchItem as s
ON n.headline COLLATE DATABASE_DEFAULT LIKE '% ' + s.Search + ' %' OR
n.body COLLATE DATABASE_DEFAULT LIKE '% ' + s.Search + ' %'
)
ORDER BY publishDate DESC
It should still be reasonably fast (compared to the original query) and duplicate-free.
(as in Rawheiser's response, there is an assumption that an ID field actually exists :))
Solution 2
If you are doing these types of searches, you should use full text search. You need to read up in BOL about how to set this up as it is complicated. However when you have a wildcard as the first character, then SQL server cannot use indexes which is why this is slow.
Solution 3
If there is an news Article key you can use a query to that joins back to itself such as:
select top 5 *
from NewsArticles as na
join
(
SELECT distinct idNo , publishDate
FROM NewsArticles as n
JOIN #SearchItem as s
ON n.headline COLLATE DATABASE_DEFAULT LIKE '% ' + s.Search + ' %' OR
n.body COLLATE DATABASE_DEFAULT LIKE '% ' + s.Search + ' %'
) as sk
on sk.idNo = na.idNo
ORDER BY sk.publishDate DESC
Solution 4
You could also try Full-Text Search. Something like
SELECT TOP 5 * FROM NewsArticles
WHERE CONTAINS((headline,body), 'FORMSOF(INFLECTIONAL, sustainable) OR conservation OR FORMSOF(INFLECTIONAL, environmental)')
But, as HLGEM says, it will be important to read up on setting up FTS. Just creating an index for those two columns may be enough since you are searching for single words, but once you add phrases you want to start editing stop words and breakers and noise words.
sr28
Updated on June 09, 2022Comments
-
sr28 about 2 years
I've got a search function for news articles that looks like this (contains more than 5 search items):
SELECT TOP 5 * FROM NewsArticles WHERE (headline LIKE '% sustainable %'OR headline LIKE '% sustainability %' OR headline LIKE '% conservation %' OR headline LIKE '% environmental % OR headline LIKE '% environmentally %') OR (body LIKE '% sustainable %'OR body LIKE '% sustainability %' OR body LIKE '% conservation %' OR body LIKE '% environmental % OR body LIKE '% environmentally %') ORDER BY publishDate DESC
This query is designed to pull out the top 5 news stories relating to sustainability and sits on my main sustainability homepage. However, it takes a while to run and the page is slow to load. So I'm looking up ways to speed this up. Having so many LIKE clauses seems cumbersome so I've tried something with a JOIN like this:
CREATE TABLE #SearchItem (Search varchar(255)) INSERT INTO #SearchItem VALUES ('sustainable'), ('sustainability'), ('conservation'), ('environmental'), ('environmentally') SELECT TOP 5 * FROM NewsArticles as n JOIN #SearchItem as s ON n.headline COLLATE DATABASE_DEFAULT LIKE '% ' + s.Search + ' %' OR n.body COLLATE DATABASE_DEFAULT LIKE '% ' + s.Search + ' %' ORDER BY n.publishDate DESC
This seems to work very well for performance, but seems to sometimes bring back duplicate articles where one of the search words appears in both the body and the headline (which is often the case). I've tried using the word using 'SELECT DISTINCT TOP 5 *' but this gives me an error saying 'The ntext data type cannot be selected as DISTINCT because it is not comparable'. Is there away of stopping this from bringing back duplicates without doing 2 separate searches and using UNION?
-
Tulains Córdova almost 11 yearsWhat's BOL and what's upas ?
-
dburges almost 11 yearsBOL is Books online which is the SQL Server help. Upas was a typo for up as
-
sr28 almost 11 yearsThanks but whilst the query worked, it took far too long to run (23 seconds). Still, it was faster than my original :)
-
Rawheiser almost 11 yearsAs always is the case with advice - "your mileage may vary".
-
WonderWorker about 8 yearsBOL for us means Boots opticians