Match comma separated values in column

24,869

Solution 1

Disclaimer: The commentators are right... CSVs in a single field are a horrible design, and should be re-done.

With that said, here's how you can work around your problem:

Pad Categories with leading and trailing ,, that way you can include them in your wildcard search:

WHERE (',' + Categories + ',') LIKE '%,science,%'

Solution 2

Use FIND_IN_SET(,)

SQL:

SELECT name FROM orders,company
WHERE orderID = 1 
AND 
FIND_IN_SET(companyID, attachedCompanyIDs)

or can check this link FIND_IN_SET() vs IN()

Solution 3

This question is visible on google and has many views, so I want to share my approach to this problem. I had to deal with such a poor design as comma-separated values stored as strings too. I came across this issue while tweaking a CMS's plugin responsible for tags.

Yeah, tags related to a site article were stored like this: "tag1,tag2,...,tagN". So, getting the exact match wasn't as trivial as it might have initially appeared: using simple LIKE, with articles tagged "ball" I also got ones tagged "football" and "ballroom". Not critical, but rather annoying.

FIND_IN_SET function seemed awesome at first but then it turned out that it doesn't use index and doesn't work properly if the first argument contains a comma character.

I had no desire to alter the plugin itself or deeper CMS core functionality which that plugin had been built upon.

Also it is worth noting that needed tag (substring) can be the first, the last element in the string or can be somewhere in the middle, so this piece of code WHERE (',' + Categories + ',') LIKE '%,science,%' doesn't cover all three cases.

Finally, I ended up with very simple solution. It worked for me like this:

... WHERE tags LIKE 'ball,%' OR tags LIKE '%,ball,%' OR tags LIKE '%,ball'

All theree cases covered; commas used as delimiters. Hope it helps others who came across similar pitfall.

PS. I am not a MySQL/DB expert at all and I would love to read about potential drawbacks of this approach especially on really huge tables (which wasn't my case, btw). I just shared the results of my small research and what I did to solve this problem with minimal efforts.

Solution 4

I propose a 4x WHERE that can match any of the possible cases: value alone, value at the start, middle or end of the csv:

WHERE Categories = 'science'     /* CSV containing only the one value */
OR Categories LIKE 'science,%'   /* value at start of CSV */
OR Categories LIKE '%,science,%' /* value somewhere in the middle */
OR Categories LIKE '%,science'   /* value at the end of CSV */

This way all 'science' rows should be selected but none of the 'poo_science' rows.

Solution 5

I've made some assumptions about your data layout. Try this - using SQL Server 2K8+ this should work:

DECLARE @SearchString NVarChar(100) = 'maths';

SELECT 1 SomeId, 'science,maths,english' Categories
INTO #TestTable;

WITH R AS (
  SELECT
    X.SomeId,
    C.value('@value', 'NVarChar(100)') SomeTagValue
  FROM (SELECT SomeId,
          CONVERT(XML, '<tag value = "' + REPLACE(Categories, ',', '" /><tag value = "') + '" />') XMLValue
        FROM #TestTable) X CROSS APPLY X.XMLValue.nodes('//tag') T(C)
)
SELECT *
FROM R
WHERE SomeTagValue = @SearchString;

DROP TABLE #TestTable;

It's definitely not going to be super-efficient or very scalable, but then working against denormalized data tends to inherently have those issues.

Share:
24,869
Alex Guerin
Author by

Alex Guerin

Updated on January 09, 2021

Comments

  • Alex Guerin
    Alex Guerin over 3 years

    If I have a column called 'Categories' with say science,maths,english in the row comma-separated as shown, how would I match all rows with the category containing maths?

    I've tried a simple LIKE but it is not quite accurate as there may be 'poo_science' which when searching for '%science%' would match both.

    I've looked around StackOverflow and there are plenty of similar questions but all seem to want to return data as a comma separated list or something - not quite what I'm after.

    I'd prefer not to use a stored procedure and cannot use full-text searching. I have a stored procedure I used which added another character ('$') around each value and then would search for '$value$'... is this too nasty? I'm after a little more simple method.