How to compare if two strings contain the same words in T-SQL for SQL Server 2008?

13,722

Solution 1

I don't think there is a simple solution for what you are trying to do in SQL Server. My first thought would be to create a CLR UDF that:

  1. Accepts two strings
  2. Breaks them into two arrays using the split function on " "
  3. Compare the contents of the two arrays, returning true if they contain the same elements.

If this is a route you'd like to go, take a look at this article to get started on creating CLR UDFs.

Solution 2

Try this... The StringSorter function breaks strings on a space and then sorts all the words and puts the string back together in sorted word order.

CREATE FUNCTION dbo.StringSorter(@sep char(1), @s varchar(8000))
RETURNS varchar(8000)
AS
BEGIN
    DECLARE @ResultVar varchar(8000);

    WITH sorter_cte AS (
      SELECT CHARINDEX(@sep, @s) as pos, 0 as lastPos
      UNION ALL
      SELECT CHARINDEX(@sep, @s, pos + 1), pos
      FROM sorter_cte
      WHERE pos > 0
    )
    , step2_cte AS (
    SELECT SUBSTRING(@s, lastPos + 1,
             case when pos = 0 then 80000
             else pos - lastPos -1 end) as chunk
    FROM sorter_cte
    )
    SELECT @ResultVar = (select ' ' + chunk 
                                     from step2_cte 
                                     order by chunk 
                                     FOR XML PATH(''));
    RETURN @ResultVar;
END
GO

Here is a test case just trying out the function:

SELECT dbo.StringSorter(' ', 'the quick brown dog jumped over the lazy fox');

which produced these results:

  brown dog fox jumped lazy over quick the the

Then to run it from a select statement using your strings

SELECT case when dbo.StringSorter(' ', 'my word') = 
                     dbo.StringSorter(' ', 'word my') 
               then 'Equal' else 'Not Equal' end as ResultCheck
SELECT case when dbo.StringSorter(' ', 'my word') = 
                     dbo.StringSorter(' ', 'aaamy word') 
               then 'Equal' else 'Not Equal' end as ResultCheck

The first one shows that they are equal, and the second does not.

This should do exactly what you are looking for with a simple function utilizing a recursive CTE to sort your string.

Enjoy!

Solution 3

There is no simple way to do this. You are advised to write a function or stored procedure that does he processing involved with this requirement.

Your function can use other functions that split the stings into parts, sort by words etc.

Here's how you can split the strings:

T-SQL: Opposite to string concatenation - how to split string into multiple records

Solution 4

Scenario is as follows. You would want to use a TVF to split the first and the second strings on space and then full join the resulting two tables on values and if you have nulls on left or right you've got inequality otherwise they are equal.

Solution 5

A VERY simple way to do this... JC65100

ALTER FUNCTION [dbo].[ITS_GetDifCharCount] 
(
@str1 VARCHAR(MAX)
,@str2 VARCHAR(MAX)
)
RETURNS INT
AS
BEGIN
DECLARE @result INT

SELECT @result = COUNT(*)
FROM dbo.ITS_CompareStrs(@str1,@str2 )

RETURN @result

END


ALTER FUNCTION [dbo].[ITS_CompareStrs]
(
@str1 VARCHAR(MAX)
,@str2 VARCHAR(MAX)
)
RETURNS 
@Result TABLE  (ind INT, c1 char(1), c2 char(1))
AS
BEGIN
    DECLARE @i AS INT
             ,@c1 CHAR(1)
             ,@c2 CHAR(1)

    SET @i = 1

    WHILE LEN (@str1) > @i-1  OR LEN (@str2) > @i-1   
    BEGIN

      IF LEN (@str1) > @i-1
        SET @c1 = substring(@str1, @i, 1)  

      IF LEN (@str2) > @i-1
        SET @c2 = substring(@str2, @i, 1)

      INSERT INTO @Result([ind],c1,c2)
      SELECT @i,@c1,@c2

      SELECT @i=@i+1
              ,@c1=NULL
              ,@c2=NULL

    END

    DELETE FROM @Result
    WHERE c1=c2


RETURN 
END
Share:
13,722
KentZhou
Author by

KentZhou

Senior Developer at ACTRA since 2005. Master Degree of Computing Skills: Database: Admin, Design/Data Model, Development(SQL), ... Web: ASP.NET, MVC, Java, ... C#,VB,java, JavaScript,Power Builder, ... Silverlight Line of Business Application System Design & Architecture

Updated on July 05, 2022

Comments

  • KentZhou
    KentZhou almost 2 years

    When I compare two strings in SQL Server, there are couple of simple ways with = or LIKE.

    I want to redefine equality as:

    If two strings contain the same words - no matter in what order - they are equal, otherwise they are not.

    For example:

    • 'my word' and 'word my' are equal
    • 'my word' and 'aaamy word' are not

    What's the best simple solution for this problem?