SQL Server full-text search for phrase containing a hyphen doesn't return expected results

11,940

Solution 1

http://support.microsoft.com/default.aspx?scid=kb;en-us;200043

"Where non-alphanumeric character must be used in the search critera (primarily the dash '-' character), use the Transact-SQL LIKE clause instead of the FULLTEXT or CONTAINS predicates."

Solution 2

In cases like these where you can't anticipate the behavior of the word-breaker it's always a good idea to run sys.dm_fts_parser on your strings to get an idea of how the words are going to be split and stored in the internal index.

For instance, running sys.dm_fts_parser on '"one two-three-four five"' results in the following -

select * from sys.dm_fts_parser('"one two-three-four five"', 1033, NULL, 0)
--edited--
1   0   1   Exact Match one
1   0   2   Exact Match two-three-four
1   0   2   Exact Match two
1   0   3   Exact Match three
1   0   4   Exact Match four
1   0   5   Exact Match five

As you can see from the returned results, the word-breaker parses the string and outputs six forms which may explain the results you see when running your CONTAINS query.

Solution 3

A full-text search considers a word to be a string of characters without spaces or punctuation. The occurrence of a non-alphanumeric character can "break" a word during a search. Because the SQL Server full-text search is a word-based engine, punctuation generally is not considered and is ignored when searching the index. Therefore, a CONTAINS clause like 'CONTAINS(testing, "computer-failure")' would match a row with the value, "The failure to find my computer would be expensive.".

Please, follow the link for WHY:https://support.microsoft.com/en-us/kb/200043

Share:
11,940
Laviak
Author by

Laviak

Updated on June 05, 2022

Comments

  • Laviak
    Laviak about 2 years

    We have an application that using a SQL Server 2008 database, and full-text search. I'm trying to understand why the following searches behave differently:

    First, a phrase containing a hyphenated word, like this:

    contains(column_name, '"one two-three-four five"')
    

    And second, an identical phrase, where the hyphens are replaced by spaces:

    contains(column_name, '"one two three four five"')
    

    The full-text index uses the ENGLISH (1033) locale, and the default system stoplist.

    From my observations of other full-text searches containing hyphenated words, the first one should allow for matches on either one two three four five or one twothreefour five. Instead, it only matches one twothreefour five (and not one two-three-four five).


    Test Case

    Setup:

    create table ftTest 
    (
        Id int identity(1,1) not null, 
        Value nvarchar(100) not null, 
        constraint PK_ftTest primary key (Id)
    );
    
    insert ftTest (Value) values ('one two-three-four five');
    insert ftTest (Value) values ('one twothreefour five');
    
    create fulltext catalog ftTest_catalog;
    create fulltext index on ftTest (Value language 1033)
        key index PK_ftTest on ftTest_catalog;
    GO
    

    Queries:

    --returns one match
    select * from ftTest where contains(Value, '"one two-three-four five"')
    
    --returns two matches
    select * from ftTest where contains(Value, '"one two three four five"')
    select * from ftTest where contains(Value, 'one and "two-three-four five"')
    select * from ftTest where contains(Value, '"one two-three-four" and five')
    GO
    

    Cleanup:

    drop fulltext index on ftTest
    drop fulltext catalog ftTest_catalog;
    drop table ftTest;
    
  • Laviak
    Laviak almost 12 years
    The question is more about why SQL server exhibits different behaviour for matching. Working around it is certainly doable, but it simply doesn't make sense to me that "two-three-four five" will return both rows, however "one two-three-four five" will not. Ditto for "one two-three-four". Is this really expected behaviour? and if so, why?