How to get number of chars in string in Transact SQL, the "other way"

16,965

Solution 1

I can't leave a comment so I will have to leave an answer (or shutup).

My vote would be for the addAndMinusBehaviour

I haven't got a good third alternative, there maybe some obscure whitespace rules to fiddle with in the options / SET / Collation assignment but don't know more detail off the top of my head.

but really addAndMinusBehaviour is probably the eaiest to implement, fastest to execute and if you document it, farily maintainable as well.

Solution 2

CREATE FUNCTION [dbo].[ufn_CountChar] ( @pInput VARCHAR(1000), @pSearchChar CHAR(1) )
RETURNS INT
BEGIN

RETURN (LEN(@pInput) - LEN(REPLACE(@pInput, @pSearchChar, '')))

END
GO

Solution 3

My understanding is that DATALENGTH(@txt)/2 should always give you the number of characters. SQL Server stores Unicode characters in UCS-2 which does not support surrogate pairs.

http://msdn.microsoft.com/en-us/library/ms186939.aspx

http://en.wikipedia.org/wiki/UCS2

Share:
16,965
Alexey Shcherbak
Author by

Alexey Shcherbak

Updated on July 19, 2022

Comments

  • Alexey Shcherbak
    Alexey Shcherbak almost 2 years

    We faced a very strange issue (really strange for such mature product): how to get number of characters in Unicode string using Transact-SQL statements. The key problem of this issue that the len() TSQL function returns number of chars, excluding trailing blanks. The other variant is to use datalength (which return number of bytes) and divide by 2, so get numbers of Unicode chars. But Unicode chars can be surrogate pairs so it won't work either.

    We have 2 variants of solution: the first is to use len(replace()) and the second is add a single symbol and then subtract 1 from result. But IMO both variants are rather ugly.

    declare @txt nvarchar(10)
    set @txt = 'stack ' 
    select @txt as variable, 
     len(@txt) as lenBehaviour, 
     DATALENGTH(@txt)/2 as datalengthBehaviour, 
     len(replace(@txt,' ','O')) as ReplaceBehaviour, 
     len(@txt+'.')-1 as addAndMinusBehaviour
    

    Any other ideas how to count chars in string with trailing spaces?

  • Alexey Shcherbak
    Alexey Shcherbak almost 15 years
    After short discussion with our team lead we figured that UTF-16 text can be stored in UCS-2 chars subset. UTF-16 single symbol with surrogate will be transformed in 2 symbols of UCS-2 and thus broke correct len() result. The other bad side of locking on UTF-16 UCS-2 subset that you can't be sure about future versions of SQLServer. I cant found information about exactly SQL2008 nvarchar store mode, looks like it's similar to 2005's one you pointed. But as far we building long term supported system - we cant lock on that. BTW - good point about UCS-2, never noticed that before.
  • Ali Mst
    Ali Mst almost 15 years
    Vote up to give some points so you can comment in the future. (I hated not being able to comment too.)