Split words with a capital letter in sql

12,706

Solution 1

Here is a function I created that is similar to the "removing non-alphabetic characters". How to strip all non-alphabetic characters from string in SQL Server?

This one uses a case sensitive collation which actively seeks out a non-space/capital letter combination and then uses the STUFF function to insert the space. This IS a scalar UDF, so some folks will immediately say that it will be slower than other solutions. To that notion, I say, please test it. This function does not use any table data and only loops as many times as necessary, so it will likely give you very good performance.

Create Function dbo.Split_On_Upper_Case(@Temp VarChar(1000))
Returns VarChar(1000)
AS
Begin

    Declare @KeepValues as varchar(50)
    Set @KeepValues = '%[^ ][A-Z]%'
    While PatIndex(@KeepValues collate Latin1_General_Bin, @Temp) > 0
        Set @Temp = Stuff(@Temp, PatIndex(@KeepValues collate Latin1_General_Bin, @Temp) + 1, 0, ' ')

    Return @Temp
End

Call it like this:

Select dbo.Split_On_Upper_Case('OneTwoThreeFour')
Select dbo.Split_On_Upper_Case('OneTwoThreeFour')
Select dbo.Split_On_Upper_Case('One')
Select dbo.Split_On_Upper_Case('OneTwoThree')
Select dbo.Split_On_Upper_Case('stackOverFlow')
Select dbo.Split_On_Upper_Case('StackOverFlow')

Solution 2

Here is a function I have just created.

FUNCTION

CREATE FUNCTION dbo.Split_On_Upper_Case
 (
     @String VARCHAR(4000)
 )
RETURNS VARCHAR(4000)
AS
BEGIN

DECLARE @Char CHAR(1);
DECLARE @i    INT = 0;
DECLARE @OutString VARCHAR(4000) = '';


WHILE (@i <= LEN(@String))
BEGIN
    SELECT @Char = SUBSTRING(@String, @i,1)

    IF (@Char = UPPER(@Char) Collate Latin1_General_CS_AI) 
       SET @OutString = @OutString + ' ' + @Char;
    ELSE 
       SET @OutString = @OutString +  @Char;

     SET @i += 1;
END

 SET @OutString =  LTRIM(@OutString);

 RETURN @OutString;

END 

Test Data

DECLARE @TABLE TABLE (Strings VARCHAR(1000))
INSERT INTO @TABLE 
VALUES ('OneTwoThree')   ,
       ('FourFiveSix')   ,
       ('SevenEightNine')

Query

SELECT dbo.Split_On_Upper_Case(Strings) AS Vals
FROM @TABLE

Result Set

╔══════════════════╗
║       Vals       ║
╠══════════════════╣
║ One Two Three    ║
║ Four Five Six    ║
║ Seven Eight Nine ║
╚══════════════════╝

Solution 3

Build a Numbers table. There are some excellent posts on SO to show you how to do this. Populate it with values up the maximum length of your input string. Select the values from 1 through the actual length of the current input string. Cross join this list of numbers to the input string. Use the result to SUBSTRING() each character. Then you can either compare the resulting list of one-charachter values to a pre-populated table-valued variable or convert each character to an integer using ASCII() and choose only those between 65 ('A') and 90 ('Z'). At this point you have a list which is the position of each upper-case character in your input string. UNION the maximum length of your input string onto the end of this list. You'll see why in just a second. Now you can SUBSTRING() your input variable, starting at the Number given by row N and taking a length of (the Number given by row N+1) - (The number given by row N). This is why you have to UNION the extra Number on the end. Finally concatenate all these substring together, space-separated, using the algorithm of your choice.

Sorry, don't have an instance in front of me to try out code. Sounds like a fun task. I think doing it with nested SELECT statements will get convoluted and un-maintainable; better to lay it out as CTEs, IMHO.

Solution 4

If a single query is needed 26 REPLACE can be used to check every upper case letter like

SELECT @var1 col1, REPLACE(
                    REPLACE(
                     REPLACE(
                      ...
                       REPLACE(@var1, 'A', ' A')
                    , ... 
                  , 'X', ' X')
                , 'Y', ' Y')
              , 'Z', ' Z') col2

Not the most beautiful thing but it'll work.

EDIT
Just to add another function to do the same thing in a different way of the other answers

CREATE FUNCTION splitCapital (@param Varchar(MAX))
RETURNS Varchar(MAX)
BEGIN
  Declare @ret Varchar(MAX) = '';
  declare @len int = len(@param);

  WITH Base10(N) AS (
              SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 
    UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 
    UNION ALL SELECT 8 UNION ALL SELECT 9
  ), Chars(N) As (
    Select TOP(@len)
           nthChar 
         = substring(@param, u.N + t.N*10 + h.N*100 + th.N*1000 + 1, 1) 
           Collate Latin1_General_CS_AI
    FROM   Base10 u
           CROSS JOIN Base10 t
           CROSS JOIN Base10 h
           CROSS JOIN Base10 th
    WHERE  u.N + t.N*10 + h.N*100 + th.N*1000 < @len
    ORDER BY u.N + t.N*10 + h.N*100 + th.N*1000
  )
  SELECT @ret += Case nthChar 
                      When UPPER(nthChar) Then ' ' 
                      Else '' 
                 End + nthChar
  FROM   Chars

  RETURN @ret;
END

This one uses the possibility of TSQL to concatenate string variable, I had to use the TOP N trick to force the Chars CTE rows in the right order

Share:
12,706
jackCaller
Author by

jackCaller

Updated on July 16, 2022

Comments

  • jackCaller
    jackCaller almost 2 years

    Does anyone know how to split words starting with capital letters from a string?

    Example:

        DECLARE @var1 varchar(100) = 'OneTwoThreeFour'
        DECLARE @var2 varchar(100) = 'OneTwoThreeFourFive'
        DECLARE @var3 varchar(100) = 'One'
    
        SELECT @var1 as Col1, <?> as Col2
        SELECT @var2 as Col1, <?> as Col2
        SELECT @var3 as Col1, <?> as Col2
    

    expected result:

        Col1                Col2
        OneTwoThreeFour     One Two three Four
        OneTwoThreeFourFive One Two Three Four Five
        One                 One
    

    If this is not possible (or if too long) an scalar function would be okay as well.

  • M.Ali
    M.Ali about 10 years
    Yes it was fun creating this I have just created one have a look :)
  • Michael Green
    Michael Green about 10 years
    @M.Ali - I was admiring your answer just now. My suggestion to the OP was that he try it as a single SELECT, leveraging the relational characteristics of the query engine and minimising any performance impact of the CROSS APPLY which a function-based solution may incur.
  • t-clausen.dk
    t-clausen.dk about 10 years
    seems ineffective. You are looping through the varchar splitting it and putting it together again character by character. Not good for performance.
  • Serpiton
    Serpiton about 10 years
    @t-clausen.dk Select replace('bBb' COLLATE Latin1_General_CS_AI, 'B', ' B') returns 'b Bb'
  • paparazzo
    paparazzo about 10 years
    Sure did not deserve a down vote. +1 to get you even.
  • t-clausen.dk
    t-clausen.dk about 10 years
    @blam when an answer gives the wrong result, it deserves a downvote
  • paparazzo
    paparazzo about 10 years
    @t-clausen.dk Really you did not get that a case sensitive collation was required.
  • Aquaphor
    Aquaphor over 2 years
    This works perfectly. But only if you have 1 uppercase letter. Like "EmailTitle" becomes "Email Title" Which is perfect. But what about if you have more than 1 uppercase letter per word. "EmailGUID" becomes "Email G U I D". Whereas I want "Email GUID" Replace the @KeepValues with: Set @KeepValues = '%[^ A-Z][A-Z]%'