Removing nonnumerical data out of a number + SQL

18,639

Solution 1

If you're using SQL Server 2005 or newer then your best option is to create a user-defined CLR function and use a regular expression to remove all non-numeric characters.

If you don't want to use a CLR function then you could create a standard user-defined function. This will do the job although it won't be as efficient:

CREATE FUNCTION dbo.RemoveNonNumerics(@in VARCHAR(255))
RETURNS VARCHAR(255)
AS
BEGIN
    DECLARE @out VARCHAR(255)

    IF (@in IS NOT NULL)
    BEGIN
        SET @out = ''

        WHILE (@in <> '')
        BEGIN
            IF (@in LIKE '[0-9]%')
                SET @out = @out + SUBSTRING(@in, 1, 1)

            SET @in = SUBSTRING(@in, 2, LEN(@in) - 1)
        END
    END

    RETURN(@out)
END

And then select from your table like so:

SELECT dbo.RemoveNonNumerics(your_column) AS your_tidy_column
FROM your_table

Solution 2

Have a look at this post (it's the 8th post down - the first LONG one) which details how to use regular expressions in SQL Server. It's not the fastest (that would be do it before you get to SQL) but it provides a decent way to do it.

Share:
18,639
Admin
Author by

Admin

Updated on June 04, 2022

Comments

  • Admin
    Admin almost 2 years

    I'm trying find the best way to remove nonnumerical data from a varchar in SQL e.g.

    '(082) 000-0000' to '0820000000' or
    '+2782 000 0000' to '0820000000'
    

    The difficulty is i'm not always sure what number formats are coming in, as shown above, so I'd like like everything that is not a number removed essentially.

    Update:
    From what you guys have said this is a little spike done:

    declare @Num varchar(20)
    
    set @Num = ' + (82) 468 6152 '
    
    --strip nonnumrical data out of @num
    
    print @Num
    
    set @Num = replace(@Num, ' ', '')
    set @Num = replace(@Num, '+', '')
    set @Num = replace(@Num, '-', '')
    set @Num = replace(@Num, '(', '')
    set @Num = replace(@Num, ')', '')
    
    print @Num
    

    Couldn't get the replace [^0-9] expression right though.