Remove Unicode characters in a String

25,971

Solution 1

Would a RegEx solution be of interest to you?

There are plenty of examples for different languages on this site - here's a C# one: How can you strip non-ASCII characters from a string? (in C#).

Try this for VBA:

Private Function GetStrippedText(txt As String) As String
    Dim regEx As Object

    Set regEx = CreateObject("vbscript.regexp")
    regEx.Pattern = "[^\u0000-\u007F]"
    GetStrippedText = regEx.Replace(txt, "")

End Function

Solution 2

Don't need to loop each character

Maybe late, but maybe it helps someone:

Public Function StripNonAsciiChars(ByVal InputString As String) As String
    Dim i As Integer
    Dim RegEx As Object
    Set RegEx = CreateObject("VBScript.RegExp")
    With RegEx
        .Global = True
        .MultiLine = True
        .IgnoreCase = True
        .Pattern = "[^\u0000-\u007F]"
        StripNonAsciiChars = Application.WorksheetFunction.Trim(RegEx.Replace(InputString, " "))
    End With
End Function

Solution 3

Try with below

Function ClearUnwantedString(fulltext As String) As String
    Dim output As String
    Dim character As String
    For i = 1 To Len(fulltext)
        character = Mid(fulltext, i, 1)
        If (character >= "a" And character <= "z") Or (character >= "0" And character <= "9") Or (character >= "A" And character <= "Z") Then
            output = output & character
        End If
    Next
    ClearUnwantedString = output
End Function

Sub test()
    a = ClearUnwantedString("dfjŒœŠdskl")
End Sub

Solution 4

What do you get when you write the following in the immediate window?

?Replace("ŸŸŸŸ", ChrW(376), "ale")

I get: alealealeale

Share:
25,971
Jeevan
Author by

Jeevan

Updated on July 09, 2022

Comments

  • Jeevan
    Jeevan almost 2 years

    How do I remove all special characters which don't fall under ASCII category in VBA?

    These are some of the symbols which appear in my string.

    Œ œ Š š Ÿ ƒ

    There are many more such characters.

    These don't belong to ASCII category as you can see here http://www.ascii.cl/htmlcodes.htm

    I tried something like this

    strName = Replace(strName, ChrW(376), " ")
    
  • Jeevan
    Jeevan about 8 years
    Can you please give me an example for it. I am not able to find any application.clean() in the autotext code in vba.
  • Jeevan
    Jeevan about 8 years
    Please take a look at this pasteboard.co/FtkoMrB.png I also have tried "strName = Clean(strName)" and they did not work.
  • Vityata
    Vityata about 8 years
    Try with Application.WorksheetFunction.Clean("üäöaŠsd"). But it will not help you, it is only for non-printable characters. And yours are printable.
  • Jeevan
    Jeevan about 8 years
    Yes It prints the same for me. I am having Ÿ as special character, but when passed as a string inside Replace method strName = Replace(strName, ChrW(376), " "), where strName is initially ŸLPAIF becomes ?LPAIF. This string then goes to a write to file method, where the code crashes saying "Run-time error '5' Invalid procedure call or argument". I think that ? newly produced is also not an ordinary question mark, but some special character. I want to replace it in my dynamic string, not in a static one as "ŸŸŸŸ" you shown.
  • Jeevan
    Jeevan about 8 years
    I am using something similar, but the moto of my code is not this. It should allow special character which are ASCII, but only remove special character which doesn't come under ASCII code of 0 to 255.
  • Jeevan
    Jeevan about 8 years
    Hi, I think it is working in Debug.Print but when using actual value from cell where Ÿ is present, it doesn't do that.
  • Jeevan
    Jeevan about 8 years
    Hi, I tried this code, and it replaced Ÿ with a ? in the output. Which still doesn't solve the issue.
  • Jeevan
    Jeevan about 8 years
    I think while reading itself Ÿ is read as ? by excel. Ÿ is anyway invisible in excel sheet.
  • Axel Richter
    Axel Richter about 8 years
    As you see in my picture, the "This Œ is œ a Š testš . Ÿ ƒ Blubb." is actual content of my sheet. With this my code works exactly as I have described it. Where do you see the "Ÿ" which is then not visible in the sheet?
  • Jeevan
    Jeevan about 8 years
    Hey Thanks a lot, it works. Not sure of all conditions and all special Unicode characters, but for now it works well. It just takes too much time for processing.
  • Jeevan
    Jeevan about 8 years
    My String looks like LPAIF in excel sheet, but there is a Ÿ char hidden in front which is not visible in excel, but it exist, so the string is ŸLPAIF. Some solutions given above worked when I gave "ŸLPAIF" as a static string, but not when read from excel directly. Anyway we have a solution now, thanks for trying to help.
  • Jeevan
    Jeevan about 8 years
    I see Ÿ when I copy paste that string from the excel cell to a notepad. Thanks for trying to help. We have a solution now Ambie. Thanks for your efforts.
  • Ambie
    Ambie about 8 years
    If you're converting many strings eg in a loop, then try defining the pattern once and just calling the .Replace function inside your loop. That might help with speed.
  • Jeevan
    Jeevan about 8 years
    Thank you :) That did not reduce the time much but anyway it helped to reduce a little bit of time.
  • Rosetta
    Rosetta about 8 years
    @Vityata You are right. This is just a test to see if you are paying attention xD :P But @Jeevan .clean exist under worksheetfunction but it can also be called under application. I'll just leave the answer here for reference.
  • Vityata
    Vityata about 8 years
    I did not get exactly what you mean. The "application.clean" does not show with Intellisense, but with "application.worksheetfunction.clean" it is there.
  • Rosetta
    Rosetta about 8 years
  • Jeevan
    Jeevan about 8 years
    Does this cover all the unicode or just 0 to 7F? Will this work for all boundary conditions? Please let me know.
  • Ambie
    Ambie about 8 years
    It will replace any character not in the Unicode range 0 to 127.
  • rustyBucketBay
    rustyBucketBay over 4 years
    I tried hard removing strange characters from my string in VBA and this code was very helpfull for me
  • Mabaega
    Mabaega over 3 years
    wow... Thanks. it work for me without Application.WorksheetFunction.Trim. Worksheet function give me error.
  • Sorin GFS
    Sorin GFS over 3 years
    @Mabaega Ofc, you may play with the Replace function too, e.g. you may put dashes, or leave the spaces, this is up to your needs. Lately, starting with the ClickToRun aproatch of MS Office gives 1004 error for the base excel functions in VBA...
  • Frank
    Frank over 3 years
    Need to add this option in: regEx.Global = True
  • joehua
    joehua over 3 years
    Only remove one Unicode character, not all of them in a string.
  • joehua
    joehua over 3 years
    It removes all unicode characters in a string. However, it also removes spaces.