Remove Unicode characters in a String
25,971
Solution 1
Would a RegEx
solution be of interest to you?
There are plenty of examples for different languages on this site - here's a C# one: How can you strip non-ASCII characters from a string? (in C#).
Try this for VBA:
Private Function GetStrippedText(txt As String) As String
Dim regEx As Object
Set regEx = CreateObject("vbscript.regexp")
regEx.Pattern = "[^\u0000-\u007F]"
GetStrippedText = regEx.Replace(txt, "")
End Function
Solution 2
Don't need to loop each character
Maybe late, but maybe it helps someone:
Public Function StripNonAsciiChars(ByVal InputString As String) As String
Dim i As Integer
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
With RegEx
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = "[^\u0000-\u007F]"
StripNonAsciiChars = Application.WorksheetFunction.Trim(RegEx.Replace(InputString, " "))
End With
End Function
Solution 3
Try with below
Function ClearUnwantedString(fulltext As String) As String
Dim output As String
Dim character As String
For i = 1 To Len(fulltext)
character = Mid(fulltext, i, 1)
If (character >= "a" And character <= "z") Or (character >= "0" And character <= "9") Or (character >= "A" And character <= "Z") Then
output = output & character
End If
Next
ClearUnwantedString = output
End Function
Sub test()
a = ClearUnwantedString("dfjŒœŠdskl")
End Sub
Solution 4
What do you get when you write the following in the immediate window?
?Replace("ŸŸŸŸ", ChrW(376), "ale")
I get: alealealeale
Author by
Jeevan
Updated on July 09, 2022Comments
-
Jeevan almost 2 years
How do I remove all special characters which don't fall under ASCII category in VBA?
These are some of the symbols which appear in my string.
Œ œ Š š Ÿ ƒ
There are many more such characters.
These don't belong to ASCII category as you can see here http://www.ascii.cl/htmlcodes.htm
I tried something like this
strName = Replace(strName, ChrW(376), " ")
-
Jeevan about 8 yearsCan you please give me an example for it. I am not able to find any application.clean() in the autotext code in vba.
-
Jeevan about 8 yearsPlease take a look at this pasteboard.co/FtkoMrB.png I also have tried "strName = Clean(strName)" and they did not work.
-
Vityata about 8 yearsTry with Application.WorksheetFunction.Clean("üäöaŠsd"). But it will not help you, it is only for non-printable characters. And yours are printable.
-
Jeevan about 8 yearsYes It prints the same for me. I am having Ÿ as special character, but when passed as a string inside Replace method strName = Replace(strName, ChrW(376), " "), where strName is initially ŸLPAIF becomes ?LPAIF. This string then goes to a write to file method, where the code crashes saying "Run-time error '5' Invalid procedure call or argument". I think that ? newly produced is also not an ordinary question mark, but some special character. I want to replace it in my dynamic string, not in a static one as "ŸŸŸŸ" you shown.
-
Jeevan about 8 yearsI am using something similar, but the moto of my code is not this. It should allow special character which are ASCII, but only remove special character which doesn't come under ASCII code of 0 to 255.
-
Jeevan about 8 yearsHi, I think it is working in Debug.Print but when using actual value from cell where Ÿ is present, it doesn't do that.
-
Jeevan about 8 yearsHi, I tried this code, and it replaced Ÿ with a ? in the output. Which still doesn't solve the issue.
-
Jeevan about 8 yearsI think while reading itself Ÿ is read as ? by excel. Ÿ is anyway invisible in excel sheet.
-
Axel Richter about 8 yearsAs you see in my picture, the "This Œ is œ a Š testš . Ÿ ƒ Blubb." is actual content of my sheet. With this my code works exactly as I have described it. Where do you see the "Ÿ" which is then not visible in the sheet?
-
Jeevan about 8 yearsHey Thanks a lot, it works. Not sure of all conditions and all special Unicode characters, but for now it works well. It just takes too much time for processing.
-
Jeevan about 8 yearsMy String looks like LPAIF in excel sheet, but there is a Ÿ char hidden in front which is not visible in excel, but it exist, so the string is ŸLPAIF. Some solutions given above worked when I gave "ŸLPAIF" as a static string, but not when read from excel directly. Anyway we have a solution now, thanks for trying to help.
-
Jeevan about 8 yearsI see Ÿ when I copy paste that string from the excel cell to a notepad. Thanks for trying to help. We have a solution now Ambie. Thanks for your efforts.
-
Ambie about 8 yearsIf you're converting many strings eg in a loop, then try defining the pattern once and just calling the
.Replace
function inside your loop. That might help with speed. -
Jeevan about 8 yearsThank you :) That did not reduce the time much but anyway it helped to reduce a little bit of time.
-
Rosetta about 8 years@Vityata You are right. This is just a test to see if you are paying attention xD :P But @Jeevan
.clean
exist underworksheetfunction
but it can also be called underapplication
. I'll just leave the answer here for reference. -
Vityata about 8 yearsI did not get exactly what you mean. The "application.clean" does not show with Intellisense, but with "application.worksheetfunction.clean" it is there.
-
Rosetta about 8 years
-
Jeevan about 8 yearsDoes this cover all the unicode or just 0 to 7F? Will this work for all boundary conditions? Please let me know.
-
Ambie about 8 yearsIt will replace any character not in the Unicode range 0 to 127.
-
rustyBucketBay over 4 yearsI tried hard removing strange characters from my string in VBA and this code was very helpfull for me
-
Mabaega over 3 yearswow... Thanks. it work for me without Application.WorksheetFunction.Trim. Worksheet function give me error.
-
Sorin GFS over 3 years@Mabaega Ofc, you may play with the
Replace
function too, e.g. you may put dashes, or leave the spaces, this is up to your needs. Lately, starting with the ClickToRun aproatch of MS Office gives 1004 error for the base excel functions in VBA... -
Frank over 3 yearsNeed to add this option in: regEx.Global = True
-
joehua over 3 yearsOnly remove one Unicode character, not all of them in a string.
-
joehua over 3 yearsIt removes all unicode characters in a string. However, it also removes spaces.