Character looks like ASCII 63 but isn't so I can't remove it

12,216

Solution 1

It is the Unicode replacement character, U+FFFD, aka ChrW(&HFFFD).

Never use Asc() or Chr(), they are legacy VB6 functions that do not handle Unicode. Passing a fancy Unicode codepoint to Asc() always produces 63, the character code for "?"c, aka "I have no idea what you're saying". The exact same idea as"�"c but using an ASCII code instead.

Seeing the Black Diamond of Death back is always bad news, something went wrong when the string was converted from the underlying byte values. Because some byte values did not produce a valid character. Which is what you really should be looking for, you always want to avoid GIGO. Garbage In Garbage Out is an ugly data corruption problem that has no winners, only victims. You.

Solution 2

I have wrote the following function in Excel VBA which will remove the "black diamond" for a single cell.

The hardest thing is to not loop each digit in all field to find it. I needed a method to identify the black diamond without check all digits of all fields.

I used a ADODB recordset, if the string is not accepted by the RS, it means it contains an invalid character. Then it looks for a ASC(63) = “?”, then it trims the cell down to without the black diamond.

The reason this work is when it loops through each digit in the string, it will recognize the black diamond as ASC = 63. If is a real question mark, it will be accepted by the RS.

Private Function Correct_Black_Diamond(ByVal First_Address As Variant) As String
    Dim CheckDigit As Integer
    Dim Temp_string As String
    Dim temp_Rs As New ADODB.Recordset
        temp_Rs.Fields.Append "address", adChar, 9999
        temp_Rs.Open

        temp_Rs.AddNew
            On Error GoTo Further_Address_Check
            temp_Rs!Address = First_Address
        temp_Rs.Update

        Correct_Black_Diamond = First_Address
    Exit Function

Further_Address_Check:
        For CheckDigit = 1 To Len(First_Address)
            If Asc(Mid(First_Address, CheckDigit, 1)) = 63 Then
                Temp_string = Trim(Mid(First_Address, 1, CheckDigit - 1)) & Trim(Mid(First_Address, CheckDigit + 1, Len(First_Address)))
            End If
        Next CheckDigit
        First_Address = Temp_string
        Correct_Black_Diamond = First_Address
        Exit Function

End Function
Share:
12,216
Lou
Author by

Lou

I'm a SE user from England. That's about all you need to know.

Updated on June 16, 2022

Comments

  • Lou
    Lou about 2 years

    I'm reading text from a text file. The first string the text file has to read is "Algood ", and note the space. In Notepad, it appears that there is a space in this string, but it isn't. When I test the 6th (zero-based index) character in Visual Studio's QuickWatch, it appears as:

    "�"c
    

    When I use the Asc function to get the ASCII code, it tells me that the ASCII code is 63. 63 is a question mark. But when I test to see if the string contains ASCII 63, it tests false. So it appears that the string contains the character with the ASCII code 63, only it doesn't, it contains some other character which tests as ASCII code 63. This is a problem: I can't remove the character if I don't know what to call it. I could remove the last character, but not every string in the text file contains this character.

    enter image description here

    The question is: what is this character if not a question mark, and how can I uniquely identify so I can remove it?