How to convert a Unicode character to its ASCII equivalent

846

Solution 1

Okay, let's elaborate. Both csgero and bzlm pointed in the right direction.

Because of blzm's reply I looked up the Windows-1252 page on wiki and found that it's called a codepage. The wikipedia article for Code page which stated the following:

No formal standard existed for these ‘extended character sets’; IBM merely referred to the variants as code pages, as it had always done for variants of EBCDIC encodings.

This led me to codepage 437:

n ASCII-compatible code pages, the lower 128 characters maintained their standard US-ASCII values, and different pages (or sets of characters) could be made available in the upper 128 characters. DOS computers built for the North American market, for example, used code page 437, which included accented characters needed for French, German, and a few other European languages, as well as some graphical line-drawing characters.

So, codepage 437 was the codepage I was calling 'extended ASCII', it had the ê as character 136 so I looked up some other chars as well and they seem right.

csgero came with the Encoding.GetEncoding() hint, I used it to create the following statement which solves my problem:

byte[] bytes = Encoding.GetEncoding(437).GetBytes("ê");

Solution 2

You cannot use the default ASCII encoding (Encoding.ASCII) here, but must create the encoding with the appropriate code page using Encoding.GetEncoding(...). You might try to use code page 1252, which is a superset of ISO 8859-1.

Solution 3

ASCII does not define ê; the number 136 comes from the number for the circumflex in 8-bit encodings such as Windows-1252.

Can you verify that a small e with a circumflex (ê) is actually what is supposed to be stored in the Access database in this case? Perhaps U+02C6 U+0065 is the result of a conversion error, where the input is actually an e followed by a circumflex, or something else entirely. Perhaps your Access database has corrupt data in the sense that the designated encoding does not match the contents, in which case the .NET client might incorrectly parse the data (using the wrong decoder).

If this error is indeed introduced during the reading from the database, perhaps pasting some code or configuration settings might help.

In Code page 437, character number 136 is an e with a circumflex.

Share:
846
Nguyen  Minh Binh
Author by

Nguyen Minh Binh

Updated on July 18, 2022

Comments

  • Nguyen  Minh Binh
    Nguyen Minh Binh almost 2 years

    Suppose I have 10 lines of code. Any line maybe throw a NullPointerException. I don't want to care about these exceptions. If a line throws the exceptions, I want the executor jump to next line and go forward. Could I do this on Java? if yes, please give me some sample code.

    UPDATE: (add sample source for more clear question)

    first.setText(prize.first);
    second.setText(prize.second);
    third.setText(prize.third);
    fourth.setText(prize.fourth);
    fifth.setText(prize.fifth);
    sixth.setText(prize.sixth);
    seventh.setText(prize.seventh);
    eighth.setText(prize.eighth);
    

    Suppose I have 8 lines of code above. What I want is: if the line 4 (or 5, 6,...) throws an exception, all other lines of code works normally. Of course I can use try...catch to catch the exceptions line by line. But this way make my source very complex.

    • Chris Forrence
      Chris Forrence over 11 years
      Why do you want to ignore NPEs instead of checking for a null value?
    • dantuch
      dantuch over 11 years
      It would be something like try / catch on every single line...
    • Steve's a D
      Steve's a D over 11 years
      Exceptions shouldn't be used to hide mistakes which you don't want to find/fix. You should determine why you'd get NullPointerExceptions and fix it, not hide it.
    • Nguyen  Minh Binh
      Nguyen Minh Binh over 11 years
      I have added sample codes on my question. Please check it for the reason I post this question. THanks
    • Natix
      Natix over 11 years
      Why exactly would those lines throw NPEs? For example, if you have first.setText(prize.first);, what variable would be null? first or prize?
    • Boann
      Boann over 11 years
      That sample code looks like a good candidate for being replaced with a loop and an array.
    • Natix
      Natix over 11 years
      @NguyenMinhBinh Those are EditTexts? Why do you let them be null??
    • Nguyen  Minh Binh
      Nguyen Minh Binh over 11 years
      EditText is just an sample here. The sense could be make with any object such as String
    • matt freake
      matt freake over 11 years
      Passing nulls around like this can be dangerous. It relies on all your code having appropriate try { } catch block/null-checks and if you miss any, you'll have some nice run-time bugs to deal with. Better to catch them/check for them and then handle appropriately.
    • Chris Forrence
      Chris Forrence over 11 years
      @Tomas - Mentioned! If you wouldn't mind though, if you have further comments on my answer, comment on my answer
  • Konrad Rudolph
    Konrad Rudolph over 15 years
    @OJ, I'm aware of that. However, the code point of a character is the same in all Unicode encodings.
  • OJ.
    OJ. over 15 years
    @Chris: In Konrad's original post he talked about UTF8, not Unicode.
  • Huppie
    Huppie over 15 years
    So, you're pinpointing my problem. The questions is how DO I do this, I know the method I tried does not work.
  • Huppie
    Huppie over 15 years
    You're right, it is indeed MODIFIER LETTER CIRCUMFLEX ACCENT, see my edits.
  • Huppie
    Huppie over 15 years
    Thanks! Your tip helped a lot, it was in fact codepage 437 (MS-DOS). Using Encoding.GetEncoding(437) it worked.
  • Huppie
    Huppie over 15 years
    Like so: byte[] bytes = Encoding.GetEncoding(437).GetBytes("ê");
  • Triynko
    Triynko about 13 years
    See Unicode Normalization topic, specifically the two forms of equivalence: canonical and compatibility - en.wikipedia.org/wiki/Unicode_normalization On a .NET String instance, call the Normalize method, passing either NormalizationForm.FormD or NormalizationForm.FormKD, which correspond to the canonical and compability decomposed forms. For example, calling this on a string like "êwś", will produce the string "e^ws'". You can also do the reverse, converting a string like "e^" into "ê" by calling Normalize( NormalizationForm.FormC ) or Normalize( NormalizationForm.FormKC ).