Inputting extended ascii values

8,964

Solution 1

You could use luit, which would let you run your cp850 application in (whatever locale you can find for this) in a UTF-8 terminal, and let luit do the translation to/from the UTF-8.

For what it's worth, a screenshot of cp850 with luit:

enter image description here

The screenshots were setup by a set of scripts which displayed a test-screen for each locale encoding. Not all encodings have corresponding locale information configured. The 761 locales listed on my Debian 7 system using locale -a correspond to only 32 encodings:

  ANSI_X3.4-1968      EUC-TW              ISO-8859-14         ISO-8859-9
  ARMSCII-8           GB18030             ISO-8859-15         KOI8-R
  BIG5                GB2312              ISO-8859-2          KOI8-T
  BIG5-HKSCS          GBK                 ISO-8859-3          KOI8-U
  CP1251              GEORGIAN-PS         ISO-8859-5          RK1048
  CP1255              ISO-8859-1          ISO-8859-6          TCVN5712-1
  EUC-JP              ISO-8859-10         ISO-8859-7          TIS-620
  EUC-KR              ISO-8859-13         ISO-8859-8          UTF-8

If you have a recent version (e.g., 2.0 in 2013) of luit, and the locale information installed, running it is simple:

luit -encoding cp850

That runs a shell in which applications use codepage 850, but your select/paste (and keyboard) are translated to/from the locale encoding in the outer shell (assumed to be UTF-8, since it wouldn't work with just the POSIX locale).

The -v (verbose) option shows a little detail:

$ luit -encoding cp850 -v -v
getCharsetByName(ASCII)
cachedCharset 'ASCII'
getCharsetByName(<null>)
using unknown 94-charset
getCharsetByName(CP 850)
cachedCharset 'CP 850'
getCharsetByName(<null>)
using unknown 94-charset
Input: G0 is ASCII, G1 is Unknown (94), G2 is CP 850, G3 is Unknown (94).
GL is G0, GR is G2.
Output: G0 is ASCII, G1 is Unknown (94), G2 is CP 850, G3 is Unknown (94).
GL is G0, GR is G2.

Using the older luit doesn't work as well, since it relies upon incomplete locale information. Here's what luit 1.1.1 does:

$ luit -encoding cp850 -v -v
Warning: couldn't find charset data for locale cp850; using ISO 8859-1.
G0 is ASCII, G1 is Unknown (94), G2 is ISO 8859-1, G3 is Unknown (94).
GL is G0, GR is G2.

If you happen to be running OpenSuSE, that provides a package. On the other extreme (e.g., Ubuntu), configuring the locales is a nuisance, but compiling luit from source is relatively simple.

Solution 2

Bytes are not characters and characters are not bytes. The correspondence between characters and bytes depends on the locale. Under a UTF-8 locale, character &#137; would be represented by two bytes, \xC2\x89 (194 and 137 in decimal); a bare byte with the value \x89 (137 decimal) would be invalid. How to input characters which do not appear on the keyboard depends on the terminal and desktop environment.

If all that you want is to send arbitrary bytes to a program you can use a pipe, for example:

$ echo -ne '\x89' | hexdump -C
00000000  89                                                |.|
00000001
Share:
8,964

Related videos on Youtube

DrPrItay
Author by

DrPrItay

Updated on September 18, 2022

Comments

  • DrPrItay
    DrPrItay over 1 year

    Hey so I'm losing my mind over this, I have some program written in c that gets some string as an input directly from terminal then prints the ascii value of each byte entered within the string, I'm trying to enter extended ascii values (value is bigger than 127) and I'm failing to do so. specifically I need to enter the ascii value of 137 as an input for the string -> hence enter a character with that value I've tried nearly everything:

    • Compose key and entering: e + "
    • Unicode value ctrl + shift + u followed by hexadecimal value of ascii code - Enters it as unicode hence takes two bytes instead of one byte wth the value of 137
    • ctrl + d - doesn't support extended ascii values

    anyways, If someone knows how to solve this, it would be helpful for me

    • Mark Perryman
      Mark Perryman over 7 years
      If your terminal supports unicode then I suspect you will find it impossible, as extended ASCII values are not valid.
    • AlexP
      AlexP over 7 years
      Virtual terminal or terminal emulator? In a graphical desktop environment, and if so what DE? By the way, ASCII is 0 to 127. There is no such thing as ASCII value 137.
  • Stéphane Chazelas
    Stéphane Chazelas over 7 years
    See printf '\211' as a portable equivalent of your echo -ne '\x89' (which only works for some shells in some environments).
  • Stéphane Chazelas
    Stéphane Chazelas over 7 years
    Specifically here, 0x89/137 is ë in the IBM850 aka cp850 character set. I don't expect that charset to be in use on any Unix-like system.
  • DrPrItay
    DrPrItay over 7 years
    How do I use luit?