Inputting extended ascii values
Solution 1
You could use luit
, which would let you run your cp850 application in (whatever locale you can find for this) in a UTF-8 terminal, and let luit
do the translation to/from the UTF-8.
For what it's worth, a screenshot of cp850 with luit:
The screenshots were setup by a set of scripts which displayed a test-screen for each locale encoding. Not all encodings have corresponding locale information configured. The 761 locales listed on my Debian 7 system using locale -a
correspond to only 32 encodings:
ANSI_X3.4-1968 EUC-TW ISO-8859-14 ISO-8859-9
ARMSCII-8 GB18030 ISO-8859-15 KOI8-R
BIG5 GB2312 ISO-8859-2 KOI8-T
BIG5-HKSCS GBK ISO-8859-3 KOI8-U
CP1251 GEORGIAN-PS ISO-8859-5 RK1048
CP1255 ISO-8859-1 ISO-8859-6 TCVN5712-1
EUC-JP ISO-8859-10 ISO-8859-7 TIS-620
EUC-KR ISO-8859-13 ISO-8859-8 UTF-8
If you have a recent version (e.g., 2.0 in 2013) of luit, and the locale information installed, running it is simple:
luit -encoding cp850
That runs a shell in which applications use codepage 850, but your select/paste (and keyboard) are translated to/from the locale encoding in the outer shell (assumed to be UTF-8, since it wouldn't work with just the POSIX locale).
The -v
(verbose) option shows a little detail:
$ luit -encoding cp850 -v -v
getCharsetByName(ASCII)
cachedCharset 'ASCII'
getCharsetByName(<null>)
using unknown 94-charset
getCharsetByName(CP 850)
cachedCharset 'CP 850'
getCharsetByName(<null>)
using unknown 94-charset
Input: G0 is ASCII, G1 is Unknown (94), G2 is CP 850, G3 is Unknown (94).
GL is G0, GR is G2.
Output: G0 is ASCII, G1 is Unknown (94), G2 is CP 850, G3 is Unknown (94).
GL is G0, GR is G2.
Using the older luit doesn't work as well, since it relies upon incomplete locale information. Here's what luit 1.1.1 does:
$ luit -encoding cp850 -v -v
Warning: couldn't find charset data for locale cp850; using ISO 8859-1.
G0 is ASCII, G1 is Unknown (94), G2 is ISO 8859-1, G3 is Unknown (94).
GL is G0, GR is G2.
If you happen to be running OpenSuSE, that provides a package. On the other extreme (e.g., Ubuntu), configuring the locales is a nuisance, but compiling luit
from source is relatively simple.
Solution 2
Bytes are not characters and characters are not bytes. The correspondence between characters and bytes depends on the locale. Under a UTF-8 locale, character ‰
would be represented by two bytes, \xC2\x89
(194 and 137 in decimal); a bare byte with the value \x89
(137 decimal) would be invalid. How to input characters which do not appear on the keyboard depends on the terminal and desktop environment.
If all that you want is to send arbitrary bytes to a program you can use a pipe, for example:
$ echo -ne '\x89' | hexdump -C
00000000 89 |.|
00000001
Related videos on Youtube
DrPrItay
Updated on September 18, 2022Comments
-
DrPrItay over 1 year
Hey so I'm losing my mind over this, I have some program written in c that gets some string as an input directly from terminal then prints the ascii value of each byte entered within the string, I'm trying to enter extended ascii values (value is bigger than 127) and I'm failing to do so. specifically I need to enter the ascii value of 137 as an input for the string -> hence enter a character with that value I've tried nearly everything:
- Compose key and entering:
e
+"
- Unicode value
ctrl
+shift
+u
followed by hexadecimal value of ascii code - Enters it as unicode hence takes two bytes instead of one byte wth the value of 137 ctrl
+d
- doesn't support extended ascii values
anyways, If someone knows how to solve this, it would be helpful for me
-
Mark Perryman over 7 yearsIf your terminal supports unicode then I suspect you will find it impossible, as extended ASCII values are not valid.
-
AlexP over 7 yearsVirtual terminal or terminal emulator? In a graphical desktop environment, and if so what DE? By the way, ASCII is 0 to 127. There is no such thing as ASCII value 137.
- Compose key and entering:
-
Stéphane Chazelas over 7 yearsSee
printf '\211'
as a portable equivalent of yourecho -ne '\x89'
(which only works for some shells in some environments). -
Stéphane Chazelas over 7 yearsSpecifically here, 0x89/137 is
ë
in the IBM850 aka cp850 character set. I don't expect that charset to be in use on any Unix-like system. -
DrPrItay over 7 yearsHow do I use luit?