UTF-8 to EBCDIC in Java

29,582

Solution 1

Assuming your target system is an IBM mainframe or midrange, it has full support for all of the EBCDIC encodings built into it's JVM as encodings named CPxxxx, corresponding to the IBM CCSID's (CP stands for code-page). You will need to do the translations on the host-side since the client side will not have the necessary encoding support.

Since Unicode is DBCS and greater, and supports every known character, you will likely be targeting multiple EBCDIC encodings; so you will likely configure those encodings in some way. Try to have your client Unicode (UTF-8, UTF-16, etc) only, with the translations being done as data arrives on the host and/or leaves the host system.

Other than needing to do translations host-side, the mechanics are the same as any Java translation; e.g. new String(bytes,encoding) and String.getBytes(encoding), and the various NIO and writer classes. There's really no magic - it's no different than translating between, say, ISO 8859-x and Unicode, or any other SBCS (or limited DBCS).

For example:

byte[] ebcdta="Hello World".getBytes("CP037");  // get bytes for EBCDIC codepage 37

You can find more information on IBM's documentation website.

Solution 2

You can always make use of the IBM Toolbox for Java (JTOpen), specifically the com.ibm.as400.access.AS400Text class in the jt400.jar.

It goes as follows:

int codePageNumber = 420;
String codePage = "CP420";
String sourceUtfText = "أحمد يوسف صالح";

AS400Text converter = new AS400Text(sourceUtfText.length(), codePageNumber);
byte[] bytesData = converter.toBytes(sourceUtfText);
String resultedEbcdicText = new String(bytesData, codePage);

I used the code-page 420 and its corresponding java representation of the encoding CP420, this code-page is used for Arabic text, so, you should pick the suitable code-page for Chinese text.

Solution 3

EBCDIC has many 8-Bit Codepages. Many of them are supported by the VM. Have a look at Charset.availableCharsets().keySet(), the EBCDIC pages are named IBM... (there are aliases like cp500 for IBM500 as you can see by Charset.forName("IBM500").aliases()).

There are two problems:

  1. if you have characters included in different code pages of EBCDIC, this will not help
  2. i am not sure, if these charsets are available in any vm outside windows.

For the first, have a look at this approach. For the second, have a try on the desired target runtime ;-)

Solution 4

For the midrange AS/400 (IBM i these days) the best bet is to use the IBM Java Toolkit (jt400.jar) which does all these things transparently (perhaps slightly hinted).

Please note that inside Java a character is a 16 bit value, not an UTF-8 (that is an encoding).

Share:
29,582
Admin
Author by

Admin

Updated on August 02, 2020

Comments

  • Admin
    Admin over 3 years

    Our requirement is to send EBCDIC text to mainframe. We have some chinese characters thus UTF8 format. So, is there a way to convert the UTF-8 characters to EBCDIC?

    Thanks, Raj Mohan

  • lavinio
    lavinio about 12 years
    Not all of the charsets that are named IBM* are EBCDIC. For example, IBM850 is the standard codepage used in U.S. and western European versions of Windows in the command prompt.