How do I make imp use right charachter set?

40,468

Solution 1

You are importing data into a database using the AL32UTF8 characterset (this is the default). The importer must therefore convert the characters to fit that and some will become multi-byte characters as you have found. There are two ways to deal with this:

1) If you don't need Unicode in the new database recreate it in the same characterset as the old database. Run this code on the old database to get the characterset and use that

SELECT parameter, value
FROM nls_database_parameters
WHERE parameter
LIKE '%CHARACTERSET';

Once the new and old database have the same characterset the import won't have to do the conversion.

2) If you can pre-create the tables, you could use the NLS_LENGTH_SEMANTICS parameter. If you set that to CHAR rather than the default of BYTE, a VARCHAR2(5) will be allocated enough space to store 5 characters in the database character set (potentially up to 20 bytes) rather than 5 bytes (which could allow just 1 character). Or instead you could modify the table creation DDL to add CHAR to every VARCHAR2 column declaration. e.g.

CREATE TABLE xyz (column_x VARCHAR2(10 CHAR) NOT NULL);

This way you can convert the data to Unicode and have a better characterset going forward, and so long as your application can support it this is perhaps the preferred method.

Solution 2

We ended up solving this by changing all the character fields to use CHAR instead of the default BYTE. This is basically option 2 proposed by @BrokenCrust in his answer. Instead of re-creating the tables, we just re-defined them using this SQL:

set head off;
set linesize 1000;
set colsep ";";
set trimspool on;
set pagesize 0;
set verify off;
set feedback off;
set term off;
column dcol new_value mydate noprint;
select to_char(sysdate,'YYYY-MM-DD_HH24MISS') dcol from dual;
select to_char(sysdate,'YYYY-MM-DD') dcol from dual;
spool c:\temp\From_Byte_to_Char_og_VarChar2&mydate;
select '-- '||sysdate from dual;
select 'ALTER TABLE '||TABLE_NAME||
      ' MODIFY '||COLUMN_NAME||' CHAR('||data_length||' CHAR);' 
from 
     user_tab_cols
where 
     DATA_TYPE='CHAR'
;
select 'ALTER TABLE '||TABLE_NAME||
      ' MODIFY '||COLUMN_NAME||' VARCHAR2('||data_length||' CHAR);' 
from 
     user_tab_cols
where 
     DATA_TYPE='VARCHAR2'
;
select 'commit ;' from dual;
select 'exit ;' from dual;
spool off ;
Share:
40,468

Related videos on Youtube

awe
Author by

awe

I am a senior software developer. I am interested in .NET programming (C#, VB), web programming (ASP.NET, Javascript) and GUI design. The chat room I most frequently visit is Musical Practice & Performance.

Updated on September 18, 2022

Comments

  • awe
    awe over 1 year

    I am trying to import a dmp file from one database to another. The problem is that some special danish characters are not imported right because some character map conversion is done during import.

    This is the info when I start the import with imp (see the note about possible character conversion):

    Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
    With the Partitioning, OLAP, Data Mining and Real Application Testing options
    
    Export file created by EXPORT:V10.02.01 via conventional path
    
    import done in WE8ISO8859P1 character set and AL16UTF16 NCHAR character set
    import server uses AL32UTF8 character set (possible charset conversion)
    . importing USERNAME's objects into USERNAME    
    . . importing table                      "TABLE2" 
    IMP-00019: row rejected due to ORACLE error 12899
    IMP-00003: ORACLE error 12899 encountered
    ORA-12899: værdi er for stor for kolonnen "USERNAME"."TABLE2"."NAME" (faktisk: 32, maksimum: 30)
    Column 1 408261
    Column 2 KUBEN FÆLLES MÅLER            
    

    Each special character (typically Æ, Ø and Å) are using 2 charactes instead of 1, and the data are padded with spaces to fill up, so this causes an error that the data contains 31 caharacters for this field, and the field is defined as size 30.

    The export that generated the dmp file was executed like this:

    SET CHARACTERSET=WE8PC850
    SET NLS_LANG=DANISH
    SET NLS_NUMERIC_CHARACTERS=.,
    exp username/password@server1 dumpfile.dmp
    

    The import was executed like this: How do I make this work correctly?

    SET CHARACTERSET=WE8PC850
    SET NLS_LANG=DANISH
    SET NLS_NUMERIC_CHARACTERS=.,
    imp username/password@server2 fromuser=username touser=username commit=y ignore=y file=dumpfile.dmp log=dumpfile.log
    

    What can be done to get this right? It seems like the imp ignores the charset used, as the import server uses AL32UTF8 (which is wrong).

  • awe
    awe over 12 years
    The export server use 'WE8ISO8859P1' for NLS_CHARACTERSET. How do I set that for the import on my other server (which has 'AL32UTF8' for NLS_CHARACTERSET)? I tried to simply update the field value, but got insufficient priveleges error (I was logged on as system).