How do I make imp use right charachter set?
Solution 1
You are importing data into a database using the AL32UTF8 characterset (this is the default). The importer must therefore convert the characters to fit that and some will become multi-byte characters as you have found. There are two ways to deal with this:
1) If you don't need Unicode in the new database recreate it in the same characterset as the old database. Run this code on the old database to get the characterset and use that
SELECT parameter, value
FROM nls_database_parameters
WHERE parameter
LIKE '%CHARACTERSET';
Once the new and old database have the same characterset the import won't have to do the conversion.
2) If you can pre-create the tables, you could use the NLS_LENGTH_SEMANTICS parameter. If you set that to CHAR rather than the default of BYTE, a VARCHAR2(5) will be allocated enough space to store 5 characters in the database character set (potentially up to 20 bytes) rather than 5 bytes (which could allow just 1 character). Or instead you could modify the table creation DDL to add CHAR to every VARCHAR2 column declaration. e.g.
CREATE TABLE xyz (column_x VARCHAR2(10 CHAR) NOT NULL);
This way you can convert the data to Unicode and have a better characterset going forward, and so long as your application can support it this is perhaps the preferred method.
Solution 2
We ended up solving this by changing all the character fields to use CHAR
instead of the default BYTE
. This is basically option 2 proposed by @BrokenCrust in his answer.
Instead of re-creating the tables, we just re-defined them using this SQL:
set head off;
set linesize 1000;
set colsep ";";
set trimspool on;
set pagesize 0;
set verify off;
set feedback off;
set term off;
column dcol new_value mydate noprint;
select to_char(sysdate,'YYYY-MM-DD_HH24MISS') dcol from dual;
select to_char(sysdate,'YYYY-MM-DD') dcol from dual;
spool c:\temp\From_Byte_to_Char_og_VarChar2&mydate;
select '-- '||sysdate from dual;
select 'ALTER TABLE '||TABLE_NAME||
' MODIFY '||COLUMN_NAME||' CHAR('||data_length||' CHAR);'
from
user_tab_cols
where
DATA_TYPE='CHAR'
;
select 'ALTER TABLE '||TABLE_NAME||
' MODIFY '||COLUMN_NAME||' VARCHAR2('||data_length||' CHAR);'
from
user_tab_cols
where
DATA_TYPE='VARCHAR2'
;
select 'commit ;' from dual;
select 'exit ;' from dual;
spool off ;
Related videos on Youtube
awe
I am a senior software developer. I am interested in .NET programming (C#, VB), web programming (ASP.NET, Javascript) and GUI design. The chat room I most frequently visit is Musical Practice & Performance.
Updated on September 18, 2022Comments
-
awe over 1 year
I am trying to import a dmp file from one database to another. The problem is that some special danish characters are not imported right because some character map conversion is done during import.
This is the info when I start the import with imp (see the note about possible character conversion):
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production With the Partitioning, OLAP, Data Mining and Real Application Testing options Export file created by EXPORT:V10.02.01 via conventional path import done in WE8ISO8859P1 character set and AL16UTF16 NCHAR character set import server uses AL32UTF8 character set (possible charset conversion) . importing USERNAME's objects into USERNAME . . importing table "TABLE2" IMP-00019: row rejected due to ORACLE error 12899 IMP-00003: ORACLE error 12899 encountered ORA-12899: værdi er for stor for kolonnen "USERNAME"."TABLE2"."NAME" (faktisk: 32, maksimum: 30) Column 1 408261 Column 2 KUBEN FÆLLES MÅLER
Each special character (typically Æ, Ø and Å) are using 2 charactes instead of 1, and the data are padded with spaces to fill up, so this causes an error that the data contains 31 caharacters for this field, and the field is defined as size 30.
The export that generated the dmp file was executed like this:
SET CHARACTERSET=WE8PC850 SET NLS_LANG=DANISH SET NLS_NUMERIC_CHARACTERS=., exp username/password@server1 dumpfile.dmp
The import was executed like this: How do I make this work correctly?
SET CHARACTERSET=WE8PC850 SET NLS_LANG=DANISH SET NLS_NUMERIC_CHARACTERS=., imp username/password@server2 fromuser=username touser=username commit=y ignore=y file=dumpfile.dmp log=dumpfile.log
What can be done to get this right? It seems like the
imp
ignores the charset used, as the import server uses AL32UTF8 (which is wrong). -
awe over 12 yearsThe export server use 'WE8ISO8859P1' for NLS_CHARACTERSET. How do I set that for the import on my other server (which has 'AL32UTF8' for NLS_CHARACTERSET)? I tried to simply update the field value, but got insufficient priveleges error (I was logged on as system).