Oracle JDBC charset and 4000 char limit
Solution 1
Prior to Oracle 12.1, a VARCHAR2
column is limited to storing 4000 bytes of data in the database character set even if it is declared VARCHAR2(4000 CHAR)
. Since every character in your string requires 2 bytes of storage in the UTF-8 character set, you won't be able to store more than 2000 characters in the column. Of course, that number will change if some of your characters actually require just 1 byte of storage or if some of them require more than 2 bytes of storage. When the database character set is Windows-1252, every character in your string requires only a single byte of storage so you'll be able to store 4000 characters in the column.
Since you have longer strings, would it be possible to declare the column as a CLOB
rather than as a VARCHAR2
? That would (effectively) remove the length limitation (there is a limit on the size of a CLOB
that depends on the Oracle version and the block size but it's at least in the multiple GB range).
If you happen to be using Oracle 12.1 or later, the max_string_size
parameter allows you to increase the maximum size of a VARCHAR2
column from 4000 bytes to 32767 bytes.
Solution 2
Solved this problem by cutting the String to the require byte length. Note that this can't be done by simply using
stat.substring(0, length)
since this produces an UTF-8 String that might be up to three times longer than allowed.
while (stat.getBytes("UTF8").length > length) {
stat = stat.substring(0, stat.length()-1);
}
note do not use stat.getBytes() since this is dependent on the set 'file.encoding' and produces either Windows-1252 or UTF-8 bytes!
If you use Hibernate you can do this using org.hibernate.Interceptor!
Related videos on Youtube
Arolition
Updated on July 13, 2022Comments
-
Arolition almost 2 years
We are trying to store an UTF-16 encoded String into an AL32UTF8 Oracle database.
Our program works perfectly on a database that uses
WE8MSWIN1252
as charset. When we try to run it on a database that usesAL32UTF8
it gets to ajava.sql.SQLException: ORA-01461: can bind a LONG value only for insert into a LONG column
.In the testcase below everything works fine as long as our input data doesn't get too long.
The input String can exceed 4000 chars. We wish to retain as much information as possible, even though we realise the input will have to be cut off.
Our database tables are defined using the
CHAR
keyword (see below). We hoped that this would allow us to store up to 4000 chars of any character set. Can this be done? If so, how?We have tried converting the String to
UTF8
using aByteBuffer
without success.OraclePreparedStatement.setFormOfUse(...)
also didn't help us out.Switching to a
CLOB
is not an option. If the string is too long it needs to be cut.This is our code at the moment:
public static void main(String[] args) throws Exception { String ip ="193.53.40.229"; int port = 1521; String sid = "ora11"; String username = "obasi"; String password = "********"; String driver = "oracle.jdbc.driver.OracleDriver"; String url = "jdbc:oracle:thin:@" + ip + ":" + port + ":" + sid; Class.forName(driver); String shortData = ""; String longData = ""; String data; for (int i = 0; i < 5; i++) shortData += "é"; for (int i = 0; i < 4000; i++) longData += "é"; Connection conn = DriverManager.getConnection(url, username, password); PreparedStatement stat = null; try { stat = conn.prepareStatement("insert into test_table_short values (?)"); data = shortData.substring(0, Math.min(5, shortData.length())); stat.setString(1, data); stat.execute(); stat = conn.prepareStatement("insert into test_table_long values (?)"); data = longData.substring(0, Math.min(4000, longData.length())); stat.setString(1, data); stat.execute(); } finally { try { stat.close(); } catch (Exception ex){} } }
This is the create script of the simple table:
CREATE TABLE test_table_short ( DATA VARCHAR2(5 CHAR); ); CREATE TABLE test_table_long ( DATA VARCHAR2(4000 CHAR); );
The test case works perfectly on the short data. On the long data however it keeps getting the error. Even when our
longData
is only 3000 characters long, it still doesn't execute successfully.Thanks in advance!
-
Arolition over 11 yearsThank you for your answer. Sadly, in this case, using clob's is out of the question for us. According to link this is the right answer. However, link is pretty misleading in my humble oppinion. Would you know where this is explained in the documentation? We have been searching a lot, but could not find this.
-
Justin Cave over 11 years@Arolition - I added a comment to the SO thread. The answer is correct in so far as it goes. It just doesn't note that if a particular 4000 characters requires more than 4000 bytes of storage that the 4000 byte capacity limit still kicks in.
-
matbrgz about 10 yearsUTF-8 is a variable length encoding. Many asian characters require at least three bytes to encode.