Getting Hibernate and SQL Server to play nice with VARCHAR and NVARCHAR

25,498

Solution 1

I decided to try this as a hack that might work without touching the database. To do this I created a custom type for NVARCHAR fields. This requires JDBC 4 drivers (using the ones from Microsoft) and Hibernate 3.6.0. The sendStringParametersAsUnicode is false.

Here's the approach, I'm still verifying its correctness - any comments from folks with more experience than I are welcome

Add a new Dialect to support the new datatype

public class SQLAddNVarCharDialect extends SQLServerDialect {

    public SQLAddNVarCharDialect(){
        super();

        registerColumnType( Types.NVARCHAR, 8000, "nvarchar($1)" );     
        registerColumnType( Types.NVARCHAR,  "nvarchar(255)" );     
    }
}

Add the new Type. Notice the setNString in nullSafeSet

public class NStringUserType implements UserType  {

    @Override
    public Object assemble(Serializable arg0, Object owner)
            throws HibernateException {

        return deepCopy(arg0);
    }

    @Override
    public Object deepCopy(Object arg0) throws HibernateException {
        if(arg0==null) return null;
        return arg0.toString();
    }

    @Override
    public Serializable disassemble(Object arg0) throws HibernateException {
        return (Serializable)deepCopy(arg0);
    }

    @Override
    public boolean equals(Object arg0, Object arg1) throws HibernateException {
        if(arg0 == null )
            return arg1 == null;
        return arg0.equals(arg1);
    }

    @Override
    public int hashCode(Object arg0) throws HibernateException {
        return arg0.hashCode();
    }

    @Override
    public boolean isMutable() {
        return false;
    }


    @Override
    public void nullSafeSet(PreparedStatement st, Object value, int index)
            throws HibernateException, SQLException {
        if(value == null)
            st.setNull(index,Types.NVARCHAR);
        else
            st.setNString(index, value.toString());
    }

    @Override
    public Object replace(Object arg0, Object target, Object owner)
            throws HibernateException {
        return deepCopy(arg0);
    }

    @Override
    public Class returnedClass() {
        return String.class;
    }

    @Override
    public int[] sqlTypes() {
        return new int[]{Types.NVARCHAR};
    }


    @Override
    public Object nullSafeGet(ResultSet resultSet, String[] names, Object owner)
            throws HibernateException, SQLException {
        String result = resultSet.getString(names[0]);
        return result == null || result.trim().length() == 0 
            ? null : result;
    }

}

Update mappings for all NVARCHAR fields

    <property name="firstName" type="NStringUserType">
        <column name="firstName" length="40" not-null="false" />
    </property>    

Raw SQL before (with sendUnicode..=true):

 exec sp_prepexec @p1 output,N'@P0 nvarchar(4000),@P1 datetime,@P2 varchar(8000),@P3 nvarchar(4000),@P4 nvarchar(4000),@P5 nvarchar(4000),@P6 nvarchar(4000)... ,N'update Account set ... where AccountId=@P35    

And after:

 exec sp_prepexec @p1 output,N'@P0 varchar(8000),@P1  .... @P6 nvarchar(4000),@P7 ... ,N'update Account set ... Validated=@P4, prefix=@P5, firstName=@P6 ... where AccountId=@P35    

Seems to work similarly for 'SELECT.."

Solution 2

public class SQLServerUnicodeDialect extends org.hibernate.dialect.SQLServerDialect {
    public SQLServerUnicodeDialect() {
        super();
        registerColumnType(Types.CHAR, "nchar(1)");
        registerColumnType(Types.LONGVARCHAR, "nvarchar(max)" );
        registerColumnType(Types.VARCHAR, 4000, "nvarchar($l)");
        registerColumnType(Types.VARCHAR, "nvarchar(max)");
        registerColumnType(Types.CLOB, "nvarchar(max)" );

        registerColumnType(Types.NCHAR, "nchar(1)");
        registerColumnType(Types.LONGNVARCHAR, "nvarchar(max)");
        registerColumnType(Types.NVARCHAR, 4000, "nvarchar($l)");
        registerColumnType(Types.NVARCHAR, "nvarchar(max)");
        registerColumnType(Types.NCLOB, "nvarchar(max)");

        registerHibernateType(Types.NCHAR, StandardBasicTypes.CHARACTER.getName());
        registerHibernateType(Types.LONGNVARCHAR, StandardBasicTypes.TEXT.getName());
        registerHibernateType(Types.NVARCHAR, StandardBasicTypes.STRING.getName());
        registerHibernateType(Types.NCLOB, StandardBasicTypes.CLOB.getName() );
    }
}

Solution 3

One thought..

Hide your varchar columns behind indexed views. The views cast to nvarchar. This allows you to maintain 2 interfaces on the same data.

The same applies the other way... use views for your downstream stuff but these cast to varchar (all your tables are now nvarchar). In this case there would be no need to index them. A WHERE clause with a varchar value (compared against the nvarchar column) will be widened to nvarchar and the index will be used

Solution 4

This is less of a Hibernate issue than how the JDBC drivers work. In practice I think the only problem that will arise (aside from the obvious data corruption if you write Unicode data to a varchar column) is when you're querying trying to match on a string.

SQL Server will implicitly convert nvarchar to varchar in a SQL statement ok but when you run a query with a string in the where clause it will not find existing indexes if the types don't match exactly.

So, for example

SELECT * FROM Person WHERE last_name = N'Smith'

will result in a table scan if the last_name field is defined as varchar and there's an index on it.

One other workaround for this performance issue is to use stored procedures to do the type conversion before executing the query.

Solution 5

  1. Copy StringNVarcharType.java and NVarcharTypeDescriptor.java classes from hibernate-core 4.3.0.Final.

  2. StringNVarcharType.hbm.xml contents

  3. Use the following dependencies in Maven:

    <dependency>
        <groupId>com.mchange</groupId>
        <artifactId>c3p0</artifactId>
        <version>0.9.5-pre6</version> <!-- Make sure you don't use the default dependency version found in hibernate-c3p0! -->
    </dependency>
    <dependency>
        <groupId>org.hibernate</groupId>
        <artifactId>hibernate-c3p0</artifactId>
        <version>3.6.10.Final</version>
        <exclusions>
            <exclusion>
                <artifactId>c3p0</artifactId>
                <groupId>c3p0</groupId>
            </exclusion>
        </exclusions>
    </dependency>
    
  4. Make hibernate aware of the mapping:

    <!DOCTYPE hibernate-configuration PUBLIC
    "-//Hibernate/Hibernate Configuration DTD 3.0//EN"
    "http://www.hibernate.org/dtd/hibernate-configuration-3.0.dtd">
    <hibernate-configuration>
        <session-factory>
            <mapping resource="StringNVarcharType.hbm.xml" />
    
            <!-- Continue with your other mappings here -->
        </session-factory>
    </hibernate-configuration>
    
  5. Use the nstring property type in your *.hbm.xml mappings files where you have nvarchar2 database columns types.

References:

  1. http://alenovarini.wikidot.com/mapping-a-custom-type-in-hibernate
  2. http://blog.xebia.com/2009/11/09/understanding-and-writing-hibernate-user-types/
Share:
25,498
dfb
Author by

dfb

Updated on March 26, 2021

Comments

  • dfb
    dfb about 3 years

    I'm currently in the process of enabling UTF-8 characters in some tables of a large database. These tables are already of MS-SQL type NVARCHAR. Additionally, I have several fields using VARCHAR as well.

    There is a well known issue with Hibernate's interactions with the JDBC driver (see e.g., Mapping to varchar and nvarchar in hibernate) . In short, Hibernate/JDBC generates SQL that passes all strings as Unicode, regardless of the underlying SQL type. When a non-unicode (varchar) field in the database is compared to a Unicode input string, the indicies for that column do not match the encoding so a full table scan is performed. In the JDBC driver (both JTDS and MS versions) there is a parameter to pass Unicode strings as ASCII, but this is an all or nothing proposition that disallows international characters from being input into the the database.

    Most posts I've seen on this issue have come up with one of two solutions - 1) change everything in the database to NVARCHAR or 2) set the sendStringParametersAsUnicode=false, My question then is this - is there any known solution for having VARCHAR and NVARCHAR play nicely together? It is a huge issue for my environment to change everything to NVARCHAR because of downstream dependencies and other external issues.