SQL LIKE in Spark SQL

17,214

You are only a little bit off. Spark SQL and Hive follow SQL standard conventions where LIKE operator accepts only two special characters:

  • _ (underscore) - which matches an arbitrary character.
  • % (percent) - which matches an arbitrary sequence of characters.

Square brackets have no special meaning and [4,8] matches only a [4,8] literal:

spark.sql("SELECT '[4,8]' LIKE '[4,8]'").show
+----------------+
|[4,8] LIKE [4,8]|
+----------------+
|            true|
+----------------+

To match complex patterns you can use RLIKE operator which suports Java regular expressions:

spark.sql("SELECT '8NXDPVAE' RLIKE '^[4,8]NXD.V.*$'").show
+-----------------------------+
|8NXDPVAE RLIKE ^[4,8]NXD.V.*$|
+-----------------------------+
|                         true|
+-----------------------------+
Share:
17,214
Dan Markhasin
Author by

Dan Markhasin

Updated on July 19, 2022

Comments

  • Dan Markhasin
    Dan Markhasin almost 2 years

    I'm trying to implement a join in Spark SQL using a LIKE condition.

    The row I am performing the join on looks like this and is called 'revision':

    Table A:

    8NXDPVAE
    

    Table B:

    [4,8]NXD_V%
    

    Performing the join on SQL server (A.revision LIKE B.revision) works just fine, but when doing the same in Spark SQL, the join returns no rows (if using inner join) or null values for Table B (if using outer join).

    This is the query I am running:

    val joined = spark.sql("SELECT A.revision, B.revision FROM RAWDATA A LEFT JOIN TPTYPE B ON A.revision LIKE B.revision")
    

    The plan looks like this:

    == Physical Plan ==
    BroadcastNestedLoopJoin BuildLeft, LeftOuter, revision#15 LIKE revision#282, false
    :- BroadcastExchange IdentityBroadcastMode
    :  +- *Project [revision#15]
    :     +- *Scan JDBCRelation(RAWDATA) [revision#15] PushedFilters: [EqualTo(bulk_id,2016092419270100198)], ReadSchema: struct<revision>
    +- *Scan JDBCRelation(TPTYPE) [revision#282] ReadSchema: struct<revision>
    

    Is it possible to perform a LIKE join like this or am I way off?