Split string by space and character as delimiter in Oracle with regexp_substr

68,354

Solution 1

SQL Fiddle

Oracle 11g R2 Schema Setup:

CREATE TABLE TEST( str ) AS
          SELECT 'Hello world - test-test! - test' FROM DUAL
UNION ALL SELECT 'Hello world2 - test2 - test-test2' FROM DUAL;

Query 1:

SELECT Str,
       COLUMN_VALUE AS Occurrence,
       REGEXP_SUBSTR( str ,'(.*?)([[:space:]]-[[:space:]]|$)', 1, COLUMN_VALUE, NULL, 1 ) AS split_value
FROM   TEST,
       TABLE(
         CAST(
           MULTISET(
             SELECT LEVEL
             FROM   DUAL
             CONNECT BY LEVEL < REGEXP_COUNT( str ,'(.*?)([[:space:]]-[[:space:]]|$)' )
           )
           AS SYS.ODCINUMBERLIST
         )
       )

Results:

|                               STR | OCCURRENCE |  SPLIT_VALUE |
|-----------------------------------|------------|--------------|
|   Hello world - test-test! - test |          1 |  Hello world |
|   Hello world - test-test! - test |          2 |   test-test! |
|   Hello world - test-test! - test |          3 |         test |
| Hello world2 - test2 - test-test2 |          1 | Hello world2 |
| Hello world2 - test2 - test-test2 |          2 |        test2 |
| Hello world2 - test2 - test-test2 |          3 |   test-test2 |

Solution 2

If i understood correctly, this will help you. Currently you are getting output as Helloworld(with space at the end). So i assume u don't want to have space at the end. If so you can simply use the space in the delimiter also like.

select regexp_substr('Helloworld - test!' ,'[^ - ]+',1,1)from dual;

OUTPUT
Helloworld(No space at the end)

As u mentioned in ur comment if u want two columns output with Helloworld and test!. you can do the following.

select regexp_substr('Helloworld - test!' ,'[^ - ]+',1,1),
       regexp_substr('Helloworld - test!' ,'[^ - ]+',1,3) from dual;

OUTPUT
col1         col2
Helloworld   test!

Solution 3

Trying to negate the match string '[[:space:]]-[[:space:]]' by putting it in a character class with a circumflex (^) to negate it will not work. Everything between a pair of square brackets is treated as a list of optional single characters except for named named character classes which expand out to a list of optional characters, however, due to the way character classes nest, it's very likely that your outer brackets are being interpreted as follows:

  • [^[[:space:]] A single non space non left square bracket character
  • - followed by a single hyphen
  • [[:space:]] followed by a single space character
  • ]+ followed by 1 or more closing square brackets.

It may be easier to convert your multi-character separator to a single character with regexp_replace, then use regex_substr to find you individual pieces:

select regexp_substr(regexp_replace('Helloworld - test!'
                                   ,'[[:space:]]-[[:space:]]'
                                   ,chr(11))
                    ,'([^'||chr(11)||']*)('||chr(11)||'|$)'
                    ,1 -- Start here
                    ,2 -- return 1st, 2nd, 3rd, etc. match
                    ,null
                    ,1 -- return 1st sub exp
                    )
  from dual;

In this code I first changed - to chr(11). That's the ASCII vertical tab (VT) character which is unlikely to appear in most text strings. Then the match expression of the regexp_substr matches all non VT characters followed by either a VT character or the end of line. Only the non VT characters are returned (the first subexpression).

Share:
68,354
aleko_vp
Author by

aleko_vp

Updated on July 05, 2022

Comments

  • aleko_vp
    aleko_vp almost 2 years

    I'm trying to split a string with regexp_subtr, but i can't make it work.

    So, first, i have this query

    select regexp_substr('Helloworld - test!' ,'[[:space:]]-[[:space:]]') from dual
    

    which very nicely extracts my delimiter - blank-blank

    But then, when i try to split the string with this option, it just doesn't work.

    select regexp_substr('Helloworld - test!' ,'[^[[:space:]]-[[:space:]]]+')from dual
    

    The query returns nothing.

    Help will be much appreciated! Thanks