Replace REGEXP_SUBSTR in SQL Server

19,308

SQL Fiddle

MS SQL Server 2014 Schema Setup:

CREATE TABLE table_name ( name VARCHAR(50) );
INSERT INTO table_name
SELECT 'AE 344592001H 6186694' UNION ALL
SELECT 'AE_161038002_6044777' UNION ALL
SELECT 'BC_VIVS_HNB011A_1WAM' UNION ALL
SELECT 'BC_56230A_30SP' UNION ALL
SELECT 'CG_3334902_NETWK_ ACTLM_3334912' UNION ALL
SELECT 'CG_3334574_HMO1_CORACT_3334575' UNION ALL
SELECT 'CG_3207160_POSC_1502AH_3207161' UNION ALL
SELECT 'UH_141015_RHM' UNION ALL
SELECT 'UH_127757_RIV' UNION ALL
SELECT 'UH 523725 RIV' UNION ALL
SELECT 'BS_W0055785_C500_M0005672';

Query 1:

WITH Names ( lvl, name, remaining, idx ) AS (
  SELECT 1,
         name,
         name,
         CHARINDEX( '_', name )
  FROM   table_name
  UNION ALL
  SELECT lvl+1,
         name,
         SUBSTRING(remaining,idx+1,LEN(remaining)-idx),
         CASE WHEN CHARINDEX( '_', remaining, idx+1 ) = 0
              THEN 0
              ELSE CHARINDEX( '_', remaining, idx+1 ) - idx
              END
  FROM   Names
  WHERE  idx > 0
)
SELECT Name,
       MAX( CASE WHEN lvl = 3 AND ( Name LIKE 'CG%' OR idx = 0 ) THEN remaining
                 WHEN lvl = 3 THEN SUBSTRING( remaining, 1, idx - 1 )
                 END ) AS OPT,
       MAX( CASE WHEN lvl = 2 AND ( Name LIKE 'CG%' OR idx = 0 ) THEN remaining
                 WHEN lvl = 2 THEN SUBSTRING( remaining, 1, idx - 1 )
                 END ) AS Name2
FROM   Names
GROUP BY Name

Results:

|                            Name |                  OPT |                        Name2 |
|---------------------------------|----------------------|------------------------------|
|           AE 344592001H 6186694 |               (null) |                       (null) |
|            AE_161038002_6044777 |              6044777 |                    161038002 |
|                  BC_56230A_30SP |                 30SP |                       56230A |
|            BC_VIVS_HNB011A_1WAM |              HNB011A |                         VIVS |
|       BS_W0055785_C500_M0005672 |                 C500 |                     W0055785 |
|  CG_3207160_POSC_1502AH_3207161 |  POSC_1502AH_3207161 |  3207160_POSC_1502AH_3207161 |
|  CG_3334574_HMO1_CORACT_3334575 |  HMO1_CORACT_3334575 |  3334574_HMO1_CORACT_3334575 |
| CG_3334902_NETWK_ ACTLM_3334912 | NETWK_ ACTLM_3334912 | 3334902_NETWK_ ACTLM_3334912 |
|                   UH 523725 RIV |               (null) |                       (null) |
|                   UH_127757_RIV |                  RIV |                       127757 |
|                   UH_141015_RHM |                  RHM |                       141015 |

Query 2:

CHARINDEX( expressionToFind, expressionToSerach[, startIndex] ) can be used to find the instances of _ in the word.

  • CHARINDEX( '_', name ) will find the index of the first instance of an _.
  • CHARINDEX( '_', name, CHARINDEX( '_', name ) + 1 ) will find the index of the second instance of an _ or will return 0 if there are not two _ characters.
  • CHARINDEX( '_', name, CHARINDEX( '_', name, CHARINDEX( '_', name ) + 1 ) + 1) will find the index of the third instance of an _ or will return 0 if there are not three _ characters.

Nesting this into an inner select you can use it to get the appropriate SUBSTRINGs in an outer select like this:

SELECT name,
       CASE WHEN idx2 > idx1 AND ( Name LIKE 'CG%' OR idx3 = 0 )THEN SUBSTRING( name, idx2 + 1, LEN( name ) )
            WHEN idx3 > idx2 THEN SUBSTRING( name, idx2 + 1, idx3 - idx2 - 1 )
            END AS OPT,
       CASE WHEN name LIKE 'CG%' THEN SUBSTRING( name, idx1 + 1, LEN( name ) )
            WHEN idx2 > idx1 THEN SUBSTRING( name, idx1 + 1, idx2 - idx1 - 1 )
            END AS Name2
FROM   (
  SELECT name,
         CHARINDEX( '_', name ) AS idx1,
         CHARINDEX( '_', name, CHARINDEX( '_', name ) + 1 ) AS idx2,
         CHARINDEX( '_', name, CHARINDEX( '_', name, CHARINDEX( '_', name ) + 1 ) + 1 ) AS idx3
  FROM   table_name
) t

Results:

|                            name |                  OPT |                        Name2 |
|---------------------------------|----------------------|------------------------------|
|           AE 344592001H 6186694 |               (null) |                       (null) |
|            AE_161038002_6044777 |              6044777 |                    161038002 |
|            BC_VIVS_HNB011A_1WAM |              HNB011A |                         VIVS |
|                  BC_56230A_30SP |                 30SP |                       56230A |
| CG_3334902_NETWK_ ACTLM_3334912 | NETWK_ ACTLM_3334912 | 3334902_NETWK_ ACTLM_3334912 |
|  CG_3334574_HMO1_CORACT_3334575 |  HMO1_CORACT_3334575 |  3334574_HMO1_CORACT_3334575 |
|  CG_3207160_POSC_1502AH_3207161 |  POSC_1502AH_3207161 |  3207160_POSC_1502AH_3207161 |
|                   UH_141015_RHM |                  RHM |                       141015 |
|                   UH_127757_RIV |                  RIV |                       127757 |
|                   UH 523725 RIV |               (null) |                       (null) |
|       BS_W0055785_C500_M0005672 |                 C500 |                     W0055785 |
Share:
19,308
russds
Author by

russds

web developer

Updated on June 04, 2022

Comments

  • russds
    russds almost 2 years

    I'm Looking for a way of replacing the use of INSTR(...) and REPLACE(REGEXP_SUBSTR(...)) oracle functions in SQL Server.

    Original Oracle:

       SELECT 
          Name,
          CASE
               WHEN SUBSTR (NAME, 1, 2) = 'CG'
               THEN SUBSTR (NAME,INSTR (NAME,'_',1,2)+ 1,LENGTH (NAME))
               ELSE REPLACE (REGEXP_SUBSTR (NAME,'_[^_]+',1,2),'_','')
            END AS OPT,
            CASE
               WHEN SUBSTR (NAME, 1, 2) = 'CG'
               THEN SUBSTR (NAME,INSTR (NAME, '_',1,1) + 1, LENGTH (NAME))
               ELSE REPLACE (REGEXP_SUBSTR (NAME,'_[^_]+',1,1),'_','')
            END as Name2
    

    Updated Example Output:

    +---------------------------------+----------------------+------------------------------+
    |              NAME               |         OPT          |            Name2             |
    +---------------------------------+----------------------+------------------------------+
    | AE 344592001H 6186694           | NULL                 | NULL                         |
    | AE_161038002_6044777            | 6044777              | 161038002                    |
    | BC_VIVS_HNB011A_1WAM            | HNB011A              | VIVS                         |
    | BC_56230A_30SP                  | 30SP                 | 56230A                       |
    | CG_3334902_NETWK_ ACTLM_3334912 | NETWK_ ACTLM_3334912 | 3334902_NETWK_ ACTLM_3334912 |
    | CG_3334574_HMO1_CORACT_3334575  | HMO1_CORACT_3334575  | 3334574_HMO1_CORACT_3334575  |
    | CG_3207160_POSC_1502AH_3207161  | POSC_1502AH_3207161  | 3207160_POSC_1502AH_3207161  |
    | UH_141015_RHM                   | RHM                  | 141015                       |
    | UH_127757_RIV                   | RIV                  | 127757                       |
    | UH 523725 RIV                   | NULL                 | NULL                         |
    | BS_W0055785_C500_M0005672       | C500                 | W0055785                     |
    +---------------------------------+----------------------+------------------------------+
    

    I tried looking at charindex and patindex but nothing panned out, I think the original regex expression is over my head to begin with. Any ideas on how I could mimic this logic in sql server?

  • russds
    russds about 8 years
    Thank you! Incredible solution. I've updated my examples to include the "CG" rows, and some better examples. Your solution is so close, but the CG records aren't parsed. Any thoughts on that? you're code is beyond me to try and dissect. Thanks!
  • MT0
    MT0 about 8 years
    @russds Updated - both solutions appear to give the desired output. Query 2 is more efficient (and much simpler).