Replace REGEXP_SUBSTR in SQL Server
19,308
MS SQL Server 2014 Schema Setup:
CREATE TABLE table_name ( name VARCHAR(50) );
INSERT INTO table_name
SELECT 'AE 344592001H 6186694' UNION ALL
SELECT 'AE_161038002_6044777' UNION ALL
SELECT 'BC_VIVS_HNB011A_1WAM' UNION ALL
SELECT 'BC_56230A_30SP' UNION ALL
SELECT 'CG_3334902_NETWK_ ACTLM_3334912' UNION ALL
SELECT 'CG_3334574_HMO1_CORACT_3334575' UNION ALL
SELECT 'CG_3207160_POSC_1502AH_3207161' UNION ALL
SELECT 'UH_141015_RHM' UNION ALL
SELECT 'UH_127757_RIV' UNION ALL
SELECT 'UH 523725 RIV' UNION ALL
SELECT 'BS_W0055785_C500_M0005672';
Query 1:
WITH Names ( lvl, name, remaining, idx ) AS (
SELECT 1,
name,
name,
CHARINDEX( '_', name )
FROM table_name
UNION ALL
SELECT lvl+1,
name,
SUBSTRING(remaining,idx+1,LEN(remaining)-idx),
CASE WHEN CHARINDEX( '_', remaining, idx+1 ) = 0
THEN 0
ELSE CHARINDEX( '_', remaining, idx+1 ) - idx
END
FROM Names
WHERE idx > 0
)
SELECT Name,
MAX( CASE WHEN lvl = 3 AND ( Name LIKE 'CG%' OR idx = 0 ) THEN remaining
WHEN lvl = 3 THEN SUBSTRING( remaining, 1, idx - 1 )
END ) AS OPT,
MAX( CASE WHEN lvl = 2 AND ( Name LIKE 'CG%' OR idx = 0 ) THEN remaining
WHEN lvl = 2 THEN SUBSTRING( remaining, 1, idx - 1 )
END ) AS Name2
FROM Names
GROUP BY Name
| Name | OPT | Name2 |
|---------------------------------|----------------------|------------------------------|
| AE 344592001H 6186694 | (null) | (null) |
| AE_161038002_6044777 | 6044777 | 161038002 |
| BC_56230A_30SP | 30SP | 56230A |
| BC_VIVS_HNB011A_1WAM | HNB011A | VIVS |
| BS_W0055785_C500_M0005672 | C500 | W0055785 |
| CG_3207160_POSC_1502AH_3207161 | POSC_1502AH_3207161 | 3207160_POSC_1502AH_3207161 |
| CG_3334574_HMO1_CORACT_3334575 | HMO1_CORACT_3334575 | 3334574_HMO1_CORACT_3334575 |
| CG_3334902_NETWK_ ACTLM_3334912 | NETWK_ ACTLM_3334912 | 3334902_NETWK_ ACTLM_3334912 |
| UH 523725 RIV | (null) | (null) |
| UH_127757_RIV | RIV | 127757 |
| UH_141015_RHM | RHM | 141015 |
Query 2:
CHARINDEX( expressionToFind, expressionToSerach[, startIndex] )
can be used to find the instances of _
in the word.
-
CHARINDEX( '_', name )
will find the index of the first instance of an_
. -
CHARINDEX( '_', name, CHARINDEX( '_', name ) + 1 )
will find the index of the second instance of an_
or will return0
if there are not two_
characters. -
CHARINDEX( '_', name, CHARINDEX( '_', name, CHARINDEX( '_', name ) + 1 ) + 1)
will find the index of the third instance of an_
or will return0
if there are not three_
characters.
Nesting this into an inner select you can use it to get the appropriate SUBSTRING
s in an outer select like this:
SELECT name,
CASE WHEN idx2 > idx1 AND ( Name LIKE 'CG%' OR idx3 = 0 )THEN SUBSTRING( name, idx2 + 1, LEN( name ) )
WHEN idx3 > idx2 THEN SUBSTRING( name, idx2 + 1, idx3 - idx2 - 1 )
END AS OPT,
CASE WHEN name LIKE 'CG%' THEN SUBSTRING( name, idx1 + 1, LEN( name ) )
WHEN idx2 > idx1 THEN SUBSTRING( name, idx1 + 1, idx2 - idx1 - 1 )
END AS Name2
FROM (
SELECT name,
CHARINDEX( '_', name ) AS idx1,
CHARINDEX( '_', name, CHARINDEX( '_', name ) + 1 ) AS idx2,
CHARINDEX( '_', name, CHARINDEX( '_', name, CHARINDEX( '_', name ) + 1 ) + 1 ) AS idx3
FROM table_name
) t
| name | OPT | Name2 |
|---------------------------------|----------------------|------------------------------|
| AE 344592001H 6186694 | (null) | (null) |
| AE_161038002_6044777 | 6044777 | 161038002 |
| BC_VIVS_HNB011A_1WAM | HNB011A | VIVS |
| BC_56230A_30SP | 30SP | 56230A |
| CG_3334902_NETWK_ ACTLM_3334912 | NETWK_ ACTLM_3334912 | 3334902_NETWK_ ACTLM_3334912 |
| CG_3334574_HMO1_CORACT_3334575 | HMO1_CORACT_3334575 | 3334574_HMO1_CORACT_3334575 |
| CG_3207160_POSC_1502AH_3207161 | POSC_1502AH_3207161 | 3207160_POSC_1502AH_3207161 |
| UH_141015_RHM | RHM | 141015 |
| UH_127757_RIV | RIV | 127757 |
| UH 523725 RIV | (null) | (null) |
| BS_W0055785_C500_M0005672 | C500 | W0055785 |
Comments
-
russds almost 2 years
I'm Looking for a way of replacing the use of INSTR(...) and REPLACE(REGEXP_SUBSTR(...)) oracle functions in SQL Server.
Original Oracle:
SELECT Name, CASE WHEN SUBSTR (NAME, 1, 2) = 'CG' THEN SUBSTR (NAME,INSTR (NAME,'_',1,2)+ 1,LENGTH (NAME)) ELSE REPLACE (REGEXP_SUBSTR (NAME,'_[^_]+',1,2),'_','') END AS OPT, CASE WHEN SUBSTR (NAME, 1, 2) = 'CG' THEN SUBSTR (NAME,INSTR (NAME, '_',1,1) + 1, LENGTH (NAME)) ELSE REPLACE (REGEXP_SUBSTR (NAME,'_[^_]+',1,1),'_','') END as Name2
Updated Example Output:
+---------------------------------+----------------------+------------------------------+ | NAME | OPT | Name2 | +---------------------------------+----------------------+------------------------------+ | AE 344592001H 6186694 | NULL | NULL | | AE_161038002_6044777 | 6044777 | 161038002 | | BC_VIVS_HNB011A_1WAM | HNB011A | VIVS | | BC_56230A_30SP | 30SP | 56230A | | CG_3334902_NETWK_ ACTLM_3334912 | NETWK_ ACTLM_3334912 | 3334902_NETWK_ ACTLM_3334912 | | CG_3334574_HMO1_CORACT_3334575 | HMO1_CORACT_3334575 | 3334574_HMO1_CORACT_3334575 | | CG_3207160_POSC_1502AH_3207161 | POSC_1502AH_3207161 | 3207160_POSC_1502AH_3207161 | | UH_141015_RHM | RHM | 141015 | | UH_127757_RIV | RIV | 127757 | | UH 523725 RIV | NULL | NULL | | BS_W0055785_C500_M0005672 | C500 | W0055785 | +---------------------------------+----------------------+------------------------------+
I tried looking at charindex and patindex but nothing panned out, I think the original regex expression is over my head to begin with. Any ideas on how I could mimic this logic in sql server?
-
russds about 8 yearsThank you! Incredible solution. I've updated my examples to include the "CG" rows, and some better examples. Your solution is so close, but the CG records aren't parsed. Any thoughts on that? you're code is beyond me to try and dissect. Thanks!
-
MT0 about 8 years@russds Updated - both solutions appear to give the desired output. Query 2 is more efficient (and much simpler).