(PowerShell) split string with escaped separator characters

11,553

Solution 1

To summarize and complement the existing, helpful answers:

  • mjolinor's answer works well if you needn't worry about \\ appearing in the input as an escaped \.
    If \\ were present, the solution would misinterpret the , in \\, as escaped (rather than an escaped \ followed by an unescaped ,).

  • iRon's own answer addresses that problem with a more sophisticated regex.

Additionally, you may want to remove the escape characters after splitting; building on a regex provided by Wiktor Stribiżew here and adding a -replace operation with regex \\(.):

PS> 'foo,bar\,baz,bang\\,last' -split '(?<=(?<!\\)(?:\\\\)*),' -replace '\\(.)', '$1'
foo
bar,baz
bang\
last

Here's a simple utility function that wraps the above, with a configurable separator and escape char.:

function Split-Text {
  param(
      [Parameter(Mandatory=$True)] [string] $Text,
      [Parameter(Mandatory=$True)] [string] $Separator,
      [string] $EscapeChar = '\'
  )
  $Text -split
      ('(?<=(?<!{0})(?:{0}{0})*){1}' -f [regex]::Escape($EscapeChar), [regex]::Escape($Separator)) `
          -replace ('{0}(.)' -f [regex]::Escape($EscapeChar)), '$1'
}
# Sample call - yields the same as above.
Split-Text 'foo,bar\,baz,bang\\,last' ','

# With "/" as the separator - analogous output.
Split-Text 'foo/bar\/baz/bang\\/last' '/'

Solution 2

Using negative lookbehind:

$text = 'CN=Test User,OU=Comma\,Test,OU=Test,DC=domain,DC=com'
$text -split '(?<!\\),'

CN=Test User
OU=Comma\,Test
OU=Test
DC=domain
DC=com

$text = 'Domain.com/Test/Slash\/Test/Test User'
$text -split '(?<!\\)/'

Domain.com
Test
Slash\/Test
Test User

Solution 3

I think there is still a little trap as RNDs could potentially end with a backslash (which will be escaped with an additional backslash):

$text = 'CN=Test User,OU=EndSlash\\,OU=Comma\,Test,DC=domain,DC=com'
$text -split '(?<!\\),'
CN=Test User
OU=EndSlash\\,OU=Comma\,Test
DC=domain
DC=com

In other words the concerned separator should only be skipped if there is an odd number of backslashes in front of it. To cover this, I think the complete regular expressions should be: (?<![^\\](\\\\)*\\), (for Distinguished Names) and (?<![^\\](\\\\)*\\)/ (for Canonical Names).

$text = 'CN=Test User,OU=EndSlash\\,OU=Comma\,Test,DC=domain,DC=com'
$text -split '(?<![^\\](\\\\)*\\),'
CN=Test User
OU=EndSlash\\
OU=Comma\,Test
DC=domain
DC=com

$text = 'Domain.com/Slash\/Test/EndSlash\\/Test/Test User'
$text -split '(?<![^\\](\\\\)*\\)/'
Domain.com
Slash\/Test
EndSlash\\
Test
Test User
Share:
11,553
iRon
Author by

iRon

There are only 10 types of people in the world: those who understand binary, and those who don't

Updated on June 05, 2022

Comments

  • iRon
    iRon almost 2 years

    The split module is often used to split Active Directory Distinguished Names and Canonical Names to RDNs conveniently forgetting about the escaped separator characters that might be used in OUs and CNs as:

    Distinguished Name Example with an escaped comma:

    CN=Test User,OU=Comma\,Test,OU=Test,DC=domain,DC=com
    

    Canonical Name Example with an escaped slash:

    Domain.com/Test/Slash\/Test/Test User
    

    There are several splitting examples on the internet that do not even mention this trap which might work for a long time but sooner or later will cause a lot of pain troubleshooting this programming flaw .

    I don’t think there is an easy way to correctly split escaped strings using a Regular Expression (see also: Is there a pure regex split of a string containing escape sequences?). .

  • iRon
    iRon about 10 years
    I had seen this answer as well but couldn't get it to work before. But appearently I maked a typo because it does work now. -thanks