How to create a powershell script that uses regex to replace a string in text with a tab

19,777

Solution 1

What you're doing wrong is using escape character (`) inside of 'single quote strings'. Single quote strings are treated as literals. You need to use "double quotes" for this to work properly:

$text = '"Item 1","Item 2"'
$expr1 = '"([^"]+?)","([^"]+?)"'
$expr2 = "`$1`t`$2"
$line = [System.Text.RegularExpressions.Regex]::Replace($line, $expr1, $expr2);

-edit-

I only replaced single quotes with double quotes in $expr2. You're having issues with it because you replaced single quotes with double quotes in $text and $expr1 but did not escape the quote characters in the string.

$text can either use single quotes like this:

$text = '"Text1","Text2"'

Or double quotes like this (escape the " inside the string):

$text = "`"Text1`",`"Text2`""

$expr1 can use single quotes like this:

$expr1 = '"([^"]+?)","([^"]+?)"'

Or double quotes like this:

$expr1 = "`"([^`"]+?)`",`"([^`"]+?)`""

And $expr2 should only use double quotes like this:

$expr2 = "`$1`t`$2"

-edit again-

I'm actually not positive about your issue is now. I know those 4 lines I originally posted work in powershell. I copied them from my answer and pasted them in a powershell console and it worked. Passing strings as arguments on a script I don't know about and I can't test it where I'm at now. But try what I added and see what happens.

-edit 3-

This works, but I don't know why the ^ has to be repeated the 2nd time. If I just have 1 ^ it does not appear in the string so it won't match, but if i double it it works perfectly. No clue why ^ needs to be doubled in 1 place but not in the other.

powershell .\regex.ps1 '\"Test1\",\"Test2\"' '\"([^"]+?)\",\"([^^\"]+?)\"' "`$1`t`$2"

In powershell if you want to use escape characters you have to use double quote strings and use the backtick (`) to escape them. in the command line for the first and 2nd strings i have to use single quotes and use backslash (\) to escape the " character. For some reason in the 2nd parameter the 2nd ^ needs to be repeated twice in order for it to actually appear in the string. I have no idea why. For the 3rd parameter I had to use the backtick ` again for the string to work correctly when its passed into the powershell script.

I blame Microsoft for handling strings with a high degree of inconsistency.

Solution 2

The escape character in PowerShell is the Backtick (`, same key as the ~), to expand `t it needs to be enclosed in quotes:

PS> $text -replace '","',"""`t"""

you can also escape the quotes:

PS> $text -replace '","',"`"`t`""

Type this in your console for more help:

PS> Get-Help about_Escape_Characters

Solution 3

If one enters the following into a console:

"This`tTest"

they will get:

This    Test

Clearly a tab has been placed.

Now, if one enters THIS into a console:

"Testing`tThis"

they will instead get:

Testing This

Did the tab go away? No, Powershell just treats tabs as formatting (as in trying to create columns for tables) and not an absolute number of spaces. What if Powershell is given a string that fills up the entire space left by a tab?

We can test this by entering the following:

"Testings`tThis"

We end up getting this output:

Testings        This

It just so happens that OP's $test string "Item 1","Item 2" encounters the second test case above and when "," is replaced with `t, it gets "eaten" and looks like just a space and not a tab character. Indeed, Shay's answer will work, but because of how tabs work it simply won't look like it does (with this string).

In summary, I advise to use a literal number of spaces instead of `t, like this:

$text -replace '","','     '

(That's 5 spaces between the last two 's)

Or, if this output is going to be read by some other program, then the previously mentioned solution:

$text -replace '","',"`t"

will work, but you'll just have to live with Powershell displaying it funny.

Note:

If you are REALLY positive you want to use your script, then do something like the following:

$inputPattern = '","'
$replacePattern = "`t"

`t doesn't expand into a tab if enclosed in single quotes ' but will if enclosed in double quotes ", like variables.

Share:
19,777
Tola Odejayi
Author by

Tola Odejayi

Updated on June 07, 2022

Comments

  • Tola Odejayi
    Tola Odejayi almost 2 years

    I have a string like this:

    "Item 1","Item 2"

    I would like to replace it so that it looks like this, using a powershell script:

    Item 1{tab character}Item 2

    I have this:

    $text = '"Item 1","Item 2"'
    $expr1 = '"([^"]+?)","([^"]+?)"'
    $expr2 = "$1\t$2"
    $line = [System.Text.RegularExpressions.Regex]::Replace($text, $expr1, $expr2);
    

    but it doesn't work.

    As an aside, is there a definitive reference for how to deal with escaping quotes and special characters in Powershell? I find it very confusing indeed.

    .

    .

    EDIT:

    The reason I want to do this is so that I can wrap this up in a parameterised script and call it using parameters. The script (regex-rs.ps1) is like this:

    param
    (
        [string] $text,
        [string] $inputPattern,
        [string] $replacePattern
    )
    
    function Main()
    {
        $text2 = [System.Text.RegularExpressions.Regex]::Replace($text, $inputPattern, $replacePattern);
        [System.Console]::WriteLine($text2);
    }
    
    Main;
    

    Unfortunately, when I call the script like this:

    powershell .\regex-rs.ps1 '"Text1","Text2"' '`"([^`"]+?)`",`"([^`"]+?)`"' '`$1`t`$2'
    

    It outputs:

    Text1,Text2
    

    In other words, no tab. What am I doing wrong?

    .

    .

    FURTHER EDIT IN RESPONSE TO NICK'S ANSWER BELOW: (I have to put this here, because the comment formatting in StackOverflow messes around with backticks)

    I replaced the single quotes with double quotes in my powershell call, like so:

    powershell .\regex-rs.ps1 ""Text1","Text2"" "`"([^`"]+?)`",`"([^`"]+?)`"" "`$1`t`$2"
    

    But I got this error:

    Missing ] at end of type token.
    

    Any further ideas?

    .

    FINAL EDIT: This is the call to the script that fixed the issue (have to post as an image, because it's so powerful that it's defied StackOverflow's formatting, even here): .

    final script call

    • Nick
      Nick over 12 years
      I don't know if this is your issue, but shouldn't $line = [System.RegularExpressions.Text]::Replace($text, $expr1, $expr2); be $line = [System.Text.RegularExpressions.Regex]::Replace($text, $expr1, $expr2);?
    • Tola Odejayi
      Tola Odejayi over 12 years
      Thanks. Fixed this above, but that's not the issue.
  • Tola Odejayi
    Tola Odejayi over 12 years
    See my second edit above. Can't add it here because of formatting issues.
  • Nick
    Nick over 12 years
    See my edit. I'm having formetting issues in the comments too.
  • Nick
    Nick over 12 years
    You consider that handling tabs oddly? That's exactly how I expect tabs to work in any application.
  • Tola Odejayi
    Tola Odejayi over 12 years
    Unfortunately, I really need this to work as a powershell script. :( I thought it would be fairly straightforward, but I may have to offer a bounty on this one...
  • Tola Odejayi
    Tola Odejayi over 12 years
    Thanks, @Shay-Levy; your help command should actually be get-help about_escape_characters
  • Nick
    Nick over 12 years
    fortunately, it does work in a powershell script. It's just an issue of getting the strings from the commandline args to pass in correctly. Look at the last edit. I just tested this and it does work, I just don't know why.
  • Tola Odejayi
    Tola Odejayi over 12 years
    Your answer was slightly off (see final edit above), but it definitely put me on the right track, and you more than deserve to have your answer marked as correct.
  • Tola Odejayi
    Tola Odejayi over 12 years
    Incidentally @Nick, the reason that you need double carets is because caret is an escape character in batch commands that you run from the command prompt, so it needs to be doubled if you want to pass a caret as an argument.
  • SpellingD
    SpellingD over 12 years
    Semantics. I guess it really depends on what applications one works with on a day to day basis. To be frank, most editors I use treat a Tab key press (and perhaps this just an error of attributing a key to its character) as a set number of space characters, or an indent. Nevertheless, I've removed the line from my answer.
  • Nick
    Nick over 12 years
    @TolaOdejayi I'm still confused why the 1st caret does not need to be doubled but the 2nd one does.
  • Prid
    Prid over 2 years
    the escape character only works in double quotes