UTF-8 output from PowerShell

126,423

Solution 1

This is a bug in .NET. When PowerShell launches, it caches the output handle (Console.Out). The Encoding property of that text writer does not pick up the value StandardOutputEncoding property.

When you change it from within PowerShell, the Encoding property of the cached output writer returns the cached value, so the output is still encoded with the default encoding.

As a workaround, I would suggest not changing the encoding. It will be returned to you as a Unicode string, at which point you can manage the encoding yourself.

Caching example:

102 [C:\Users\leeholm]
>> $r1 = [Console]::Out

103 [C:\Users\leeholm]
>> $r1

Encoding                                          FormatProvider
--------                                          --------------
System.Text.SBCSCodePageEncoding                  en-US



104 [C:\Users\leeholm]
>> [Console]::OutputEncoding = [System.Text.Encoding]::UTF8

105 [C:\Users\leeholm]
>> $r1

Encoding                                          FormatProvider
--------                                          --------------
System.Text.SBCSCodePageEncoding                  en-US

Solution 2

Not an expert on encoding, but after reading these...

... it seems fairly clear that the $OutputEncoding variable only affects data piped to native applications.

If sending to a file from withing PowerShell, the encoding can be controlled by the -encoding parameter on the out-file cmdlet e.g.

write-output "hello" | out-file "enctest.txt" -encoding utf8

Nothing else you can do on the PowerShell front then, but the following post may well help you:.

Solution 3

Set the [Console]::OuputEncoding as encoding whatever you want, and print out with [Console]::WriteLine.

If powershell ouput method has a problem, then don't use it. It feels bit bad, but works like a charm :)

Share:
126,423

Related videos on Youtube

Paul Stovell
Author by

Paul Stovell

I live in Brisbane and work full time bootstrapping my own product company around Octopus Deploy, an automated deployment tool for .NET applications. Prior to Octopus Deploy, I worked for an investment bank in London building WPF applications, and before that I worked for Readify, an Australian .NET consulting firm, where I was lucky enough to work with some very talented people. I also worked on a number of open source projects and was an active user group presenter. I've been a Microsoft MVP for WPF since 2006. I have a blog at paulstovell.com.

Updated on July 09, 2022

Comments

  • Paul Stovell
    Paul Stovell almost 2 years

    I'm trying to use Process.Start with redirected I/O to call PowerShell.exe with a string, and to get the output back, all in UTF-8. But I don't seem to be able to make this work.

    What I've tried:

    • Passing the command to run via the -Command parameter
    • Writing the PowerShell script as a file to disk with UTF-8 encoding
    • Writing the PowerShell script as a file to disk with UTF-8 with BOM encoding
    • Writing the PowerShell script as a file to disk with UTF-16
    • Setting Console.OutputEncoding in both my console application and in the PowerShell script
    • Setting $OutputEncoding in PowerShell
    • Setting Process.StartInfo.StandardOutputEncoding
    • Doing it all with Encoding.Unicode instead of Encoding.UTF8

    In every case, when I inspect the bytes I'm given, I get different values to my original string. I'd really love an explanation as to why this doesn't work.

    Here is my code:

    static void Main(string[] args)
    {
        DumpBytes("Héllo");
    
        ExecuteCommand("PowerShell.exe", "-Command \"$OutputEncoding = [System.Text.Encoding]::UTF8 ; Write-Output 'Héllo';\"",
            Environment.CurrentDirectory, DumpBytes, DumpBytes);
    
        Console.ReadLine();
    }
    
    static void DumpBytes(string text)
    {
        Console.Write(text + " " + string.Join(",", Encoding.UTF8.GetBytes(text).Select(b => b.ToString("X"))));
        Console.WriteLine();
    }
    
    static int ExecuteCommand(string executable, string arguments, string workingDirectory, Action<string> output, Action<string> error)
    {
        try
        {
            using (var process = new Process())
            {
                process.StartInfo.FileName = executable;
                process.StartInfo.Arguments = arguments;
                process.StartInfo.WorkingDirectory = workingDirectory;
                process.StartInfo.UseShellExecute = false;
                process.StartInfo.CreateNoWindow = true;
                process.StartInfo.RedirectStandardOutput = true;
                process.StartInfo.RedirectStandardError = true;
                process.StartInfo.StandardOutputEncoding = Encoding.UTF8;
                process.StartInfo.StandardErrorEncoding = Encoding.UTF8;
    
                using (var outputWaitHandle = new AutoResetEvent(false))
                using (var errorWaitHandle = new AutoResetEvent(false))
                {
                    process.OutputDataReceived += (sender, e) =>
                    {
                        if (e.Data == null)
                        {
                            outputWaitHandle.Set();
                        }
                        else
                        {
                            output(e.Data);
                        }
                    };
    
                    process.ErrorDataReceived += (sender, e) =>
                    {
                        if (e.Data == null)
                        {
                            errorWaitHandle.Set();
                        }
                        else
                        {
                            error(e.Data);
                        }
                    };
    
                    process.Start();
    
                    process.BeginOutputReadLine();
                    process.BeginErrorReadLine();
    
                    process.WaitForExit();
                    outputWaitHandle.WaitOne();
                    errorWaitHandle.WaitOne();
    
                    return process.ExitCode;
                }
            }
        }
        catch (Exception ex)
        {
            throw new Exception(string.Format("Error when attempting to execute {0}: {1}", executable, ex.Message),
                ex);
        }
    }
    

    Update 1

    I found that if I make this script:

    [Console]::OutputEncoding = [System.Text.Encoding]::UTF8
    Write-Host "Héllo!"
    [Console]::WriteLine("Héllo")
    

    Then invoke it via:

    ExecuteCommand("PowerShell.exe", "-File C:\\Users\\Paul\\Desktop\\Foo.ps1",
      Environment.CurrentDirectory, DumpBytes, DumpBytes);
    

    The first line is corrupted, but the second isn't:

    H?llo! 48,EF,BF,BD,6C,6C,6F,21
    Héllo 48,C3,A9,6C,6C,6F
    

    This suggests to me that my redirection code is all working fine; when I use Console.WriteLine in PowerShell I get UTF-8 as I expect.

    This means that PowerShell's Write-Output and Write-Host commands must be doing something different with the output, and not simply calling Console.WriteLine.

    Update 2

    I've even tried the following to force the PowerShell console code page to UTF-8, but Write-Host and Write-Output continue to produce broken results while [Console]::WriteLine works.

    $sig = @'
    [DllImport("kernel32.dll")]
    public static extern bool SetConsoleCP(uint wCodePageID);
    
    [DllImport("kernel32.dll")]
    public static extern bool SetConsoleOutputCP(uint wCodePageID);
    '@
    
    $type = Add-Type -MemberDefinition $sig -Name Win32Utils -Namespace Foo -PassThru
    
    $type::SetConsoleCP(65001)
    $type::SetConsoleOutputCP(65001)
    
    Write-Host "Héllo!"
    
    & chcp    # Tells us 65001 (UTF-8) is being used
    
    • alroc
      alroc about 10 years
      Why start Powershell.exe instead of using System.Management.Automation to embed PowerShell right in the app?
    • Paul Stovell
      Paul Stovell about 10 years
      I had an, um, friend, who built an entire app on top of System.Management.Automation. But after the 765th complaint from users of "I have a script that works like this under PowerShell.exe but like that under your host" I, I mean my friend, decided to give up on the idea.
    • Paul Stovell
      Paul Stovell about 10 years
    • Jaykul
      Jaykul about 10 years
      I think you're over thinking this ... you just need to accept the fact that it's UTF-16 ;-)
    • Paul Stovell
      Paul Stovell about 10 years
      Why do you say UTF-16? It seems to be the OEM codepage
    • mihca
      mihca about 3 years
      "Writing the PowerShell script as a file to disk with UTF-8 with BOM encoding". This has solved my issue. I guess Powershell got it wrong already when reading the input without BOM.
    • Paul Williams
      Paul Williams over 2 years
      Consider using [System.Web.HttpUtility]::UrlEncode() in PS and UrlDecode() in C#. Hopefully $OutputEncoding has been fixed in PS 7, but my code will need to work w/ PS 5 for a long time yet, and also be easily understood by other team members.
  • Paul Stovell
    Paul Stovell about 10 years
    Thanks for the info Lee. What do you mean by "It will be returned to you as a Unicode string, at which point you can manage the encoding yourself.". I'm trying to call Encoding.GetEncoding(850).GetBytes(textOutputByPowershell), followed by Encoding.UTF8.GetString(), but this also seems to produce the wrong output.
  • Jaykul
    Jaykul about 10 years
    It should be a .net string (thus, UTF16) already. Setting StandardOutputEncoding is never guaranteed to work anyway, because "setting this property does not guarantee that the process will use the specified encoding..." Having said that, I think it defaults to your Windows CodePage :-/
  • Paul Stovell
    Paul Stovell about 10 years
    Thanks everyone, got it working! (Solution added to original post)
  • aggieNick02
    aggieNick02 over 4 years
    What was your solution @PaulStovell ? I don't quite see it up above. Dealing with python code that writes to powershell's stdout, and going through cp1252 doesn't work so well when the source has unicode.