UTF-8 output from PowerShell
Solution 1
This is a bug in .NET. When PowerShell launches, it caches the output handle (Console.Out). The Encoding property of that text writer does not pick up the value StandardOutputEncoding property.
When you change it from within PowerShell, the Encoding property of the cached output writer returns the cached value, so the output is still encoded with the default encoding.
As a workaround, I would suggest not changing the encoding. It will be returned to you as a Unicode string, at which point you can manage the encoding yourself.
Caching example:
102 [C:\Users\leeholm]
>> $r1 = [Console]::Out
103 [C:\Users\leeholm]
>> $r1
Encoding FormatProvider
-------- --------------
System.Text.SBCSCodePageEncoding en-US
104 [C:\Users\leeholm]
>> [Console]::OutputEncoding = [System.Text.Encoding]::UTF8
105 [C:\Users\leeholm]
>> $r1
Encoding FormatProvider
-------- --------------
System.Text.SBCSCodePageEncoding en-US
Solution 2
Not an expert on encoding, but after reading these...
- http://blogs.msdn.com/b/powershell/archive/2006/12/11/outputencoding-to-the-rescue.aspx
- http://technet.microsoft.com/en-us/library/hh847796.aspx
- http://www.johndcook.com/blog/2008/08/25/powershell-output-redirection-unicode-or-ascii/
... it seems fairly clear that the $OutputEncoding variable only affects data piped to native applications.
If sending to a file from withing PowerShell, the encoding can be controlled by the -encoding
parameter on the out-file
cmdlet e.g.
write-output "hello" | out-file "enctest.txt" -encoding utf8
Nothing else you can do on the PowerShell front then, but the following post may well help you:.
Solution 3
Set the [Console]::OuputEncoding
as encoding whatever you want, and print out with [Console]::WriteLine
.
If powershell ouput method has a problem, then don't use it. It feels bit bad, but works like a charm :)
Related videos on Youtube
Paul Stovell
I live in Brisbane and work full time bootstrapping my own product company around Octopus Deploy, an automated deployment tool for .NET applications. Prior to Octopus Deploy, I worked for an investment bank in London building WPF applications, and before that I worked for Readify, an Australian .NET consulting firm, where I was lucky enough to work with some very talented people. I also worked on a number of open source projects and was an active user group presenter. I've been a Microsoft MVP for WPF since 2006. I have a blog at paulstovell.com.
Updated on July 09, 2022Comments
-
Paul Stovell almost 2 years
I'm trying to use
Process.Start
with redirected I/O to callPowerShell.exe
with a string, and to get the output back, all in UTF-8. But I don't seem to be able to make this work.What I've tried:
- Passing the command to run via the
-Command
parameter - Writing the PowerShell script as a file to disk with UTF-8 encoding
- Writing the PowerShell script as a file to disk with UTF-8 with BOM encoding
- Writing the PowerShell script as a file to disk with UTF-16
- Setting
Console.OutputEncoding
in both my console application and in the PowerShell script - Setting
$OutputEncoding
in PowerShell - Setting
Process.StartInfo.StandardOutputEncoding
- Doing it all with
Encoding.Unicode
instead ofEncoding.UTF8
In every case, when I inspect the bytes I'm given, I get different values to my original string. I'd really love an explanation as to why this doesn't work.
Here is my code:
static void Main(string[] args) { DumpBytes("Héllo"); ExecuteCommand("PowerShell.exe", "-Command \"$OutputEncoding = [System.Text.Encoding]::UTF8 ; Write-Output 'Héllo';\"", Environment.CurrentDirectory, DumpBytes, DumpBytes); Console.ReadLine(); } static void DumpBytes(string text) { Console.Write(text + " " + string.Join(",", Encoding.UTF8.GetBytes(text).Select(b => b.ToString("X")))); Console.WriteLine(); } static int ExecuteCommand(string executable, string arguments, string workingDirectory, Action<string> output, Action<string> error) { try { using (var process = new Process()) { process.StartInfo.FileName = executable; process.StartInfo.Arguments = arguments; process.StartInfo.WorkingDirectory = workingDirectory; process.StartInfo.UseShellExecute = false; process.StartInfo.CreateNoWindow = true; process.StartInfo.RedirectStandardOutput = true; process.StartInfo.RedirectStandardError = true; process.StartInfo.StandardOutputEncoding = Encoding.UTF8; process.StartInfo.StandardErrorEncoding = Encoding.UTF8; using (var outputWaitHandle = new AutoResetEvent(false)) using (var errorWaitHandle = new AutoResetEvent(false)) { process.OutputDataReceived += (sender, e) => { if (e.Data == null) { outputWaitHandle.Set(); } else { output(e.Data); } }; process.ErrorDataReceived += (sender, e) => { if (e.Data == null) { errorWaitHandle.Set(); } else { error(e.Data); } }; process.Start(); process.BeginOutputReadLine(); process.BeginErrorReadLine(); process.WaitForExit(); outputWaitHandle.WaitOne(); errorWaitHandle.WaitOne(); return process.ExitCode; } } } catch (Exception ex) { throw new Exception(string.Format("Error when attempting to execute {0}: {1}", executable, ex.Message), ex); } }
Update 1
I found that if I make this script:
[Console]::OutputEncoding = [System.Text.Encoding]::UTF8 Write-Host "Héllo!" [Console]::WriteLine("Héllo")
Then invoke it via:
ExecuteCommand("PowerShell.exe", "-File C:\\Users\\Paul\\Desktop\\Foo.ps1", Environment.CurrentDirectory, DumpBytes, DumpBytes);
The first line is corrupted, but the second isn't:
H?llo! 48,EF,BF,BD,6C,6C,6F,21 Héllo 48,C3,A9,6C,6C,6F
This suggests to me that my redirection code is all working fine; when I use
Console.WriteLine
in PowerShell I get UTF-8 as I expect.This means that PowerShell's
Write-Output
andWrite-Host
commands must be doing something different with the output, and not simply callingConsole.WriteLine
.Update 2
I've even tried the following to force the PowerShell console code page to UTF-8, but
Write-Host
andWrite-Output
continue to produce broken results while[Console]::WriteLine
works.$sig = @' [DllImport("kernel32.dll")] public static extern bool SetConsoleCP(uint wCodePageID); [DllImport("kernel32.dll")] public static extern bool SetConsoleOutputCP(uint wCodePageID); '@ $type = Add-Type -MemberDefinition $sig -Name Win32Utils -Namespace Foo -PassThru $type::SetConsoleCP(65001) $type::SetConsoleOutputCP(65001) Write-Host "Héllo!" & chcp # Tells us 65001 (UTF-8) is being used
-
alroc about 10 yearsWhy start
Powershell.exe
instead of usingSystem.Management.Automation
to embed PowerShell right in the app? -
Paul Stovell about 10 yearsI had an, um, friend, who built an entire app on top of
System.Management.Automation
. But after the 765th complaint from users of "I have a script that works like this under PowerShell.exe but like that under your host" I, I mean my friend, decided to give up on the idea. -
Paul Stovell about 10 yearsI posted some info here: octopusdeploy.com/blog/improving-powershell
-
Jaykul about 10 yearsI think you're over thinking this ... you just need to accept the fact that it's UTF-16 ;-)
-
Paul Stovell about 10 yearsWhy do you say UTF-16? It seems to be the OEM codepage
-
mihca about 3 years"Writing the PowerShell script as a file to disk with UTF-8 with BOM encoding". This has solved my issue. I guess Powershell got it wrong already when reading the input without BOM.
-
Paul Williams over 2 yearsConsider using
[System.Web.HttpUtility]::UrlEncode()
in PS andUrlDecode()
in C#. Hopefully$OutputEncoding
has been fixed in PS 7, but my code will need to work w/ PS 5 for a long time yet, and also be easily understood by other team members.
- Passing the command to run via the
-
Paul Stovell about 10 yearsThanks for the info Lee. What do you mean by "It will be returned to you as a Unicode string, at which point you can manage the encoding yourself.". I'm trying to call
Encoding.GetEncoding(850).GetBytes(textOutputByPowershell)
, followed byEncoding.UTF8.GetString()
, but this also seems to produce the wrong output. -
Jaykul about 10 yearsIt should be a .net string (thus, UTF16) already. Setting StandardOutputEncoding is never guaranteed to work anyway, because "setting this property does not guarantee that the process will use the specified encoding..." Having said that, I think it defaults to your Windows CodePage :-/
-
Paul Stovell about 10 yearsThanks everyone, got it working! (Solution added to original post)
-
aggieNick02 over 4 yearsWhat was your solution @PaulStovell ? I don't quite see it up above. Dealing with python code that writes to powershell's stdout, and going through cp1252 doesn't work so well when the source has unicode.