Using PowerShell to write a file in UTF-8 without the BOM
Solution 1
Using .NET's UTF8Encoding
class and passing $False
to the constructor seems to work:
$MyRawString = Get-Content -Raw $MyPath
$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
[System.IO.File]::WriteAllLines($MyPath, $MyRawString, $Utf8NoBomEncoding)
Solution 2
The proper way as of now is to use a solution recommended by @Roman Kuzmin in comments to @M. Dudley answer:
[IO.File]::WriteAllLines($filename, $content)
(I've also shortened it a bit by stripping unnecessary System
namespace clarification - it will be substituted automatically by default.)
Solution 3
I figured this wouldn't be UTF, but I just found a pretty simple solution that seems to work...
Get-Content path/to/file.ext | out-file -encoding ASCII targetFile.ext
For me this results in a utf-8 without bom file regardless of the source format.
Solution 4
Note: This answer applies to Windows PowerShell; by contrast, in the cross-platform PowerShell Core edition (v6+), UTF-8 without BOM is the default encoding, across all cmdlets.
-
In other words: If you're using PowerShell [Core] version 6 or higher, you get BOM-less UTF-8 files by default (which you can also explicitly request with
-Encoding utf8
/-Encoding utf8NoBOM
, whereas you get with-BOM encoding with-utf8BOM
). -
If you're running Windows 10 and you're willing to switch to BOM-less UTF-8 encoding system-wide - which can have side effects - even Windows PowerShell can be made to use BOM-less UTF-8 consistently - see this answer.
To complement M. Dudley's own simple and pragmatic answer (and ForNeVeR's more concise reformulation):
For convenience, here's advanced function Out-FileUtf8NoBom
, a pipeline-based alternative that mimics Out-File
, which means:
- you can use it just like
Out-File
in a pipeline. - input objects that aren't strings are formatted as they would be if you sent them to the console, just like with
Out-File
. - an additional
-UseLF
switch allows you transform Windows-style CRLF newlines to Unix-style LF-only newlines.
Example:
(Get-Content $MyPath) | Out-FileUtf8NoBom $MyPath # Add -UseLF for Unix newlines
Note how (Get-Content $MyPath)
is enclosed in (...)
, which ensures that the entire file is opened, read in full, and closed before sending the result through the pipeline. This is necessary in order to be able to write back to the same file (update it in place).
Generally, though, this technique is not advisable for 2 reasons: (a) the whole file must fit into memory and (b) if the command is interrupted, data will be lost.
A note on memory use:
- M. Dudley's own answer requires that the entire file contents be built up in memory first, which can be problematic with large files.
- The function below improves on this only slightly: all input objects are still buffered first, but their string representations are then generated and written to the output file one by one.
Source code of function Out-FileUtf8NoBom
:
Note: The function is also available as an MIT-licensed Gist, and only it will be maintained going forward.
You can install it directly with the following command (while I can personally assure you that doing so is safe, you should always check the content of a script before directly executing it this way):
# Download and define the function.
irm https://gist.github.com/mklement0/8689b9b5123a9ba11df7214f82a673be/raw/Out-FileUtf8NoBom.ps1 | iex
function Out-FileUtf8NoBom {
<#
.SYNOPSIS
Outputs to a UTF-8-encoded file *without a BOM* (byte-order mark).
.DESCRIPTION
Mimics the most important aspects of Out-File:
* Input objects are sent to Out-String first.
* -Append allows you to append to an existing file, -NoClobber prevents
overwriting of an existing file.
* -Width allows you to specify the line width for the text representations
of input objects that aren't strings.
However, it is not a complete implementation of all Out-File parameters:
* Only a literal output path is supported, and only as a parameter.
* -Force is not supported.
* Conversely, an extra -UseLF switch is supported for using LF-only newlines.
Caveat: *All* pipeline input is buffered before writing output starts,
but the string representations are generated and written to the target
file one by one.
.NOTES
The raison d'être for this advanced function is that Windows PowerShell
lacks the ability to write UTF-8 files without a BOM: using -Encoding UTF8
invariably prepends a BOM.
Copyright (c) 2017, 2020 Michael Klement <[email protected]> (http://same2u.net),
released under the [MIT license](https://spdx.org/licenses/MIT#licenseText).
#>
[CmdletBinding()]
param(
[Parameter(Mandatory, Position=0)] [string] $LiteralPath,
[switch] $Append,
[switch] $NoClobber,
[AllowNull()] [int] $Width,
[switch] $UseLF,
[Parameter(ValueFromPipeline)] $InputObject
)
#requires -version 3
# Convert the input path to a full one, since .NET's working dir. usually
# differs from PowerShell's.
$dir = Split-Path -LiteralPath $LiteralPath
if ($dir) { $dir = Convert-Path -ErrorAction Stop -LiteralPath $dir } else { $dir = $pwd.ProviderPath}
$LiteralPath = [IO.Path]::Combine($dir, [IO.Path]::GetFileName($LiteralPath))
# If -NoClobber was specified, throw an exception if the target file already
# exists.
if ($NoClobber -and (Test-Path $LiteralPath)) {
Throw [IO.IOException] "The file '$LiteralPath' already exists."
}
# Create a StreamWriter object.
# Note that we take advantage of the fact that the StreamWriter class by default:
# - uses UTF-8 encoding
# - without a BOM.
$sw = New-Object System.IO.StreamWriter $LiteralPath, $Append
$htOutStringArgs = @{}
if ($Width) {
$htOutStringArgs += @{ Width = $Width }
}
# Note: By not using begin / process / end blocks, we're effectively running
# in the end block, which means that all pipeline input has already
# been collected in automatic variable $Input.
# We must use this approach, because using | Out-String individually
# in each iteration of a process block would format each input object
# with an indvidual header.
try {
$Input | Out-String -Stream @htOutStringArgs | % {
if ($UseLf) {
$sw.Write($_ + "`n")
}
else {
$sw.WriteLine($_)
}
}
} finally {
$sw.Dispose()
}
}
Solution 5
Starting from version 6 powershell supports the UTF8NoBOM
encoding both for set-content and out-file and even uses this as default encoding.
So in the above example it should simply be like this:
$MyFile | Out-File -Encoding UTF8NoBOM $MyPath
Comments
-
sourcenouveau almost 2 years
Out-File
seems to force the BOM when using UTF-8:$MyFile = Get-Content $MyPath $MyFile | Out-File -Encoding "UTF8" $MyPath
How can I write a file in UTF-8 with no BOM using PowerShell?
Update 2021
PowerShell has changed a bit since I wrote this question 10 years ago. Check multiple answers below, they have a lot of good information!
-
Signal15 over 9 yearsBOM = Byte-Order Mark. Three chars placed at the beginning of a file (0xEF,0xBB,0xBF) that look like ""
-
MichaelGG about 9 yearsThis is incredibly frustrating. Even third party modules get polluted, like trying to upload a file over SSH? BOM! "Yeah, let's corrupt every single file; that sounds like a good idea." -Microsoft.
-
Paul Shiryaev almost 5 yearsThe default encoding is UTF8NoBOM starting with Powershell version 6.0 docs.microsoft.com/en-us/powershell/module/…
-
Dragas over 4 yearsTalk about breaking backwards compatibility...
-
-
Scott Muc almost 13 yearsUgh, I hope that's not the only way.
-
Roman Kuzmin over 12 yearsOne line
[System.IO.File]::WriteAllLines($MyPath, $MyFile)
is enough. ThisWriteAllLines
overload writes exactly UTF8 without BOM. -
darksoulsong over 10 yearsThis one fails without any warning. What version of powershell should I use to run it?
-
Groostav about 9 yearsCreated an MSDN feature request here: connect.microsoft.com/PowerShell/feedbackdetail/view/1137121/…
-
BermudaLamb about 9 yearsThe WriteAllLines solution works great for small files. However, I need a solution for larger files. Every time I try to use this with a larger file I'm getting an OutOfMemory error.
-
sourcenouveau almost 9 yearsPer the Out-File documentation specifying the
Default
encoding will use the system's current ANSI code page, which is not UTF-8, as I required. -
ForNeVeR over 8 yearsNo, it will convert the output to current ANSI codepage (cp1251 or cp1252, for example). It is not UTF-8 at all!
-
Greg over 8 yearsThanks Robin. This may not have worked for writing a UTF-8 file without the BOM but the -Encoding ASCII option removed the BOM. That way I could generate a bat file for gvim. The .bat file was tripping up on the BOM.
-
mklement0 over 8 years@ForNeVeR: You're correct that encoding
ASCII
is not UTF-8, but it's als not the current ANSI codepage - you're thinking ofDefault
;ASCII
truly is 7-bit ASCII encoding, with codepoints >= 128 getting converted to literal?
instances. -
ForNeVeR over 8 years@mklement0 AFAIK
ASCII
really mean the default single-byte encoding in this API and generally in Windows. Yes, it is not in sync with the official ASCII definition, but is just a historical legacy. -
mklement0 over 8 years@ForNeVeR: You're probably thinking of "ANSI" or "extended ASCII". Try this to verify that
-Encoding ASCII
is indeed 7-bit ASCII only:'äb' | out-file ($f = [IO.Path]::GetTempFilename()) -encoding ASCII; '?b' -eq $(Get-Content $f; Remove-Item $f)
- theä
has been transliterated to a?
. By contrast,-Encoding Default
("ANSI") would correctly preserve it. -
Liam almost 8 yearsThis (for whatever reason) did not remove the BOM for me, where as the accepted answer did
-
ForNeVeR almost 8 years@Liam, probably some old version of PowerShell or .NET?
-
eythort over 7 yearsThis does seem to work for me, at least for Export-CSV. If you open the resulting file in a proper editor, the file encoding is UTF-8 without BOM, and not Western Latin ISO 9 as I would have expected with ASCII
-
TNT over 7 years@rob This is the perfect answer for everybody who just doesn't need utf-8 or anything else that is different to ASCII and is not interested in understanding encodings and the purpose of unicode. You can use it as utf-8 because the equivalent utf-8 characters to all ASCII characters are identical (means converting an ASCII-file to an utf-8-file results in an identical file (if it gets no BOM)). For all who have non-ASCII characters in their text this answer is just false and misleading.
-
sschuberth over 7 yearsNote that
WriteAllLines
seems to require$MyPath
to be absolute. -
Just Rudy over 7 yearsThis worked for me, except I used
-encoding utf8
for my requirement. -
codewario over 7 years@sschuberth I just tried WriteAllLines with a relative path, works fine for me. Does it give you an error with a relative path?
-
sschuberth over 7 years@AlexanderMiles It "works", but the file ends up being in some weird directory (not relative to the current working directory). IIRC it was the path of the PowerShell interpreter binary.
-
codewario over 7 yearsI believe older versions of the .NET WriteAllLines function did write the BOM by default. So it could be a version issue.
-
ForNeVeR over 7 years@AlexanderMiles best I can tell from .NET 2.0 documentation, it still uses BOMless UTF-8 there.
-
xdhmoore about 7 yearsFor me, it seems to write the file to my Desktop even if I'm currently in another directory.
-
user1529294 about 7 yearsThank you very much. I am working with dump logs of a tool - which had tabs inside it. UTF-8 was not working. ASCII solved the problem. Thanks.
-
mklement0 about 7 yearsYes,
-Encoding ASCII
avoids the BOM problem, but you obviously only get 7-bit ASCII characters. Given that ASCII is a subset of UTF-8, the resulting file is technically also a valid UTF-8 file, but all non-ASCII characters in your input will be converted to literal?
characters. -
Rosberg Linhares almost 7 yearsIf you don't want an extra new line in the end of the file, you can do this:
[IO.File]::WriteAllText($MyPath, $MyFile)
. -
emptyother almost 7 yearsMany editors open the file as UTF-8 if they can't detect the encoding.
-
BobHy over 6 yearsCan confirm this writes UTF8 no BOM on Win10 / .Net 4.6. But still needs an absolute path .
-
Shayan Toqraee over 6 years@xdhmoore
WriteAllLines
gets the current directory from[System.Environment]::CurrentDirectory
. If you open PowerShell and then change your current directory (usingcd
orSet-Location
), then[System.Environment]::CurrentDirectory
will not be changed and the file will end up being in the wrong directory. You can work around this by[System.Environment]::CurrentDirectory = (Get-Location).Path
. -
chazbot7 over 6 yearsConfirmed with writes with a BOM in Powershell 3, but without a BOM in Powershell 4. I had to use M. Dudley's original answer.
-
Johny Skovdal over 6 yearsSo it works on Windows 10 where it's installed by default. :) Also, suggested improvement:
[IO.File]::WriteAllLines(($filename | Resolve-Path), $content)
-
mklement0 about 6 yearsGood pointers; suggestions/: the simpler alternative to
$ExecutionContext.SessionState.Path.GetUnresolvedProviderPathFromPSPath($MyPath)
isConvert-Path $MyPath
; if you want to ensure a trailing CRLF, simply use[System.IO.File]::WriteAllLines()
even with a single input string (no need forOut-String
). -
mklement0 about 6 yearsYes,
-Encoding ASCII
avoids the BOM problem, but you obviously only get support for 7-bit ASCII characters. Given that ASCII is a subset of UTF-8, the resulting file is technically also a valid UTF-8 file, but all non-ASCII characters in your input will be converted to literal?
characters. -
Amit Naidu about 6 yearsThis answer needs more votes. The sqlplus incompatibility with BOM is a cause of many headaches.
-
watery about 6 yearsThis looks to be the solution still in 2018 with Out-File from PowerShell 6; but Notepad++ states the file has no encoding, any hint?
-
DoubleOZ almost 6 years$MyFile variable does not have to be object that is created by a Get-Content. It can also be a plain string, i.e. $MyFile = "utf8 string of some kind..."
-
pholpar about 5 yearsInstead of New-Object System.Text.UTF8Encoding $False you can use simply New-Object System.Text.UTF8Encoding, since "This constructor creates an instance that does not provide a Unicode byte order mark", see docs.microsoft.com/en-us/dotnet/api/…
-
PolarBear almost 5 yearsAs @RosbergLinhares noted,
WriteAllLines
adds an extra new line at the end of a file. But to makeWriteAllText
work you have to use-Raw
parameter forGet-Content
, otherwise all text will be squashed into a single line.$fileContent = Get-Content -Raw "$fileFullName"; [System.IO.File]::WriteAllText($fileFullName, $fileContent)
-
KCD over 4 yearsNice. FYI check version with
$PSVersionTable.PSVersion
-
mklement0 over 3 yearsWorth noting that in PowerShell [Core] v6+
-Encoding UTF8NoBOM
is never required, because it is the default encoding. -
mklement0 over 3 yearsNice - works great with strings (which may be all that is needed and certainly meets the requirements of the question). In case you need to take advantage of the formatting that
Out-File
, unlikeSet-Content
, provides, pipe toOut-String
first; e.g.,$MyFile = Get-ChildItem | Out-String
-
mklement0 over 3 yearsTo spell it out: This is a system-wide setting that makes Windows PowerShell default to BOM-less UTF-8 across all cmdlets, which may or may not be desired, not least because the feature is still in beta (as of this writing) and can break legacy console applications - see this answer for background information.
-
duct_tape_coder about 3 yearsIn powershell 5.1 on Win 10:
[IO.File]::WriteAllLines("c:\users\user\file.txt", $content)
gives meCannot find an overload for "WriteAllLines" and the argument count "2"
-
Joel Coehoorn about 2 years@AmitNaidu No, this is the wrong answer, because it won't work if the text has any non-ascii characters: any accents, umlauts, oriental/cryllic, etc.
-
Erik Anderson about 2 years@JoelCoehoorn This is a correct answer according to what the user asked. Since the user asked for a way to "force", they're not expecting any issues or don't care probably because the source doesn't use any non-ASCII characters. For those who do care about the preservation of those characters, this will not work.
-
ygoe about 2 yearsWarning: Definitely not. This deletes all non-ASCII characters and replaces them with question marks. Don't do this or you will lose data! (Tried with PS 5.1 on Windows 10)