Convert a text file from ansi to UTF-8 in windows batch scripting

28,135

Solution 1

The PowerShell syntax is rather straightforward. This command opens a file in the default encoding and saves it as UTF-8 with BOM:

Get-Content <SrcFile.txt> -Encoding Oem | Out-File <DestFile.txt> -Encoding utf8

The Encoding parameter accepts the following: Ascii, BigEndianUnicode, BigEndianUTF32, Byte, Default, Oem, String, Unicode, Unknown, UTF32, UTF7, UTF8

Solution 2

Get-Content might be not optimal as it handles the input file line by line (at least, by default, if you don't use the Raw switch as described later), and may cause changing the line ending (for example, if you move text files between Unix and Windows systems). I had serious problems in a script just because that, and it took about an hour to find the exact reason. See more about that in this post. Due to this behavior, Get-Content is not the best choice as well, if performance matters.

Instead of this, you can use PowerShell in combination of the .NET classes (as long you have a version of the .NET Framework installed on your system):

$sr = New-Object System.IO.StreamReader($infile) 
$sw = New-Object System.IO.StreamWriter($outfile, $false, [System.Text.Encoding]::Default)

$sw.Write($sr.ReadToEnd())

$sw.Close()
$sr.Close() 
$sw.Dispose()
$sr.Dispose()

Or even more simply, use the Raw switch as described here to avoid that overhead and read the text in a single block:

Get-Content $inFile -Raw
Share:
28,135

Related videos on Youtube

Admin
Author by

Admin

Updated on September 18, 2022

Comments

  • Admin
    Admin over 1 year

    We have a text file which is in default ANSI format and that needs to be converted into UTF-8 format. Is there any way we can use the general windows DOS commands to convert the file? We can use the PowerShell but only this command line has to be run from a different batch process.

  • Admin
    Admin about 6 years
    You've initialized your StreamsReader and StreamWriter with wrong encoding.