script to save file as unicode

10,900

Solution 1

You can use iconv. On Windows you can use it under Cygwin.

iconv -f from_encoding -t to_encoding file

Solution 2

This could work for you, but notice that it'll grab every file in the current folder:


Get-ChildItem | Foreach-Object { $c = (Get-Content $_); `
Set-Content -Encoding UTF8 $c -Path ($_.name + "u") }

Same thing using aliases for brevity:


gci | %{ $c = (gc $_); sc -Encoding UTF8 $c -Path ($_.name + "u") }

Steven Murawski suggests using Out-File instead. The differences between both cmdlets are the following:

  • Out-File will attempt to format the input it receives.
  • Out-File's default encoding is Unicode-based, whereas Set-Content uses the system's default.

Here's an example assuming the file test.txt doesn't exist in either case:


PS> [system.string] | Out-File test.txt
PS> Get-Content test.txt

IsPublic IsSerial Name                                     BaseType          
-------- -------- ----                                     --------          
True     True     String                                   System.Object     

# test.txt encoding is Unicode-based with BOM


PS> [system.string] | Set-Content test.txt
PS> Get-Content test.txt

System.String

# test.txt encoding is "ANSI" (Windows character set)

In fact, if you don't need any specific Unicode encoding, you could as well do the following to convert a text file to Unicode:


PS> Get-Content sourceASCII.txt > targetUnicode.txt

Out-File is a "redirection operator with optional parameters" of sorts.

Solution 3

The easiest way would be Get-Content 'path/to/text/file' | out-file 'name/of/file'.

Out-File has an -encoding parameter, the default of which is Unicode.

If you wanted to script a batch of them, you could do something like

$files = get-childitem 'directory/of/text/files' 
foreach ($file in $files) 
{
  get-content $file | out-file $file.fullname
}
Share:
10,900
river0
Author by

river0

Updated on June 19, 2022

Comments

  • river0
    river0 about 2 years

    Do you know any way that I could programmatically or via scrirpt transform a set of text files saved in ansi character encoding, to unicode encoding?

    I would like to do the same as I do when I open the file with notepad and choose to save it as an unicode file.

  • guillermooo
    guillermooo over 15 years
    Why's the accepted answer related to Cygwin? The question is tagged as powershell...
  • river0
    river0 over 15 years
    Yes, at the begining I was looking for a powershell solution, but turns out that this worked really good for me and I could also use cygwin. Anyway all the reponses given seem to be valid approaches
  • jyao
    jyao over 3 years
    Using out-file will somehow cause an empty file. I am using PS V5.1