Batch convert latin-1 files to utf-8 using iconv
Solution 1
You shouldn't use ls
like that and a for
loop is not appropriate either. Also, the destination directory should be outside the source directory.
mkdir /path/to/destination
find . -type f -exec iconv -f iso-8859-1 -t utf-8 "{}" -o /path/to/destination/"{}" \;
No need for a loop. The -type f
option includes files and excludes directories.
Edit:
The OS X version of iconv
doesn't have the -o
option. Try this:
find . -type f -exec bash -c 'iconv -f iso-8859-1 -t utf-8 "{}" > /path/to/destination/"{}"' \;
Solution 2
Some good answers, but I found this a lot easier in my case with a nested directory of hundreds of files to convert:
WARNING: This will write the files in place, so make a backup
$ vim $(find . -type f)
# in vim, go into command mode (:)
:set nomore
:bufdo set fileencoding=utf8 | w
Solution 3
This converts all files with the .php
filename extension - in the current directory and its subdirectories - preserving the directory structure:
find . -name "*.php" -exec sh -c "iconv -f ISO-8859-1 -t UTF-8 {} > {}.utf8" \; -exec mv "{}".utf8 "{}" \;
Notes:
To get a list of files that will be targeted beforehand, just run the command without the -exec
flags (like this: find . -name "*.php"
). Making a backup is a good idea.
Using sh
like this allows piping and redirecting with -exec, which is necessary because not all versions of iconv support the -o
flag.
Adding .utf8
to the filename of the output and then removing it might seem strange but it is necessary. Using the same name for output and input files can cause the following problems:
For large files (around 30 KB in my experience) it causes core dump (or
termination by signal 7
)Some versions of iconv seem to create the output-file before they read the input file, which means that if the input and output files have the same name, the input file is overwritten with an empty file before it is read.
Solution 4
To convert a complete directory tree recursively from iso-8859-1 to utf-8 including the creation of subdirectories none of the short solutions above worked for me because the directory structure was not created in the target. Based on Dennis Williamsons answer I came up with the following solution:
find . -type f -exec bash -c 't="/tmp/dest"; mkdir -p "$t/`dirname {}`"; iconv -f iso-8859-1 -t utf-8 "{}" > "$t/{}"' \;
It will create a clone of the current directory subtree in /tmp/dest
(adjust to your needs) including all subdirectories and with all iso-8859-1
files converted to utf-8
. Tested on macosx.
Btw: Check your file encodings with:
file -I file.php
to get the encoding information.
Hope this helps.
Solution 5
I create the following script that (i) backups all tex files in directory "converted", (ii) checks the encoding of every tex file, and (iii) converts to UTF-8 only the tex files in the ISO-8859-1 encoding.
FILES=*.tex
for f in $FILES
do
filename="${f%.*}"
echo -n "$f"
#file -I $f
if file -I $f | grep -wq "iso-8859-1"
then
mkdir -p converted
cp $f ./converted
iconv -f ISO-8859-1 -t UTF-8 $f > "${filename}_utf8.tex"
mv "${filename}_utf8.tex" $f
echo ": CONVERTED TO UTF-8."
else
echo ": UTF-8 ALREADY."
fi
done
Comments
-
Jasmo almost 4 years
I'm having this one PHP project on my OSX which is in latin1 -encoding. Now I need to convert files to UTF8. I'm not much a shell coder and I tried something I found from internet:
mkdir new for a in `ls -R *`; do iconv -f iso-8859-1 -t utf-8 <"$a" >new/"$a" ; done
But that does not create the directory structure and it gives me heck load of errors when run. Can anyone come up with neat solution?