Tool to convert accented characters to HTML entities?
Recode can convert to HTML entities:
$ echo "é" | recode ..html
é
There are a few slightly different HTML transformations available in recode; see info recode HTML
.
If you want to recode a file or some files, you can use
$ recode ..html one_file another_file and so on
For recursive action, use the find
command, e.g.
$ find your_directory -type f -name "*.html"
The above find command will only show the files. Please make sure that you have found only the right files, not any binaries and not any files in unwanted directories. It is also a good idea to make a backup or use a copy of your files, not the real files. If you have found the correct find command, append -exec your_command {} +
, where your_command is the recode ..html
from above and the {}
denotes the file(s) which are given by find to recode:
$ find your_directory -type f -name "*.html" -exec recode ..html {} +
But wait a moment, there's one big caveat: recode ..html
assumes that your input files are in the same character set (encoding) that you are using on the command line. If all of your files use the "modern" UTF-8, it will work fine, because Ubuntu used UTF-8 from the standard. But if some of your files use the older ISO-8859-1 or other charsets, it will be a lot more complicated.
Related videos on Youtube
bafromca
Updated on September 18, 2022Comments
-
bafromca over 1 year
Is there a tool (command-line is fine) that can convert accented characters to HTML entities in Ubuntu? Preferably recursively and without also converting html/php tags.
e.g. from: é to: é or: é
-
bafromca about 13 yearsI'm aware of those tools but I need to convert hundreds of files (so gedit is out) and I need to convert all accented characters (and there are a lot of those).
-
Denwerko about 13 yearsif you need to convert hundreds of files, you use that sed with find, maybe like this find /folder_where_you_have_files -mindepth 0 -name *.html -exec sed s/"é"/"\é"/g < {} > {}.new \; sed can read instructions from file, so you can replace all char at once. Im not sure that i typed command exactly right, will try on some examples and post if something changes
-
bafromca about 13 yearsYa I ran a rename command to get rid of all the spaces in the files with
rename 's/\ /_/g' *
and thenfor i in *.php; do iconv --from-code=ISO-8859-15 --to-code=UTF-8 $i > $i.iconv; mv $i.iconv $i; done
to convert to UTF-8. Problem with that program is that it does every character imaginable, including html and php tags. -
elmicha about 13 yearsYou didn't need to rename the files. You can use double quotes around your variable values, i.e.
"$i"
. These double quotes make sure that your variable values are not split. -
That Brazilian Guy over 2 yearsThe link for the solution is broken.