Short way to escape HTML in Bash?
22,644
Solution 1
Escaping HTML really just involves replacing three characters: <
, >
, and &
. For extra points, you can also replace "
and '
. So, it's not a long sed
script:
sed 's/&/\&/g; s/</\</g; s/>/\>/g; s/"/\"/g; s/'"'"'/\'/g'
Solution 2
You can use recode
utility:
echo 'He said: "Not sure that - 2<1"' | recode ascii..html
Output:
He said: "Not sure that - 2<1"
Solution 3
Pure bash, no external programs:
function htmlEscape () {
local s
s=${1//&/&}
s=${s//</<}
s=${s//>/>}
s=${s//'"'/"}
printf -- %s "$s"
}
Just simple string substitution.
Solution 4
or use xmlstar Escape/Unescape special XML characters:
$ echo '<abc&def>'| xml esc
<abc&def>
Related videos on Youtube

Author by
James Evans
Updated on July 09, 2022Comments
-
James Evans 6 months
The box has no Ruby/Python/Perl etc.
Only
bash
,sed
, andawk
.A way is to replace chars by map, but it becomes tedious.
Perhaps some built-in functionality i'm not aware of?
-
Ruud Helderman about 7 yearsBig mistake. When I HTML-encode a string
&
, it is because I want it to be rendered by some web browser as&
. That is why it must be turned into&
. That way, HTML-encoding and HTML-decoding are in balance. You don't suppress HTML-encoding just because the input looks like it has already been HTML-encoded. HTML-encoding is not idempotent. Failure to grasp that, eventually leads to XSS vulnerabilities. -
Brian McCutchon almost 7 years@Ruud is right; the right way to accomplish this is to escape ampersands first, like in ruakh's answer.
-
tbodt about 6 yearsProbably not available if there's no Python/Ruby/Perl.
-
kmkaplan almost 6 yearsI totally agree with what @Ruud said except that he should have emphasized failure to grasp that leads to XSS vulnerabilities
-
WinEunuuchs2Unix over 5 years+1 for elegance and efficiency. You should post your answer here: stackoverflow.com/questions/5929492/… where they recommend installing
recode
,perl
,php
,xmlsarlet
andw3m
(a web browser for crying out loud). The last answer recommends using Python3 which although installed by default (in Ubuntu at least) is overkill too. -
ruakh over 5 years@WinEunuuchs2Unix: Thanks for your kind words! That question is asking about the opposite direction (
<
to<
), and the answers there are trying to cover the possibility of random other entity references likeé
and numeric character references likeÉ
, rather than minimally-escaped HTML. For many purposes that might be overengineering, but on Stack Overflow it can be hard to tell exactly what someone's purpose is, so I don't blame the answerers there for wanting to provide something universal. -
WinEunuuchs2Unix over 5 years@ruakh You're welcome :) Can't your sed search and replace simply be reversed to accomplish the same result as those answers?
-
ruakh over 5 years@WinEunuuchs2Unix: There are many ways to HTML-escape a given piece of text; for example,
<
,<
, and<
are all valid ways to escape<
. Mysed
script only does one kind of HTML-escaping, since you only need one; but if you want to do HTML-unescaping, then either you need to handle all valid ways of escaping, or you need to know beforehand exactly what way of escaping was used. Do you see what I mean? -
WinEunuuchs2Unix over 5 yearsYes. My HTML-unescaping is limited to stack exchange site Ask Ubuntu and so far I've only noticed
&Amp;
,$lt;
and"
. The goal is to compare all the scripts on my drive I've published in Ask Ubuntu to see if they have been changed locally or revised by someone else in Ask Ubuntu. For fun I'm also extracting upvotes from the HTML file and putting it in the local file. This is the work in progress from a few nights ago: askubuntu.com/questions/894888/… -
geotheory about 4 yearsReally useful. Same as a function:
escape_html() { sed $1 's/&/\&/g; s/</\</g; s/>/\>/g; s/"/\"/g; s/'"'"'/\'/g'; }
-
Ohiovr almost 3 yearsI want to try this but I don't know how to install xml esc. I don't even know what it is. Could you elaborate?
-
schemacs almost 3 yearsJust
brew install xmlstarlet
if you are using MacOS. -
vhs over 2 yearsTested on 30 or so textfiles containing ASCII and it even handles the null character
\0
. Use to sandbox textfile contents forsrcdoc
attribute of a sandboxediframe
in HTML and allow background styling via parent frame to cascade.