How can I encode and decode percent-encoded strings on the command line?
Solution 1
These commands do what you want (using Python 2):
python -c "import urllib, sys; print urllib.quote(sys.argv[1])" æ
python -c "import urllib, sys; print urllib.unquote(sys.argv[1])" %C3%A6
If you want to encode spaces as +
, replace urllib.quote
with urllib.quote_plus
.
I'm guessing you will want to alias them ;-)
Solution 2
shell
Try the following command line:
$ echo "%C3%A6ndr%C3%BCk" | sed 's@+@ @g;s@%@\\x@g' | xargs -0 printf "%b"
ændrük
You may define it as alias and add it to your shell rc files:
$ alias urldecode='sed "s@+@ @g;s@%@\\\\x@g" | xargs -0 printf "%b"'
Then every time when you need it, simply go with:
$ echo "http%3A%2F%2Fwww" | urldecode
http://www
bash
When scripting, you can use the following syntax:
input="http%3A%2F%2Fwww"
decoded=$(printf '%b' "${input//%/\\x}")
However above syntax won't handle pluses (+
) correctly, so you've to replace them with spaces via sed
.
You can also use the following urlencode()
and urldecode()
functions:
urlencode() {
# urlencode <string>
local length="${#1}"
for (( i = 0; i < length; i++ )); do
local c="${1:i:1}"
case $c in
[a-zA-Z0-9.~_-]) printf "$c" ;;
*) printf '%%%02X' "'$c"
esac
done
}
urldecode() {
# urldecode <string>
local url_encoded="${1//+/ }"
printf '%b' "${url_encoded//%/\\x}"
}
Note that your urldecode() assumes the data contains no backslash.
bash + xxd
Bash function with xxd
tool:
urlencode() {
local length="${#1}"
for (( i = 0; i < length; i++ )); do
local c="${1:i:1}"
case $c in
[a-zA-Z0-9.~_-]) printf "$c" ;;
*) printf "$c" | xxd -p -c1 | while read x;do printf "%%%s" "$x";done
esac
done
}
Found in cdown's gist file, also at stackoverflow.
Python
Try to define the following aliases:
alias urldecode='python -c "import sys, urllib as ul; print ul.unquote_plus(sys.argv[1])"'
alias urlencode='python -c "import sys, urllib as ul; print ul.quote_plus(sys.argv[1])"'
Usage:
$ urlencode "ændrük"
C%26ndrC%3Ck
$ urldecode "%C3%A6ndr%C3%BCk"
ændrük
Source: ruslanspivak
PHP
Using PHP you can try the following command:
$ echo oil+and+gas | php -r 'echo urldecode(fgets(STDIN));' // Or: php://stdin
oil and gas
or just:
php -r 'echo urldecode("oil+and+gas");'
Use -R
for multiple line input.
Perl
In Perl you can use URI::Escape
.
decoded_url=$(perl -MURI::Escape -e 'print uri_unescape($ARGV[0])' "$encoded_url")
Or to process a file:
perl -i -MURI::Escape -e 'print uri_unescape($ARGV[0])' file
sed
Using sed
can be achieved by:
cat file | sed -e's/%\([0-9A-F][0-9A-F]\)/\\\\\x\1/g' | xargs echo -e
awk
Try anon solution:
awk -niord '{printf RT?$0chr("0x"substr(RT,2)):$0}' RS=%..
See: Using awk printf to urldecode text.
decoding file names
If you need to remove url encoding from the file names, use deurlname
tool from renameutils
(e.g. deurlname *.*
).
See also:
- Can wget decode uri file names when downloading in batch?
- How to remove URI encoding from file names?
Related:
- How to decode URL-encoded string in shell? at SO
- Decoding URL encoding (percent encoding) at unix SE
Solution 3
Percent-encode reserved URI characters and non-ASCII characters
jq -s -R -r @uri
-s
(--slurp
) reads input lines into an array and -s -R
(--slurp --raw-input
) reads the input into a single string. -r
(--raw-output
) outputs the contents of strings instead of JSON string literals.
Percent-encode all characters
xxd -p|tr -d \\n|sed 's/../%&/g'
tr -d \\n
removes the linefeeds that are added by xxd -p
after every 60 characters.
Percent-encode all characters except ASCII alphanumeric characters in Bash
eu () {
local LC_ALL=C c
while IFS= read -r -n1 -d '' c
do
if [[ $c = [[:alnum:]] ]]
then
printf %s "$c"
else
printf %%%02x "'$c"
fi
done
}
Without -d ''
this would skip linefeeds and null bytes. Without IFS=
this would replace characters in IFS
with %00
. Without LC_ALL=C
this would for example replace あ
with %3042
in a UTF-8 locale.
Solution 4
Pure bash solution for decoding only:
$ a='%C3%A6ndr%C3%BCk'
$ echo -e "${a//%/\\x}"
ændrük
Solution 5
I can't comment on best answer in this thread, so here is mine.
Personally, I use these aliases for URL encoding and decoding:
alias urlencode='python -c "import urllib, sys; print urllib.quote( sys.argv[1] if len(sys.argv) > 1 else sys.stdin.read()[0:-1])"'
alias urldecode='python -c "import urllib, sys; print urllib.unquote(sys.argv[1] if len(sys.argv) > 1 else sys.stdin.read()[0:-1])"'
Both commands allow you to convert data, passed as a command line argument or read it from standard input, because both one-liners check whether there are command line arguments (even empty ones) and process them or just read standard input otherwise.
update 2015-07-16 (empty 1st arg)
... according to @muru comment.
update 2017-05-28 (slash encoding)
If you also need to encode the slash, just add an empty second argument to the quote function, then the slash will also be encoded.
So, finally urlencode
alias in bash looks like this:
alias urlencode='python -c "import urllib, sys; print urllib.quote(sys.argv[1] if len(sys.argv) > 1 else sys.stdin.read()[0:-1], \"\")"'
Example
$ urlencode "Проба пера/Pen test"
%D0%9F%D1%80%D0%BE%D0%B1%D0%B0%20%D0%BF%D0%B5%D1%80%D0%B0%2FPen%20test
$ echo "Проба пера/Pen test" | urlencode
%D0%9F%D1%80%D0%BE%D0%B1%D0%B0%20%D0%BF%D0%B5%D1%80%D0%B0%2FPen%20test
$ urldecode %D0%9F%D1%80%D0%BE%D0%B1%D0%B0%20%D0%BF%D0%B5%D1%80%D0%B0%2FPen%20test
Проба пера/Pen test
$ echo "%D0%9F%D1%80%D0%BE%D0%B1%D0%B0%20%D0%BF%D0%B5%D1%80%D0%B0%2FPen%20test" | urldecode
Проба пера/Pen test
$ urlencode "Проба пера/Pen test" | urldecode
Проба пера/Pen test
$ echo "Проба пера/Pen test" | urlencode | urldecode
Проба пера/Pen test
Related videos on Youtube
Comments
-
RusGraf over 1 year
How can I encode and decode percent-encoded (URL encoded) strings on the command line?
I'm looking for a solution that can do this:
$ percent-encode "ændrük" %C3%A6ndr%C3%BCk $ percent-decode "%C3%A6ndr%C3%BCk" ændrük
-
samme4life almost 13 yearsDo you want to incorporate different encodings too?
%E6ndr%FCk
doesn't look like (standard) UTF8 to me. Or it's just an example? -
RusGraf almost 13 years@arrange Thanks for catching that. Apparently I chose the bad apple among search results for online converters.
-
kenorb about 9 yearsFor file names, see: How to remove URI encoding in file names.
-
-
muru almost 9 yearsI think
sys.argv[1] if len(sys.argv) > 1 else sys.stdin.read()[0:-1]
might be more appropriate. Especially if you use this in scripts and accidentally give an empty first argument. -
DIG mbl almost 9 yearsAs per @muru comment I changed the checking for an argument on the command line. It was:
len(sys.argv) < 2 and sys.stdin.read()[0:-1] or sys.argv[1]
Now:sys.argv[1] if len(sys.argv) > 1 else sys.stdin.read()[0:-1]
That is, if there is even an empty first argument, the command does not wait for input from the standard input, but processes an empty argument. -
TMG over 6 yearsWhat is that æ character at the end of first line? Edit: answering to myself - got it, it's just a single character UTF8 to-be-encoded string for example purpose :-)
-
RicardoE over 5 yearshow about python3?
-
Pablo Bianchi over 5 years@RicardoE check this answer.
-
12431234123412341234123 over 4 yearsThe bash + xxd version does not work with strings that contain a
%
, maybe you could replaceprintf "$c"
withprintf "%c" "$c"
? A other problem is that some non-ASCII charachters are not encoded (such asä
) in some language settings, maybe add aexport LC_ALL=C
in the function (that should not affect anything outside the function)? -
rimkashox over 2 yearsFor folks in 2022, that's the only version that works lol
-
Admin almost 2 years
urldecode
is printing extra new lines in output. (python3.10)