How can I encode and decode percent-encoded strings on the command line?

53,790

Solution 1

These commands do what you want (using Python 2):

python -c "import urllib, sys; print urllib.quote(sys.argv[1])" æ
python -c "import urllib, sys; print urllib.unquote(sys.argv[1])" %C3%A6

If you want to encode spaces as +, replace urllib.quote with urllib.quote_plus.

I'm guessing you will want to alias them ;-)

Solution 2

shell

Try the following command line:

$ echo "%C3%A6ndr%C3%BCk" | sed 's@+@ @g;s@%@\\x@g' | xargs -0 printf "%b"
ændrük

You may define it as alias and add it to your shell rc files:

$ alias urldecode='sed "s@+@ @g;s@%@\\\\x@g" | xargs -0 printf "%b"'

Then every time when you need it, simply go with:

$ echo "http%3A%2F%2Fwww" | urldecode
http://www

bash

When scripting, you can use the following syntax:

input="http%3A%2F%2Fwww"
decoded=$(printf '%b' "${input//%/\\x}")

However above syntax won't handle pluses (+) correctly, so you've to replace them with spaces via sed.

You can also use the following urlencode() and urldecode() functions:

urlencode() {
    # urlencode <string>
    local length="${#1}"
    for (( i = 0; i < length; i++ )); do
        local c="${1:i:1}"
        case $c in
            [a-zA-Z0-9.~_-]) printf "$c" ;;
            *) printf '%%%02X' "'$c"
        esac
    done
}

urldecode() {
    # urldecode <string>

    local url_encoded="${1//+/ }"
    printf '%b' "${url_encoded//%/\\x}"
}

Note that your urldecode() assumes the data contains no backslash.


bash + xxd

Bash function with xxd tool:

urlencode() {
  local length="${#1}"
  for (( i = 0; i < length; i++ )); do
    local c="${1:i:1}"
    case $c in
      [a-zA-Z0-9.~_-]) printf "$c" ;;
    *) printf "$c" | xxd -p -c1 | while read x;do printf "%%%s" "$x";done
  esac
done
}

Found in cdown's gist file, also at stackoverflow.


Python

Try to define the following aliases:

alias urldecode='python -c "import sys, urllib as ul; print ul.unquote_plus(sys.argv[1])"'
alias urlencode='python -c "import sys, urllib as ul; print ul.quote_plus(sys.argv[1])"'

Usage:

$ urlencode "ændrük"
C%26ndrC%3Ck
$ urldecode "%C3%A6ndr%C3%BCk"
ændrük

Source: ruslanspivak


PHP

Using PHP you can try the following command:

$ echo oil+and+gas | php -r 'echo urldecode(fgets(STDIN));' // Or: php://stdin
oil and gas

or just:

php -r 'echo urldecode("oil+and+gas");'

Use -R for multiple line input.


Perl

In Perl you can use URI::Escape.

decoded_url=$(perl -MURI::Escape -e 'print uri_unescape($ARGV[0])' "$encoded_url")

Or to process a file:

perl -i -MURI::Escape -e 'print uri_unescape($ARGV[0])' file

sed

Using sed can be achieved by:

cat file | sed -e's/%\([0-9A-F][0-9A-F]\)/\\\\\x\1/g' | xargs echo -e

awk

Try anon solution:

awk -niord '{printf RT?$0chr("0x"substr(RT,2)):$0}' RS=%..

See: Using awk printf to urldecode text.


decoding file names

If you need to remove url encoding from the file names, use deurlname tool from renameutils (e.g. deurlname *.*).

See also:


Related:

Solution 3

Percent-encode reserved URI characters and non-ASCII characters

jq -s -R -r @uri

-s (--slurp) reads input lines into an array and -s -R (--slurp --raw-input) reads the input into a single string. -r (--raw-output) outputs the contents of strings instead of JSON string literals.

Percent-encode all characters

xxd -p|tr -d \\n|sed 's/../%&/g'

tr -d \\n removes the linefeeds that are added by xxd -p after every 60 characters.

Percent-encode all characters except ASCII alphanumeric characters in Bash

eu () {
    local LC_ALL=C c
    while IFS= read -r -n1 -d '' c
    do 
        if [[ $c = [[:alnum:]] ]]
        then 
            printf %s "$c"
        else
            printf %%%02x "'$c"
        fi
    done
}

Without -d '' this would skip linefeeds and null bytes. Without IFS= this would replace characters in IFS with %00. Without LC_ALL=C this would for example replace with %3042 in a UTF-8 locale.

Solution 4

Pure bash solution for decoding only:

$ a='%C3%A6ndr%C3%BCk'
$ echo -e "${a//%/\\x}"
ændrük

Solution 5

I can't comment on best answer in this thread, so here is mine.

Personally, I use these aliases for URL encoding and decoding:

alias urlencode='python -c "import urllib, sys; print urllib.quote(  sys.argv[1] if len(sys.argv) > 1 else sys.stdin.read()[0:-1])"'

alias urldecode='python -c "import urllib, sys; print urllib.unquote(sys.argv[1] if len(sys.argv) > 1 else sys.stdin.read()[0:-1])"'

Both commands allow you to convert data, passed as a command line argument or read it from standard input, because both one-liners check whether there are command line arguments (even empty ones) and process them or just read standard input otherwise.

update 2015-07-16 (empty 1st arg)

... according to @muru comment.

update 2017-05-28 (slash encoding)

If you also need to encode the slash, just add an empty second argument to the quote function, then the slash will also be encoded.

So, finally urlencode alias in bash looks like this:

alias urlencode='python -c "import urllib, sys; print urllib.quote(sys.argv[1] if len(sys.argv) > 1 else sys.stdin.read()[0:-1], \"\")"'

Example

$ urlencode "Проба пера/Pen test"
%D0%9F%D1%80%D0%BE%D0%B1%D0%B0%20%D0%BF%D0%B5%D1%80%D0%B0%2FPen%20test

$ echo "Проба пера/Pen test" | urlencode
%D0%9F%D1%80%D0%BE%D0%B1%D0%B0%20%D0%BF%D0%B5%D1%80%D0%B0%2FPen%20test

$ urldecode %D0%9F%D1%80%D0%BE%D0%B1%D0%B0%20%D0%BF%D0%B5%D1%80%D0%B0%2FPen%20test
Проба пера/Pen test

$ echo "%D0%9F%D1%80%D0%BE%D0%B1%D0%B0%20%D0%BF%D0%B5%D1%80%D0%B0%2FPen%20test" | urldecode
Проба пера/Pen test

$ urlencode "Проба пера/Pen test" | urldecode
Проба пера/Pen test

$ echo "Проба пера/Pen test" | urlencode | urldecode
Проба пера/Pen test
Share:
53,790

Related videos on Youtube

RusGraf
Author by

RusGraf

My goal here is usually to document.

Updated on September 18, 2022

Comments

  • RusGraf
    RusGraf over 1 year

    How can I encode and decode percent-encoded (URL encoded) strings on the command line?

    I'm looking for a solution that can do this:

    $ percent-encode "ændrük"
    %C3%A6ndr%C3%BCk
    $ percent-decode "%C3%A6ndr%C3%BCk"
    ændrük
    
    • samme4life
      samme4life almost 13 years
      Do you want to incorporate different encodings too? %E6ndr%FCk doesn't look like (standard) UTF8 to me. Or it's just an example?
    • RusGraf
      RusGraf almost 13 years
      @arrange Thanks for catching that. Apparently I chose the bad apple among search results for online converters.
    • kenorb
      kenorb about 9 years
  • muru
    muru almost 9 years
    I think sys.argv[1] if len(sys.argv) > 1 else sys.stdin.read()[0:-1] might be more appropriate. Especially if you use this in scripts and accidentally give an empty first argument.
  • DIG mbl
    DIG mbl almost 9 years
    As per @muru comment I changed the checking for an argument on the command line. It was: len(sys.argv) < 2 and sys.stdin.read()[0:-1] or sys.argv[1] Now: sys.argv[1] if len(sys.argv) > 1 else sys.stdin.read()[0:-1] That is, if there is even an empty first argument, the command does not wait for input from the standard input, but processes an empty argument.
  • TMG
    TMG over 6 years
    What is that æ character at the end of first line? Edit: answering to myself - got it, it's just a single character UTF8 to-be-encoded string for example purpose :-)
  • RicardoE
    RicardoE over 5 years
    how about python3?
  • Pablo Bianchi
    Pablo Bianchi over 5 years
    @RicardoE check this answer.
  • 12431234123412341234123
    12431234123412341234123 over 4 years
    The bash + xxd version does not work with strings that contain a %, maybe you could replace printf "$c" with printf "%c" "$c"? A other problem is that some non-ASCII charachters are not encoded (such as ä) in some language settings, maybe add a export LC_ALL=C in the function (that should not affect anything outside the function)?
  • rimkashox
    rimkashox over 2 years
    For folks in 2022, that's the only version that works lol
  • Admin
    Admin almost 2 years
    urldecode is printing extra new lines in output. (python3.10)