Associative arrays in Shell scripts

223,089

Solution 1

To add to Irfan's answer, here is a shorter and faster version of get() since it requires no iteration over the map contents:

get() {
    mapName=$1; key=$2

    map=${!mapName}
    value="$(echo $map |sed -e "s/.*--${key}=\([^ ]*\).*/\1/" -e 's/:SP:/ /g' )"
}

Solution 2

Another option, if portability is not your main concern, is to use associative arrays that are built in to the shell. This should work in bash 4.0 (available now on most major distros, though not on OS X unless you install it yourself), ksh, and zsh:

declare -A newmap
newmap[name]="Irfan Zulfiqar"
newmap[designation]=SSE
newmap[company]="My Own Company"

echo ${newmap[company]}
echo ${newmap[name]}

Depending on the shell, you may need to do a typeset -A newmap instead of declare -A newmap, or in some it may not be necessary at all.

Solution 3

Another non-bash 4 way.

#!/bin/bash

# A pretend Python dictionary with bash 3 
ARRAY=( "cow:moo"
        "dinosaur:roar"
        "bird:chirp"
        "bash:rock" )

for animal in "${ARRAY[@]}" ; do
    KEY=${animal%%:*}
    VALUE=${animal#*:}
    printf "%s likes to %s.\n" "$KEY" "$VALUE"
done

echo -e "${ARRAY[1]%%:*} is an extinct animal which likes to ${ARRAY[1]#*:}\n"

You could throw an if statement for searching in there as well. if [[ $var =~ /blah/ ]]. or whatever.

Solution 4

I think that you need to step back and think about what a map, or associative array, really is. All it is is a way to store a value for a given key, and get that value back quickly and efficiently. You may also want to be able to iterate over the keys to retrieve every key value pair, or delete keys and their associated values.

Now, think about a data structure you use all the time in shell scripting, and even just in the shell without writing a script, that has these properties. Stumped? It's the filesystem.

Really, all you need to have an associative array in shell programming is a temp directory. mktemp -d is your associative array constructor:

prefix=$(basename -- "$0")
map=$(mktemp -dt ${prefix})
echo >${map}/key somevalue
value=$(cat ${map}/key)

If you don't feel like using echo and cat, you can always write some little wrappers; these ones are modelled off of Irfan's, though they just output the value rather than setting arbitrary variables like $value:

#!/bin/sh

prefix=$(basename -- "$0")
mapdir=$(mktemp -dt ${prefix})
trap 'rm -r ${mapdir}' EXIT

put() {
  [ "$#" != 3 ] && exit 1
  mapname=$1; key=$2; value=$3
  [ -d "${mapdir}/${mapname}" ] || mkdir "${mapdir}/${mapname}"
  echo $value >"${mapdir}/${mapname}/${key}"
}

get() {
  [ "$#" != 2 ] && exit 1
  mapname=$1; key=$2
  cat "${mapdir}/${mapname}/${key}"
}

put "newMap" "name" "Irfan Zulfiqar"
put "newMap" "designation" "SSE"
put "newMap" "company" "My Own Company"

value=$(get "newMap" "company")
echo $value

value=$(get "newMap" "name")
echo $value

edit: This approach is actually quite a bit faster than the linear search using sed suggested by the questioner, as well as more robust (it allows keys and values to contain -, =, space, qnd ":SP:"). The fact that it uses the filesystem does not make it slow; these files are actually never guaranteed to be written to the disk unless you call sync; for temporary files like this with a short lifetime, it's not unlikely that many of them will never be written to disk.

I did a few benchmarks of Irfan's code, Jerry's modification of Irfan's code, and my code, using the following driver program:

#!/bin/sh

mapimpl=$1
numkeys=$2
numvals=$3

. ./${mapimpl}.sh    #/ <- fix broken stack overflow syntax highlighting

for (( i = 0 ; $i < $numkeys ; i += 1 ))
do
    for (( j = 0 ; $j < $numvals ; j += 1 ))
    do
        put "newMap" "key$i" "value$j"
        get "newMap" "key$i"
    done
done

The results:

    $ time ./driver.sh irfan 10 5

    real    0m0.975s
    user    0m0.280s
    sys     0m0.691s

    $ time ./driver.sh brian 10 5

    real    0m0.226s
    user    0m0.057s
    sys     0m0.123s

    $ time ./driver.sh jerry 10 5

    real    0m0.706s
    user    0m0.228s
    sys     0m0.530s

    $ time ./driver.sh irfan 100 5

    real    0m10.633s
    user    0m4.366s
    sys     0m7.127s

    $ time ./driver.sh brian 100 5

    real    0m1.682s
    user    0m0.546s
    sys     0m1.082s

    $ time ./driver.sh jerry 100 5

    real    0m9.315s
    user    0m4.565s
    sys     0m5.446s

    $ time ./driver.sh irfan 10 500

    real    1m46.197s
    user    0m44.869s
    sys     1m12.282s

    $ time ./driver.sh brian 10 500

    real    0m16.003s
    user    0m5.135s
    sys     0m10.396s

    $ time ./driver.sh jerry 10 500

    real    1m24.414s
    user    0m39.696s
    sys     0m54.834s

    $ time ./driver.sh irfan 1000 5

    real    4m25.145s
    user    3m17.286s
    sys     1m21.490s

    $ time ./driver.sh brian 1000 5

    real    0m19.442s
    user    0m5.287s
    sys     0m10.751s

    $ time ./driver.sh jerry 1000 5

    real    5m29.136s
    user    4m48.926s
    sys     0m59.336s

Solution 5

Yet another non-bash-4 (i.e., bash 3, Mac-compatible) way:

val_of_key() {
    case $1 in
        'A1') echo 'aaa';;
        'B2') echo 'bbb';;
        'C3') echo 'ccc';;
        *) echo 'zzz';;
    esac
}

for x in 'A1' 'B2' 'C3' 'D4'; do
    y=$(val_of_key "$x")
    echo "$x => $y"
done

Prints:

A1 => aaa
B2 => bbb
C3 => ccc
D4 => zzz

The function with the case acts like an associative array. Unfortunately it cannot use return, so it has to echo its output, but this is not a problem, unless you are a purist that shuns forking subshells.

Share:
223,089
Irfan Zulfiqar
Author by

Irfan Zulfiqar

One more software engineer

Updated on December 07, 2020

Comments

  • Irfan Zulfiqar
    Irfan Zulfiqar over 3 years

    We required a script that simulates Associative arrays or Map like data structure for Shell Scripting, any body?

  • Irfan Zulfiqar
    Irfan Zulfiqar over 15 years
    I dont think you should be using file system for maps, that basically using IO for something that you can done fairly fast in memory.
  • Irfan Zulfiqar
    Irfan Zulfiqar over 15 years
    Thanks for you posting answer, I think that would the best way to do it for guys who would be using bash 4.0 or above.
  • Brian Campbell
    Brian Campbell over 15 years
    The files won't necessarily ever be written to the disk; unless you call sync, the operating system may just leave them in memory. Your code is calling out to sed and doing several linear searches, which are all very slow. I did some quick benchmarks, and my version is 5-35 times faster.
  • Tim Post
    Tim Post about 15 years
    I'd add a little kludge to make sure BASH_VERSION is set, and >= 4. And yes, BASH 4 is really, really cool!
  • lhunath
    lhunath almost 14 years
    forking a subshell and sed is hardly optimal. Bash4 supports this natively and bash3 has better alternatives.
  • lhunath
    lhunath almost 14 years
    on the other hand, bash4's native arrays are significantly better an approach and in bash3 you can still keep everything off the disk without forking by use of declare and indirection.
  • lhunath
    lhunath almost 14 years
    You're eval'ing data as though it's bash code, and what's more: you fail to quote it properly. Both cause masses of bugs and arbitrary code injection.
  • jmtd
    jmtd over 13 years
    "fast" and "shell" don't really go together anyway: certainly not for the sort of speed issues we're talking about at the "avoid miniscule IO" level. You could search for and use /dev/shm to guarantee no IO.
  • Paul
    Paul over 11 years
    This method is good when you don't have Bash 4 indeed. But I think the line that fetches the VALUE would be safer this way: VALUE=${animal#*:}. With only one # character, the matching will stop on the first ":". That allows values to contain ":", too.
  • Bubnoff
    Bubnoff over 11 years
    @Ced-le-pingouin ~ That's a great point! I didn't catch that. I've edited my post to reflect your suggested improvements.
  • Mark K Cowan
    Mark K Cowan almost 11 years
    Might be worth sticking it up on github
  • Bubnoff
    Bubnoff over 10 years
    It a pretty hackish emulation of associative arrays using BASH parameter substitution. The "key" param-sub substitutes everything before the colon and the value pattern substitutes everything after the colon. Similar to a regex wildcard match. So NOT a true associative array. Not recommended unless you need an easy to understand way to do hash/associative array-like functionality in BASH 3 or below. It works though! More here: tldp.org/LDP/abs/html/parameter-substitution.html#PSOREX2
  • Jer
    Jer over 10 years
    I'm using something like this. What's the best way to "catch" the error where the array index/subscript doesn't exist? For example, what if I was taking the subscript as a command line option, and the user made a typo and entered "designatio"? I get a "bad array subscript" error but don't how to validate the input at the time of array lookup, if that's possible?
  • Brian Campbell
    Brian Campbell over 10 years
    @Jer It's pretty obscure, but to determine if a variable is set in the shell, you can use test -z ${variable+x} (the x doesn't matter, that could be any string). For an associative array in Bash, you can do similar; use test -z ${map[key]+x}.
  • Gordon
    Gordon over 8 years
    This solution amazed me, and is just awesome. Still holds true in 2016. It really should be the accepted answer.
  • Bren
    Bren about 7 years
    I get "basename: illegal option -- b" for the first line with the prefix. how can I fix that?
  • Brian Campbell
    Brian Campbell about 7 years
    @Bren That happens if you run this directly in the shell, instead of from within a script, because in that case $0 will be equal to -bash, and it looks like basename is interpreting that as flags rather than as the argument to generate the basename from. One fix is to just hardcode prefix. That's not used for anything very important, it just adds a string to the name of the temp dirs so if you see them in /tmp, you can tell where they cam from. Your implementation of basename may support using -- to separate flags from arguments: basename -- "$0".
  • Eric Postpischil
    Eric Postpischil over 5 years
    This does not implement an associative array because it does not provide a way to look up an item by the key. It only provides a way to find each key (and value) from a numeric index. (An item could be found by key by iterating through the array, but that is not what is desired for an associative array.)
  • Bubnoff
    Bubnoff over 5 years
    @EricPostpischil True. It is only a hack. It allows a person to use familiar syntax in the setup but still requires iterating through the array as you say. I've tried to be clear in my previous comment that it is definitely not an associative array and I don't even recommend it if you have alternatives. The only point in its favor, in my view, is that it is easy to write and use for those familiar with other languages like Python. If you are at a point where you actually want to implement associative arrays in BASH 3 then you may need to retrace your steps a bit.
  • Michael P
    Michael P about 4 years
    I was looking for a fast approach for this problem and this is evil genius ! Thank you very much "Brian from 2009" :)
  • Stuart R. Jefferys
    Stuart R. Jefferys over 3 years
    These links are dead.
  • Bowen Peng
    Bowen Peng almost 3 years
    BUT if I want to assign to hash_index "foo" || echo ${hash_vals[$(($? - 1))]} to a variable, this code output=foo" || echo ${hash_vals[$(($? - 1))]}. So if not mind, could you tell me the right way to assign it.
  • harshainfo
    harshainfo about 2 years
    I had to use declare -a newmap in iTerm2 3.4