Associative arrays in Shell scripts
Solution 1
To add to Irfan's answer, here is a shorter and faster version of get()
since it requires no iteration over the map contents:
get() {
mapName=$1; key=$2
map=${!mapName}
value="$(echo $map |sed -e "s/.*--${key}=\([^ ]*\).*/\1/" -e 's/:SP:/ /g' )"
}
Solution 2
Another option, if portability is not your main concern, is to use associative arrays that are built in to the shell. This should work in bash 4.0 (available now on most major distros, though not on OS X unless you install it yourself), ksh, and zsh:
declare -A newmap
newmap[name]="Irfan Zulfiqar"
newmap[designation]=SSE
newmap[company]="My Own Company"
echo ${newmap[company]}
echo ${newmap[name]}
Depending on the shell, you may need to do a typeset -A newmap
instead of declare -A newmap
, or in some it may not be necessary at all.
Solution 3
Another non-bash 4 way.
#!/bin/bash
# A pretend Python dictionary with bash 3
ARRAY=( "cow:moo"
"dinosaur:roar"
"bird:chirp"
"bash:rock" )
for animal in "${ARRAY[@]}" ; do
KEY=${animal%%:*}
VALUE=${animal#*:}
printf "%s likes to %s.\n" "$KEY" "$VALUE"
done
echo -e "${ARRAY[1]%%:*} is an extinct animal which likes to ${ARRAY[1]#*:}\n"
You could throw an if statement for searching in there as well. if [[ $var =~ /blah/ ]]. or whatever.
Solution 4
I think that you need to step back and think about what a map, or associative array, really is. All it is is a way to store a value for a given key, and get that value back quickly and efficiently. You may also want to be able to iterate over the keys to retrieve every key value pair, or delete keys and their associated values.
Now, think about a data structure you use all the time in shell scripting, and even just in the shell without writing a script, that has these properties. Stumped? It's the filesystem.
Really, all you need to have an associative array in shell programming is a temp directory. mktemp -d
is your associative array constructor:
prefix=$(basename -- "$0")
map=$(mktemp -dt ${prefix})
echo >${map}/key somevalue
value=$(cat ${map}/key)
If you don't feel like using echo
and cat
, you can always write some little wrappers; these ones are modelled off of Irfan's, though they just output the value rather than setting arbitrary variables like $value
:
#!/bin/sh
prefix=$(basename -- "$0")
mapdir=$(mktemp -dt ${prefix})
trap 'rm -r ${mapdir}' EXIT
put() {
[ "$#" != 3 ] && exit 1
mapname=$1; key=$2; value=$3
[ -d "${mapdir}/${mapname}" ] || mkdir "${mapdir}/${mapname}"
echo $value >"${mapdir}/${mapname}/${key}"
}
get() {
[ "$#" != 2 ] && exit 1
mapname=$1; key=$2
cat "${mapdir}/${mapname}/${key}"
}
put "newMap" "name" "Irfan Zulfiqar"
put "newMap" "designation" "SSE"
put "newMap" "company" "My Own Company"
value=$(get "newMap" "company")
echo $value
value=$(get "newMap" "name")
echo $value
edit: This approach is actually quite a bit faster than the linear search using sed suggested by the questioner, as well as more robust (it allows keys and values to contain -, =, space, qnd ":SP:"). The fact that it uses the filesystem does not make it slow; these files are actually never guaranteed to be written to the disk unless you call sync
; for temporary files like this with a short lifetime, it's not unlikely that many of them will never be written to disk.
I did a few benchmarks of Irfan's code, Jerry's modification of Irfan's code, and my code, using the following driver program:
#!/bin/sh
mapimpl=$1
numkeys=$2
numvals=$3
. ./${mapimpl}.sh #/ <- fix broken stack overflow syntax highlighting
for (( i = 0 ; $i < $numkeys ; i += 1 ))
do
for (( j = 0 ; $j < $numvals ; j += 1 ))
do
put "newMap" "key$i" "value$j"
get "newMap" "key$i"
done
done
The results:
$ time ./driver.sh irfan 10 5 real 0m0.975s user 0m0.280s sys 0m0.691s $ time ./driver.sh brian 10 5 real 0m0.226s user 0m0.057s sys 0m0.123s $ time ./driver.sh jerry 10 5 real 0m0.706s user 0m0.228s sys 0m0.530s $ time ./driver.sh irfan 100 5 real 0m10.633s user 0m4.366s sys 0m7.127s $ time ./driver.sh brian 100 5 real 0m1.682s user 0m0.546s sys 0m1.082s $ time ./driver.sh jerry 100 5 real 0m9.315s user 0m4.565s sys 0m5.446s $ time ./driver.sh irfan 10 500 real 1m46.197s user 0m44.869s sys 1m12.282s $ time ./driver.sh brian 10 500 real 0m16.003s user 0m5.135s sys 0m10.396s $ time ./driver.sh jerry 10 500 real 1m24.414s user 0m39.696s sys 0m54.834s $ time ./driver.sh irfan 1000 5 real 4m25.145s user 3m17.286s sys 1m21.490s $ time ./driver.sh brian 1000 5 real 0m19.442s user 0m5.287s sys 0m10.751s $ time ./driver.sh jerry 1000 5 real 5m29.136s user 4m48.926s sys 0m59.336s
Solution 5
Yet another non-bash-4 (i.e., bash 3, Mac-compatible) way:
val_of_key() {
case $1 in
'A1') echo 'aaa';;
'B2') echo 'bbb';;
'C3') echo 'ccc';;
*) echo 'zzz';;
esac
}
for x in 'A1' 'B2' 'C3' 'D4'; do
y=$(val_of_key "$x")
echo "$x => $y"
done
Prints:
A1 => aaa
B2 => bbb
C3 => ccc
D4 => zzz
The function with the case
acts like an associative array. Unfortunately it cannot use return
, so it has to echo
its output, but this is not a problem, unless you are a purist that shuns forking subshells.
Comments
-
Irfan Zulfiqar over 3 years
We required a script that simulates Associative arrays or Map like data structure for Shell Scripting, any body?
-
Gabriel Staples over 2 yearsSee also: How to define hash tables in Bash?
-
-
Irfan Zulfiqar over 15 yearsI dont think you should be using file system for maps, that basically using IO for something that you can done fairly fast in memory.
-
Irfan Zulfiqar over 15 yearsThanks for you posting answer, I think that would the best way to do it for guys who would be using bash 4.0 or above.
-
Brian Campbell over 15 yearsThe files won't necessarily ever be written to the disk; unless you call sync, the operating system may just leave them in memory. Your code is calling out to sed and doing several linear searches, which are all very slow. I did some quick benchmarks, and my version is 5-35 times faster.
-
Tim Post about 15 yearsI'd add a little kludge to make sure BASH_VERSION is set, and >= 4. And yes, BASH 4 is really, really cool!
-
lhunath almost 14 yearsforking a subshell and sed is hardly optimal. Bash4 supports this natively and bash3 has better alternatives.
-
lhunath almost 14 yearson the other hand, bash4's native arrays are significantly better an approach and in bash3 you can still keep everything off the disk without forking by use of declare and indirection.
-
lhunath almost 14 yearsYou're
eval
'ing data as though it's bash code, and what's more: you fail to quote it properly. Both cause masses of bugs and arbitrary code injection. -
jmtd over 13 years"fast" and "shell" don't really go together anyway: certainly not for the sort of speed issues we're talking about at the "avoid miniscule IO" level. You could search for and use /dev/shm to guarantee no IO.
-
Paul over 11 yearsThis method is good when you don't have Bash 4 indeed. But I think the line that fetches the VALUE would be safer this way: VALUE=${animal#*:}. With only one # character, the matching will stop on the first ":". That allows values to contain ":", too.
-
Bubnoff over 11 years@Ced-le-pingouin ~ That's a great point! I didn't catch that. I've edited my post to reflect your suggested improvements.
-
Mark K Cowan almost 11 yearsMight be worth sticking it up on github
-
Bubnoff over 10 yearsIt a pretty hackish emulation of associative arrays using BASH parameter substitution. The "key" param-sub substitutes everything before the colon and the value pattern substitutes everything after the colon. Similar to a regex wildcard match. So NOT a true associative array. Not recommended unless you need an easy to understand way to do hash/associative array-like functionality in BASH 3 or below. It works though! More here: tldp.org/LDP/abs/html/parameter-substitution.html#PSOREX2
-
Jer over 10 yearsI'm using something like this. What's the best way to "catch" the error where the array index/subscript doesn't exist? For example, what if I was taking the subscript as a command line option, and the user made a typo and entered "designatio"? I get a "bad array subscript" error but don't how to validate the input at the time of array lookup, if that's possible?
-
Brian Campbell over 10 years@Jer It's pretty obscure, but to determine if a variable is set in the shell, you can use
test -z ${variable+x}
(thex
doesn't matter, that could be any string). For an associative array in Bash, you can do similar; usetest -z ${map[key]+x}
. -
Gordon over 8 yearsThis solution amazed me, and is just awesome. Still holds true in 2016. It really should be the accepted answer.
-
Bren about 7 yearsI get "basename: illegal option -- b" for the first line with the prefix. how can I fix that?
-
Brian Campbell about 7 years@Bren That happens if you run this directly in the shell, instead of from within a script, because in that case
$0
will be equal to-bash
, and it looks likebasename
is interpreting that as flags rather than as the argument to generate the basename from. One fix is to just hardcodeprefix
. That's not used for anything very important, it just adds a string to the name of the temp dirs so if you see them in/tmp
, you can tell where they cam from. Your implementation ofbasename
may support using--
to separate flags from arguments:basename -- "$0"
. -
Eric Postpischil over 5 yearsThis does not implement an associative array because it does not provide a way to look up an item by the key. It only provides a way to find each key (and value) from a numeric index. (An item could be found by key by iterating through the array, but that is not what is desired for an associative array.)
-
Bubnoff over 5 years@EricPostpischil True. It is only a hack. It allows a person to use familiar syntax in the setup but still requires iterating through the array as you say. I've tried to be clear in my previous comment that it is definitely not an associative array and I don't even recommend it if you have alternatives. The only point in its favor, in my view, is that it is easy to write and use for those familiar with other languages like Python. If you are at a point where you actually want to implement associative arrays in BASH 3 then you may need to retrace your steps a bit.
-
Michael P about 4 yearsI was looking for a fast approach for this problem and this is evil genius ! Thank you very much "Brian from 2009" :)
-
Stuart R. Jefferys over 3 yearsThese links are dead.
-
Bowen Peng almost 3 yearsBUT if I want to assign to
hash_index "foo" || echo ${hash_vals[$(($? - 1))]}
to a variable, this codeoutput=foo" || echo ${hash_vals[$(($? - 1))]}
. So if not mind, could you tell me the right way to assign it. -
harshainfo about 2 yearsI had to use
declare -a newmap
in iTerm2 3.4