How to define hash tables in Bash?
Solution 1
Bash 4
Bash 4 natively supports this feature. Make sure your script's hashbang is #!/usr/bin/env bash
or #!/bin/bash
so you don't end up using sh
. Make sure you're either executing your script directly, or execute script
with bash script
. (Not actually executing a Bash script with Bash does happen, and will be really confusing!)
You declare an associative array by doing:
declare -A animals
You can fill it up with elements using the normal array assignment operator. For example, if you want to have a map of animal[sound(key)] = animal(value)
:
animals=( ["moo"]="cow" ["woof"]="dog")
Or declare and instantiate in one line:
declare -A animals=( ["moo"]="cow" ["woof"]="dog")
Then use them just like normal arrays. Use
animals['key']='value'
to set value"${animals[@]}"
to expand the values"${!animals[@]}"
(notice the!
) to expand the keys
Don't forget to quote them:
echo "${animals[moo]}"
for sound in "${!animals[@]}"; do echo "$sound - ${animals[$sound]}"; done
Bash 3
Before bash 4, you don't have associative arrays. Do not use eval
to emulate them. Avoid eval
like the plague, because it is the plague of shell scripting. The most important reason is that eval
treats your data as executable code (there are many other reasons too).
First and foremost: Consider upgrading to bash 4. This will make the whole process much easier for you.
If there's a reason you can't upgrade, declare
is a far safer option. It does not evaluate data as bash code like eval
does, and as such does not allow arbitrary code injection quite so easily.
Let's prepare the answer by introducing the concepts:
First, indirection.
$ animals_moo=cow; sound=moo; i="animals_$sound"; echo "${!i}"
cow
Secondly, declare
:
$ sound=moo; animal=cow; declare "animals_$sound=$animal"; echo "$animals_moo"
cow
Bring them together:
# Set a value:
declare "array_$index=$value"
# Get a value:
arrayGet() {
local array=$1 index=$2
local i="${array}_$index"
printf '%s' "${!i}"
}
Let's use it:
$ sound=moo
$ animal=cow
$ declare "animals_$sound=$animal"
$ arrayGet animals "$sound"
cow
Note: declare
cannot be put in a function. Any use of declare
inside a bash function turns the variable it creates local to the scope of that function, meaning we can't access or modify global arrays with it. (In bash 4 you can use declare -g
to declare global variables - but in bash 4, you can use associative arrays in the first place, avoiding this workaround.)
Summary:
- Upgrade to bash 4 and use
declare -A
for associative arrays. - Use the
declare
option if you can't upgrade. - Consider using
awk
instead and avoid the issue altogether.
Solution 2
There's parameter substitution, though it may be un-PC as well ...like indirection.
#!/bin/bash
# Array pretending to be a Pythonic dictionary
ARRAY=( "cow:moo"
"dinosaur:roar"
"bird:chirp"
"bash:rock" )
for animal in "${ARRAY[@]}" ; do
KEY="${animal%%:*}"
VALUE="${animal##*:}"
printf "%s likes to %s.\n" "$KEY" "$VALUE"
done
printf "%s is an extinct animal which likes to %s\n" "${ARRAY[1]%%:*}" "${ARRAY[1]##*:}"
The BASH 4 way is better of course, but if you need a hack ...only a hack will do. You could search the array/hash with similar techniques.
Solution 3
This is what I was looking for here:
declare -A hashmap
hashmap["key"]="value"
hashmap["key2"]="value2"
echo "${hashmap["key"]}"
for key in ${!hashmap[@]}; do echo $key; done
for value in ${hashmap[@]}; do echo $value; done
echo hashmap has ${#hashmap[@]} elements
This did not work for me with bash 4.1.5:
animals=( ["moo"]="cow" )
Solution 4
Just use the file system
The file system is a tree structure that can be used as a hash map. Your hash table will be a temporary directory, your keys will be filenames, and your values will be file contents. The advantage is that it can handle huge hashmaps, and doesn't require a specific shell.
Hashtable creation
hashtable=$(mktemp -d)
Add an element
echo $value > "$hashtable/$key"
Read an element
value=$(< "$hashtable/$key")
Performance
Of course, its slow, but not that slow. I tested it on my machine, with an SSD and btrfs, and it does around 3000 element read/write per second.
Solution 5
You can further modify the hput()/hget() interface so that you have named hashes as follows:
hput() {
eval "$1""$2"='$3'
}
hget() {
eval echo '${'"$1$2"'#hash}'
}
and then
hput capitals France Paris
hput capitals Netherlands Amsterdam
hput capitals Spain Madrid
echo `hget capitals France` and `hget capitals Netherlands` and `hget capitals Spain`
This lets you define other maps that don't conflict (e.g., 'rcapitals' which does country lookup by capital city). But, either way, I think you'll find that this is all pretty terrible, performance-wise.
If you really want fast hash lookup, there's a terrible, terrible hack that actually works really well. It is this: write your key/values out to a temporary file, one-per line, then use 'grep "^$key"' to get them out, using pipes with cut or awk or sed or whatever to retrieve the values.
Like I said, it sounds terrible, and it sounds like it ought to be slow and do all sorts of unnecessary IO, but in practice it is very fast (disk cache is awesome, ain't it?), even for very large hash tables. You have to enforce key uniqueness yourself, etc. Even if you only have a few hundred entries, the output file/grep combo is going to be quite a bit faster - in my experience several times faster. It also eats less memory.
Here's one way to do it:
hinit() {
rm -f /tmp/hashmap.$1
}
hput() {
echo "$2 $3" >> /tmp/hashmap.$1
}
hget() {
grep "^$2 " /tmp/hashmap.$1 | awk '{ print $2 };'
}
hinit capitals
hput capitals France Paris
hput capitals Netherlands Amsterdam
hput capitals Spain Madrid
echo `hget capitals France` and `hget capitals Netherlands` and `hget capitals Spain`
Sridhar Ratnakumar
Updated on November 04, 2021Comments
-
Sridhar Ratnakumar over 2 years
What is the equivalent of Python dictionaries but in Bash (should work across OS X and Linux).
-
DigitalRoss almost 14 yearsSigh, that seems unnecessarily insulting and it's inaccurate anyway. One would not put input validation, escaping, or encoding (see, I actually do know) in the guts of the hash table, but rather in a wrapper and as soon as possible after input.
-
zhaorufei over 13 yearsGreat! you can even iterate it: for i in $(compgen -A variable capitols); do hget "$i" "" done
-
Lenik over 13 years+1 for
declare -A
, I can't believe that I never used it before! I've programmed bash for 10 years. -
glenn jackman about 13 yearsI would change that to
VALUE=${animal#*:}
to protect the case whereARRAY[$x]="caesar:come:see:conquer"
-
devguydavid about 12 yearsIt's also useful to put double quotes around the ${ARRAY[@]} in case there are spaces in the keys or values, as in
for animal in "${ARRAY[@]}"; do
-
Richard almost 12 yearsI am using bash 4.2, yet
declare -A
complains-A
is not a valid option... any ideas why? linux distr is SUSE.. -
lhunath almost 12 years@Richard: Presumably, you aren't actually using bash. Is your hashbang sh instead of bash, or are you otherwise invoking your code with sh? Try putting this right before your declare: echo "$BASH_VERSION $POSIXLY_CORRECT", it should output
4.x
and noty
. -
Steve Pitchers about 10 yearsCan't upgrade: the only reason I write scripts in Bash is for "run anywhere" portability. So relying on a non-universal feature of Bash rules this approach out. Which is a shame, because otherwise it would have been an excellent solution for me!
-
Steve Pitchers about 10 years@lhunath Continued reliability of existing systems is seen (round here) as more important than new bangs and whistles no-one other than me even perceives the need for yet. Even where I have full freedom, normally I don't upgrade important components like Bash unless the need is really convincing. The answers and advice in several answers here is quite enough to solve my problem without introducing any additional risk!
-
lhunath about 10 years@StevePitchers An adequate and also typical answer. Though at least ponder, what is the additional risk of not upgrading systems - even if it is not an "introduced" risk. Some consider this "technical debt".
-
Admin over 9 yearsI think this is a pretty neat snippet. It could use a little cleanup (not much, though). In my version, I've renamed 'key' to 'pair' and made KEY and VALUE lowercase (because I use uppercase when variables are exported). I also renamed getHashKey to getHashValue and made both key and value local (sometimes you would want them not to be local, though). In getHashKeys, I do not assign anything to value. I use semicolon for separation, since my values are URLs.
-
ken over 9 yearsIt's a shame that OSX defaults to Bash 3 still as this represents the "default" for a lot of people. I thought the ShellShock scare might have been the push they needed but apparently not.
-
lhunath over 9 years@ken it's a licensing issue. Bash on OSX is stuck at the latest non-GPLv3 licensed build.
-
Jonathan Cross about 9 yearsAdding to comments above: OSX is still on Bash version 3.2.57 as of today.
-
rubo77 almost 9 yearsI only get
0 - dog
when I try your for loop using bash 4.3.30 on ubuntu 14.10 -
rubo77 almost 9 yearsNote, that the value may not contain spaces, otherwise you adde more elements at once
-
CodeManX almost 9 yearsBut isn't the efficiency quite poor? I'm thinking O(n*m) if you want to compare to another list of keys, instead of O(n) with proper hashmaps (constant time lookup, O(1) for a single key).
-
Bubnoff almost 9 yearsThe idea is less about efficiency, more about understand/read-ability for those with a background in perl, python or even bash 4. Allows you to write in a similar fashion.
-
AsymLabs almost 9 years@CharlieMartin : read is a very powerful feature and is under-utilized by many bash programmers. It allows compact forms of lisp-like list processing. For example, in the above example we can strip off just the first element and retain the rest (ie a similar concept to first and rest in lisp) by doing:
IFS=$'|' read -r first rest <<< "$fields"
-
Tobia over 8 yearsThanks, simple
${!i}
indirection does the trick and that's enough. Flaming war in 3, 2, 1… -
Paul Draper over 8 yearsIhnunath, Bash 4 has slower adoption because of GPLv3. Like it or hate it, most would agree that changing all of your licensing or business model is not a "silly excuse".
-
jww about 8 years"The future is now, stop living in the past ..." - That's fine and dandy until OS X reality hits:
$ /bin/bash --version
returnsGNU bash, version 3.2.53(1)-release (x86_64-apple-darwin12)
. I'm still on OS X 10.8.5 due to operational requirements. -
lhunath about 8 years@jww Apple will not upgrade GNU bash beyond 3 due to its ill will against the GPLv3. But that should not be a deterrent.
brew install bash
brew.sh -
thomanski over 7 yearsUpvote for the hashmap["key"]="value" syntax which I, too, found missing from the otherwise fantastic accepted answer.
-
Charles Duffy over 7 years...or
sudo port install bash
, for those (wisely, IMHO) unwilling to make directories in the PATH for all users writable without explicit per-process privilege escalation. -
Sanjay over 7 years@DigitalRoss can you explain what is the use of #hash in eval echo '${hash'"$1"'#hash}'. for me it seems me as a comment not more then that. does #hash have any special meaning here?
-
MestreLion about 7 years@CoDEmanX: this is a hack, a clever and elegant but still rudimentary workaround to help the poor souls still stuck in 2007 with Bash 3.x. You cannot expect "proper hashmaps" or efficiency considerations in such a simple code.
-
Camille Goudeseune almost 7 yearsWhich version of bash supports
mkdir -d
? (Not 4.3, on Ubuntu 14. I'd resort tomkdir /run/shm/foo
, or if that filled up RAM,mkdir /tmp/foo
.) -
Reid Ellis almost 7 yearsPerhaps
mktemp -d
was meant instead? -
Helin Wang over 6 yearsCurious what is the difference between
$value=$(< $hashtable/$key)
andvalue=$(< $hashtable/$key)
? Thanks! -
1.61803 over 6 years@lhunath, could you explain the order in which the pairs are stored?
-
MikeW over 6 yearsBash 4 is a no-no for may developers, especially for embedded devices, due to GPL3 requirements "Protecting your right to tinker" - which is a potential security nightmare, if you also want to guard against hacks (eg botnet subversion) on your device !
-
kirbyfan64sos over 6 years"tested it on my machine" This sounds like a great way to burn a hole through your SSD. Not all Linux distros use tmpfs by default.
-
Sam F over 6 yearsbe careful when looping because the keys are not visited in the order you may expect in a loop such as
for sound in "${!animals[@]}"
-
Rolf over 6 yearsI am processing about 50000 hashes. Perl and PHP do it a hair under 1/2 second. Node in 1 second and something. FS option sounds slow. However, can we make sure that the files only exist in RAM, somehow?
-
lovasoa over 6 yearsYes. In linux, you can put the files in /dev/shm, for instance. See howtoforge.com/storing-files-directories-in-memory-with-tmpfs
-
Xeverous about 6 years@rubo77 key neither, it adds multiple keys. Any way to workaround this?
-
Shabirmean about 6 yearsShouldn't the line echo "${animals[moo]}"** in the answer have a **$ before moo, like echo "${animals[$moo]}"?
-
lhunath about 6 years@Shabirmean
moo
is not a parameter in this context. It is a string"moo"
, the key of the entry in the array that has the value"cow"
. -
Trevor Boyd Smith almost 6 yearsi do something similar but slightly different --> each string is separated by a ' '. then i iterate over each line by index and do
words=(${line})
and then i can pick each parameter out of the arraywords
-
Per Lundberg almost 6 yearsFWIW: if you are thinking about "which Ubuntu version supports bash 4?" (as I was when reading this), it works on at least 14.04 and newer: packages.ubuntu.com/search?keywords=bash
-
jpaugh over 5 years@Sanjay
${var#start}
removes the text start from the beginning of the value stored in the variable var. -
Rakib about 5 yearsThe link in the answer is scary! If you click it, you are stuck in a redirection loop. Please update.
-
Adam Katz about 5 years@MohammadRakibAmin – Yeah, my website is down and I doubt I'll be resurrecting my blog. I have updated the above link to an archived version. Thanks for your interest!
-
santiago arizti almost 5 yearsexpanding... if you have
animals_meaw=cat animals_woof=dog animals_moo=cow
you can iterate withfor animal in ${!animals_*}; do echo "the ${!animal} goes ${animal#animals_}"; done
outputs:the cat goes meaw...
-
santiago arizti almost 5 years... and if you have
declare -A animals=([meaw]=cat [woof]=dog [moo]=cow)
you can iterate withfor sound in ${!animals[*]}; do echo "the ${animals[$sound]} goes ${sound}"; done
outputs:the cow goes moo...
-
Miloš Černilovský over 4 yearsThis also works in ZSH which is the new default command line for Mac.
-
neuralmer almost 4 yearsIt doesn't look like this will handle hash collisions.
-
Adam Katz almost 4 years@neuralmer – True. This was designed to be an actual hash implementation of a hash structure. If you want to handle hash collisions, I recommend a real hash implementation rather than a hack like this. Adapting this to manage collisions would remove all of its elegance.
-
Benjamin over 3 years"Not actually executing a Bash script with Bash does happen, and will be really confusing!" Truer words were never spoken, especially in the context of the "test" command versus double brackets.
-
cycollins about 3 yearsI like this solution a lot. It is very common to need only handful of (< 100) elements because it makes the code more straightforward. If you need many thousands, probably best to use Python, Perl, whatever.
-
zomars over 2 yearsThis won't work with values that has "/" slashes in them