How to define hash tables in Bash?

572,220

Solution 1

Bash 4

Bash 4 natively supports this feature. Make sure your script's hashbang is #!/usr/bin/env bash or #!/bin/bash so you don't end up using sh. Make sure you're either executing your script directly, or execute script with bash script. (Not actually executing a Bash script with Bash does happen, and will be really confusing!)

You declare an associative array by doing:

declare -A animals

You can fill it up with elements using the normal array assignment operator. For example, if you want to have a map of animal[sound(key)] = animal(value):

animals=( ["moo"]="cow" ["woof"]="dog")

Or declare and instantiate in one line:

declare -A animals=( ["moo"]="cow" ["woof"]="dog")

Then use them just like normal arrays. Use

  • animals['key']='value' to set value

  • "${animals[@]}" to expand the values

  • "${!animals[@]}" (notice the !) to expand the keys

Don't forget to quote them:

echo "${animals[moo]}"
for sound in "${!animals[@]}"; do echo "$sound - ${animals[$sound]}"; done

Bash 3

Before bash 4, you don't have associative arrays. Do not use eval to emulate them. Avoid eval like the plague, because it is the plague of shell scripting. The most important reason is that eval treats your data as executable code (there are many other reasons too).

First and foremost: Consider upgrading to bash 4. This will make the whole process much easier for you.

If there's a reason you can't upgrade, declare is a far safer option. It does not evaluate data as bash code like eval does, and as such does not allow arbitrary code injection quite so easily.

Let's prepare the answer by introducing the concepts:

First, indirection.

$ animals_moo=cow; sound=moo; i="animals_$sound"; echo "${!i}"
cow

Secondly, declare:

$ sound=moo; animal=cow; declare "animals_$sound=$animal"; echo "$animals_moo"
cow

Bring them together:

# Set a value:
declare "array_$index=$value"

# Get a value:
arrayGet() { 
    local array=$1 index=$2
    local i="${array}_$index"
    printf '%s' "${!i}"
}

Let's use it:

$ sound=moo
$ animal=cow
$ declare "animals_$sound=$animal"
$ arrayGet animals "$sound"
cow

Note: declare cannot be put in a function. Any use of declare inside a bash function turns the variable it creates local to the scope of that function, meaning we can't access or modify global arrays with it. (In bash 4 you can use declare -g to declare global variables - but in bash 4, you can use associative arrays in the first place, avoiding this workaround.)

Summary:

  • Upgrade to bash 4 and use declare -A for associative arrays.
  • Use the declare option if you can't upgrade.
  • Consider using awk instead and avoid the issue altogether.

Solution 2

There's parameter substitution, though it may be un-PC as well ...like indirection.

#!/bin/bash

# Array pretending to be a Pythonic dictionary
ARRAY=( "cow:moo"
        "dinosaur:roar"
        "bird:chirp"
        "bash:rock" )

for animal in "${ARRAY[@]}" ; do
    KEY="${animal%%:*}"
    VALUE="${animal##*:}"
    printf "%s likes to %s.\n" "$KEY" "$VALUE"
done

printf "%s is an extinct animal which likes to %s\n" "${ARRAY[1]%%:*}" "${ARRAY[1]##*:}"

The BASH 4 way is better of course, but if you need a hack ...only a hack will do. You could search the array/hash with similar techniques.

Solution 3

This is what I was looking for here:

declare -A hashmap
hashmap["key"]="value"
hashmap["key2"]="value2"
echo "${hashmap["key"]}"
for key in ${!hashmap[@]}; do echo $key; done
for value in ${hashmap[@]}; do echo $value; done
echo hashmap has ${#hashmap[@]} elements

This did not work for me with bash 4.1.5:

animals=( ["moo"]="cow" )

Solution 4

Just use the file system

The file system is a tree structure that can be used as a hash map. Your hash table will be a temporary directory, your keys will be filenames, and your values will be file contents. The advantage is that it can handle huge hashmaps, and doesn't require a specific shell.

Hashtable creation

hashtable=$(mktemp -d)

Add an element

echo $value > "$hashtable/$key"

Read an element

value=$(< "$hashtable/$key")

Performance

Of course, its slow, but not that slow. I tested it on my machine, with an SSD and btrfs, and it does around 3000 element read/write per second.

Solution 5

You can further modify the hput()/hget() interface so that you have named hashes as follows:

hput() {
    eval "$1""$2"='$3'
}

hget() {
    eval echo '${'"$1$2"'#hash}'
}

and then

hput capitals France Paris
hput capitals Netherlands Amsterdam
hput capitals Spain Madrid
echo `hget capitals France` and `hget capitals Netherlands` and `hget capitals Spain`

This lets you define other maps that don't conflict (e.g., 'rcapitals' which does country lookup by capital city). But, either way, I think you'll find that this is all pretty terrible, performance-wise.

If you really want fast hash lookup, there's a terrible, terrible hack that actually works really well. It is this: write your key/values out to a temporary file, one-per line, then use 'grep "^$key"' to get them out, using pipes with cut or awk or sed or whatever to retrieve the values.

Like I said, it sounds terrible, and it sounds like it ought to be slow and do all sorts of unnecessary IO, but in practice it is very fast (disk cache is awesome, ain't it?), even for very large hash tables. You have to enforce key uniqueness yourself, etc. Even if you only have a few hundred entries, the output file/grep combo is going to be quite a bit faster - in my experience several times faster. It also eats less memory.

Here's one way to do it:

hinit() {
    rm -f /tmp/hashmap.$1
}

hput() {
    echo "$2 $3" >> /tmp/hashmap.$1
}

hget() {
    grep "^$2 " /tmp/hashmap.$1 | awk '{ print $2 };'
}

hinit capitals
hput capitals France Paris
hput capitals Netherlands Amsterdam
hput capitals Spain Madrid

echo `hget capitals France` and `hget capitals Netherlands` and `hget capitals Spain`
Share:
572,220
Sridhar Ratnakumar
Author by

Sridhar Ratnakumar

Updated on November 04, 2021

Comments

  • Sridhar Ratnakumar
    Sridhar Ratnakumar over 2 years

    What is the equivalent of Python dictionaries but in Bash (should work across OS X and Linux).

  • DigitalRoss
    DigitalRoss almost 14 years
    Sigh, that seems unnecessarily insulting and it's inaccurate anyway. One would not put input validation, escaping, or encoding (see, I actually do know) in the guts of the hash table, but rather in a wrapper and as soon as possible after input.
  • zhaorufei
    zhaorufei over 13 years
    Great! you can even iterate it: for i in $(compgen -A variable capitols); do hget "$i" "" done
  • Lenik
    Lenik over 13 years
    +1 for declare -A, I can't believe that I never used it before! I've programmed bash for 10 years.
  • glenn jackman
    glenn jackman about 13 years
    I would change that to VALUE=${animal#*:} to protect the case where ARRAY[$x]="caesar:come:see:conquer"
  • devguydavid
    devguydavid about 12 years
    It's also useful to put double quotes around the ${ARRAY[@]} in case there are spaces in the keys or values, as in for animal in "${ARRAY[@]}"; do
  • Richard
    Richard almost 12 years
    I am using bash 4.2, yet declare -A complains -A is not a valid option... any ideas why? linux distr is SUSE..
  • lhunath
    lhunath almost 12 years
    @Richard: Presumably, you aren't actually using bash. Is your hashbang sh instead of bash, or are you otherwise invoking your code with sh? Try putting this right before your declare: echo "$BASH_VERSION $POSIXLY_CORRECT", it should output 4.x and not y.
  • Steve Pitchers
    Steve Pitchers about 10 years
    Can't upgrade: the only reason I write scripts in Bash is for "run anywhere" portability. So relying on a non-universal feature of Bash rules this approach out. Which is a shame, because otherwise it would have been an excellent solution for me!
  • Steve Pitchers
    Steve Pitchers about 10 years
    @lhunath Continued reliability of existing systems is seen (round here) as more important than new bangs and whistles no-one other than me even perceives the need for yet. Even where I have full freedom, normally I don't upgrade important components like Bash unless the need is really convincing. The answers and advice in several answers here is quite enough to solve my problem without introducing any additional risk!
  • lhunath
    lhunath about 10 years
    @StevePitchers An adequate and also typical answer. Though at least ponder, what is the additional risk of not upgrading systems - even if it is not an "introduced" risk. Some consider this "technical debt".
  • Admin
    Admin over 9 years
    I think this is a pretty neat snippet. It could use a little cleanup (not much, though). In my version, I've renamed 'key' to 'pair' and made KEY and VALUE lowercase (because I use uppercase when variables are exported). I also renamed getHashKey to getHashValue and made both key and value local (sometimes you would want them not to be local, though). In getHashKeys, I do not assign anything to value. I use semicolon for separation, since my values are URLs.
  • ken
    ken over 9 years
    It's a shame that OSX defaults to Bash 3 still as this represents the "default" for a lot of people. I thought the ShellShock scare might have been the push they needed but apparently not.
  • lhunath
    lhunath over 9 years
    @ken it's a licensing issue. Bash on OSX is stuck at the latest non-GPLv3 licensed build.
  • Jonathan Cross
    Jonathan Cross about 9 years
    Adding to comments above: OSX is still on Bash version 3.2.57 as of today.
  • rubo77
    rubo77 almost 9 years
    I only get 0 - dog when I try your for loop using bash 4.3.30 on ubuntu 14.10
  • rubo77
    rubo77 almost 9 years
    Note, that the value may not contain spaces, otherwise you adde more elements at once
  • CodeManX
    CodeManX almost 9 years
    But isn't the efficiency quite poor? I'm thinking O(n*m) if you want to compare to another list of keys, instead of O(n) with proper hashmaps (constant time lookup, O(1) for a single key).
  • Bubnoff
    Bubnoff almost 9 years
    The idea is less about efficiency, more about understand/read-ability for those with a background in perl, python or even bash 4. Allows you to write in a similar fashion.
  • AsymLabs
    AsymLabs almost 9 years
    @CharlieMartin : read is a very powerful feature and is under-utilized by many bash programmers. It allows compact forms of lisp-like list processing. For example, in the above example we can strip off just the first element and retain the rest (ie a similar concept to first and rest in lisp) by doing: IFS=$'|' read -r first rest <<< "$fields"
  • Tobia
    Tobia over 8 years
    Thanks, simple ${!i} indirection does the trick and that's enough. Flaming war in 3, 2, 1…
  • Paul Draper
    Paul Draper over 8 years
    Ihnunath, Bash 4 has slower adoption because of GPLv3. Like it or hate it, most would agree that changing all of your licensing or business model is not a "silly excuse".
  • jww
    jww about 8 years
    "The future is now, stop living in the past ..." - That's fine and dandy until OS X reality hits: $ /bin/bash --version returns GNU bash, version 3.2.53(1)-release (x86_64-apple-darwin12). I'm still on OS X 10.8.5 due to operational requirements.
  • lhunath
    lhunath about 8 years
    @jww Apple will not upgrade GNU bash beyond 3 due to its ill will against the GPLv3. But that should not be a deterrent. brew install bash brew.sh
  • thomanski
    thomanski over 7 years
    Upvote for the hashmap["key"]="value" syntax which I, too, found missing from the otherwise fantastic accepted answer.
  • Charles Duffy
    Charles Duffy over 7 years
    ...or sudo port install bash, for those (wisely, IMHO) unwilling to make directories in the PATH for all users writable without explicit per-process privilege escalation.
  • Sanjay
    Sanjay over 7 years
    @DigitalRoss can you explain what is the use of #hash in eval echo '${hash'"$1"'#hash}'. for me it seems me as a comment not more then that. does #hash have any special meaning here?
  • MestreLion
    MestreLion about 7 years
    @CoDEmanX: this is a hack, a clever and elegant but still rudimentary workaround to help the poor souls still stuck in 2007 with Bash 3.x. You cannot expect "proper hashmaps" or efficiency considerations in such a simple code.
  • Camille Goudeseune
    Camille Goudeseune almost 7 years
    Which version of bash supports mkdir -d? (Not 4.3, on Ubuntu 14. I'd resort to mkdir /run/shm/foo, or if that filled up RAM, mkdir /tmp/foo.)
  • Reid Ellis
    Reid Ellis almost 7 years
    Perhaps mktemp -d was meant instead?
  • Helin Wang
    Helin Wang over 6 years
    Curious what is the difference between $value=$(< $hashtable/$key) and value=$(< $hashtable/$key)? Thanks!
  • 1.61803
    1.61803 over 6 years
    @lhunath, could you explain the order in which the pairs are stored?
  • MikeW
    MikeW over 6 years
    Bash 4 is a no-no for may developers, especially for embedded devices, due to GPL3 requirements "Protecting your right to tinker" - which is a potential security nightmare, if you also want to guard against hacks (eg botnet subversion) on your device !
  • kirbyfan64sos
    kirbyfan64sos over 6 years
    "tested it on my machine" This sounds like a great way to burn a hole through your SSD. Not all Linux distros use tmpfs by default.
  • Sam F
    Sam F over 6 years
    be careful when looping because the keys are not visited in the order you may expect in a loop such as for sound in "${!animals[@]}"
  • Rolf
    Rolf over 6 years
    I am processing about 50000 hashes. Perl and PHP do it a hair under 1/2 second. Node in 1 second and something. FS option sounds slow. However, can we make sure that the files only exist in RAM, somehow?
  • lovasoa
    lovasoa over 6 years
    Yes. In linux, you can put the files in /dev/shm, for instance. See howtoforge.com/storing-files-directories-in-memory-with-tmpf‌​s
  • Xeverous
    Xeverous about 6 years
    @rubo77 key neither, it adds multiple keys. Any way to workaround this?
  • Shabirmean
    Shabirmean about 6 years
    Shouldn't the line echo "${animals[moo]}"** in the answer have a **$ before moo, like echo "${animals[$moo]}"?
  • lhunath
    lhunath about 6 years
    @Shabirmean moo is not a parameter in this context. It is a string "moo", the key of the entry in the array that has the value "cow".
  • Trevor Boyd Smith
    Trevor Boyd Smith almost 6 years
    i do something similar but slightly different --> each string is separated by a ' '. then i iterate over each line by index and do words=(${line}) and then i can pick each parameter out of the array words
  • Per Lundberg
    Per Lundberg almost 6 years
    FWIW: if you are thinking about "which Ubuntu version supports bash 4?" (as I was when reading this), it works on at least 14.04 and newer: packages.ubuntu.com/search?keywords=bash
  • jpaugh
    jpaugh over 5 years
    @Sanjay ${var#start} removes the text start from the beginning of the value stored in the variable var.
  • Rakib
    Rakib about 5 years
    The link in the answer is scary! If you click it, you are stuck in a redirection loop. Please update.
  • Adam Katz
    Adam Katz about 5 years
    @MohammadRakibAmin – Yeah, my website is down and I doubt I'll be resurrecting my blog. I have updated the above link to an archived version. Thanks for your interest!
  • santiago arizti
    santiago arizti almost 5 years
    expanding... if you have animals_meaw=cat animals_woof=dog animals_moo=cow you can iterate with for animal in ${!animals_*}; do echo "the ${!animal} goes ${animal#animals_}"; done outputs: the cat goes meaw...
  • santiago arizti
    santiago arizti almost 5 years
    ... and if you have declare -A animals=([meaw]=cat [woof]=dog [moo]=cow) you can iterate with for sound in ${!animals[*]}; do echo "the ${animals[$sound]} goes ${sound}"; done outputs: the cow goes moo...
  • Miloš Černilovský
    Miloš Černilovský over 4 years
    This also works in ZSH which is the new default command line for Mac.
  • neuralmer
    neuralmer almost 4 years
    It doesn't look like this will handle hash collisions.
  • Adam Katz
    Adam Katz almost 4 years
    @neuralmer – True. This was designed to be an actual hash implementation of a hash structure. If you want to handle hash collisions, I recommend a real hash implementation rather than a hack like this. Adapting this to manage collisions would remove all of its elegance.
  • Benjamin
    Benjamin over 3 years
    "Not actually executing a Bash script with Bash does happen, and will be really confusing!" Truer words were never spoken, especially in the context of the "test" command versus double brackets.
  • cycollins
    cycollins about 3 years
    I like this solution a lot. It is very common to need only handful of (< 100) elements because it makes the code more straightforward. If you need many thousands, probably best to use Python, Perl, whatever.
  • zomars
    zomars over 2 years
    This won't work with values that has "/" slashes in them