How to define hash tables in Bash?


Solution 1

Bash 4

Bash 4 natively supports this feature. Make sure your script's hashbang is #!/usr/bin/env bash or #!/bin/bash so you don't end up using sh. Make sure you're either executing your script directly, or execute script with bash script. (Not actually executing a Bash script with Bash does happen, and will be really confusing!)

You declare an associative array by doing:

declare -A animals

You can fill it up with elements using the normal array assignment operator. For example, if you want to have a map of animal[sound(key)] = animal(value):

animals=( ["moo"]="cow" ["woof"]="dog")

Or declare and instantiate in one line:

declare -A animals=( ["moo"]="cow" ["woof"]="dog")

Then use them just like normal arrays. Use

  • animals['key']='value' to set value

  • "${animals[@]}" to expand the values

  • "${!animals[@]}" (notice the !) to expand the keys

Don't forget to quote them:

echo "${animals[moo]}"
for sound in "${!animals[@]}"; do echo "$sound - ${animals[$sound]}"; done

Bash 3

Before bash 4, you don't have associative arrays. Do not use eval to emulate them. Avoid eval like the plague, because it is the plague of shell scripting. The most important reason is that eval treats your data as executable code (there are many other reasons too).

First and foremost: Consider upgrading to bash 4. This will make the whole process much easier for you.

If there's a reason you can't upgrade, declare is a far safer option. It does not evaluate data as bash code like eval does, and as such does not allow arbitrary code injection quite so easily.

Let's prepare the answer by introducing the concepts:

First, indirection.

$ animals_moo=cow; sound=moo; i="animals_$sound"; echo "${!i}"

Secondly, declare:

$ sound=moo; animal=cow; declare "animals_$sound=$animal"; echo "$animals_moo"

Bring them together:

# Set a value:
declare "array_$index=$value"

# Get a value:
arrayGet() { 
    local array=$1 index=$2
    local i="${array}_$index"
    printf '%s' "${!i}"

Let's use it:

$ sound=moo
$ animal=cow
$ declare "animals_$sound=$animal"
$ arrayGet animals "$sound"

Note: declare cannot be put in a function. Any use of declare inside a bash function turns the variable it creates local to the scope of that function, meaning we can't access or modify global arrays with it. (In bash 4 you can use declare -g to declare global variables - but in bash 4, you can use associative arrays in the first place, avoiding this workaround.)


  • Upgrade to bash 4 and use declare -A for associative arrays.
  • Use the declare option if you can't upgrade.
  • Consider using awk instead and avoid the issue altogether.

Solution 2

There's parameter substitution, though it may be un-PC as well indirection.


# Array pretending to be a Pythonic dictionary
ARRAY=( "cow:moo"
        "bash:rock" )

for animal in "${ARRAY[@]}" ; do
    printf "%s likes to %s.\n" "$KEY" "$VALUE"

printf "%s is an extinct animal which likes to %s\n" "${ARRAY[1]%%:*}" "${ARRAY[1]##*:}"

The BASH 4 way is better of course, but if you need a hack ...only a hack will do. You could search the array/hash with similar techniques.

Solution 3

This is what I was looking for here:

declare -A hashmap
echo "${hashmap["key"]}"
for key in ${!hashmap[@]}; do echo $key; done
for value in ${hashmap[@]}; do echo $value; done
echo hashmap has ${#hashmap[@]} elements

This did not work for me with bash 4.1.5:

animals=( ["moo"]="cow" )

Solution 4

Just use the file system

The file system is a tree structure that can be used as a hash map. Your hash table will be a temporary directory, your keys will be filenames, and your values will be file contents. The advantage is that it can handle huge hashmaps, and doesn't require a specific shell.

Hashtable creation

hashtable=$(mktemp -d)

Add an element

echo $value > "$hashtable/$key"

Read an element

value=$(< "$hashtable/$key")


Of course, its slow, but not that slow. I tested it on my machine, with an SSD and btrfs, and it does around 3000 element read/write per second.

Solution 5

You can further modify the hput()/hget() interface so that you have named hashes as follows:

hput() {
    eval "$1""$2"='$3'

hget() {
    eval echo '${'"$1$2"'#hash}'

and then

hput capitals France Paris
hput capitals Netherlands Amsterdam
hput capitals Spain Madrid
echo `hget capitals France` and `hget capitals Netherlands` and `hget capitals Spain`

This lets you define other maps that don't conflict (e.g., 'rcapitals' which does country lookup by capital city). But, either way, I think you'll find that this is all pretty terrible, performance-wise.

If you really want fast hash lookup, there's a terrible, terrible hack that actually works really well. It is this: write your key/values out to a temporary file, one-per line, then use 'grep "^$key"' to get them out, using pipes with cut or awk or sed or whatever to retrieve the values.

Like I said, it sounds terrible, and it sounds like it ought to be slow and do all sorts of unnecessary IO, but in practice it is very fast (disk cache is awesome, ain't it?), even for very large hash tables. You have to enforce key uniqueness yourself, etc. Even if you only have a few hundred entries, the output file/grep combo is going to be quite a bit faster - in my experience several times faster. It also eats less memory.

Here's one way to do it:

hinit() {
    rm -f /tmp/hashmap.$1

hput() {
    echo "$2 $3" >> /tmp/hashmap.$1

hget() {
    grep "^$2 " /tmp/hashmap.$1 | awk '{ print $2 };'

hinit capitals
hput capitals France Paris
hput capitals Netherlands Amsterdam
hput capitals Spain Madrid

echo `hget capitals France` and `hget capitals Netherlands` and `hget capitals Spain`
    What is the equivalent of Python dictionaries but in Bash (should work across OS X and Linux).

    This won't work with values that has "/" slashes in them