How does a hash table work? Is it faster than "SELECT * from .."

21,402

Solution 1

A simple hash table works by keeping the items on several lists, instead of just one. It uses a very fast and repeatable (i.e. non-random) method to choose which list to keep each item on. So when it is time to find the item again, it repeats that method to discover which list to look in, and then does a normal (slow) linear search in that list.

By dividing the items up into 17 lists, the search becomes 17 times faster, which is a good improvement.

Although of course this is only true if the lists are roughly the same length, so it is important to choose a good method of distributing the items between the lists.

In your example table, the first column is the key, the thing we need to find the item. And lets suppose we will maintain 17 lists. To insert something, we perform an operation on the key called hashing. This just turns the key into a number. It doesn't return a random number, because it must always return the same number for the same key. But at the same time, the numbers must be "spread out" widely.

Then we take the resulting number and use modulus to shrink it down to the size of our list:

Hash(key) % 17

This all happens extremely fast. Our lists are in an array, so:

_lists[Hash(key % 17)].Add(record);

And then later, to find the item using that key:

Record found = _lists[Hash(key % 17)].Find(key);

Note that each list can just be any container type, or a linked list class that you write by hand. When we execute a Find in that list, it works the slow way (examine the key of each record).

Solution 2

Do not worry about what MySQL is doing internally to locate records quickly. The job of a database is to do that sort of thing for you. Just run a SELECT [columns] FROM table WHERE [condition]; query and let the database generate a query plan for you. Note that you don't want to use SELECT *, since if you ever add a column to the table that will break all your old queries that relied on there being a certain number of columns in a certain order.

If you really want to know what's going on under the hood (it's good to know, but do not implement it yourself: that is the purpose of a database!), you need to know what indexes are and how they work. If a table has no index on the columns involved in the WHERE clause, then, as you say, the database will have to search through every row in the table to find the ones matching your condition. But if there is an index, the database will search the index to find the exact location of the rows you want, and jump directly to them. Indexes are usually implemented as B+-trees, a type of search tree that uses very few comparisons to locate a specific element. Searching a B-tree for a specific key is very fast. MySQL is also capable of using hash indexes, but these tend to be slower for database uses. Hash indexes usually only perform well on long keys (character strings especially), since they reduce the size of the key to a fixed hash size. For data types like integers and real numbers, which have a well-defined ordering and fixed length, the easy searchability of a B-tree usually provides better performance.

You might like to look at the chapters in the MySQL manual and PostgreSQL manual on indexing.

Solution 3

http://en.wikipedia.org/wiki/Hash_table

Hash tables may be used as in-memory data structures. Hash tables may also be adopted for use with persistent data structures; database indices sometimes use disk-based data structures based on hash tables, although balanced trees are more popular.

Share:
21,402
roa3
Author by

roa3

Updated on July 09, 2022

Comments

  • roa3
    roa3 almost 2 years

    Let's say, I have :

    Key | Indexes | Key-values
    ----+---------+------------
    001 | 100001  | Alex
    002 | 100002  | Micheal
    003 | 100003  | Daniel
    

    Lets say, we want to search 001, how to do the fast searching process using hash table?

    Isn't it the same as we use the "SELECT * from .. " in mysql? I read alot, they say, the "SELECT *" searching from beginning to end, but hash table is not? Why and how?

    By using hash table, are we reducing the records we are searching? How?

    Can anyone demonstrate how to insert and retrieve hash table process in mysql query code? e.g.,

    SELECT * from table1 where hash_value="bla" ...
    

    Another scenario: If the indexes are like S0001, S0002, T0001, T0002, etc. In mysql i could use:

    SELECT * from table WHERE value = S*
    

    isn't it the same and faster?

  • roa3
    roa3 over 15 years
    I already read at en.wikipedia.org/wiki/Hash_table and some research over the internet, however I just could not grab the idea of how the search process can be fastened?
  • kquinn
    kquinn over 15 years
    Using that strategy to "optimize" a database is inviting disaster. It's the database's job to make data retrieval fast and easy. "Shortcuts" like this will usually only undermine it and make its job that much harder.
  • Daniel Earwicker
    Daniel Earwicker over 15 years
    NB if any part of this is confusing, leave a comment and I'll try to improve it.
  • roa3
    roa3 over 15 years
    maybe you could help me answer this question: stackoverflow.com/questions/540848/…