What is the algorithm for query search in the database?

23,996

Solution 1

If there's no indexes, then yes, a linear search is performed.

But, databases typically use a B Tree index when you specify a column(s) as a key. These are special data structure formats that are specifically tuned(high B Tree branching factors) to perform well on magnetic disk hardware, where the most significant time consuming factor is the seek operation(the magnetic head has to move to a diff part of the file).

You can think of the index as a sorted/structured copy of the values in a column. It can be determined quickly if the value being searched for is in the index. If it finds it, then it will also find a pointer that points back to the correct location of the corresponding row in the main data file(so it can go and read the other columns in the row). Sometimes a multi-column index contains all the data requested by the query, and then it doesn't need to skip back to the main file, it can just read what it found and then its done.

There's other types of indexes, but I think you get the idea - duplicate data and arrange it in a way that's fast to search.

On a large database, indexes make the difference between waiting a fraction of a second, vs possibly days for a complex query to complete.

btw- B tree's aren't a simple and easy to understand data structure, and the traversal algorithm is also complex. In addition, the traversal is even uglier than most of the code you will find, because in a database they are constantly loading/unloading chunks of data from disk and managing it in memory, and this significantly uglifies the code. But, if you're familiar with binary search trees, then I think you understand the concept well enough.

Solution 2

Well, it depends on how the data is stored and what are you trying to do.

  • As already indicated, a common structure for maintaining entries is a B+ tree. The tree is well optimized for disk since the actual data is stored only in leaves - and the keys are stored in the internal nodes. It usually allows a very small number of disk accesses since the top k levels of the tree can be stored in RAM, and only the few bottom levels will be stored on disk and require a disk read for each.
  • Other alternative is a hash table. You maintain in memory (RAM) an array of "pointers" - these pointers indicate a disk address, which contains a bucket that includes all entries with the corresponding hash value. Using this method, you only need O(1) disk accesses (which is usually the bottleneck when dealing with data bases), so it should be relatively fast.
    However, a hash table does not allow efficient range queries (which can be efficiently done in a B+ tree).

The disadvantage of all of the above is that it requires a single key - i.e. if the hash table or B+ tree is built according to the field "id" of the relation, and then you search according to "key" - it becomes useless.
If you want to guarantee fast search for all fields of the relation - you are going to need several structures, each according to a different key - which is not very memory efficient.

Now, there are many optimizations to be considered according to the specific usage. If for example, number of searches is expected to be very small (say smaller loglogN of total ops), maintaining a B+ tree is overall less efficient then just storing the elements as a list and on the rare occasion of a search - just do a linear search.

Solution 3

Very gOod question, but it can have many answers depending on the structure of your table and how is normalized...

Usually to perform a seacrh in a SELECT query the DBMS sorts the table (it uses mergesort because this algorithm is good for I/O in disc, not quicksort) then depending on indexes (if the table has) it just match the numbers, but if the structure is more complex the DBMS can perform a search in a tree, but this is too deep, let me research again in my notes I took.

I recommend activating the query execution plan, here is an example in how to do so in Sql Server 2008. And then execute your SELECT statement with the WHERE clause and you will be able to begin understanding what is going on inside the DBMS.

Share:
23,996
Treize
Author by

Treize

I'm currently studying Computer Science at the Polytechnic University of the Philippines.

Updated on July 30, 2022

Comments

  • Treize
    Treize almost 2 years

    Good day everyone, I'm currently doing research on search algorithm optimization.

    As of now, I'm researching on the Database.

    In a database w/ SQL Support.

    I can write the query for a specific table.

    1. Select Number from Table1 where Name = "Test";
    2. Select * from Table1 where Name = "Test";

    1 searches the number from Table1 from where the Name is Test and 2 searches all the column for name Test.

    I understand the concept of the function however what I'm interested in learning what is the approach of the search?

    Is it just plain linear search where from the first index until the nth index it will grab so long as the condition is true thus having O(n) speed or does it have a unique algorithm that speeds its process?

    • nullpotent
      nullpotent over 11 years
      Most likely MySQL (InnoDB) optimizes search queries with B-tree.