Log Structured Merge Tree in Hbase

14,381

Solution 1

You can take a look at this two articles that describe exactly what you want

http://blog.cloudera.com/blog/2012/06/hbase-io-hfile-input-output/

http://blog.cloudera.com/blog/2012/06/hbase-write-path/

In brief:

  • The client send data to the region server that is responsible to handle the key
  • (.META. contains key ranges for each region)
  • The user operation (e.g. put) is written to the Write-Ahead-Log (WAL, the HLog)
  • (The log is used just for "safety" if the region server crash the log is replayed to recover data not written to disk)
  • After writing to the log, data is also written to the MemStore
  • ...once the memstore reach a threshold (conf property)
  • The memstore is flushed on disk, creating a single hfile
  • ...when the number of hfiles grows too much (conf property) the compaction kicks in (merge)

In terms of on disk data structure: http://blog.cloudera.com/blog/2012/06/hbase-io-hfile-input-output/ The article above cover the hfile format... it's an append only format, and can be seen like a b+tree. (Keeping in mind that this b+tree cannot be modified in place)

The HLog is only used for "safety", once the data is written to the hfiles, the logs can be thrown away

Solution 2

According to LSM-tree model in HBase the data consists of two parts - in-memory tree which contains most recent updates upon the data and disk store tree which arranges the rest part of the data into a form of immutable sequential B-tree located on the hard drive. From time to time HBase service decides that it has enough changes in memory to flush them into file storage. In that case it performs the rolling merge of data from the virtual space to disc, executing an operation similar to merge step of Merge sort algorithm.

In HBase infrastructure such data model is based on several components which organize all data across the cluster as a collections of LSM-trees located on slave servers and driven by the main master service. The system is driven by the following components:

HMaster - primary HBase service which maintains the correct state of slave Region Server nodes by managing and balancing the data among them. Besides it drives the changes of metadata information in the storage, like table or column creations and updates.

Zookeeper - represents a distributed cache used by HBase services and its clients to store reconciled up-to-date information about naming and configurations.

Regional servers - HBase worker nodes which perform the management and storage of pieces of the information in LSM-tree fashion HDFS - used by Regional servers behind the scene for the actual storage of the data

From Low-level the most part of HBase functionality is located within Regional server which performs the read-write work upon the tables. Every table technically can be distributed across different Regional servers as a collection of of separate pieces called HRegions. Single Regional server node can hold several HRegions of one table. Each HRegion holds a certain range of rows shared between the memory and disc space and sorted by key attribute. These ranges do not intersect between different regions so we can relay on their sequential behavior across the cluster. Individual Regional server HRegion includes following parts:

Write Ahead Log (WAL) file - the first place when data is been persisted on every write operation before getting into Memory. As I've mentioned earlier the first part of the LSM-tree is kept in memory, which means that it can be affected by some external factors like power lose from example. Keeping the log file of such operations in a separate place would allow to restore this part easily without any looses.

Memstore - keeps a sorted collection of most recent updates of the information in the memory. It is the actual implementation of the first part of LMS-tree structure, described earlier. Periodically performs rolling merges into the store files called HFiles on the local hard drives

HFile - represents a small pieces of date received from the Memstore and saved in HDFS. Each HFile contains sorted KeyValues collection and B-Tree+ index which allows to seek the data without reading the whole file. Periodically HBase performs merge sort operations upon these files to make them fit the configured size of standard HDFS block and avoid small files problem

enter image description here

You can walk through these elements manually by pushing the data and passing it through the whole LSM-tree process. I described how to do it in my recent article:

https://oyermolenko.blog/2017/02/21/hbase-as-primary-nosql-hadoop-storage/

Share:
14,381
Amit Nagar
Author by

Amit Nagar

Hey All, About me, My self Amit, I have 8+ years of IT Industry experience in Java, J2EE, Server Side Programming and SOA. I am very enthusiastic, passionable Software Engineer working in MNC in India.

Updated on June 11, 2022

Comments

  • Amit Nagar
    Amit Nagar almost 2 years

    I am working on Hbase. I have query regarding how Hbase store the data in sorted order with LSM.

    As per my understanding, Hbase use LSM Tree for data transfer in large scale data processing. when Data comes from client, it store in-memory sequentially first and than sort and store as B-Tree as Store file. Than it is merging the Store file with Disk B-Tree(of key). is it correct ? Am I missing something ?

    • If Yes, than in cluster env. there are multiple RegionServers who take the client request. On that case, How all the Hlogs (of each regionServer) merge with disk B-Tree(as existing key spread across the all dataNode disk) ?

    • Is it like Hlog only merge the data with Hfile of same regionServer ?