HBase - What's the difference between WAL and MemStore?
WAL
is for recovery NOT for data duplication.(further see my answer here)
Pls go through below to understand more...
-
A Hbase Store hosts a MemStore and 0 or more StoreFiles (HFiles). A Store corresponds to a column family for a table for a given region.
-
The Write Ahead Log (
WAL
) records all changes to data in HBase, to file-based storage. if a RegionServer crashes or becomes unavailable before the MemStore is flushed, the WAL ensures that the changes to the data can be replayed. -
With a single
WAL
per RegionServer, the RegionServer must write to the WAL serially, because HDFS files must be sequential. This causes the WAL to be a performance bottleneck. -
WAL
can be disabled to improve performance bottleneck. This is done by calling the Hbase client field
Mutation.writeToWAL(false)
General Note : Its general practice that while doing bulkloading data, WAL
is disabled to get speed. But side effect is if you disable WAL
you cant get back data to replay if in case any memory crashes.
More over if you use solr+ HBASE + LILY, i.e LILY Morphiline NRT indexes with hbase then it will work on WAL
if you disable WAL
for performance reasons, then Solr NRT
indexing wont work. since Lily works on WAL
.
please have a look at Hbase architecture section
Related videos on Youtube
Comments
-
Shankar about 2 years
I am trying to understand the
HBase
architecture. I can see two different terms are used for same purpose.Write Ahead Logs
andMemstore
, both are used to store new data that hasn't yet been persisted topermanent storage
.What's the difference between
WAL
and MemStore?Update:
WAL - is used to recover not-yet-persisted data in case a server crashes. MemStore - stores updates in memory as Sorted Keyvalue.
It seems lot of duplication of data before writing the data to Disk.
-
best wishes almost 5 yearsWhat happens if things fail between WAL and memstore? when db will come up after crash, it will assume that whatever is there in WAL has been replicated, but that may not be the case.
-
Ram Ghadiyaram almost 5 yearsWAL is for recovery. AFAIK it should recover. thats the primray reason to go for WAL. otherwise its waste of time and effort to write in to the WAL.
-
best wishes almost 5 yearsi agree on that, the case i am concerned about is
does failure of write to memstore has any impact on response code from hbase
? if not there will be inconsistency right? and if yes do we do a undo log? I know it is a really edge case but how we consider these cases while designing databases?