What are disadvantages of the Hadoop distribution MapR compared to Cloudera and Hortonworks?

hadoop hdfs cloudera mapr

11,303

I would define MapR a bit differently. It does not use HDFS, but instead of it provides their own distributed file system with NFS interface. which, as well as HDFS is based on local FS.
Main differances are coming from the fact that HDFS is not Posix and other design choices.
1. HDFS is not mutable while MapR is. It can be viewed as advantage, especially if you need it.
2. HDFS is not mountable while MapR is. You can use any existing tools working with Linux FS.

Unrelated to posix: MapR have small block size and not single point of failure (NameNode). MapR Has multisite replication.

lets look on dark side also: a) Having mutable data (instead of not mutable HDFS) makes system more complicated.
b) It is not known (at least for me) to work on huge clusters. (I heard about hundred of nodes).
c) From architecture point (having small blocks) I am not sure how good data locality can be achieved.

11,303

Author by

Kai Wähner

Kai Wähner works as Technology Evangelist at TIBCO. Kai’s main area of expertise lies within the fields of Big Data, Advanced Analytics, Machine Learning, Integration, SOA, Microservices, BPM, Cloud and Internet of Things. He is regular speaker at international IT conferences such as JavaOne, O’Reilly Software Architecture or ApacheCon, writes articles for professional journals, and shares his experiences with new technologies on his blog.Contact: [email protected] / @KaiWaehner. Find more details and references (presentations, articles) on his website: www.kai-waehner.de

Updated on June 17, 2022

Comments

Kai Wähner almost 2 years

Cloudera and Hortonworks use HDFS, one of the basic concepts of Apache Hadoop. MapR uses its own concept / implementation. Instead of HDFS, you use the native file system directly. You can find a lot of advantages using this approach on the website of MapR.

I wonder what are the disadvantages of this approach?
Ted Dunning about 11 years

Regarding David's dark-side comments, (a) mutability makes things much simpler for the user, (b) it works on large clusters... see recent world sort record, (c) small blocks aren't the issue for locality; MapR separates the concepts of disk unit (small blocks), cluster striping unit (like Hadoop block 100's of MB) and scaling constant (30GB instead of Hadoops default 64MB).
David Gruzman about 11 years

Ted - please provide a link to the sort record
David Gruzman about 11 years

It is very interesting document. I think it would be very useful to have summary of MapR improvements aside of the HDFS replacement.
David Gruzman about 11 years

In addition - it is not clear what is file server mentioned in the document, and what was network - 1 GBit or 10 GBit?
Ted Dunning about 11 years

Dave, Srivas already provided the link. See mapr.com/blog/hadoop-minutesort-record
Ted Dunning about 11 years

The file server is the standard MapR distributed file server. The network is 10GbE. See mapr.com/doc/display/MapR/Start+Here
cabad over 10 years

Any source other than a MapR blog? I don't see the sort record here: http://sortbenchmark.org/.
j.raymond over 9 years

Something else to note, "The TeraByte benchmark is now deprecated because it became essentially the same as MinuteSort." REF: sortbenchmark.org