What are disadvantages of the Hadoop distribution MapR compared to Cloudera and Hortonworks?

11,303

I would define MapR a bit differently. It does not use HDFS, but instead of it provides their own distributed file system with NFS interface. which, as well as HDFS is based on local FS.
Main differances are coming from the fact that HDFS is not Posix and other design choices.
1. HDFS is not mutable while MapR is. It can be viewed as advantage, especially if you need it.
2. HDFS is not mountable while MapR is. You can use any existing tools working with Linux FS.

Unrelated to posix: MapR have small block size and not single point of failure (NameNode). MapR Has multisite replication.

lets look on dark side also: a) Having mutable data (instead of not mutable HDFS) makes system more complicated.
b) It is not known (at least for me) to work on huge clusters. (I heard about hundred of nodes).
c) From architecture point (having small blocks) I am not sure how good data locality can be achieved.

Share:
11,303
Kai Wähner
Author by

Kai Wähner

Kai Wähner works as Technology Evangelist at TIBCO. Kai’s main area of expertise lies within the fields of Big Data, Advanced Analytics, Machine Learning, Integration, SOA, Microservices, BPM, Cloud and Internet of Things. He is regular speaker at international IT conferences such as JavaOne, O’Reilly Software Architecture or ApacheCon, writes articles for professional journals, and shares his experiences with new technologies on his blog.Contact: [email protected] / @KaiWaehner. Find more details and references (presentations, articles) on his website: www.kai-waehner.de

Updated on June 17, 2022

Comments

  • Kai Wähner
    Kai Wähner almost 2 years

    Cloudera and Hortonworks use HDFS, one of the basic concepts of Apache Hadoop. MapR uses its own concept / implementation. Instead of HDFS, you use the native file system directly. You can find a lot of advantages using this approach on the website of MapR.

    I wonder what are the disadvantages of this approach?

  • Ted Dunning
    Ted Dunning about 11 years
    Regarding David's dark-side comments, (a) mutability makes things much simpler for the user, (b) it works on large clusters... see recent world sort record, (c) small blocks aren't the issue for locality; MapR separates the concepts of disk unit (small blocks), cluster striping unit (like Hadoop block 100's of MB) and scaling constant (30GB instead of Hadoops default 64MB).
  • David Gruzman
    David Gruzman about 11 years
    Ted - please provide a link to the sort record
  • David Gruzman
    David Gruzman about 11 years
    It is very interesting document. I think it would be very useful to have summary of MapR improvements aside of the HDFS replacement.
  • David Gruzman
    David Gruzman about 11 years
    In addition - it is not clear what is file server mentioned in the document, and what was network - 1 GBit or 10 GBit?
  • Ted Dunning
    Ted Dunning about 11 years
    Dave, Srivas already provided the link. See mapr.com/blog/hadoop-minutesort-record
  • Ted Dunning
    Ted Dunning about 11 years
    The file server is the standard MapR distributed file server. The network is 10GbE. See mapr.com/doc/display/MapR/Start+Here
  • cabad
    cabad over 10 years
    Any source other than a MapR blog? I don't see the sort record here: http://sortbenchmark.org/.
  • j.raymond
    j.raymond over 9 years
    Something else to note, "The TeraByte benchmark is now deprecated because it became essentially the same as MinuteSort." REF: sortbenchmark.org