What is the difference between AWS Elastic MapReduce and AWS Redshift

12,712

You are correct that both Amazon EMR and Amazon Redshift are clustered systems that can scale-out to offer more computing power. However, there are some very distinct differences between the two services.

Amazon EMR provides Apache Hadoop and applications that run on Hadoop. It is a very flexible system that can read and process unstructured data and is typically used for processing Big Data. However, learning Hadoop and related technologies can be quite difficult. ("With great power comes great responsibility!")

Amazon Redshift is a petabyte-scale data warehouse that is accessed via SQL. Data must be loaded into Redshift before being queried, which often requires some for of transformation ("ETL").

So which one to choose?

  • If you want to use SQL and you have structured data (eg CSV files), then Redshift is the simplest solution.
  • If you want to process unstructured data (eg in strange formats rather than structured CSV files), Amazon EMR can provide a Hadoop system that is very capable.
  • Sometimes people use both -- use Hadoop to transform data, then use Redshift for querying the data.

If Amazon Redshift can fit your needs, then use it rather than Hadoop. Redshift is simpler to use because it presents itself as a standard SQL database that you can get going in a few minutes. All the cluster stuff is behind-the-scenes and you don't have to know much to use it.

If you need more flexible capabilities and you don't mind getting low-level and technical, then Hadoop on Amazon EMR will offer you more capabilities.

Share:
12,712
Cenxui
Author by

Cenxui

An Android Mobile Developer for 3 years and Java Application over 3years. I use Amazon Web Service for my project. This year I got Amazon certification Architecture associate level and use AWS for my blog and my Android Application backend. I majored in math in university. I am interested in body building.

Updated on June 21, 2022

Comments

  • Cenxui
    Cenxui almost 2 years

    I see that AWS Elastic MapReduce and AWS Redshift both use a cluster structure and can be used for data analysis. What are the different use cases for them?

    Amazon Redshift supports client connections with many types of applications, including business intelligence (BI), reporting, data, and analytics tools.

    Amazon Elastic MapReduce (Amazon EMR) is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data.

  • Cenxui
    Cenxui almost 8 years
    Thank you for your perfect answer, it is really useful for me. It seems that Amazon EMR need more technologies to understand because of Hadoop.