AWS Glue pricing against AWS EMR

10,732

Solution 1

Yes, EMR does work out to be cheaper than Glue, and this is because Glue is meant to be serverless and fully managed by AWS, so the user doesn't have to worry about the infrastructure running behind the scenes, but EMR requires a whole lot of configuration to set up. So it's a trade off between user friendliness and cost, and for more technical users EMR can be the better option.

Solution 2

@user2889316 - Did you check my question wherein I had provided a comparison numbers?

Also please note Glue is roughly about 0.44 per hour / DPU for a job. I don't think you will have any AWS Glue JOB that is expected to running throughout the day? Are you talking about the Glue Dev end point or the Job?

A AWS Glue job requires a minimum of 2 DPUs to run, which means 0.88 per hour, which I think roughly about $21 per day? This is only for the GLUE job and there are additional charges such as S3, and any database / connection charges / crawler charges, etc.

Corresponding instance for EMR is m3.xlarge & its charges are (pricing at $0.266 & $0.070 respectively). This would be approximately less than $16 for 2 instance per day? plus other S3, database charges, etc. Am considering 2 EMR instances against the default DPUs for AWS Glue job.

Hope this would give you an idea.

Thanks

Solution 3

If you use Spot instance of EMR instead of On-Demand it will cost 1/3rd of on-Demand price and will turn out to be much cheaper. AWS Glue doesn't have that pricing benefits.

Solution 4

If your infrastructure doesn't need drastic scaling (and is mostly with fixed configuration), use EMR. But if it is needed, Glue is better choice as it is serverless. By just changing DPUs, your infrastructure is scaled. However in EMR, you have to decide on cluster type, number of nodes, auto-scaling rules. For each change, you will need to change cluster creation script, test it, deploy it - basically add overhead of standard release cycle for change. With change in infra config, you may want to change spark config to optimize jobs accordingly. So time to make new version release is higher with change in infra configuration. If you add high configuration to start, it will cost more. If you add low configuration to start, you need frequent changes in script.

Having said that, AWS Glue has fixed infra configuration for each DPU - e.g. 16GB memory per core. If your ETL demands more memory per core, you may have to shift to EMR. However, if your ETL is designed such a way that it will not exceed 11GB driver memory with 1 executor or 5.5GB with 2 executors (e.g. Take additional data volume in parallel on new core or divide volume in 5gb/11gb batch and run in for loop on same core), Glue is right choice.

If your ETL is complex and all jobs are going to keep cluster busy throughout day, I would recommend to go with EMR with dedicated devops team to manage EMR infra.

Share:
10,732
Yuva
Author by

Yuva

A tech-savy professional having more than 20 years of development/project management / team building activities. Currently exploring more into AWS Services & Big data components for real time streaming with batch processing of data. My hobbies include watching cricket, listen to instrumental music, watch western-classical fusion concerts.

Updated on June 06, 2022

Comments

  • Yuva
    Yuva almost 2 years

    I am doing some pricing comparison between AWS Glue against AWS EMR so as to chose between EMR & Glue.

    I have considered 6 DPUs (4 vCPUs + 16 GB Memory) with ETL Job running for 10 minutes for 30 days. Expected crawler requests is assumed to be 1 million above free tier and is calculated at $1 for the 1 million additional requests.

    On EMR I have considered m3.xlarge for both EC2 & EMR (pricing at $0.266 & $0.070 respectively) with 6 nodes, running for 10 minutes for 30 days.

    On calculating for a month, I see that AWS Glue works out to be around $14.64, whereas for EMR it works out to be around $10.08. I have not taken into account other additional expenses such as S3, RDS, Redshift, etc. & DEV Endpoint which is optional, since my objective is to compare ETL job price benefits

    Looks like EMR is cheaper when compared to AWS Glue. Is the EMR pricing correct, can someone please suggest if anything missing? I have tried the AWS price calculator for EMR, but confused, and not clear if normalized hours are billed into it.

    Regards

    Yuva

  • Yuva
    Yuva about 6 years
    Thanks, I got it.
  • Sandeep Fatangare
    Sandeep Fatangare over 4 years
    Spot instance is not recommended in production. You do not wish to have server down in mid of ETL. :P
  • Srihari Karanth
    Srihari Karanth over 4 years
    If EMR is being used only for 10 min every day (as asked by OP) then Spot instances are best suited for that. I am using Spot instances every day for more than 5 hours for last 2 months and never had it disconnected abruptly. And also one has to use Spot instances along with on-Demand so that if spot instances go down for some reason the job won't be killed and runs with reduced capacity.