Relationship between glue dpu and max concurrency

11,898

AWS provides two key documents which described mentioned problem:

https://docs.aws.amazon.com/glue/latest/dg/troubleshooting-service-limits.html

https://docs.aws.amazon.com/glue/latest/dg/add-job.html

Basing on these documents we have the following job parameters and service limits that refer to our topic:

Server limits:

  • "Number of concurrent job runs per account"
  • "Number of concurrent job runs per job"
  • "Maximum DPUs used by a role at one time"

Glue Job Parameters:

  • "Max concurrency"
  • "Concurrent DPUs per job run"

It is the rule refers to one glue job:

  • "Max concurrency" * "Concurrent DPUs per job run" <= "Maximum DPUs used by a role at one time"
  • "Max concurrency" <= "Number of concurrent job runs per job"
  • number of glue job runs <= "Max concurrency"

If you run multiple glue jobs at the same time, you must also meet the following rule:

  • number of glue job runs * "Concurrent DPUs per job run" <= "Maximum DPUs used by a role at one time"
  • number of glue job runs <= "Number of concurrent job runs per account"

Let's say that you use the default service limits and you don't run other jobs at the same time:

Number of concurrent job runs per job:3

Maximum DPUs used by a role at one time:100

It means that you can run up to three of the same glue jobs in parallel and these tasks cannot exceed the limit of 100 DPU's in total.

For example: You can run 3 instances of the glue job with DPU=30 and max concurrency=3, but when you run 3 instances of the glue job with DPU=50 and max concurrency=3 you will receive the following error:

"Exceeded maxiumum concurent compute capacity for you account"

I hope it will help

Share:
11,898
Admin
Author by

Admin

Updated on June 04, 2022

Comments

  • Admin
    Admin almost 2 years

    I have worked on Amazon EMR for more than 1 year but recently we have moved to aws glue for data processing.

    I am having difficulty in understanding the relationship between no of dpus and max concurrency we provide in a glue job.

    For example, I have created a job with 2 dpus with max concurrency as 2. And on top of it, imagine I have two threads launching this endpoint (job) at once.

    Let's say I am performing some aggregation on a 60GB file. I did find some posts but they didn't really help, like this and this

    How many job runs can I expect for this job on aws glue?

  • Admin
    Admin over 5 years
    Hi @gorski thanks for your answer, I will test it please give me a couple of days
  • Admin
    Admin over 5 years
    @gorski I tested with max concurrency 4 and dpu's 20 on same job, it ran fine and also created more than 4 job runs at once. I have 3 questions friend. 1. So, what do you mean when you say max concurrent job runs per job is 3? 2. If I run, 3 of the same glue jobs in parallel with 20 DPU. Does that mean, each job run consumes 20 DPUS and in total they all take 60 DPUS. 3. So, to conclude, max concurrency * no of dpu's should not exceed 100?
  • j.b.gorski
    j.b.gorski over 5 years
    @JumpMan It is strange that you were able to set "Max concurrency" on 4, because the default limit is 3. "Number of concurrent job runs per job" is the service limit, "Max concurrency" is the glue job parameter. When it comes to question 2 and 3, yes, you are right. I added also additional explanation to my answer.