Relationship between glue dpu and max concurrency
AWS provides two key documents which described mentioned problem:
https://docs.aws.amazon.com/glue/latest/dg/troubleshooting-service-limits.html
https://docs.aws.amazon.com/glue/latest/dg/add-job.html
Basing on these documents we have the following job parameters and service limits that refer to our topic:
Server limits:
- "Number of concurrent job runs per account"
- "Number of concurrent job runs per job"
- "Maximum DPUs used by a role at one time"
Glue Job Parameters:
- "Max concurrency"
- "Concurrent DPUs per job run"
It is the rule refers to one glue job:
- "Max concurrency" * "Concurrent DPUs per job run" <= "Maximum DPUs used by a role at one time"
- "Max concurrency" <= "Number of concurrent job runs per job"
- number of glue job runs <= "Max concurrency"
If you run multiple glue jobs at the same time, you must also meet the following rule:
- number of glue job runs * "Concurrent DPUs per job run" <= "Maximum DPUs used by a role at one time"
- number of glue job runs <= "Number of concurrent job runs per account"
Let's say that you use the default service limits and you don't run other jobs at the same time:
Number of concurrent job runs per job:3
Maximum DPUs used by a role at one time:100
It means that you can run up to three of the same glue jobs in parallel and these tasks cannot exceed the limit of 100 DPU's in total.
For example: You can run 3 instances of the glue job with DPU=30 and max concurrency=3, but when you run 3 instances of the glue job with DPU=50 and max concurrency=3 you will receive the following error:
"Exceeded maxiumum concurent compute capacity for you account"
I hope it will help
Admin
Updated on June 04, 2022Comments
-
Admin almost 2 years
I have worked on Amazon EMR for more than 1 year but recently we have moved to
aws glue
for data processing.I am having difficulty in understanding the relationship between
no of dpus
andmax concurrency
we provide in a glue job.For example, I have created a job with
2 dpus
withmax concurrency as 2
. And on top of it, imagine I havetwo threads
launching this endpoint (job) at once.Let's say I am performing some aggregation on a
60GB file
. I did find some posts but they didn't really help, like this and thisHow many job runs can I expect for this job on aws glue?
-
Admin over 5 yearsHi @gorski thanks for your answer, I will test it please give me a couple of days
-
Admin over 5 years@gorski I tested with max concurrency 4 and dpu's 20 on same job, it ran fine and also created more than 4 job runs at once. I have 3 questions friend. 1. So, what do you mean when you say max concurrent job runs per job is
3
? 2. If I run,3
of the same glue jobs in parallel with20 DPU
. Does that mean, each job run consumes20 DPUS
and in total they all take60 DPUS
. 3. So, to conclude,max concurrency * no of dpu's
should not exceed100
? -
j.b.gorski over 5 years@JumpMan It is strange that you were able to set "Max concurrency" on 4, because the default limit is 3. "Number of concurrent job runs per job" is the service limit, "Max concurrency" is the glue job parameter. When it comes to question 2 and 3, yes, you are right. I added also additional explanation to my answer.