Is there any way to trigger a AWS Lambda function at the end of an AWS Glue job?

14,147

Solution 1

No. Currently you can't trigger a lambda function at the end of a Glue job. The reason for this is that this trigger has not yet been provided by AWS in Lambda. If you look at the list of AWS lambda triggers after you create a lambda function, you will see that it has most of AWS services as trigger but not AWS Glue. So, for now, it is not possible but maybe in future.

But I would like to mention that you can actually control the flow of glue scripts using your lambda python script. (I did it using python, I am sure there may be other languages supporting this). My use case was that whenever I upload any object in S3 bucket, it gets lambda function trigger from which I was reading the object file and starting my glue job. And once the status of Glue job was complete, I would write my file back to S3 bucket linked to this Lambda function.

Solution 2

@oreoluwa is right, this can be done using Cloudwatch Events.

From the Cloudwatch dashboard:

  • Click on 'Rules' from the left menu
  • For 'Event Source', choose 'Event Pattern' and in 'Service Name' choose 'Glue'
  • For 'Event Type' choose 'Glue Job State Change'
  • On the right side of the page, in the 'Targets' section, click 'Add Target' -> 'Lambda Function' and then choose your function.

The event you'll get in Lambda will be of the format:

{
    'version': '0',
    'id': 'a9bc90be-xx00-03e0-9bc5-a0a0a0a0a0a0',
    'detail-type': 'GlueJobStateChange',
    'source': 'aws.glue',
    'account': 'xxxxxxxxxx',
    'time': '2018-05-10T16: 17: 03Z',
    'region': 'us-east-2',
    'resources': [],
    'detail': {
        'jobName': 'xxxx_myjobname_yyyy',
        'severity': 'INFO',
        'state': 'SUCCEEDED',
        'jobRunId': 'jr_565465465446788dfdsdf546545454654546546465454654',
        'message': 'Jobrunsucceeded'
    }
}

Solution 3

Since AWS Glue has started supporting python, you can probably follow the below path to achieve what you desire. Below sample script shows how to do that -

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
import boto3   ## Step-2

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

## Do all ETL stuff here

## Once the ETL completes
lambda_client = boto3.client('lambda')  ## Step-3
response = lambda_client.invoke(FunctionName='string')  ## Step-4
  1. Create a python based Glue Job (to perform ETL on Redshift)
  2. In the job script, import boto3 (need to place this package as script library).
  3. Make a connection to lambda using boto3
  4. Invoke lambda function using the boto3 lambda invoke() once the ETL completes.

Please make sure that the role that you are using while creating the Glue job has permissions to invoke lambda functions.

Refer to the Boto3 documentation for lambda here.

Solution 4

@ace and @adeel, have part of the solution, but you could get this resolved by creating the CloudWatch Rule with the following event pattern:

{
  "source": [
    "aws.glue"
  ],
  "detail-type": [
    "Glue Job State Change"
  ],
  "detail": {
    "jobName": [
      "<YourJobName>"
    ],
    "state": [
      "SUCCEEDED"
    ]
  }
}
Share:
14,147
dd.
Author by

dd.

Updated on June 21, 2022

Comments

  • dd.
    dd. almost 2 years

    Currently I'm using an AWS Glue job to load data into RedShift, but after that load I need to run some data cleansing tasks probably using an AWS Lambda function. Is there any way to trigger a Lambda function at the end of a Glue job? Lambda functions can be triggered using SNS messages, but I couldn't find a way to send an SNS at the end of the Glue job.

  • dd.
    dd. about 6 years
    I was able to orchestrate the tasks from a python function. The cleanest way to do that, I guess, is to create a function that triggers the Glue Job, waits for the job to end, and the triggers the data cleansing tasks. But the problem is the execution time limit of Lambda (<=300s). My Glue jobs run for much longer than that. I know I can do it in a different way, for example with a lambda function that checks every n minutes if there's a new successful run for the Glue Job. But I don't like the idea, it seams very hard to monitor. Isn't a better way to orchestrate ETL tasks?
  • CodeHunter
    CodeHunter about 6 years
    @dd.You can split up your lambda function into multiple function and then trigger one after the other gets completed. Now, you can't directly invoke lambda function after a lambda function but you can trigger it through other components like S3. If first lambda gets completed, do some update inside an S3 object and then trigger second lambda and so on. I can think of this as the preliminary way but if I find a better way, I would let you know.
  • Yuva
    Yuva over 5 years
    Hi CodeHunter, Can you please provide some sample lambda code to call a Glue job? When any object is uploaded to S3 bucket/folder, I have a lambda function listening to the S3 location, the lambda function should trigger to start my glue job. I searched for some references, but couldnt find one
  • CodeHunter
    CodeHunter over 5 years
    @Yuva: So I am triggering Glue job as soon as a file is uploaded in S3 by letting my upload service push a message inside a Kafka queue or maybe SNS event. Then use that Kafka message and try to listen it using Kafka Consumer and using that message, you can spawn Glue job as soon as you read a message from Kafka. I found this way to be better than using lambda triggers as firstly, you cannot trigger glue job based on upload in S3 and secondly, it makes it cloud agnostic using Kafka queue.
  • Joe
    Joe about 5 years
    Hi @CodeHunter, I am trying to do what you did, when a file arrives to my S3 bucket start an ETL, but haven't succeed yet, I am a little bit confused with Lambda, can you tell me how you manage to do it? o where is a tutorial or something like that. Yestarday I even put the question here [stackoverflow.com/questions/55367322/…