AWS Glue predicate push down condition has no effect
Pushdown predicate works for partitioning columns only. In other words, your data files should be placed in hierarchically structured folders. For example, if data is located in
s3://bucket/dataset/ and partitioned by year, month and day then the structure should be following:
In such case pushdown predicate would work for columns
datasource = glueContext.create_dynamic_frame_from_catalog( database = source_catalog_db, table_name = source_catalog_tbl, push_down_predicate = "year = 2017 and month > 6 and day between 3 and 10", transformation_ctx = "datasource")
Besides that you have to keep in mind that pushdown predicates work with s3 data sources only.
Here is a nice blog post written by AWS Glue devs about data partitioning.
This is great! I was able to use it to obtain the last 30 days of data using my "dt" partition column:
datasource0 = glueContext.create_dynamic_frame.from_catalog( database = "my_db", table_name = "my_table", push_down_predicate = "to_date(dt) >= date_sub(current_date, 30)", transformation_ctx = "datasource0" )
I'm using Glue 1.0 - Spark 2.4 - Python 2.
Related videos on Youtube
Anas Ismail 3 months
I have a MySQL source from which I am creating a Glue Dynamic Frame with predicate push down condition as follows
datasource = glueContext.create_dynamic_frame_from_catalog( database = source_catalog_db, table_name = source_catalog_tbl, push_down_predicate = "id > 1531812324", transformation_ctx = "datasource")
I am always getting all the records in 'datasource' whatever the condition I put in 'push_down_predicate'. What am I missing?
Anas Ismail over 4 yearsThanks @Yuriy, It completely makes sense. I am now using Filter operation of Glue for narrowing down my results. It is not efficient since it loads the complete table in memory and then apply filters. But, this is the only option we have with Glue right now I believe.
Sujai Sivasamy almost 4 yearsIs there any way that I could apply push_down_predicate on RDS data sources?
Yuriy Bondaruk almost 4 years@Sujai No, unfortunately it works with s3 source only
Patrick Bray over 1 yearIs this being pushed down into the SQL where? Trying to understand whether or not push down predicates are still not supported for JDBC
Patrick Bray over 1 yearIs this still only supported for S3? Or can this be used for RDS now?