Resources Exceeded during query execution
JOIN EACH can fail if your join keys (in this case, user_key) are unevenly distributed. For example, if you have one user_key that appears abnormally often, you'll get a "resources exceeded" error from the node that handles that key. Alternatively, you could try running the query over a smaller set of user keys by filtering out some portion of the user keys before the join.
GROUP EACH BY can fail if you have too many distinct group keys. You could try whittling down the join output by adding a few more WHERE clauses in order to see if this is the case.
Basically, I'd recommend whittling down the inputs to either the JOIN EACH or the GROUP EACH BY until you get the query to work, and then you'll have a better sense for the limits you're running up against. Once you know that, you can (hopefully) structure your queries to get the most out of the available resources.
(BTW, we expect to tune these operations in the near future to remove some of the limits you may be hitting!)
Related videos on Youtube
user2388120
Updated on September 28, 2022Comments
-
user2388120 over 1 year
I'm trying to run a query joining 2 large sets of data and I'm hitting the resources exceeded during query execution error. I've read that there are work around when using Join Each and Group Each but not what those workaround would be.
SELECT year(users.firstseen) as first_year, month(users.firstseen) as first_month, DATEDIFF(orders.timestamp,users.firstseen) as days_elapsed, count(orders.user_key) as count_orders FROM [project.orders] as orders JOIN EACH [project.users] AS users ON orders.user_key = users.user_key WHERE orders.store = 'ios' GROUP EACH BY 1,2,3
Edit: the following worked:
SELECT year(users.firstseen) as firstyear, month(users.firstseen) as firstmonth, DATEDIFF(orders.timestamp, users.firstseen) as days_elapsed, COUNT(users.firstseen) AS count_orders FROM [project.orders] as orders JOIN EACH( SELECT user_key, firstseen FROM [project.users] WHERE store_key = 'ios') as users ON orders.user_key = users.user_key GROUP BY firstyear, firstmonth, days_elapsed ORDER BY firstyear, firstmonth, days_elapsed
-
user2388120 almost 11 yearsthis ended up working SELECT year(users.firstseen) as firstyear, month(users.firstseen) as firstmonth, DATEDIFF(orders.timestamp, users.firstseen) as days_elapsed, COUNT(users.firstseen) AS count_orders FROM [project.orders] as orders JOIN EACH( SELECT user_key, firstseen FROM [project.users] WHERE store_key = 'ios') as users ON orders.user_key = users.user_key GROUP BY firstyear, firstmonth, days_elapsed ORDER BY firstyear, firstmonth, days_elapsed
-
-
Mo'in Creemers about 10 yearsIt looks like having a JOIN EACH that yields 0 matches also triggers this error (tested on a join 2 million left, 180 million right).
-
Jeremy Condit about 10 yearsA JOIN EACH with zero matches is unlikely to be the culprit on its own. More likely, there is some other clause in your query either before or after the JOIN EACH that is exceeding the resources available for a given worker node. If you send a job ID to [email protected], I can take a closer look.
-
Nils Ziehn over 8 years@JeremyCondit Do you have an update on the constraints?
-
Jeremy Condit over 8 yearsIn progress! GROUP BY is already updated, and will likely do a better job of dynamically partitioning your data than GROUP EACH BY. We're still working out a few kinks with JOIN, but it'll be updated soon as well.
-
Charles Chow over 7 yearsHi, my order by query also failed
SELECT * FROM [kaggle_bank_raw.cleaned_train] order by ncodpers
could you take a look? Thanks!