Resources Exceeded during query execution

16,224

JOIN EACH can fail if your join keys (in this case, user_key) are unevenly distributed. For example, if you have one user_key that appears abnormally often, you'll get a "resources exceeded" error from the node that handles that key. Alternatively, you could try running the query over a smaller set of user keys by filtering out some portion of the user keys before the join.

GROUP EACH BY can fail if you have too many distinct group keys. You could try whittling down the join output by adding a few more WHERE clauses in order to see if this is the case.

Basically, I'd recommend whittling down the inputs to either the JOIN EACH or the GROUP EACH BY until you get the query to work, and then you'll have a better sense for the limits you're running up against. Once you know that, you can (hopefully) structure your queries to get the most out of the available resources.

(BTW, we expect to tune these operations in the near future to remove some of the limits you may be hitting!)

Share:
16,224

Related videos on Youtube

user2388120
Author by

user2388120

Updated on September 28, 2022

Comments

  • user2388120
    user2388120 over 1 year

    I'm trying to run a query joining 2 large sets of data and I'm hitting the resources exceeded during query execution error. I've read that there are work around when using Join Each and Group Each but not what those workaround would be.

    SELECT 
      year(users.firstseen) as first_year,
      month(users.firstseen) as first_month, 
      DATEDIFF(orders.timestamp,users.firstseen) as days_elapsed,
      count(orders.user_key) as count_orders
    FROM 
      [project.orders] as orders
    JOIN EACH
      [project.users] AS users
    ON
      orders.user_key = users.user_key
    WHERE orders.store = 'ios'
    GROUP EACH BY 1,2,3
    

    Edit: the following worked:

    SELECT
      year(users.firstseen) as firstyear,
      month(users.firstseen) as firstmonth,
      DATEDIFF(orders.timestamp, users.firstseen) as days_elapsed,
      COUNT(users.firstseen) AS count_orders FROM [project.orders] as orders
    JOIN EACH( SELECT user_key, firstseen FROM [project.users]
    WHERE store_key = 'ios') as users ON orders.user_key = users.user_key
    GROUP BY firstyear, firstmonth, days_elapsed
    ORDER BY firstyear, firstmonth, days_elapsed
    
    • user2388120
      user2388120 almost 11 years
      this ended up working SELECT year(users.firstseen) as firstyear, month(users.firstseen) as firstmonth, DATEDIFF(orders.timestamp, users.firstseen) as days_elapsed, COUNT(users.firstseen) AS count_orders FROM [project.orders] as orders JOIN EACH( SELECT user_key, firstseen FROM [project.users] WHERE store_key = 'ios') as users ON orders.user_key = users.user_key GROUP BY firstyear, firstmonth, days_elapsed ORDER BY firstyear, firstmonth, days_elapsed
  • Mo'in Creemers
    Mo'in Creemers about 10 years
    It looks like having a JOIN EACH that yields 0 matches also triggers this error (tested on a join 2 million left, 180 million right).
  • Jeremy Condit
    Jeremy Condit about 10 years
    A JOIN EACH with zero matches is unlikely to be the culprit on its own. More likely, there is some other clause in your query either before or after the JOIN EACH that is exceeding the resources available for a given worker node. If you send a job ID to [email protected], I can take a closer look.
  • Nils Ziehn
    Nils Ziehn over 8 years
    @JeremyCondit Do you have an update on the constraints?
  • Jeremy Condit
    Jeremy Condit over 8 years
    In progress! GROUP BY is already updated, and will likely do a better job of dynamically partitioning your data than GROUP EACH BY. We're still working out a few kinks with JOIN, but it'll be updated soon as well.
  • Charles Chow
    Charles Chow over 7 years
    Hi, my order by query also failed SELECT * FROM [kaggle_bank_raw.cleaned_train] order by ncodpers could you take a look? Thanks!