SQL - Unequal left join BigQuery

16,160

Below is for BigQuery Standard SQL and mostly reproduce logic in your query with exception of not including days where no activity at all is found

#standardSQL
SELECT
    daily_use.user_id
  , wd.date AS DATE
  , MIN(DATE_DIFF(wd.date, daily_use.activity_date, DAY)) AS days_since_last_action
FROM dates AS wd
CROSS JOIN daily_use
WHERE wd.date BETWEEN 
  daily_use.activity_date AND DATE_ADD(daily_use.activity_date, INTERVAL 30 DAY)
GROUP BY 1,2
-- ORDER BY 1,2

if for whatever reason you still need to exactly reproduce your logic - you can embrace above with final left join as below:

#standardSQL
SELECT *
FROM dates AS wd
LEFT JOIN (
  SELECT
    daily_use.user_id
    , wd.date AS date
    , MIN(DATE_DIFF(wd.date, daily_use.activity_date, DAY)) AS days_since_last_action
  FROM dates AS wd
  CROSS JOIN daily_use
  WHERE wd.date BETWEEN 
    daily_use.activity_date AND DATE_ADD(daily_use.activity_date, INTERVAL 30 DAY)
  GROUP BY 1,2
) AS daily_use
USING (date)
-- ORDER BY 1,2
Share:
16,160

Related videos on Youtube

SpasticCamel
Author by

SpasticCamel

Updated on June 04, 2022

Comments

  • SpasticCamel
    SpasticCamel almost 2 years

    New here. I am trying to get the Daily and Weekly active users over time. they have 30 days before they are considered inactive. My goal is to create graph's that can be split by user_id to show cohorts, regions, categories, etc.

    I have created a date table to get every day for the time period and I have the simplified orders table with the base info that I need to calculate this.

    I am trying to do a Left Join to get the status by date using the following SQL Query:

    WITH daily_use AS (
            SELECT
              __key__.id AS user_id
              , DATE_TRUNC(date(placeOrderDate), day) AS activity_date
            FROM `analysis.Order`
            where isBuyingGroupOrder = TRUE 
              AND testOrder = FALSE
            GROUP BY 1, 2
     ),
    dates AS (
            SELECT DATE_ADD(DATE "2016-01-01", INTERVAL d.d DAY) AS date
            FROM
              (
               SELECT ROW_NUMBER() OVER(ORDER BY __key__.id) -1 AS d
               FROM `analysis.Order`
               ORDER BY __key__.id
               LIMIT 1096
              ) AS  d
            ORDER BY 1 DESC
          )
    
    SELECT
          daily_use.user_id
        , wd.date AS date
        , MIN(DATE_DIFF(wd.date, daily_use.activity_date, DAY)) AS days_since_last_action
    FROM dates AS wd
    
    LEFT JOIN daily_use
        ON wd.date >= daily_use.activity_date
        AND wd.date < DATE_ADD(daily_use.activity_date, INTERVAL 30 DAY)
    
    GROUP BY 1,2
    

    I am getting this Error: LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join. In BigQuery and was wondering how can I go around this. I am using Standard SQL within BigQuery.

    Thank you

    • phroureo
      phroureo over 6 years
    • Elliott Brossard
      Elliott Brossard over 6 years
      BigQuery will try to get you to use an equality condition since this isn't scalable for large tables (there is no common key to use when shuffling data around). What happens if you use CROSS JOIN with a WHERE clause?
  • SpasticCamel
    SpasticCamel over 6 years
    Hey Mikhail Berlyant, This answers my question perfectly. Your first suggestion works perfectly. I don't need to fill in the dates with no data. The dates within the range and the days between orders is what we are looking for. Thanks a bunch.
  • Endrju
    Endrju over 3 years
    That's a wonderful and creative solution, however in my case I have to check a billion of rows agains 300 thousand in the other table for geographical proximity. BigQuery is just unable to do so using this "CROSS JOIN" workaround. Are there any other ideas?... Thanks in advance.
  • Mikhail Berlyant
    Mikhail Berlyant over 3 years
    please post your question as a new question and provide all needed details and we will be glad to answer :o)