SQL - Unequal left join BigQuery
Below is for BigQuery Standard SQL and mostly reproduce logic in your query with exception of not including days where no activity at all is found
#standardSQL
SELECT
daily_use.user_id
, wd.date AS DATE
, MIN(DATE_DIFF(wd.date, daily_use.activity_date, DAY)) AS days_since_last_action
FROM dates AS wd
CROSS JOIN daily_use
WHERE wd.date BETWEEN
daily_use.activity_date AND DATE_ADD(daily_use.activity_date, INTERVAL 30 DAY)
GROUP BY 1,2
-- ORDER BY 1,2
if for whatever reason you still need to exactly
reproduce your logic - you can embrace above with final left join as below:
#standardSQL
SELECT *
FROM dates AS wd
LEFT JOIN (
SELECT
daily_use.user_id
, wd.date AS date
, MIN(DATE_DIFF(wd.date, daily_use.activity_date, DAY)) AS days_since_last_action
FROM dates AS wd
CROSS JOIN daily_use
WHERE wd.date BETWEEN
daily_use.activity_date AND DATE_ADD(daily_use.activity_date, INTERVAL 30 DAY)
GROUP BY 1,2
) AS daily_use
USING (date)
-- ORDER BY 1,2
Related videos on Youtube
SpasticCamel
Updated on June 04, 2022Comments
-
SpasticCamel almost 2 years
New here. I am trying to get the Daily and Weekly active users over time. they have 30 days before they are considered inactive. My goal is to create graph's that can be split by user_id to show cohorts, regions, categories, etc.
I have created a date table to get every day for the time period and I have the simplified orders table with the base info that I need to calculate this.
I am trying to do a Left Join to get the status by date using the following SQL Query:
WITH daily_use AS ( SELECT __key__.id AS user_id , DATE_TRUNC(date(placeOrderDate), day) AS activity_date FROM `analysis.Order` where isBuyingGroupOrder = TRUE AND testOrder = FALSE GROUP BY 1, 2 ), dates AS ( SELECT DATE_ADD(DATE "2016-01-01", INTERVAL d.d DAY) AS date FROM ( SELECT ROW_NUMBER() OVER(ORDER BY __key__.id) -1 AS d FROM `analysis.Order` ORDER BY __key__.id LIMIT 1096 ) AS d ORDER BY 1 DESC ) SELECT daily_use.user_id , wd.date AS date , MIN(DATE_DIFF(wd.date, daily_use.activity_date, DAY)) AS days_since_last_action FROM dates AS wd LEFT JOIN daily_use ON wd.date >= daily_use.activity_date AND wd.date < DATE_ADD(daily_use.activity_date, INTERVAL 30 DAY) GROUP BY 1,2
I am getting this Error: LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join. In BigQuery and was wondering how can I go around this. I am using Standard SQL within BigQuery.
Thank you
-
phroureo over 6 yearsPossible duplicate of: stackoverflow.com/questions/43858433/…
-
Elliott Brossard over 6 yearsBigQuery will try to get you to use an equality condition since this isn't scalable for large tables (there is no common key to use when shuffling data around). What happens if you use CROSS JOIN with a WHERE clause?
-
-
SpasticCamel over 6 yearsHey Mikhail Berlyant, This answers my question perfectly. Your first suggestion works perfectly. I don't need to fill in the dates with no data. The dates within the range and the days between orders is what we are looking for. Thanks a bunch.
-
Endrju over 3 yearsThat's a wonderful and creative solution, however in my case I have to check a billion of rows agains 300 thousand in the other table for geographical proximity. BigQuery is just unable to do so using this "CROSS JOIN" workaround. Are there any other ideas?... Thanks in advance.
-
Mikhail Berlyant over 3 yearsplease post your question as a new question and provide all needed details and we will be glad to answer :o)