Bigquery SQL - Is it better to unnest in SELECT or JOIN?
Solution 1
Working queries:
a)
SELECT visitId, ( SELECT COUNT( hitNumber ) FROM UNNEST( hits ) ) AS view_count
FROM `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910`
b)
SELECT visitId, COUNT( hitNumber ) AS view_count
FROM `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910`
LEFT JOIN UNNEST( hits )
GROUP BY visitId
The first query looks shorter and more concise, but let's also look at the explain tab:
It also looks better at execution time! Probably because the second query has a GROUP BY visitId
, that forces BigQuery to look if there are any other sessions with the same id.
But if you are looking for an even more concise option:
SELECT visitId, ARRAY_LENGTH(hits) AS view_count
FROM `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910`
Solution 2
It is not just about which way better?
- it is also about which way reflects your goal?
because results of those are different! And you can see this in Felipe's answer - first query returns 63 rows and second query returns 62 rows
So, the first query just returns as many rows as your sessions
table has along with count of entries in array filed.
Whereas the second query, in addition to above, groups all rows and aggregates respective counts
Of course, if your table has all visitId unique - this produces the same result
But because this extra grouping - I would expect second query to be more expensive
A.S.
Updated on June 14, 2022Comments
-
A.S. almost 2 years
I have a dataset where views are nested inside of sessions and I want a count of views for each session. What is the more efficient/proper way to structure such a query?
Is there any documentation that talks about the preferred way to write queries in BigQuery SQL?
SELECT session_key, ( SELECT COUNT( view_id ) FROM UNNEST( views ) views ) AS view_count FROM sessions WHERE _PARTITIONTIME >= TIMESTAMP( '2016-04-01' ) ; SELECT session_key, COUNT( view_id ) AS view_count FROM sessions LEFT JOIN UNNEST( views ) views WHERE _PARTITIONTIME >= TIMESTAMP( '2016-04-01' ) GROUP BY session_key;
Thank you
-
SriniV over 6 years
-
-
A.S. over 6 yearsIs there a shortcut like ARRAY_LENGTH(hits) if there is another level nested inside "views" called "clicks" and I want to get the click count on a session level?
-
Felipe Hoffa over 6 yearsNavigate in with dots? I'll be able to give you a specific answer if you give me a specific dataset.