Bigquery SQL - Is it better to unnest in SELECT or JOIN?

12,519

Solution 1

Working queries:

a)

SELECT visitId, ( SELECT COUNT( hitNumber ) FROM UNNEST( hits ) ) AS view_count 
FROM `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910`

b)

SELECT visitId, COUNT( hitNumber ) AS view_count 
FROM `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910`
LEFT JOIN UNNEST( hits )  
GROUP BY visitId

The first query looks shorter and more concise, but let's also look at the explain tab:

a) enter image description here

b) enter image description here

It also looks better at execution time! Probably because the second query has a GROUP BY visitId, that forces BigQuery to look if there are any other sessions with the same id.

But if you are looking for an even more concise option:

SELECT visitId, ARRAY_LENGTH(hits) AS view_count 
FROM `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910`

Solution 2

It is not just about which way better? - it is also about which way reflects your goal? because results of those are different! And you can see this in Felipe's answer - first query returns 63 rows and second query returns 62 rows

So, the first query just returns as many rows as your sessions table has along with count of entries in array filed.
Whereas the second query, in addition to above, groups all rows and aggregates respective counts

Of course, if your table has all visitId unique - this produces the same result
But because this extra grouping - I would expect second query to be more expensive

Share:
12,519
A.S.
Author by

A.S.

Updated on June 14, 2022

Comments

  • A.S.
    A.S. almost 2 years

    I have a dataset where views are nested inside of sessions and I want a count of views for each session. What is the more efficient/proper way to structure such a query?

    Is there any documentation that talks about the preferred way to write queries in BigQuery SQL?

    SELECT session_key, ( SELECT COUNT( view_id ) FROM UNNEST( views ) views ) AS view_count 
    FROM sessions 
    WHERE _PARTITIONTIME >= TIMESTAMP( '2016-04-01' ) ;
    
    SELECT session_key, COUNT( view_id ) AS view_count 
    FROM sessions 
      LEFT JOIN UNNEST( views ) views 
    WHERE _PARTITIONTIME >= TIMESTAMP( '2016-04-01' ) 
    GROUP BY session_key; 
    

    Thank you

  • A.S.
    A.S. over 6 years
    Is there a shortcut like ARRAY_LENGTH(hits) if there is another level nested inside "views" called "clicks" and I want to get the click count on a session level?
  • Felipe Hoffa
    Felipe Hoffa over 6 years
    Navigate in with dots? I'll be able to give you a specific answer if you give me a specific dataset.