how to do subqueries in bigquery?
You might want to do something like below (just guess):
SELECT
p.subreddit,
p.body AS first_body,
p.score AS first_score,
CONCAT('t1_',p.id) AS first_id ,
c.last_body,
c.last_score,
c.last_id
FROM
[fh-bigquery:reddit_comments.2016_01] p
JOIN (
SELECT
body AS last_body,
score AS last_score,
CONCAT('t1_',id) AS last_id,
parent_id,
author,
body
FROM [fh-bigquery:reddit_comments.2016_01]
WHERE body != '[deleted]'
AND author != '[deleted]'
AND score > 1
) c
ON p.link_id = c.parent_id
WHERE p.score > 1
AND p.author != '[deleted]'
AND p.body != '[deleted]'
LIMIT 100
See more about JOINs
Please note, I just transformed your query to proper use JOINs, but logic of query is still for you to polish as you see needed
Added to address additional info in your comment:
SELECT
subreddit,
first_body,
first_score,
first_id ,
last_body,
last_score,
last_id
FROM (
SELECT
subreddit,
body AS first_body,
score AS first_score,
CONCAT('t1_',id) AS first_id
FROM [fh-bigquery:reddit_comments.2016_01]
WHERE score > 1
AND author != '[deleted]'
AND body != '[deleted]'
) p
JOIN (
SELECT
body AS last_body,
score AS last_score,
CONCAT('t1_',id) AS last_id,
parent_id,
author,
body
FROM [fh-bigquery:reddit_comments.2016_01]
WHERE body != '[deleted]'
AND author != '[deleted]'
AND score > 1
) c
ON p.first_id = c.parent_id
LIMIT 100
jas
Updated on June 25, 2022Comments
-
jas almost 2 years
Im trying to play with the reddit data on bigquery and I want to see comments and replies in one row. I see bigquery supports subqueries, but I am unable to construct the query. I have to use a subquery to self join the same table because of the structure of the data, specifically i want to join id and parent_id together, but I need to modify id before I can join. Here is how im trying to do the query:
SELECT p.subreddit, p.body AS first_body, p.score AS first_score, CONCAT('t1_',p.id) AS first_id , c.last_body, c.last_score, c.last_id FROM [fh-bigquery:reddit_comments.2016_01] p, ( SELECT body AS last_body, score AS last_score, CONCAT('t1_',id) AS last_id, parent_id, author, body FROM [fh-bigquery:reddit_comments.2016_01] WHERE body != '[deleted]' AND author != '[deleted]' AND score > 1 ) c WHERE p.first_id = c.parent_id AND p.score > 1 AND p.author != '[deleted]' AND p.body != '[deleted]';
The error I get is:
Field 'c.parent_id' not found in table 'fh-bigquery:reddit_comments.2016_01'; did you mean 'parent_id'?
Here is where you can run the query: https://bigquery.cloud.google.com/table/fh-bigquery:reddit_comments.2016_01
Im not sure how to fix this. what is the proper way to join this and get this query to run?
-
jas about 8 yearsMikhail, I couldn't use this style specifically because of the join clause. I need to join: on concat('t1_',p.id) = c.parent_id . the id is missing the "t1_" string in front of it. bigquery does not allow joins for fields that dont exist in the table. So I need to modify the query to use subselect I believe. here is the error I get when I try to use concat: Query Failed Error: ON clause must be AND of = comparisons of one field name from each table, with all field names prefixed with table name. .
-
Mikhail Berlyant about 8 yearsAdded to my original answer to reflect your spec for ON clause