Hadoop Hive Query: Multi-join
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries
Hive supports subqueries only in the FROM clause.
You can't use a subquery as a 'column' in Hive.
To work around this you'll want to use that subquery in a FROM clause and JOIN
to it. (the below won't work, but is the idea)
SELECT url,
COUNT(url) AS access_url,
t2.col1, t2.col2 ...
FROM aaa_hit
JOIN (SELECT events.event_id as evt,
COUNT(events.event_id) as access_evt
FROM aaa_event events
LEFT OUTER JOIN aaa_hit hits
ON ( events.hit_key = hit_key )
ORDER BY access_evt DESC LIMIT 1),
(SELECT sessions.remote_address as remote_address,
COUNT(sessions.remote_address) as access_addr
FROM aaa_session sessions
RIGHT OUTER JOIN aaa_hit hits
ON ( sessions.session_key = session_key )
ORDER BY access_addr DESC LIMIT 1) t2
ON (aaa_hit.THING = t2.THING)
Check out https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins for more information on using JOINs in Hive.
Comments
-
batman about 4 years
How can I do sub-selections in Hive? I think I might be making a really obvious mistake that's not so obvious to me...
Error I'm receiving:
FAILED: Parse Error: line 4:8 cannot recognize input 'SELECT' in expression specification
Here are my three source tables:
aaa_hit -> [SESSION_KEY, HIT_KEY, URL] aaa_event-> [SESSION_KEY,HIT_KEY,EVENT_ID] aaa_session->[SESSION_KEY,REMOTE_ADDRESS]
...and what I want to do is insert the result into a result table like this:
result -> [url, num_url, event_id, num_event_id, remote_address, num_remote_address]
...where column 1 is the URL, column 3 is the top 1 "event" per URL, and column 5 is the top 1 REMOTE_ADDRESS to visit that URL. (Even columns are "count"s of the previous column.)
Soooooo... what did I do wrong here?
INSERT OVERWRITE TABLE result2 SELECT url, COUNT(url) AS access_url, (SELECT events.event_id as evt, COUNT(events.event_id) as access_evt FROM aaa_event events LEFT OUTER JOIN aaa_hit hits ON ( events.hit_key = hit_key ) ORDER BY access_evt DESC LIMIT 1), (SELECT sessions.remote_address as remote_address, COUNT(sessions.remote_address) as access_addr FROM aaa_session sessions RIGHT OUTER JOIN aaa_hit hits ON ( sessions.session_key = session_key ) ORDER BY access_addr DESC LIMIT 1) FROM aaa_hit ORDER BY access_url DESC;
Thank you so much :)
-
batman almost 13 yearsSo do I have to make another table, then?
-
batman almost 13 yearsGood to know I can't do this, but how should I get around it?
-
QuinnG almost 13 years@Travis Powell: added details
-
WhatsThePoint about 6 yearsWhile this link may answer the question, link only answers are discouraged on Stack Overflow, you can improve this answer by taking vital parts of the link and putting it into your answer, this makes sure your answer is still an answer if the link gets changed or removed :)