Why does full outer join in HIVE gives weird result when one of the join fields is missing?
I could not simulate the result reported by @Candic3
I used the below statements along with the same "select" query as in the question.
CREATE TABLE IF NOT EXISTS sql_test_a (ID String, FIRST_NAME String, LAST_NAME String) COMMENT 'sql_test_a'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
CREATE TABLE IF NOT EXISTS sql_test_b (NUM String, FIRST_NAME String, LAST_NAME String) COMMENT 'sql_test_b'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
INSERT INTO sql_test_a VALUES ('1', 'John', 'Snow');
INSERT INTO sql_test_a VALUES ('2', 'Mike', 'Tyson');
INSERT INTO sql_test_b VALUES ('20', 'Mike', 'Tyson');
SELECT A.FIRST_NAME, A.LAST_NAME, A.ID, B.NUM
FROM
SQL_TEST_A A
FULL OUTER JOIN
SQL_TEST_B B
ON
A.FIRST_NAME = B.FIRST_NAME
AND
A.LAST_NAME = B.LAST_NAME;
Please find the result attached.
However, select query would return NULL due to unnoticed minor mistakes like data-type mismatch between the DDL and the actual data (say, from flat files) or mismatch among the delimiter mentioned in the DDL and the ones in the actual data.
Comments
-
makansij almost 2 years
I'm comparing the behavior between SQL engines. Oracle has the behavior I would expect from a SQL engine for full outer joins:
Oracle
CREATE TABLE sql_test_a ( ID VARCHAR2(4000 BYTE), FIRST_NAME VARCHAR2(200 BYTE), LAST_NAME VARCHAR2(200 BYTE) ); CREATE TABLE sql_test_b ( NUM VARCHAR2(4000 BYTE), FIRST_NAME VARCHAR2(200 BYTE), LAST_NAME VARCHAR2(200 BYTE) ); INSERT INTO sql_test_a (ID, FIRST_NAME, LAST_NAME) VALUES ('1', 'John', 'Snow'); INSERT INTO sql_test_a (ID, FIRST_NAME, LAST_NAME) VALUES ('2', 'Mike', 'Tyson'); INSERT INTO sql_test_b (NUM, FIRST_NAME, LAST_NAME) VALUES ('20', 'Mike', 'Tyson');
When I execute the following, it gives me the expected result. The resulting table contains two rows, with one of the rows containing
NULL
for theNUM
field, because there is no john snow in the tablesql_test_b
.SELECT A.FIRST_NAME, A.LAST_NAME, A.ID, B.NUM FROM SQL_TEST_A A FULL OUTER JOIN SQL_TEST_B B ON A.FIRST_NAME = B.FIRST_NAME AND A.LAST_NAME = B.LAST_NAME;
You can test the sql script here: http://sqltest.net/
HIVE
In HIVE, however, if you were to try the same thing, the full outer join results in a table with two rows. The row that should be the "John Snow" row contains
NULL
for the fields FIRST_NAME, LAST_NAME, and NUM. The1
is filled in forID
, but that's it.Why such weird behavior in HIVE? Is this a bug? Or am I missing something...because Oracle 11g seems to handle this much better. Thanks.