Skewed tables in Hive

17,272

Solution 1

What are skewed tables in Hive?

A skewed table is a special type of table where the values that appear very often (heavy skew) are split out into separate files and rest of the values go to some other file..

How do we create skewed tables?

create table <T> (schema) skewed by (keys) on ('value1', 'value2') [STORED as DIRECTORIES];

Example :

create table T (c1 string, c2 string) skewed by (c1) on ('x1')

How does it affect performance?

By specifying the skewed values Hive will split those out into separate files automatically and take this fact into account during queries so that it can skip (or include) whole files if possible thus enhancing the performance.

EDIT :

x1 is actually the value on which column c1 is skewed. You can have multiple such values for multiple columns. For example,

create table T (c1 string, c2 string) skewed by (c1) on ('x1', 'x2', 'x3')

Advantage of having such a setup is that for the values that appear more frequently than other values get split out into separate files(or separate directories if we are using STORED AS DIRECTORIES clause). And this information is used by the execution engine during query execution to make processing more efficient.

Solution 2

In Skewed Tables, partition will be created for the column value which has many records and rest of the data will be moved to another partition. Hence number of partitions, number of mappers and number of intermediate files will be reduced. For ex: out of 100 patients, 90 patients have high BP and other 10 patients have fever, cold, cancer etc. So one partition will be created for 90 patients and one partition will be created for other 10 patients. I hope this will answer your question.

Share:
17,272
thiru_k
Author by

thiru_k

Software Engineer @ OpenText

Updated on July 16, 2022

Comments

  • thiru_k
    thiru_k almost 2 years

    I am learning hive and came across skewed tables. Help me understanding it.

    What are skewed tables in Hive?

    How do we create skewed tables?

    How does it effect performance?

  • dfrankow
    dfrankow about 8 years
    Is there a real concrete example, to better illustrate what is happening?
  • d34th4ck3r
    d34th4ck3r over 7 years
    What does x1 mean in your example?
  • Tariq
    Tariq almost 6 years
    @d34th4ck3r My apologies for such a late response. I somehow missed this in the stream of so many other things. I'm sure you would have figured it by now. But just in case you couldn't I have edited my answer. Please feel free to let me know if it's still unclear.
  • Tariq
    Tariq almost 6 years
    @dfrankow My apologies to you too. Please let me know if you still need a concrete example to better understand this. I'll be more than happy to do that!
  • Tariq
    Tariq almost 6 years
    @CodingOwl You should probably remove Coding from your username. There is always a dignified manner to put your point forward. I have always tried to provide as descriptive answers as possible, but sometimes due to time crunch I tend to copy things from the source, and I don't see any harm in it. And it would have been more useful if could you have actually provided some help instead of ranting on my technical abilities. Anyway, I don't think I need to justify myself in front of a person like you. Good luck with your soft skills though!
  • CodingOwl
    CodingOwl almost 6 years
    @Tariq my apologies brother. I was so frustrated on something which i wrote down here perhaps. sorry again. :) Have a good day.
  • luckyluke
    luckyluke about 4 years
    Will the partion be created automatically ?