SSIS: Merge vs Union All

10,738

Ok, so the answer is kind of obvious now in hindsight. It seems that the key difference is in the output.

  • Merge: Output is Sorted
  • Union All: Output is Unsorted

The idea is that if we have sorted input data then why not keep it sorted after our "Merge"/"Union All"? Who knows, we may need sorted data as an input in another transformation down the line, so just use "Merge" if possible so that the data stays sorted.

What's funny is that a whole lot of articles on the internet don't mention this simple difference. Hope this helps another newbie on their way to SSIS mastery!

Share:
10,738
Omaer
Author by

Omaer

I'm hard working and intelligent. I question everything, and am always keen to learn new things. I am passionate about productivity, and using technology to solve problems. I will not lie or cheat, or do anything I believe is wrong. My biggest strengths are: My passion for learning and self improvement. My ability to learn fast, and then apply myself in the real world. My desire to solve problems and improve processes. My technical aptitude. My biggest weaknesses are: Punctuality: I'm on time for meetings, but not on time to office. I work hard from early morning till evening - but I start working from home, and don't leave for office till I've finish what I'm working on - and that can make me late. I don't plan to fix this. Struggle to say "No": Working on it - I'll get used to it soon! Poor career planning: I haven't prioritized managing my career. Poor health: I need to work out more often. Poor networking: I need to meet people outside of work some more. I read/heard this quote somewhere, and it is what drives me: "If you find something boring, it's because you don't know enough about it." - Unknown

Updated on June 21, 2022

Comments

  • Omaer
    Omaer almost 2 years

    Disclaimer: I'm in the process of learning SSIS - so forgive me if this is an obvious answer, but I couldn't find anything on searching.

    It seems that the Merge transformation is very similar to the Union All transformation, with the following differences:

    1. Merge can only merge 2 sources of data, whereas Union All supports more (I'm not sure how many - my instructor says 256 but there's no sources to confirm that - and MSDN doesn't mention an upper limit).
    2. Merge requires inputs to be sorted whereas Union All does not.

    Apart from these differences, there doesn't seem to be much else that is different. They are both partially blocking, and from my limited tests, I couldn't see any noticeable difference in performance either.

    Everywhere I've searched, I've found that the recommendation is to use "Merge" whenever possible, and to use "Union All" if there are more than 2 data sources, or if the source data is unsorted.

    My question is, why? Why would I not just use Union All everywhere? What is the point of Merge?

    Once again, I'm sorry if this is a very basic/newbie question, but my instructor doesn't know the answer, and searching hasn't been very fruitful either.

    Thanks!

    Note: It seems (from some other SO answers that people are mixing up "Merge" with "Merge Join". These are two different things.