Haskell Merge Sort

16,097

Solution 1

Halving a list is not an O(1) operation but O(n), so the given solutions introduce additional costs compared to the imperative version of merge sort. One way to avoid halving is to simply start merging directly by making singletons and then merging every two consecutive lists:

sort :: (Ord a) => [a] -> [a]
sort = mergeAll . map (:[]) 
  where
    mergeAll [] = []
    mergeAll [t] = t
    mergeAll xs  = mergeAll (mergePairs xs)

    mergePairs (x:y:xs) = merge x y:mergePairs xs
    mergePairs xs = xs

where merge is already given by others.

Solution 2

Haskell is an indentation sensitive programming language, you simply need to fix that (btw. if you are using tabs change that to using spaces).

mergeSort :: ([a] -> [a] -> [a]) -> [a] -> [a]
mergeSort merge xs
        | length xs < 2 = xs
        | otherwise = merge (mergeSort merge first) (mergeSort merge second)
        where first = take half xs 
              second = drop half xs 
              half = length xs `div` 2

Solution 3

Another msort implementation in Haskell;

merge :: Ord a => [a] -> [a] -> [a]
merge [] ys         = ys
merge xs []         = xs
merge (x:xs) (y:ys) | x < y     = x:merge xs (y:ys)
                    | otherwise = y:merge (x:xs) ys

halve :: [a] -> ([a],[a])
halve xs = (take lhx xs, drop lhx xs)
           where lhx = length xs `div` 2

msort :: Ord a => [a] -> [a]
msort []  = []
msort [x] = [x]
msort  xs = merge (msort left) (msort right)
            where (left,right) = halve xs

Solution 4

None of these solutions is as smart as Haskell's own solution, which runs on the idea that in the worst case scenario's these proposed algorithms is still run Theta (n log n) even if the list to be sorted is already trivially sorted.

Haskell's solution is to merge lists of strictly decreasing (and increasing values). The simplified code looks like:

mergesort :: Ord a => [a] -> [a] 
mergesort xs = unwrap (until single (pairWith merge) (runs xs)) 

runs :: Ord a => [a] -> [[a]]
runs = foldr op [] 
 where op x [] = [[x]]
       op x ((y:xs):xss) | x <= y    = (x:y:xs):xss
                         | otherwise = [x]:(y:xs):xss`

This will run Theta(n)

Haskell's version is smarter still because it will do an up run and a down run.

As usual I am in awe with the cleverness of Haskell!

Share:
16,097
emuterisa
Author by

emuterisa

Updated on June 04, 2022

Comments

  • emuterisa
    emuterisa almost 2 years

    This is an implementation of Mergesort using higher order functions,guards,where and recursion.

    However getting an error from compiler 6:26: parse error on input ‘=’

     mergeSort :: ([a] -> [a] -> [a]) -> [a] -> [a]
        mergeSort merge xs
            | length xs < 2 = xs
            | otherwise = merge (mergeSort merge first) (mergeSort merge second)
            where first = take half xs 
                  second = drop half xs 
                   half = (length xs) `div` 2
    

    I can't see whats wrong? or rather I don't understand the compiler.

  • emuterisa
    emuterisa almost 8 years
    that seems to have fixed that error but now when I run the function mergeSort, it says merge not in scope. Does that mean, the code is incomplete and I need to further define what merge actually does?
  • Fabian Schneider
    Fabian Schneider about 5 years
    Shouldn't merge be a tail recursive function for efficiency reasons? I haven't benchmarked it but when just running it in repl.it with a list of 30000 values, tail recursion is a bit faster: repl.it/@faebl/Sorting
  • Cuban coffee
    Cuban coffee almost 5 years
    the msort is missing the equation "msort [] = []". Otherwise the thing will never stop if the array to sort is empty.
  • Redu
    Redu almost 5 years
    @Cuban coffee Thanks for the heads up... corrected.
  • dfeuer
    dfeuer almost 5 years
    @FabianSchneider, no, it should not. Haskell lists are lazy in both their elements and (what matters here) their tails. So constructing a Haskell list the way the above code does is efficient: the list constructor is allocated first, and only later is the tail evaluated.
  • Fabian Schneider
    Fabian Schneider almost 5 years
    @dfeuer thanks for the info; I thought about it and read a a bit more about laziness and the list constructor and it makes kind of sense. Also "benchmarking" it on repl.it isn't really a reliable thing...
  • Fabian Schneider
    Fabian Schneider almost 5 years
    @emuterisa this should be an accepted answer if it solved your problem; even if this is from three years ago...
  • dfeuer
    dfeuer almost 5 years
    A small but nice optimization: instead of breaking the list up into singletons, break it up into sorted lists of two elements. That saves O(n) allocation without much extra code complexity.
  • dfeuer
    dfeuer almost 5 years
    It's likely more efficient to halve a list using a two-pointer algorithm than to take the length and divide by two. And it's definitely prettier.
  • dfeuer
    dfeuer almost 5 years
  • dfeuer
    dfeuer over 2 years
    The version in Data.List is indeed clever as a general purpose sort. But not every application will expect elements to be partially sorted already. In such applications, it will generally be faster not to try to detect runs of sorted elements. Additionally, there are quite a few applications where it's more important for running time to be predictable than for it to be fast. In these applications too, it's better not to take advantage of sorted runs.