Mergesort with Python

115,356

Solution 1

You can initialise the whole result list in the top level call to mergesort:

result = [0]*len(x)   # replace 0 with a suitable default element if necessary. 
                      # or just copy x (result = x[:])

Then for the recursive calls you can use a helper function to which you pass not sublists, but indices into x. And the bottom level calls read their values from x and write into result directly.

That way you can avoid all that poping and appending which should improve performance.

Solution 2

The first improvement would be to simplify the three cases in the main loop: Rather than iterating while some of the sequence has elements, iterate while both sequences have elements. When leaving the loop, one of them will be empty, we don't know which, but we don't care: We append them at the end of the result.

def msort2(x):
    if len(x) < 2:
        return x
    result = []          # moved!
    mid = int(len(x) / 2)
    y = msort2(x[:mid])
    z = msort2(x[mid:])
    while (len(y) > 0) and (len(z) > 0):
        if y[0] > z[0]:
            result.append(z[0])
            z.pop(0)
        else:
            result.append(y[0])
            y.pop(0)
    result += y
    result += z
    return result

The second optimization is to avoid popping the elements. Rather, have two indices:

def msort3(x):
    if len(x) < 2:
        return x
    result = []
    mid = int(len(x) / 2)
    y = msort3(x[:mid])
    z = msort3(x[mid:])
    i = 0
    j = 0
    while i < len(y) and j < len(z):
        if y[i] > z[j]:
            result.append(z[j])
            j += 1
        else:
            result.append(y[i])
            i += 1
    result += y[i:]
    result += z[j:]
    return result

A final improvement consists in using a non recursive algorithm to sort short sequences. In this case I use the built-in sorted function and use it when the size of the input is less than 20:

def msort4(x):
    if len(x) < 20:
        return sorted(x)
    result = []
    mid = int(len(x) / 2)
    y = msort4(x[:mid])
    z = msort4(x[mid:])
    i = 0
    j = 0
    while i < len(y) and j < len(z):
        if y[i] > z[j]:
            result.append(z[j])
            j += 1
        else:
            result.append(y[i])
            i += 1
    result += y[i:]
    result += z[j:]
    return result

My measurements to sort a random list of 100000 integers are 2.46 seconds for the original version, 2.33 for msort2, 0.60 for msort3 and 0.40 for msort4. For reference, sorting all the list with sorted takes 0.03 seconds.

Solution 3

Code from MIT course. (with generic cooperator )

import operator


def merge(left, right, compare):
    result = []
    i, j = 0, 0
    while i < len(left) and j < len(right):
        if compare(left[i], right[j]):
            result.append(left[i])
            i += 1
        else:
            result.append(right[j])
            j += 1
    while i < len(left):
        result.append(left[i])
        i += 1
    while j < len(right):
        result.append(right[j])
        j += 1
    return result


def mergeSort(L, compare=operator.lt):
    if len(L) < 2:
        return L[:]
    else:
        middle = int(len(L) / 2)
        left = mergeSort(L[:middle], compare)
        right = mergeSort(L[middle:], compare)
        return merge(left, right, compare)

Solution 4

def merge_sort(x):

    if len(x) < 2:return x

    result,mid = [],int(len(x)/2)

    y = merge_sort(x[:mid])
    z = merge_sort(x[mid:])

    while (len(y) > 0) and (len(z) > 0):
            if y[0] > z[0]:result.append(z.pop(0))   
            else:result.append(y.pop(0))

    result.extend(y+z)
    return result

Solution 5

Take my implementation

def merge_sort(sequence):
    """
    Sequence of numbers is taken as input, and is split into two halves, following which they are recursively sorted.
    """
    if len(sequence) < 2:
        return sequence

    mid = len(sequence) // 2     # note: 7//2 = 3, whereas 7/2 = 3.5

    left_sequence = merge_sort(sequence[:mid])
    right_sequence = merge_sort(sequence[mid:])

    return merge(left_sequence, right_sequence)

def merge(left, right):
    """
    Traverse both sorted sub-arrays (left and right), and populate the result array
    """
    result = []
    i = j = 0
    while i < len(left) and j < len(right):
        if left[i] < right[j]:
            result.append(left[i])
            i += 1
        else:
            result.append(right[j])
            j += 1
    result += left[i:]
    result += right[j:]

    return result

# Print the sorted list.
print(merge_sort([5, 2, 6, 8, 5, 8, 1]))
Share:
115,356

Related videos on Youtube

Hans
Author by

Hans

Finished studying Computer Science from University of Tartu. Currently working at Playtech Estonia as a full-stack python developer

Updated on July 09, 2022

Comments

  • Hans
    Hans almost 2 years

    I couldn't find any working Python 3.3 mergesort algorithm codes, so I made one myself. Is there any way to speed it up? It sorts 20,000 numbers in about 0.3-0.5 seconds

    def msort(x):
        result = []
        if len(x) < 2:
            return x
        mid = int(len(x)/2)
        y = msort(x[:mid])
        z = msort(x[mid:])
        while (len(y) > 0) or (len(z) > 0):
            if len(y) > 0 and len(z) > 0:
                if y[0] > z[0]:
                    result.append(z[0])
                    z.pop(0)
                else:
                    result.append(y[0])
                    y.pop(0)
            elif len(z) > 0:
                for i in z:
                    result.append(i)
                    z.pop(0)
            else:
                for i in y:
                    result.append(i)
                    y.pop(0)
        return result
    
    • poke
      poke over 10 years
      You should not pop from lists, as that will unecessarily shift the array elements over and over. You should avoid changing the list anyway when iterating over it.
    • Tamás
      Tamás over 10 years
      Also, there is probably nothing specific to Python 3.3 in an ordinary implementation of mergesort so you can just Google for "python mergesort" and use any implementation you find, even if it is for older versions. For instance, this one: geekviewpoint.com/python/sorting/mergesort
    • siddhesh
      siddhesh over 7 years
      The question is too old but isn't it using more memory for result array merge sort already uses double memory of array to sort it we are again producing the array in result.
  • simonzack
    simonzack over 9 years
    Using sorted() feels like cheating.
  • Abhishek Prakash
    Abhishek Prakash over 9 years
    I tried your msort3 method in python 2.7.6 but I got the following error - Traceback (most recent call last): File "mergesort.py", line 21, in <module> msort3([5,24, 87, 55, 32, 1, 45]); File "mergesort.py", line 6, in msort3 y = msort3(x[:mid]) File "mergesort.py", line 10, in msort3 while i < len(y) and j < len(z): TypeError: object of type 'NoneType' has no len()
  • Abhishek Prakash
    Abhishek Prakash over 9 years
    I tried the same msort3 method in python 3.4.0 and got the following error - [24, 87] Traceback (most recent call last): File "mergesort.py", line 21, in <module> msort3([5,24, 87, 55, 32, 1, 45]); File "mergesort.py", line 6, in msort3 y = msort3(x[:mid]) File "mergesort.py", line 10, in msort3 while i < len(y) and j < len(z): TypeError: object of type 'NoneType' has no len()
  • anumi
    anumi over 9 years
    @AbhishekPrakash: I cannot reproduce the error in Python 2.7.5. Will try latter on another machine. Are the return statements well written?
  • anumi
    anumi over 9 years
    @AbhishekPrakash: I ran your test without problems under Python 2.7.6 and Python 3.4.0 (Ubuntu 14.04). If you used print rather than return, the function returns None (as no return is found) and breaks the recursivity.
  • rbaleksandar
    rbaleksandar over 8 years
    Posting test results is not helpful for the OP since he probably has different hardware.
  • Claudiu Creanga
    Claudiu Creanga over 7 years
    returns error: slice indices must be integers or None or have an index method
  • jerryjvl
    jerryjvl over 7 years
    Technically a good answer to the question, but it may need some explanation why you made the changes you did, to be maximally helpful to this and future users.
  • HaveNoDisplayName
    HaveNoDisplayName over 7 years
    Add some explanation
  • Dimitri W
    Dimitri W over 7 years
    Working fine with Python 2.7.5
  • Kishan Mehta
    Kishan Mehta almost 7 years
    After we are outside the first while loop: we can do: if len(left) == i: result.extend(right[j:]) else: result.extend(left[i:])
  • Hans
    Hans about 6 years
    Could use some explanation.
  • Alexandr Zhytenko
    Alexandr Zhytenko about 6 years
    I have only changed your variables names and in the end of your code. If you put print command after each result.append() you will understand better.
  • NoobEditor
    NoobEditor almost 6 years
    you are creating new list instead of modifying the original...not a good idea!
  • Admin
    Admin over 5 years
    This is the implementation of Tim Roughgarden's "Algorithms Illuminated" book.
  • grepit
    grepit over 5 years
    very minimalistic approach but using extend() fails to demonstrate the concept/algorithm for merge....I mean what's a merge sort without the merge algorithm implementation !
  • xiawi
    xiawi over 4 years
    While this code may answer the question, providing additional context regarding why and/or how this code answers the question improves its long-term value.
  • Vitaliy Terziev
    Vitaliy Terziev over 4 years
    I wonder if the while block is going to make your solution not stable, if i == j: append j to result, [1, 2, 3], [1, 8, 9], result will append from right list if I am not mistaken
  • Admin
    Admin over 4 years
    What is the reason for using left.append(float('inf')) and right.append(float('inf')). Is there any other alternative?
  • Admin
    Admin over 4 years
    @Hans Sensational assumption !
  • User
    User about 4 years
    How about saving values in sequence rather than creating a new list called result?
  • Hans
    Hans about 2 years
    Having single letter variables is a bad practice and is not helpful for beginners
  • greybeard
    greybeard about 2 years
    Naming a function mSort() does not make it an implementation of merge sort. Then, there is the Style Guide for Python Code.
  • Sar ibra
    Sar ibra about 2 years
    thank you I'm newbie in python in stackoverflow I just posted this implementation because most of what I found is confusing for me .