0

I am a beginner learning about algorithms for fun, and I am trying to implement merge sort in Python.

Below is my implementation. It is very slow when I feed it a 100000 list.

def mergesortedlist(L, R):
    LR = [0]*(len(L)+len(R))  ##merged output (LR combined)
    i= 0     ##counter for left side
    j= 0      ##counter for ride side
    k = 0
    while i <= len(L) and j <= len(R):

        if i == len(L): 
            LR[k:]= R[j:]
            return LR

        elif j == len(R): 
            LR[k:] = L[i:]
            return LR
        elif L[i] < R[j]: 
            LR[k]= L[i]
            i+=1
            k+=1
        else: ##left side is bigger than right side
            LR[k]=R[j]
            j+=1
            k+=1   

def mergesort(N):  

    if len(N) <= 1: 
        return N
    else:
        sub1 = N[0:round(len(N)/2)]  
        sub2 = N[round(len(N)/2):]
        return mergesortedlist(mergesort(sub1), mergesort(sub2))

Here's an implementation that I found somewhere online from this website (http://interactivepython.org/courselib/static/pythonds/SortSearch/TheMergeSort.html)

def mergeSort(alist):
    print("Splitting ",alist)
    if len(alist)>1:
        mid = len(alist)//2
        lefthalf = alist[:mid]
        righthalf = alist[mid:]

        mergeSort(lefthalf)
        mergeSort(righthalf)

        i=0
        j=0
        k=0
        while i < len(lefthalf) and j < len(righthalf):
            if lefthalf[i] < righthalf[j]:
                alist[k]=lefthalf[i]
                i=i+1
            else:
                alist[k]=righthalf[j]
                j=j+1
            k=k+1

        while i < len(lefthalf):
            alist[k]=lefthalf[i]
            i=i+1
            k=k+1

        while j < len(righthalf):
            alist[k]=righthalf[j]
            j=j+1
            k=k+1
    print("Merging ",alist)

alist = [54,26,93,17,77,31,44,55,20]
mergeSort(alist)
print(alist)

It's very fast.

From what I can tell, my implementation is also O(Nlog(N)), so why is mine much slower?

Thanks for your help.

Cheers!

user5219763
  • 1,284
  • 12
  • 19

2 Answers2

0

I don't see a huge performance difference: 7.3s vs 5.4s for 1,000,000 random floats. But you're correct the second one is faster.

Here's how to profile them to see what's taking the time.

python -m cProfile -s time mergesort.py

Output from yours:

999999    8.517    0.000   11.088    0.000 mergesort.py:2(mergesortedlist)
1999999/1    2.735    0.000   14.151   14.151 mergesort.py:25(mergesort)
84184856/84184834    2.725    0.000    2.725    0.000 {len}
1999998    0.174    0.000    0.174    0.000 {round}

And the second one:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
1999999/1    7.377    0.000    8.721    8.721 mergesort2.py:1(mergeSort)
40499148/40499126    1.344    0.000    1.344    0.000 {len}

You see yours spends more time with calling len() and round().

Also the second one operates on the array passed by reference while yours returns the array. That copying may take some time too. Since you're a beginner it would make sense to review the difference

Community
  • 1
  • 1
Ryan Jay
  • 838
  • 8
  • 9
0

My 2 cents. In my tests neither round() or len() functions affect significantly the running time. I think the problem is the following: In each mergesortedlist() call, you creates a new list. I tested this code with IPython's %%timeit

L = N[0:round(len(N)/2)]  
R = N[round(len(N)/2):]
LR = [0]*(len(L)+len(R))

2,000,000 elements -> 10 loops, best of 3: 34.4 ms per loop
1,000,000 elements -> 100 loops, best of 3: 17.2 ms per loop
500,000 elements -> 100 loops, best of 3: 8.3 ms per loop
250,000 elements -> 100 loops, best of 3: 3.49 ms per loop
125,000 elements -> 1000 loops, best of 3: 1.25 ms per loop
62,500 elements -> 1000 loops, best of 3: 526 µs per loop
and so...

And remember, the worst case for Mergesort is O(N*logN). (This fact rejects the case where we apply the first sort function to random numbers and the second one to sorted numbers.)

Jose Raul Barreras
  • 849
  • 1
  • 13
  • 19