Divide & Conquer: Merge Sort

Life is all about making mistakes and learning from them, right? Being a beginner is hard work, but it's also really exhilarating if you can accept that sucking at something is the first step to becoming really good at something. As Jake eloquently quoted ...

So what's with the sappy intro? Someone reading this will feel the same way, or will eventually feel this way. Whether it's about the topic of the blog, or some concept you're struggling to grasp. Be it a bombed technical interview or a failed test on a lab, you will feel that you are alone in what you have called, stupidity. Truth is, you're not alone. Embrace the rollercoaster and continue the never ending journey of learning...

Divide and Conquer

Recursion is a powerful element of programming but it is really difficult to understand. The goal of this blog is to try and make sense of recursion while implementing a sorting algorithm called Merge sort.

Divide and conquer is a strategy that allows us to break a problem into smaller, trivial, subtasks of the same or related type. For example, your job as promoter might be to make 100 phone calls to sponsors around the world. You don't have the time or the energy to do it, so you ask your colleague if they can take half of them. Your friend {{insertnamehere}}, being nice says yes, but also doesn't have the time or energy to accomplish said task. Person B solicits another friend to do half of their phone calls, and this happens again and again until each promoter makes 1 phone call. The end result is the same, but the work done by each promoter was incredibly easy.

Merge Sort

Okay, now that we have a picture of the type of recursion we are going to implement, let's start building out our sorting algorithm.

Merge sort is a sorting algorithm that utilizes the divide and conquer algorithm to split an unsorted list to n sublists, each containing 1 element. Then we will repeatedly merge sublists to produce new sorted sublists until there is one left.

Words, Antoin, you're just saying words. OKAY, let's code.

The first thing we want to do is build our merge_sort method, which will be the method we call recursively to split our array into two halves until there is one element left.

Our merge_sort method will take in an array as an argument.

def merge_sort(array)  
  return array if array.length == 1 
end  

BASE CASES, the corner stone of all recursive functions. Here we have defined our base case, which is, return the array if the length is equal to 1. When designing recursive functions you want to always be working towards some base case or you will be stuck in an endless loop. Ask Dolores how she feels about loops.

The trick here is to deconstruct our array until we have a one element array on the left and right. So here we are going to split our array at it's midpoint until there is one element left. We are setting the left and the right equal the the return value of the recursively called method merge_sort.

def merge_sort(array)  
  return array if array.length == 1
  midpoint = array.length / 2

  left = merge_sort(array[0..midpoint])
  right = merge_sort(array[midpoint..array.length])

  merge(left, right)
end  

Let's talk about what happens when we call this method recursively.

Say we have an 8 element array: [17,11,97,23,9,16]. The method call would look like this:

left = merge_sort([17,11,97])  

This creates a new Stack Frame, which is allocated to handle the local variables of the function until it is returned. Our variable left has not been set equal to anything yet.

So, since we have called the merge_sort method again, we are back inside of it with this array as the argument: [17,11,97]. Our base case will fail because we still have an array with a length greater than one. Midpoint is defined again, and the following line will execute again, except with a smaller array:

left = merge_sort([17])  

Again, we create a new recursive frame by calling merge_sort, and it looks like this:

def merge_sort(array)  
   ## array = [17,11,97]

   return array if array.length == 1
   midpoint = array.length / 2 

   left = merge_sort([17])
   right = merge_sort([11,97]) # still has not been called

   merge(left, right)
end  

This time when we call merge sort our method returns because our base case condition is met. So the execution context that we are in right now pops off the stack and now we are back in the frame listed above. Our recursive frame now will look like this:

def merge_sort(array)  
   ## array = [17,11,97]

   return array if array.length == 1
   midpoint = array.length / 2 

   left = [17]
   right = merge_sort([11,97]) 

   merge(left, right)
end  

Here, left has been assigned a value because of the return.
Our right side variable declaration will do the same thing until we end up with the following:

def merge_sort(array)  
   ## array = [17,11,97]

   return array if array.length == 1
   midpoint = array.length / 2 

   left = [11]
   right = [97]

   merge(left, right)
end  

Notice how left in this frame is equal to 11, rather than 17. If we followed the same pattern we discussed above, we can see how this set of recursive method calls would have set the left side to 11. Now that these two values have been set, we can call the method that will do the merging.

def merge(left = [11], right = [97])  
  result = []

  while(!left.length < 1 && !right.length < 1) 
    if left[0] < right[0]
      result << left.shit 
    else 
      result << right.shift 
    end 
   end 

  result + left + right 
end  

We have made this task a very simple one, and when this method returns, we will pop another frame off of the stack, which will then put us back into this context with a newly sorted sublist.

def merge_sort(array)  
   array = [17,11,97,23,9,16]
   return array if array.length == 1
   midpoint = array.length / 2 

   left = [11,17,97]
   right = merge_sort([23,9,16])

   merge(left, right)
end  

We are now back near the beginning of our stack of recursive frames, and since the return value of our merge_sort method is the result of merge, left has been set to a newly sorted sublist. The same process will occur until we end up with an execution context that looks like this:

def merge_sort(array)  
   array = [17,11,97,23,9,16]
   return array if array.length == 1
   midpoint = array.length / 2 

   left = [11,17,97]
   right = [9,16,23]

   merge(left, right)
end  

where we have two newly sorted arrays. This will execute merge one last time.

def merge(left = [11,17,97], right = [9,16,23])  
  result = []

  while(!left.length < 1 && !right.length < 1) 
    if left[0] < right[0]
      result << left.shit 
    else 
      result << right.shift 
    end 
   end 

    result + left + right 
  end 

end  

Wrapping up

Recursion is hard, but practice makes everything easier over time. Remember, we should always be working towards a base case -- that being something that will break us out of our recursive loop. A neat trick to make sense of recursive stack frames is to use sticky notes. Each time a method is called, a new sticky note should be added with the method call and the variables in the context of that method call. Whenever a method returns, the sticky note should be removed.

Example:

def merge_sort(array = [4,91,2])  
  return array if array.length == 1 
  midpoint = array.length / 2

  left = merge_sort([4])

end 

The method above should be one sticky note and when we reach this line:

left = merge_sort([4])  

we should add another sticky note on top of it that would look like this:

def merge_sort(array = [4])  
  return array if array.length == 1 
  midpoint = array.length / 2

  left = merge_sort([4])

end  

Since this method returns, we need to remove the sticky note, and edit our first sticky note to look like the following:

def merge_sort(array = [4,91,2])  
  return array if array.length == 1 
  midpoint = array.length / 2

  left = [4]
  right = merge_sort([91,2])

end  

These sticky notes are a physical representation of the stack frames that I have mentioned throughout this post. Keep adding and removing sticky notes until you get back to the original note with two sorted halves. This will give you a better idea of how recursion works and how it can simplify tasks. I hope this helps!

Show Comments
comments powered by Disqus