On practicality of big O notation

Big O notation is a mathematical definition for comparing growth of different functions when their arguments tend to infinity. Computer scientists use this definition to compare performance of algorithms. Many developers ignore importance of this fundamental construct of computer science. Through this post, I am going to take a look at importance of big O notation from a practical point of view.

Is there any meaningful difference between two algorithms from the same order?

In worst case scenario an O(n) operation is faster than an O(n²) operation, but is there any substantial difference between two O(n²) operations? The short answer is yes, but let’s compare performance of two algorithms with the same order to have a tangible understanding about this difference. Consider an algorithm that determine union of two sets. Just remember that by definition, a set does not have duplicate items. Here are two algorithms with the same order, O(n²),  that merge two lists.

The first algorithm is:

public static List UnionAlgorithm1(List list1, List list2)
{
  var result = new List();
  result.AddRange(list1);

  foreach (var item in list2)
    if (!list1.Contains(item))
      result.Add(item);

  return result;
}

According to MSDN List.Contains and List.AddRange are both O(n) operations. Hence the above algorithm is O(n) + O(n²) which is equal to O(n²). 

The second algorithm is:

public static List UnionAlgorithm2(List list1, List list2)
{
  var result = new List();
  result.AddRange(list1);

  foreach (var item in list2)
    result.Remove(item);

  result.AddRange(list2);
  return result;
}

Because List.Remove is an O(n) operation, and we already know that List.AddRange is an O(n) operation, order of the above algorithm is O(n) + O(n²) + O(n), which again equals to O(n²).

Theoretically there should not be any significant difference between running time of the above algorithms. But significant difference in mathematics has a specific meaning. So let’s put these two algorithms to work to find out how different they are. To make it more interesting, I am going to give you a chance to guess how long does it take to run each of them. I know that the result of the test depends on the hardware specification, but I have chosen reasonably far from each other answers to cancel that.

Consider list1 contains “1”, “2”, …, “100000” and list2 contains “90001”, “90002”, …, “190000”, and then answer the following questions:

First algorithm will merge two lists in ——.

  1. 4 minutes
  2. 2 minutes
  3. 1 minute
  4. 30 seconds
  5. 10 seconds
  6. Less than a second

Second algorithm will merge two lists in ——.

  1. 3 minutes
  2. 1 minutes and 30 Seconds
  3. 45 seconds
  4. 20 seconds
  5. 10 seconds
  6. Less than a second

Before seeing the actual results, let’s analyse which algorithm is faster than the other. Theoretically, the second algorithm should be faster, because while the add function in the first one actively increases the size of the result list which the O(n²) part of the algorithm is iterating on it, remove function in the second algorithm actively decreases the size of it. Hence, we expect that the second algorithm outperforms the first one. [If you have changed your mind, go ahead and pick new choices ;)]

Running the first algorithm on my machine took 136,665 milliseconds, therefore 2 minutes is the closest answer for the first question.  And the second one took 93,035 milliseconds, therefore 1 minutes and 30 seconds is the closest answer for the second question. Notice that the second algorithm runs 30% faster than the first one.

The moral of the story is that, while theory says that two algorithms of the same order are not “significantly” different, in practice two algorithms from the same order can be “noticeably” different in term of performance. Hence you should be careful about choosing an algorithm over another even if they are from the same order.

How significant is the difference between an O(n²) and an O(n) algorithm?

As you may already notice, none of the introduced algorithms was clever enough. It is possible to improve the above algorithms by improving the part that is looking for duplicate items. Currently we are doing an O(n) search operation, but by using HashSet instead of List, search can be an O(1) operation.

Optimised version of the first algorithm is:

public static List UnionAlgorithm3(List list1, List list2)
{
  var result = new HashSet();
  foreach (var item in list1)
    result.Add(item);

  foreach (var item in list2)
    if (!result.Contains(item))
      result.Add(item);

  return result.ToList();
}

Order of the above operation is O(n) + O(n) + O(n), which is equal to O(n).

Optimised version of the second algorithm is:

public static List UnionAlgorithm4(List list1, List list2)
{
  var result = new HashSet();
  foreach (var item in list1)
    result.Add(item);

  foreach (var item in list2)
    result.Remove(item);

  foreach (var item in list2)
    result.Add(item);

  return result.ToList();
}

Order of above operation is O(n) + O(n) + O(n) + O(n), which is again equal to O(n).

Again I give you a chance to guess how long these algorithms take to merge two lists. However the difference between the two algorithms is so tiny that I preferred to ask only one question.

Optimised version algorithms took ——.

  1. 30 seconds
  2. 10 seconds
  3. 3 seconds
  4. 1 second.
  5. 500 milliseconds
  6. Less than 50 milliseconds

Despite being very similar in term of performance, these two algorithms are not equal. In fact we expect one of them to be a bit faster than the other one. Theoretically, third algorithm runs faster, because it consists of three O(n) operations in comparison to the forth one that consists of four O(n) operations.  

Anyway, let’s take a look at the results. Running these algorithms on my machine, they both took less than 50 milliseconds to finish. As it was expected third algorithm ran faster and took 37 milliseconds, and forth one took 44 milliseconds. Rather astonishingly, the first algorithm improved 3700 times and the second one 2200 times. So “usually” the difference between two algorithms from different orders is very large.

Last Word

After reading this, some might think that choosing between an O(n) and an O(n²) algorithm is trivial. But in practice, simply it’s not. As you know, Big O notation examine worst case scenarios. So there is a chance that in best case scenario an O(n²) out performs an O(n) algorithm. For example if we change the input parameters in previous experiment, results would be very different. Consider list1 contains “1”, “2”, …, “100000” and list2 is just an empty list. In this case the first two O(n²) algorithms took less than one millisecond, while the last two O(n) operations took like 10 milliseconds to complete. So an O(n²) operation can be 10 times faster than an O(n) operation.

As you can see, investigating the inputs before choosing an algorithm, could be more critical than having the knowledge of using “The Best” algorithm. In other word, as it is discussed in my previous post, knowing the problem is as important as knowing the solution.

I hope this post reminded you the importance of big O notation in practice. Here you can download source code for the above experiments.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *