This is the third article in a seven-part series on Algorithms and Computation, which explores how we use simple binary numbers to power our world. The first article, How Algorithms Run the World We Live In, can be found here.
If you watch a visualization of different sorting algorithms working on a data set, it becomes very obvious, very quickly, that some are much better than others. Where one algorithm takes seconds to finish, another will take minutes with even small data sets. Fortunately, we don’t need to see the algorithm at work to know if the algorithm can get the job done quickly or if it's going to collapse under the weight of its input. That is what Big O Notation is for, and it can tell us at a glance if the Time Complexity of an algorithm means we'll get it in a few hours or billions of years.
What is Time Complexity?
In more formal terms, time complexity is the measurement of how long an algorithm will take to produce a result, relative to the size of its input. Practically, it is the rate of growth in the number of operations required to produce a result for every additional unit of input.
In the first article in this series, I described a straight forward summation algorithm. To calculate the sum of all numbers between the numbers p and q, I declared another variable, r, and set it to zero. I then added p to r, then I added p+1 to r, then p+2, and so on until I finally added q itself to r, at which point I stopped and returned the result, r, which held the sum.
How many operations will this require, without knowing what the ultimate input size will be, using just the abstract variables p and q? What we are actually doing is running a loop where every iteration increases p by exactly 1 and adds it r. This is actually how our algorithm looks when presented somewhat more formally:
1. let r = 0
2. while p is <= to q
3. r = p + r
4. p = p + 1
5. return r
When laid out like this in what we call pseudocode, it becomes much easier to see how many operations this will take. Go down the steps number by number and count the operations. The first one is an instruction, so it's a single operation. The second line, however, is different. Loops like this repeat until the condition is satisfied, in this case, once p is greater than q. We know then that we are including q and p in the sum, so the number of iterations through this loop is equal to q - ( p - 1 ), which is the input size for our algorithm.
But that is just the number of times we iterate through the loop, we also have to count the operations inside it, which is where we add p and r and assign the result to r and then we add p and 1 and then assign the result to p. So we perform two operations for every iteration, and there are q - ( p - 1 ) iterations. All we need to do now is multiply 2 operations and q - ( p - 1 ) iterations to get 2 * ( q - ( p - 1 ) ) operations for the entire loop. Add the operations at 1. and 5., which gives us a final count of 2 * ( q - ( p - 1 ) ) + 2 operations for this algorithm.
This is a linear function since there are no exponents, so the rate of growth for our summation algorithm is directly tied to the input size, which we call linear time complexity. As a general rule, whatever the highest order term in a formula that defines our time complexity is what characterizes its growth over time, so we take the highest order term and scrap the rest, leaving q - ( p - 1), which we will call n for simplicity's sake.
Linear time complexity might sound inefficient when you image input sizes in the billions, but linear time isn't actually too bad. We can do better though.
We've known for a very long time that the sum of all the numbers from 1 and q is given by the formula ( q * ( q+1 ) ) /2. We also know that the commutative property of addition tells us that the result of ( p-1 * (p) ) / 2 subtracted from ( q * ( q+1 ) ) / 2 simply lops off the part of the sum that includes everything from 1 to p-1, leaving the sum of the numbers from p to q behind, which is exactly what we want.
This algorithm, depending on how it's coded, should take no more than three operations. Since direct math operations like this are exactly the things that computers do better than humans, we could string the formulas together into a single mathematical expression, and the processor will chew through it as easily as it would if we broke it into smaller chunks, but we'll stick to three operations for the moment.
1. p = ( p-1 * (p) ) / 2
2. q = ( q * ( q+1 ) ) / 2
3. return ( q - p )
No loops, just a constant number of operations that don't change, no matter what the difference between p and q is. This algorithm will always take three operations to perform, so we call this constant time complexity.
Now, if you didn’t know what your input size was going to be when you were designing an algorithm, which algorithm are you going to use between these two? Obviously, the second one, because constant time algorithms are essentially a computational free lunch as far as input is concerned. So as we scale up our program to handle more data, their run time will not appreciably change, whereas we know our first algorithm will grow exactly as much as our input, which could be a value in the billions or more. Of course, constant time complexity doesn't always means its more efficient, but in this case it's the one we want.
Before we ever sat down to write a line of code, we already figured out which algorithm was the better option for our input, and we never had to run it to find out how well it works. This is why we use time complexity, there just aren't that many tools out there that are as effective at grading the efficiency of algorithms.
What is Big O Notation?
We have a special way of annotating this efficiency to make things easier, something we call Big O Notation. Simply put, Big O Notation represents the algorithm's efficiency run through its worst-case scenario. If you had to search for a name in a directory by reading every name until you found the right one, the worst case scenario is that the name you want is the very last entry in the directory. This means you'd have to read through the whole directory of n names to get to the one you want. So we say this algorithm is O(n), where n is the highest order term in our operations formula.
There are other Big notations for the best case and average case, but what really matters is the worst case scenario; those are the ones that can crash your computer—or worse, your car or your airplane. They go right to the heart of why time complexity matters and point to why some algorithms simply cannot solve a problem without taking a few billion years to do it.
So how do we use Big O Notation? The chart above shows you how these different Big O Notations look in practice, with the x-axis being your input and your y-axis the time taken to finish. For a bit more detail, you'll find a quick list of all the Big O Notations that matter for now and the time complexity they represent:
* O(1): Constant Time Complexity- This is the effectively computational free pass we talked about before. This does not necessarily mean it is faster. It just means that the time complexity is unrelated to the size of the input.
* O(log n): Logarithmic Time Complexity- These are most common when using a divide-and-conquer strategy on a data set, where every operation discards a large portion of the input that it has ruled out as not having the solution. The classic example is looking a word up in a dictionary: Open at random, check what letter section you're in, then ignoring the part where you know your word won't be, and recursively subdividing and discarding the remaining sections until you find your word.
* O(n): Linear Time Complexity- This was our summation algorithm from the beginning.
* O(n log n): Linearithmic Time Complexity- Performing a fast Fourier transform is an O(n Log n) algorithm, as is a Mergesort.
* O(n2): Quadratic Time Complexity- This is usually found whenever you have nested loops. In our first algorithm, if we had a second loop within the first one, the function would have developed quadratic time complexity.
* O(nc, c > 1): Polynomial Time Complexity- Polynomial Time Complexity is very important because it more or less represents the upper bound on what a classical computer can solve in a practical amount of time.
* O(cn, n > 1, c > 1): Exponential Time Complexity- This is where you start to get the Billion-year algorithms. Any time a unit of input causes you to double the number of operations performed relative to the number that are performed if the input is n-1, you have exponential time complexity. The most common example most people use is trying to enumerate every possible subset of a set, but Brute-forcing a 128-bit encryption key is usually put into this category.
* O(n!): Factorial Time Complexity- These are algorithms that could probably run until the heat death of the Universe with if given a moderately large input size of a few thousand. Whenever you have something like “how many different ways can you arrange this sequence,” you have a permutation problem and brute-forcing your way to an answer requires creating n! different values, which is given by the result of: n * (n - 1) * (n - 2) * ... * 3 * 2 * 1. These are the algorithms that run almost parallel to the y-axis in the diagram above.
Why We Can’t Just Come Up With Better Algorithms
It’s not for lack of trying. The entire field of theoretical computer science is all about trying to find the most efficient algorithm for a given problem, but sometimes we just don’t know the math. And not just you and me, we’re talking Field’s Medal winners who've come up against some problems like the O(2n) and O(n!) and their guess is as good as ours.
There's a whole catalog of problems that we don't have the math to solve yet—we will get into that more toward the end of the series—, but these problems act as choke points that create inefficiencies in business, scientific research, and other administrative areas, and there are a lot of people waiting for these problems to finally be solved. Very lucrative prizes have even been offered to anyone who can solve some of these things but so far, no one has been able to and some of these problems have been sitting around for decades.
The reason why classical computers can’t solve these problems efficiently either is built into the circuits of the machine; every instruction it is given must be completed before it can start on the next one. Multicore systems or multiprocessors machines can speed things up, but you’d still probably need one or two trillion years to get a result after throwing all of our processors together and setting them loose on an O(n!) problem where n was something like 500.
Computer engineers have seen this coming for decades, but now we are finally coming to the end of the line for what we can squeeze out of a classical computer. You just can't make these operations go any faster, and by nature, they can only perform one operation at a time. If you have an algorithm that requires quintillions of operations to complete, a classical computer has to carry out all of those operations in order. At that point, the time it takes to get a result simply becomes a matter of math and physics. If we’re going to solve these problems that have exponential time complexity or greater, then something else is needed.
The good news is that we know it’s possible, in at least one case, to find an efficient enough algorithm to find the answer to one of these O(2n) problems in polynomial time complexity. If it can be done once, then it's possible to do it again; it just takes sidestepping the whole “physics” thing.
The fourth article in our series on Algorithms and Computation, How Peter Shor’s Algorithm Dooms RSA Encryption to Failure, can be found here.