**This is the third article in a seven-part series on Algorithms and Computation, which explores how we use simple binary numbers to power our world. The first article, ****How Algorithms Run the World We Live In****, can be found here.**

If you watch a visualization of different sorting algorithms working on a data set, it becomes very obvious, very quickly, that some are much better than others. Where one algorithm takes **seconds to finish**, another will take **minutes** with even small data sets. Fortunately, we don’t need to see the algorithm at work to know if the algorithm can get the job done quickly or if it's going to collapse under the weight of its input. That is what **Big O Notation** is for, and it can tell us at a glance if the **Time Complexity** of an algorithm means we'll get it in a few **hours** or **billions of years**.

## What is Time Complexity?

In more formal terms, **time complexity** is the measurement of how long an algorithm will take to produce **a result**, relative to the size of **its input**. Practically, it is the **rate of growth** in the **number of operations** required to produce a result for **every additional unit of input**.

In the first article in this series, I described a straight forward **summation algorithm**. To calculate the **sum** of **all numbers between** the numbers **p** and **q**, I declared another variable, **r**, and set it to **zero**. I then added **p** to **r**, then I added **p+1** to** r**, then **p+2**, and so on until I finally added **q** itself to **r**, at which point I stopped and returned the result, **r**, which held the **sum**.

How many operations will this require, without knowing what the ultimate input size will be, using just the abstract variables **p** and **q**? What we are actually doing is running a **loop** where every iteration increases **p** by exactly **1** and adds it **r**. This is actually how our algorithm looks when presented somewhat more formally:

*1. let r = 0*

*2. while*

**p**is**<=**to**q***3.*

**r****=****p****+****r***4.*

**p**=**p****+**

*1**5. return*

*r*When laid out like this in what we call **pseudocode, **it becomes much easier to see how many operations this will take. Go down the steps number by number and count the operations. The first one is an instruction, so it's a single operation. The second line, however, is different. Loops like this repeat until the condition is satisfied, in this case, once **p** is greater than **q. **We know then that we are including **q** and **p** in the **sum**, so the number of **iterations** through this **loop** is equal to **q - ( p - 1 ), **which is the **input size** for our algorithm.

But that is just the number of times we iterate through the loop, we also have to count the operations inside it, which is where we add **p** and **r **and **assign** the result to** r** and then we add **p** and **1** and then assign the result to **p.** So we perform **two operations** for every** iteration**, and there are **q - ( p - 1 )** **iterations.** All we need to do now is multiply **2 operations** and **q - ( p - 1 )** **iterations** to get **2 * ( q - ( p - 1 ) ) operations** for the entire loop. Add the operations at 1. and 5., which gives us a final count of **2 * ( q - ( p - 1 ) ) + 2 operations **for this algorithm.

This is a linear function since there are no exponents, so the **rate of growth** for our summation algorithm is **directly tied to **the **input** size, which we call **linear time complexity**. As a general rule, whatever the highest order term in a formula that defines our time complexity is what characterizes its growth over time, so we take the highest order term and scrap the rest, leaving q - ( p - 1), which we will call **n** for simplicity's sake.

Linear time complexity might sound inefficient when you image input sizes in the billions, but linear time isn't actually too bad. We can do better though.

We've known for a very long time that the **sum** of **all** **the numbers** from** 1** and **q** is given by the formula **( q * ( q+1 ) ) /2.** We also know that the **commutative property of addition** tells us that the result of **( p-1 * (p) ) / 2** subtracted from **( q * ( q+1 ) ) / 2** simply lops off the part of the sum that includes everything from **1** to **p****-1**, leaving the **sum** of the numbers from **p** to **q **behind, which is exactly what we want.

This algorithm, depending on how it's coded, should take no more than **three operations. **Since direct math operations like this are exactly the things that computers do better than humans, we could string the formulas together into a single mathematical expression, and the processor will chew through it as easily as it would if we broke it into smaller chunks, but we'll stick to **three operations** for the moment.

1. *p = ** ( p-1 * (p) ) / 2* =

2. q

**( q * ( q+1 ) ) / 2**

3.

*return*(

**q -**

*)*

**p**No **loops**, just a constant number of operations that don't change, no matter what the difference between **p** and **q** is. This algorithm will always take three operations to perform, so we call this **constant time complexity**.

Now, if you didn’t know what your **input size** was going to be when you were designing an algorithm, which algorithm are you going to use between these two? Obviously, the **second one**, because **constant time algorithms** are essentially a **computational free lunch as far as input is concerned.** So as we **scale up** our program to handle **more data**, their run time will not appreciably change, whereas we know our first algorithm will grow exactly as much as our input, which could be a value in the billions or more. Of course, constant time complexity doesn't always means its more efficient, but in this case it's the one we want.

Before we ever sat down to write a **line of code**, we already figured out which algorithm was **the better option for our input**, and **we never had to run it **to find out how well it works. This is why we use **time complexity, **there just aren't that many tools out there that are as effective at **grading the efficiency of algorithms**.

## What is Big O Notation?

We have a special way of annotating this efficiency to make things easier, something we call **Big O Notation**. Simply put, **Big O Notation** represents the **algorithm's** **efficiency** run through its **worst-case scenario**. If you had to search for a name in a directory by reading every name until you found the right one, the worst case scenario is that the name you want is the very last entry in the directory. This means you'd have to read through the whole directory of **n** names to get to the one you want. So we say this algorithm is O(**n**), where **n** is the **highest order term** in our operations formula.

There are other **Big notations** for the **best case** and **average case**, but what really matters is the **worst case scenario**; those are the ones that can crash your computer—or worse, **your car** or **your airplane**. They go right to the heart of why **time complexity** matters and point to why **some algorithms** simply cannot solve a problem without taking **a few billion years** to do it.

So how do we use **Big O Notation**? The chart above shows you how these different **Big O Notations** look in practice, with the **x-axis** being your input and your **y-axis** the time taken to finish. For a bit more detail, you'll find a quick list of all the **Big O Notations** that matter for now and the **time complexity** they represent:

* **O(1)**: **Constant Time Complexity**- This is the effectively computational free pass we talked about before. **This does not necessarily mean it is faster**. It just means that the **time complexity** is unrelated to the size of the input.

* **O(log n)**: **Logarithmic Time Complexity**- These are most common when using a **divide-and-conquer strategy** on a data set, where every operation discards a large portion of the input that it has ruled out as not having the solution. The classic example is looking a word up in a dictionary: Open at random, check what letter section you're in, then ignoring the part where you know your word won't be, and recursively subdividing and discarding the remaining sections until you find your word.

* **O(n)**: **Linear Time Complexity**- This was our summation algorithm from the beginning.

* **O(n log n)**: **Linearithmic Time Complexity**- Performing a fast Fourier transform is an **O(n Log n)** algorithm, as is a Mergesort.

* **O(n ^{2})**:

**Quadratic Time Complexity**- This is usually found whenever you have nested loops. In our first algorithm, if we had a second loop within the first one, the function would have developed quadratic time complexity.

* **O(n ^{c}, c > 1)**:

**Polynomial Time Complexity**-

**Polynomial Time Complexity**is

*because it more or less*

**very important****represents the upper bound**on what a

**classical computer**can solve in a practical amount of time.

* **O(c ^{n}, n > 1, c > 1)**:

**Exponential Time Complexity**- This is where you start to get the

**Billion-year algorithms**. Any time a

**unit of input**causes you to

**double the number of operations performed**relative to the number that are performed

**if the input is**

**n-1**, you have

**exponential time complexity**. The most common example most people use is trying to enumerate

**every possible subset of a set**, but

**Brute-forcing**a

**128-bit encryption key**is usually put into this category.

* **O(n!)**: **Factorial Time Complexity**- These are algorithms that could probably run until the heat death of the Universe with if given a moderately large input size of a few thousand. Whenever you have something like “how many different ways can you arrange this sequence,” you have a permutation problem and brute-forcing your way to an answer requires creating **n!** different values, which is given by the result of: **n * (n - 1) * (n - 2) * ... * 3 * 2 * 1**. These are the algorithms that run almost parallel to the **y-axis** in the diagram above.

## Why We Can’t Just Come Up With Better Algorithms

It’s not for lack of trying. The entire field of **theoretical computer science** is all about trying to find the most **efficient algorithm** for a given problem, but sometimes we just don’t know the math. And not just you and me, we’re talking Field’s Medal winners who've come up against some problems like the **O(2 ^{n})** and

**O(n!)**and their guess is as good as ours.

There's a whole catalog of problems that we don't have the math to solve yet—we will get into that more toward the end of the series—, but these problems act as choke points that create inefficiencies in business, scientific research, and other administrative areas, and there are a lot of people waiting for these problems to finally be solved. Very lucrative prizes have even been offered to anyone who can solve some of these things but so far, no one has been able to and some of these problems have been sitting around for decades.

[see-also]

The reason why **classical computers** can’t solve these problems efficiently either is built into the circuits of the machine; every instruction it is given must be completed before it can start on the next one. Multicore systems or multiprocessors machines can speed things up, but you’d still probably need **one or two trillion years** to get a result after throwing all of our processors together and setting them loose on an **O(n!)** problem where **n** was something like **500**.

Computer engineers have seen this coming for decades, but now we are finally coming to the end of the line for what we can squeeze out of a **classical computer**. You just can't make these **operations go any faster**, and by nature, they can **only perform one operation at a time**. If you have an algorithm that requires **quintillions of operations** to complete, a **classical computer** has to carry out all of those **operations in order.** At that point, the time it takes to get a result simply becomes a matter of **math and physics**. If we’re going to solve these problems that have **exponential time complexity or greater**, then **something else** is needed.

The good news is that we know it’s possible, in **at least one case**, to find an efficient enough algorithm to find the answer to one of these **O(2 ^{n})** problems in

**polynomial time complexity**. If it can be done once, then it's possible to do it again; it just takes

**sidestepping the whole “physics” thing**.

*The fourth article in our series on Algorithms and Computation, *How Peter Shor’s Algorithm Dooms RSA Encryption to Failure, *can be found here**.*