# Introduction to Algorithm Analysis

Before we dive into the subject of Algorithm Analysis, let’s get a clear understanding of what an algorithm actually is.

Whenever we want to solve a problem with a computer, we process large amounts of data. To actually make sense of this data, we need some set of rules on how we’re supposed to process it. This is where the algorithm comes in.

**An algorithm is a specified set of rules / instructions that the computer will follow to solve a particular problem. In other words, we need to tell the computer how to process the data, so we can make sense of it.**

This might be easier to understand with a practical example:

Let’s assume you work as a programmer, and you have to sort the data about your company’s customers. Your boss want you to sort it based on their last names. Instead of manually going through the list (which will be a full time job itself), you program a routine that compares the customers last names – and organize them in a sorted manner. That’s it – you’ve just created an *algorithm*, which sorts data alphabetically.

## What is Algorithm Analysis?

In today’s society, time is critical – and computer science is no exception. We want to process large amounts of data, and we want to do it as fast as possible. This is where algorithm analysis comes in. Whenever we come up with a new algorithm, or make use of a pre-existing one, we want to know how fast the data can be processed.

The first factor which comes to interest is of course the amount of input that must be processed. Surely, we can sort 10 elements faster than we can sort 10 000 elements. Thus, more data equals more time.

But, although the running time of an algorithm is largely dependent on the amount of input, we generally don’t focus much on this aspect. The amount of data we need to process is often out of our hands, especially with sorting algorithms. Referring to the example above, it would not make any sense to reduce the amount of input data – we need to sort all the customers, not just some of them.

Another factor which you may want to consider important, is the speed of the host machine. And although correct, this is not what algorithm analysis is about. Actually, in general, when performing algorithm analysis, we exclude external factors as the speed of the host machine, the quality of the compiler etc.

**When performing algorithm analysis, we want to evaluate the performance of an algorithm in terms of its input size. In other words, we’re evaluating how the performance of an algorithm changes according to its input size.**

## How to perform Algorithm Analysis

One naive way to perform the analysis is to measure the actual run-time of the algorithm. This can, and will, produce false results. Assuming we have two algorithms, **algorithm 1 **and **algorithm 2**:

**Algorithm 1**may produce better results on some inputs, and**algorithm 2**may produce better results on other inputs.**Algorithm 1**may have a lower run-time on one machine, and**algorithm 2**may have a lower run-time on other machines.

An example of performing algorithm analysis this way, and thus producing false results is as follows:

Let’s say we want to compare two different search algorithms, the **Linear Search (**the** **growth rate is a linear function of the input size), and the **Binary Search** (the growth rate is a logarithmic function of the input size). For simplicity, assume we run the algorithms on two different machines.

Clearly, the Linear Search is less efficient than the Binary Search, since the growth rate is linear, and not logarithmic. However, since we ran the algorithms on two different machines, it appeared as the linear search was faster than the binary search.

Instead of comparing the actual run-time, which will differ based on a whole lot of external factors, we want to use something called **Asymptotic Analysis**. In Asymptotic Analysis, we* evaluate how the performance changes according to the input size, *which is the correct way of performing algorithm analysis.

It is also worth mentioning that we have three cases to analyze an algorithm:

- Worst case scenario
- Average case scenario
- Best case scenario

### Algorithm running times

As previously mentioned, we want to evaluate how the algorithms run-time changes according to the input size N. When expressing these run-times, we need to make use of some kind of notation. In this article, we use Big-Oh Notation.** Big-Oh Notation is used to represent the growth rate of an algorithm.**

Five very common functions encountered in algorithm analysis is:

- Constant time. The algorithm’s run-time is constant. It will execute in the same time, regardless of the size of the input N. The Big-Oh notation is
**O(1).** - The linear function. The algorithm’s run-time increases proportionally with the input size N. The Big-Oh notation for a linear function is
**O(N).** - The logarithmic function. The run-time increases with a factor of log N. The Big-Oh notation for the logarithmic function is
**O(logN).** - The quadratic function. The growth rate is quadratic, thus it increases quadratic. The Big-Oh notation is
**O(N²).** - The cubic function. The growth rate is cubic, and the Big-Oh notation is
**O(N³).**

## Analysis of a Linear Search algorithm

Let’s take an example where we implement a Linear Search algorithm in Java, and analyses it’s performance. If you’re unfamiliar with the linear search algorithm, it is surprisingly simple:

Given an array of N objects, and you want to find the index of an object **x **in that array, the linear search algorithm simply loops through the array until **x **is found. If **x **is not found, it returns a negative value (indicating that the object was not found, since **arrays **can’t have negative indexes).

For further reference, keep in mind that the input size is represented by the character N, thus input size = N.

Consider the following Java implementation of a linear search algorithm:

1 2 3 4 5 6 7 | private int search(int array[], int x) { for (int i = 0; i < array.length; i++) { if (array[i] == x) return i; } return -1; } |

### Analysis of the worst case scenario

We start off with the worst case scenario, which is the analysis usually performed.

A worst case scenario is a general case, which indicates that the maximum number of operations is executed. Looking at our linear search algorithm, we can tell that the maximum number of operations happens when **x **is not found, or when **x **is in the last position of the array . Thus, in these two scenarios we have to loop through the entire array of size N.

According to this analysis, we can conclude that the worst case scenario for the linear search algorithm is **linear, O(N).**

### Analysis of the average case scenario

The average case scenario is a difficult thing to actually measure. When performing algorithm analysis, we want to produce a general term for the run-time of an algorithm. Without any prior information about the input, it is especially hard.

The steps to perform an analysis of the average case scenario is:

- Take all possible inputs, and calculate the computing time for all the inputs.
- Sum up all the calculated computing times, and divide it by the total number of inputs.

To actually perform this calculation, we need to make use of something called **mathematical induction**. Let’s apply this technique to the steps above:

Assuming all possible inputs is N + 1, and that the runtime is *i. *The computing time for all the inputs divided by the total number of inputs is then:

= = **O(N)**

Thus, the run-time for the average case is also linear, O(N).

### Analysis of the best case scenario

The best case scenario is not generally used. The best case scenario represents a lower bound run-time for the algorithm. In other words, we are looking for a situation which produces the lowest number of operations executed. Since we search through the array in a linear manner, it is clear that this scenario occurs if the element we’re looking for is in the first position in the array. Thus, the run-time for the best case scenario is constant, **O(1).**

However, the analysis of the best case is not particularly useful. It provides no information about the average and worst case, and no statistical information about how often it occurs. It might actually never occur, as far as we know.

### Note about Big-Oh notation

You might have noticed that I used the Big-Oh notation for all of the scenarios. This is done for simplicity, to avoid introducing other notation-techniques, while still being able to introduce the best, average and worst case scenario.

In a real life situation, you use Big-Oh notation only when describing the **worst case **scenario, known as the upper bound of the algorithm.