Greedy


In this article I describe Dijkstra’s algorithm for finding the shortest path from one source to all the other vertexes in a graph. Afterwards, I provide the source code in C of a simple implementation.

To understand this you should know what a graph is, and how to store one in memory. If in doubt check this and this.

Another solution for this problem is the Bellman-Ford algorithm.

The Problem

Given the following graph calculate the length of the shortest path from node 1 to every other node.
dj1.png

Lets take the nodes 1 and 3. There are several paths (1 -> 4 -> 3, 1 -> 2 -> 3, etc.), but the shortest of them is 1 -> 4 -> 2 -> 3 of length 9. Our job is to find it.

The Algorithm

Dijkstra’s algorithm is one of the most common solutions to this problem. Even so, it only works on graphs which have no edges of negative weight, and the actual speed of the algorithm can vary from O(n*lg(lg(n))) to O(n2).

The idea is somewhat simple:

Take the length of the shortest path to all nodes to be infinity. Mark the length of the shortest path to the source as 0.

dj2.png

Now, we already know that the graph has no edges of negative weight so the a path of length 0 is the best we can come up with. The path to the source is 0, so it’s optimal.

This algorithm works by making the paths to one more node optimal at each step. So, at the kth step, you know for sure that there are at least k nodes to which you know the shortest path.

At each step, choose the node, which is not yet optimal, but which is closest to the source; i.e. the node to which the current calculated shortest path is smallest. Then, from it, try to optimise the path to every node connected to it. Finally, mark the said node as optimal (visited, if you prefer). In the previous example, the node which is closest to the source and is not yet optimal is the source. From it, you can optimise the path to nodes 2 and 4.
dj3_1.png

At this point, the only visited/optimal node is 0. Now we have to redo this step 4 more times (to ensure that all nodes are optimal).

The next node to consider is 4:
dj4.png

It’s worthwhile to note that at this step, we’ve also found a better path to node 2.
Next is node 2:
dj5.png

Finally, we look at nodes 5 and 3 (none of which offer any optimisations):
dj5.png

dj5.png

The actual code in C looks something like this:

void dijkstra(int s) {
        int i, k, mini;
        int visited[GRAPHSIZE];

        for (i = 1; i <= n; ++i) {
                d&#91;i&#93; = INFINITY;
                visited&#91;i&#93; = 0; /* the i-th element has not yet been visited */
        }

        d&#91;s&#93; = 0;

        for (k = 1; k <= n; ++k) {
                mini = -1;
                for (i = 1; i <= n; ++i)
                        if (!visited&#91;i&#93; && ((mini == -1) || (d&#91;i&#93; < d&#91;mini&#93;)))
                                mini = i;

                visited&#91;mini&#93; = 1;

                for (i = 1; i <= n; ++i)
                        if (dist&#91;mini&#93;&#91;i&#93;)
                                if (d&#91;mini&#93; + dist&#91;mini&#93;&#91;i&#93; < d&#91;i&#93;)
                                        d&#91;i&#93; = d&#91;mini&#93; + dist&#91;mini&#93;&#91;i&#93;;
        }
}
&#91;/sourcecode&#93;

<h4>The Programme</h4>
Putting the above into context, we get the <strong>O(n<sup>2</sup>)</strong> implementation. This works well for most graphs (it will <strong>not</strong> work for graphs with negative weight edges), and it's quite fast.

Here's the source code in C (<a href='https://compprog.files.wordpress.com/2007/12/dijkstra.c' title='dijkstra.c'>dijkstra.c</a>):

#include <stdio.h>

#define GRAPHSIZE 2048
#define INFINITY GRAPHSIZE*GRAPHSIZE
#define MAX(a, b) ((a > b) ? (a) : (b))

int e; /* The number of nonzero edges in the graph */
int n; /* The number of nodes in the graph */
long dist[GRAPHSIZE][GRAPHSIZE]; /* dist[i][j] is the distance between node i and j; or 0 if there is no direct connection */
long d[GRAPHSIZE]; /* d[i] is the length of the shortest path between the source (s) and node i */

void printD() {
	int i;
	for (i = 1; i <= n; ++i)
		printf("%10d", i);
	printf("\n");
	for (i = 1; i <= n; ++i) {
		printf("%10ld", d&#91;i&#93;);
	}
	printf("\n");
}

void dijkstra(int s) {
	int i, k, mini;
	int visited&#91;GRAPHSIZE&#93;;

	for (i = 1; i <= n; ++i) {
		d&#91;i&#93; = INFINITY;
		visited&#91;i&#93; = 0; /* the i-th element has not yet been visited */
	}

	d&#91;s&#93; = 0;

	for (k = 1; k <= n; ++k) {
		mini = -1;
		for (i = 1; i <= n; ++i)
			if (!visited&#91;i&#93; && ((mini == -1) || (d&#91;i&#93; < d&#91;mini&#93;)))
				mini = i;

		visited&#91;mini&#93; = 1;

		for (i = 1; i <= n; ++i)
			if (dist&#91;mini&#93;&#91;i&#93;)
				if (d&#91;mini&#93; + dist&#91;mini&#93;&#91;i&#93; < d&#91;i&#93;) 
					d&#91;i&#93; = d&#91;mini&#93; + dist&#91;mini&#93;&#91;i&#93;;
	}
}

int main(int argc, char *argv&#91;&#93;) {
	int i, j;
	int u, v, w;

	FILE *fin = fopen("dist.txt", "r");
	fscanf(fin, "%d", &e);
	for (i = 0; i < e; ++i)
		for (j = 0; j < e; ++j)
			dist&#91;i&#93;&#91;j&#93; = 0;
	n = -1;
	for (i = 0; i < e; ++i) {
		fscanf(fin, "%d%d%d", &u, &v, &w);
		dist&#91;u&#93;&#91;v&#93; = w;
		n = MAX(u, MAX(v, n));
	}
	fclose(fin);

	dijkstra(1);

	printD();

	return 0;
}
&#91;/sourcecode&#93;

And here's a sample input file(<a href='https://compprog.files.wordpress.com/2007/12/dist.txt' title='dist.txt'>dist.txt</a>):

<code>10
1 2 10
1 4 5
2 3 1
2 4 3
3 5 6
4 2 2
4 3 9
4 5 2
5 1 7
5 3 4
</code>

The graph is given as an edge list:
<ul>
	<li>the first line contains <em>e</em>, the number of edges</li>
	<li>the following <em>e</em> lines contain <em>3</em> numbers: <em>u</em>, <em>v</em> and <em>w</em> signifying that there's an edge from <em>u</em> to <em>v</em> of weight <em>w</em></li>
</ul>

That's it. Good luck and have fun. Always open to comments.

<h4>Finding the shortest path</h4>
<strong>UPDATE</strong> In response to <strong>campOs</strong>' comment.

Now we know the distance between the source node and any other node (the distance to the ith node is remembered in <strong>d[i]</strong>). But suppose we also need the path (which nodes make up the path).

Look at the above code. Where is <strong>d</strong> modified? Where is the recorded distance between the source and a node modified? In two places:

Firstly, <strong>d[s]</strong> is initialised to be <em>0</em>.

d[s] = 0;

And then, when a new shortest path is found, d[i] is updated accordingly:

for (i = 1; i <= n; ++i)
	if (dist&#91;mini&#93;&#91;i&#93;)
		if (d&#91;mini&#93; + dist&#91;mini&#93;&#91;i&#93; < d&#91;i&#93;) 
			d&#91;i&#93; = d&#91;mini&#93; + dist&#91;mini&#93;&#91;i&#93;;
&#91;/sourcecode&#93;

The important thing to notice here is that <strong>when you update the shortest distance to node i, you know the previous node in the path to i</strong>. This is, of course, <strong>mini</strong>. This suggests the solution to our problem.

For every node <strong>i</strong> other than the source, remember not only the distance to it, but also the previous node in the path to it. Thus we have a new array, <strong>prev</strong>.

Now, we need to make to modifications.
First, we initialise the value of <strong>prev[i]</strong> to something impossible (say <em>-1</em>) at the start of <strong>dijkstra()</strong>.

for (i = 1; i <= n; ++i) {
	d&#91;i&#93; = INFINITY;
	prev&#91;i&#93; = -1; /* no path has yet been found to i */
	visited&#91;i&#93; = 0; /* the i-th element has not yet been visited */
}
&#91;/sourcecode&#93;

Secondly, we update the value of <strong>prev[i]</strong> every time a new shortest path is found to i.

for (i = 1; i <= n; ++i)
	if (dist&#91;mini&#93;&#91;i&#93;)
		if (d&#91;mini&#93; + dist&#91;mini&#93;&#91;i&#93; < d&#91;i&#93;) {
			d&#91;i&#93; = d&#91;mini&#93; + dist&#91;mini&#93;&#91;i&#93;;
			prev&#91;i&#93; = mini;
		}
&#91;/sourcecode&#93;

Good. For every node reachable from the source we know which node is just before it in the shortest path. For the above example, we would have the following array:
<code>i - prev[i]
1 - -1
2 - 4
3 - 2
4 - 1
5 - 4
</code>

Using this, how do you get the path? Let's say you want to get to <em>3</em>. Which node comes right before <em>3</em>? Node <em>2</em>. Which node comes right before node <em>2</em>? Node <em>4</em>. Which node comes before <em>4</em>? Node <em>1</em>. We've reached the source, so we're done. Go through this list backwards and you get the path: <em>1 -&gt; 4 -&gt; 2 -&gt; 3</em>. This is easily implemented with <a href="http://en.wikipedia.org/wiki/Recursion">recursion</a>.


void printPath(int dest) {
	if (prev[dest] != -1)
		printPath(prev[dest]);
	printf("%d ", dest);
}

Here is the updated source: dijkstraWithPath.c.

Good luck.

Advertisements

In this article, I describe the greedy algorithm for solving the Fractional Knapsack Problem and give an implementation in C.

The Problem

The Fractional Knapsack Problem usually sounds like this:

Ted Thief has just broken into the Fort Knox! He sees himself in a room with n piles of gold dust. Because the each pile has a different purity, each pile also has a different value (v[i]) and a different weight (c[i]). Ted has a knapsack that can only hold W kilograms.

Given n, v, c and W, calculate which piles Ted should completely put into his knapsack and which he should put only a fraction of.

So, for this input:
n = 5
c = {12, 1, 2, 1, 4}
v = {4, 2, 2, 1, 10}
W = 15

Ted should take piles 2, 3, 4 and 5 completely and about 58% of pile 1.

You’re usually dealling with a knapsack problem when you’re give the cost and the benefits of certain objects and asked to obtain the maximum benefit so that the sum of the costs is smaller than a given value. You’ve got the fractional knapsack problem when you can take fractions (as opposed to all or nothing) of the objects.

The Algorithm

This is a standard greedy algorithm. In fact, it’s one of the classic examples.

The idea is to calculate for each object the ratio of value/cost, and sort them according to this ratio. Then you take the objects with the highest ratios and add them until you can’t add the next object as whole. Finally add as much as you can of the next object.

So, for our example:
v = {4, 2, 2, 1, 10}
c = {12, 1, 2, 1, 4}
r = {1/3, 2, 1, 1, 5/2}

From this it’s obvious that you should add the objects: 5, 2, 3, 4 and then as much as possible of 1.
The output of my programme is this:
Added object 5 (10$, 4Kg) completly in the bag. Space left: 11.
Added object 2 (2$, 1Kg) completly in the bag. Space left: 10.
Added object 3 (2$, 2Kg) completly in the bag. Space left: 8.
Added object 4 (1$, 1Kg) completly in the bag. Space left: 7.
Added 58% (4$, 12Kg) of object 1 in the bag.
Filled the bag with objects worth 15.48$.

The Programme

Now, you could implement the algorithm as stated, but for practical reasons you may wish to trade speed for simplicity. That’s what I’ve done here: instead of sorting the objects, I simply go through them every time searching for the best ratio. This modification turns an O(n*lg(n)) algorithm into an O(n2) one. For small values of n, this doesn’t matter and n is usually small.

Here’s the code in C (fractional_knapsack.c):

#include <stdio.h>

int n = 5; /* The number of objects */
int c[10] = {12, 1, 2, 1, 4}; /* c[i] is the *COST* of the ith object; i.e. what
				YOU PAY to take the object */
int v[10] = {4, 2, 2, 1, 10}; /* v[i] is the *VALUE* of the ith object; i.e.
				what YOU GET for taking the object */
int W = 15; /* The maximum weight you can take */

void simple_fill() {
	int cur_w;
	float tot_v;
	int i, maxi;
	int used[10];

	for (i = 0; i < n; ++i)
		used&#91;i&#93; = 0; /* I have not used the ith object yet */

	cur_w = W;
	while (cur_w > 0) { /* while there's still room*/
		/* Find the best object */
		maxi = -1;
		for (i = 0; i < n; ++i)
			if ((used&#91;i&#93; == 0) &&
				((maxi == -1) || ((float)v&#91;i&#93;/c&#91;i&#93; > (float)v[maxi]/c[maxi])))
				maxi = i;

		used[maxi] = 1; /* mark the maxi-th object as used */
		cur_w -= c[maxi]; /* with the object in the bag, I can carry less */
		tot_v += v[maxi];
		if (cur_w >= 0)
			printf("Added object %d (%d$, %dKg) completly in the bag. Space left: %d.\n", maxi + 1, v[maxi], c[maxi], cur_w);
		else {
			printf("Added %d%% (%d$, %dKg) of object %d in the bag.\n", (int)((1 + (float)cur_w/c[maxi]) * 100), v[maxi], c[maxi], maxi + 1);
			tot_v -= v[maxi];
			tot_v += (1 + (float)cur_w/c[maxi]) * v[maxi];
		}
	}

	printf("Filled the bag with objects worth %.2f$.\n", tot_v);
}

int main(int argc, char *argv[]) {
	simple_fill();

	return 0;
}

That’s it. Good luck.

Always open to comments.

Update: The next article in this series is The 0-1 Knapsack Problem.