November 2007


In this article, I describe the Bellman-Ford algorithm for finding the one-source shortest paths in a graph, give an informal proof and provide the source code in C for a simple implementation.

To understand this you should know what a graph is, and how to store one in memory. If in doubt check this and this.

Another solution to this problem is Dijkstra’s algorithm.

The Problem

Given the following graph, calculate the length of the shortest path from node 1 to node 2.
bf1.png

It’s obvious that there’s a direct route of length 6, but take a look at path: 1 -> 4 -> 3 -> 2. The length of the path is 7 – 3 – 2 = 2, which is less than 6. BTW, you don’t need negative edge weights to get such a situation, but they do clarify the problem.

This also suggests a property of shortest path algorithms: to find the shortest path form x to y, you need to know, beforehand, the shortest paths to y‘s neighbours. For this, you need to know the paths to y‘s neighbours’ neighbours… In the end, you must calculate the shortest path to the connected component of the graph in which x and y are found.

That said, you usually calculate the shortest path to all nodes and then pick the ones you’re intrested in.

The Algorithm

The Bellman-Ford algorithm is one of the classic solutions to this problem. It calculates the shortest path to all nodes in the graph from a single source.

The basic idea is simple:
Start by considering that the shortest path to all nodes, less the source, is infinity. Mark the length of the path to the source as 0:
bf2.png

Take every edge and try to relax it:
bf3.png

Relaxing an edge means checking to see if the path to the node the edge is pointing to can’t be shortened, and if so, doing it. In the above graph, by checking the edge 1 -> 2 of length 6, you find that the length of the shortest path to node 1 plus the length of the edge 1 -> 2 is less then infinity. So, you replace infinity in node 2 with 6. The same can be said for edge 1 -> 4 of length 7. It’s also worth noting that, practically, you can’t relax the edges whose start has the shortest path of length infinity to it.

Now, you apply the previous step n – 1 times, where n is the number of nodes in the graph. In this example, you have to apply it 4 times (that’s 3 more times).
bf4.png

bf5.png

bf6.png

That’s it, here’s the algorithm in a condensed form:

void bellman_ford(int s) {
int i, j;

for (i = 0; i < n; ++i) d[i] = INFINITY; d[s] = 0; for (i = 0; i < n - 1; ++i) for (j = 0; j < e; ++j) if (d[edges[j].u] + edges[j].w < d[edges[j].v]) d[edges[j].v] = d[edges[j].u] + edges[j].w; } [/sourcecode] Here, d[i] is the shortest path to node i, e is the number of edges and edges[i] is the i-th edge.

It may not be obvious why this works, but take a look at what is certain after each step. After the first step, any path made up of at most 2 nodes will be optimal. After the step 2, any path made up of at most 3 nodes will be optimal… After the (n – 1)-th step, any path made up of at most n nodes will be optimal.

The Programme

The following programme just puts the bellman_ford function into context. It runs in O(VE) time, so for the example graph it will do something on the lines of 5 * 9 = 45 relaxations. Keep in mind that this algorithm works quite well on graphs with few edges, but is very slow for dense graphs (graphs with almost n2 edges). For graphs with lots of edges, you’re better off with Dijkstra’s algorithm.

Here’s the source code in C (bellmanford.c):

#include

typedef struct {
int u, v, w;
} Edge;

int n; /* the number of nodes */
int e; /* the number of edges */
Edge edges[1024]; /* large enough for n <= 2^5=32 */ int d[32]; /* d[i] is the minimum distance from node s to node i */ #define INFINITY 10000 void printDist() { int i; printf("Distances:\n"); for (i = 0; i < n; ++i) printf("to %d\t", i + 1); printf("\n"); for (i = 0; i < n; ++i) printf("%d\t", d[i]); printf("\n\n"); } void bellman_ford(int s) { int i, j; for (i = 0; i < n; ++i) d[i] = INFINITY; d[s] = 0; for (i = 0; i < n - 1; ++i) for (j = 0; j < e; ++j) if (d[edges[j].u] + edges[j].w < d[edges[j].v]) d[edges[j].v] = d[edges[j].u] + edges[j].w; } int main(int argc, char *argv[]) { int i, j; int w; FILE *fin = fopen("dist.txt", "r"); fscanf(fin, "%d", &n); e = 0; for (i = 0; i < n; ++i) for (j = 0; j < n; ++j) { fscanf(fin, "%d", &w); if (w != 0) { edges[e].u = i; edges[e].v = j; edges[e].w = w; ++e; } } fclose(fin); /* printDist(); */ bellman_ford(0); printDist(); return 0; } [/sourcecode] And here's the input file used in the example (dist.txt):
5
0 6 0 7 0
0 0 5 8 -4
0 -2 0 0 0
0 0 -3 9 0
2 0 7 0 0

That’s an adjacency matrix.

That’s it. Have fun. Always open to comments.

Advertisements

The 0-1 Knapsack Problem (AKA The Discrete Knapsack Problem) is a famous problem solvable by dynamic-programming. In this article, I describe the problem, the most common algorithm used to solve it and then provide a sample implementation in C.

If you’ve never heard of the Knapsack Problems before, it will help to read this previous post.

The Problem

The Discrete (0-1) Knapsack Problem usually sounds like this:

Little Red Riding Hood wants to bring grandma a basket of goodies. She has an unlimited supply of n types of sweets each weighting c[i] and having the nutritional value of v[i]. Her basket can hold at most W kilograms of sweets.

Given n, c, v and W, figure out which sweets and how many to take so that the nutritional value in maximal.

So, for this input:
n = 3
c = {8, 6, 4}
v = {16, 10, 7}
W = 10

LRRH should take one of 3 and one of 2, amassing 17 nutritional points.

You’re usually dealling with a knapsack problem when you’re give the cost and the benefits of certain objects and asked to obtain the maximum benefit so that the sum of the costs is smaller than a given value. You have got the Discrete Knapsack Problem when you can only take the whole object or none at all and you have an unlimited supply of objects.

The Algorithm

This is a dynamic-programming algorithm.

The idea is to first calculate the maximum benefit for weight x and only after that to calculate the maximum benefit for x+1. So, on the whole, you first calculate the maximum benefit for 1, then for 2, then for 3, …, then for W-1 and, finally, for W. I store the maximum benefits in an array named a.

Start with a[0] = 0. Then for every a between 1 … W use the formula:
a[i] = max{vj + a(i − cj) | cj ≤ i }

The basic idea is that to reach weight x, you have to add an object of weight w to a previous maximum benefit. More specifically, you have to add w to x – w. Now, there will probably be several ways to reach weight x, so you have to choose the one that maximises the benefit. That’s what the max is for.

Basically, the formula says: “To calculate the benefit of weight x, take every object (value: v; weight: w) and see if the benefit for x – w plus v is greater than the current benefit for x. If so, change it.”

So, for the example, the programme would output (and do) this:
Weight 0; Benefit: 0; Can't reach this exact weight.
Weight 1; Benefit: 0; Can't reach this exact weight.
Weight 2; Benefit: 0; Can't reach this exact weight.
Weight 3; Benefit: 0; Can't reach this exact weight.
Weight 4; Benefit: 7; To reach this weight I added object 3 (7$ 4Kg) to weight 0.
Weight 5; Benefit: 7; To reach this weight I added object 3 (7$ 4Kg) to weight 1.
Weight 6; Benefit: 10; To reach this weight I added object 2 (10$ 6Kg) to weight 0.
Weight 7; Benefit: 10; To reach this weight I added object 2 (10$ 6Kg) to weight 1.
Weight 8; Benefit: 16; To reach this weight I added object 1 (16$ 8Kg) to weight 0.
Weight 9; Benefit: 16; To reach this weight I added object 1 (16$ 8Kg) to weight 1.
Weight 10; Benefit: 17; To reach this weight I added object 2 (10$ 6Kg) to weight 4.

The Programme

This programme runs in pseudo-plynominal time O(n * W). i.e. Slow as hell for large very values of W. Also because it holds to arrays of at least length W, it’s also horribly memory inefficient. Unfortunately, there’s not much you can do.

Here’s the code in C (knapsack10.c):

#include <stdio.h>

#define MAXWEIGHT 100

int n = 3; /* The number of objects */
int c[10] = {8, 6, 4}; /* c[i] is the *COST* of the ith object; i.e. what
				YOU PAY to take the object */
int v[10] = {16, 10, 7}; /* v[i] is the *VALUE* of the ith object; i.e.
				what YOU GET for taking the object */
int W = 10; /* The maximum weight you can take */ 

void fill_sack() {
	int a[MAXWEIGHT]; /* a[i] holds the maximum value that can be obtained
				using at most i weight */
	int last_added[MAXWEIGHT]; /* I use this to calculate which object were
					added */
	int i, j;
	int aux;

	for (i = 0; i <= W; ++i) {
		a&#91;i&#93; = 0;
		last_added&#91;i&#93; = -1;
	}

	a&#91;0&#93; = 0;
	for (i = 1; i <= W; ++i)
		for (j = 0; j < n; ++j)
			if ((c&#91;j&#93; <= i) && (a&#91;i&#93; < a&#91;i - c&#91;j&#93;&#93; + v&#91;j&#93;)) {
				a&#91;i&#93; = a&#91;i - c&#91;j&#93;&#93; + v&#91;j&#93;;
				last_added&#91;i&#93; = j;
			}

	for (i = 0; i <= W; ++i)
		if (last_added&#91;i&#93; != -1)
			printf("Weight %d; Benefit: %d; To reach this weight I added object %d (%d$ %dKg) to weight %d.\n", i, a&#91;i&#93;, last_added&#91;i&#93; + 1, v&#91;last_added&#91;i&#93;&#93;, c&#91;last_added&#91;i&#93;&#93;, i - c&#91;last_added&#91;i&#93;&#93;);
		else
			printf("Weight %d; Benefit: 0; Can't reach this exact weight.\n", i);

	printf("---\n");

	aux = W;
	while ((aux > 0) && (last_added[aux] != -1)) {
		printf("Added object %d (%d$ %dKg). Space left: %d\n", last_added[aux] + 1, v[last_added[aux]], c[last_added[aux]], aux - c[last_added[aux]]);
		aux -= c[last_added[aux]];
	}

	printf("Total value added: %d$\n", a[W]);
}

int main(int argc, char *argv[]) {
	fill_sack();

	return 0;
}

That’s it. Good luck. Always open to comments.

In this article, I describe the greedy algorithm for solving the Fractional Knapsack Problem and give an implementation in C.

The Problem

The Fractional Knapsack Problem usually sounds like this:

Ted Thief has just broken into the Fort Knox! He sees himself in a room with n piles of gold dust. Because the each pile has a different purity, each pile also has a different value (v[i]) and a different weight (c[i]). Ted has a knapsack that can only hold W kilograms.

Given n, v, c and W, calculate which piles Ted should completely put into his knapsack and which he should put only a fraction of.

So, for this input:
n = 5
c = {12, 1, 2, 1, 4}
v = {4, 2, 2, 1, 10}
W = 15

Ted should take piles 2, 3, 4 and 5 completely and about 58% of pile 1.

You’re usually dealling with a knapsack problem when you’re give the cost and the benefits of certain objects and asked to obtain the maximum benefit so that the sum of the costs is smaller than a given value. You’ve got the fractional knapsack problem when you can take fractions (as opposed to all or nothing) of the objects.

The Algorithm

This is a standard greedy algorithm. In fact, it’s one of the classic examples.

The idea is to calculate for each object the ratio of value/cost, and sort them according to this ratio. Then you take the objects with the highest ratios and add them until you can’t add the next object as whole. Finally add as much as you can of the next object.

So, for our example:
v = {4, 2, 2, 1, 10}
c = {12, 1, 2, 1, 4}
r = {1/3, 2, 1, 1, 5/2}

From this it’s obvious that you should add the objects: 5, 2, 3, 4 and then as much as possible of 1.
The output of my programme is this:
Added object 5 (10$, 4Kg) completly in the bag. Space left: 11.
Added object 2 (2$, 1Kg) completly in the bag. Space left: 10.
Added object 3 (2$, 2Kg) completly in the bag. Space left: 8.
Added object 4 (1$, 1Kg) completly in the bag. Space left: 7.
Added 58% (4$, 12Kg) of object 1 in the bag.
Filled the bag with objects worth 15.48$.

The Programme

Now, you could implement the algorithm as stated, but for practical reasons you may wish to trade speed for simplicity. That’s what I’ve done here: instead of sorting the objects, I simply go through them every time searching for the best ratio. This modification turns an O(n*lg(n)) algorithm into an O(n2) one. For small values of n, this doesn’t matter and n is usually small.

Here’s the code in C (fractional_knapsack.c):

#include <stdio.h>

int n = 5; /* The number of objects */
int c[10] = {12, 1, 2, 1, 4}; /* c[i] is the *COST* of the ith object; i.e. what
				YOU PAY to take the object */
int v[10] = {4, 2, 2, 1, 10}; /* v[i] is the *VALUE* of the ith object; i.e.
				what YOU GET for taking the object */
int W = 15; /* The maximum weight you can take */

void simple_fill() {
	int cur_w;
	float tot_v;
	int i, maxi;
	int used[10];

	for (i = 0; i < n; ++i)
		used&#91;i&#93; = 0; /* I have not used the ith object yet */

	cur_w = W;
	while (cur_w > 0) { /* while there's still room*/
		/* Find the best object */
		maxi = -1;
		for (i = 0; i < n; ++i)
			if ((used&#91;i&#93; == 0) &&
				((maxi == -1) || ((float)v&#91;i&#93;/c&#91;i&#93; > (float)v[maxi]/c[maxi])))
				maxi = i;

		used[maxi] = 1; /* mark the maxi-th object as used */
		cur_w -= c[maxi]; /* with the object in the bag, I can carry less */
		tot_v += v[maxi];
		if (cur_w >= 0)
			printf("Added object %d (%d$, %dKg) completly in the bag. Space left: %d.\n", maxi + 1, v[maxi], c[maxi], cur_w);
		else {
			printf("Added %d%% (%d$, %dKg) of object %d in the bag.\n", (int)((1 + (float)cur_w/c[maxi]) * 100), v[maxi], c[maxi], maxi + 1);
			tot_v -= v[maxi];
			tot_v += (1 + (float)cur_w/c[maxi]) * v[maxi];
		}
	}

	printf("Filled the bag with objects worth %.2f$.\n", tot_v);
}

int main(int argc, char *argv[]) {
	simple_fill();

	return 0;
}

That’s it. Good luck.

Always open to comments.

Update: The next article in this series is The 0-1 Knapsack Problem.

In this article I describe the Floyd-Warshall algorithm for finding the shortest path between all nodes in a graph. I give an informal proof and provide an implementation in C.

Shortest paths

The shortest path between two nodes of a graph is a sequence of connected nodes so that the sum of the edges that inter-connect them is minimal.

Take this graph,
p2.png

There are several paths between A and E:
Path 1: A -> B -> E 20
Path 2: A -> D -> E 25
Path 3: A -> B -> D -> E 35
Path 4: A -> D -> B -> E 20

There are several things to notice here:

  1. There can be more then one route between two nodes
  2. The number of nodes in the route isn’t important (Path 4 has 4 nodes but is shorter than Path 2, which has 3 nodes)
  3. There can be more than one path of minimal length

Something else that should be obvious from the graph is that any path worth considering is simple. That is, you only go through each node once.

Unfortunately, this is not always the case. The problem appears when you allow negative weight edges. This isn’t by itself bad. But if a loop of negative weight appears, then there is no shortest path. Look at this example:
A graph containing a negative weight loop

Look at the path B -> E -> D -> B. This is a loop, because the starting node is the also the end. What’s the cost? It’s 10 – 20 + 5 = -5. This means that adding this loop to a path once lowers the cost of the path by 5. Adding it twice would lower the cost by 2 * 5 = 10. So, whatever shortest path you may have come up with, you can make it smaller by going through the loop one more time. BTW there’s no problem with a negative cost path.

The Floyd-Warshall Algorithm

This algorithm calculates the length of the shortest path between all nodes of a graph in O(V3) time. Note that it doesn’t actually find the paths, only their lengths.

Let’s say you have the adjacency matrix of a graph. Assuming no loop of negative values, at this point you have the minimum distance between any two nodes which are connected by an edge.
A B C D E
A 0 10 0 5 0
B 10 0 5 5 10
C 0 5 0 0 0
D 5 5 0 0 20
E 0 10 0 20 0

The graph is the one shown above (the first one).

The idea is to try to interspace A between any two nodes in hopes of finding a shorter path.
A B C D E
A 0 10 0 5 0
B 10 0 5 5 10
C 0 5 0 0 0
D 5 5 0 0 20
E 0 10 0 20 0

Then try to interspace B between any two nodes:
A B C D E
A 0 10 15 5 20
B 10 0 5 5 10
C 15 5 0 10 15
D 5 5 10 0 15
E 20 10 15 15 0

Do the same for C:
A B C D E
A 0 10 15 5 20
B 10 0 5 5 10
C 15 5 0 10 15
D 5 5 10 0 15
E 20 10 15 15 0

Do the same for D:
A B C D E
A 0 10 15 5 20
B 10 0 5 5 10
C 15 5 0 10 15
D 5 5 10 0 15
E 20 10 15 15 0

And for E:
A B C D E
A 0 10 15 5 20
B 10 0 5 5 10
C 15 5 0 10 15
D 5 5 10 0 15
E 20 10 15 15 0

This is the actual algorithm:

# dist(i,j) is "best" distance so far from vertex i to vertex j 

# Start with all single edge paths.
 For i = 1 to n do
     For j = 1 to n do
         dist(i,j) = weight(i,j) 

 For k = 1 to n do # k is the `intermediate' vertex
     For i = 1 to n do
         For j = 1 to n do
             if (dist(i,k) + dist(k,j) < dist(i,j)) then # shorter path?
                 dist(i,j) = dist(i,k) + dist(k,j)

The Programme

Here’s the code in C(floyd_warshall.c):

#include

int n; /* Then number of nodes */
int dist[16][16]; /* dist[i][j] is the length of the edge between i and j if
it exists, or 0 if it does not */

void printDist() {
int i, j;
printf(” “);
for (i = 0; i < n; ++i) printf("%4c", 'A' + i); printf("\n"); for (i = 0; i < n; ++i) { printf("%4c", 'A' + i); for (j = 0; j < n; ++j) printf("%4d", dist[i][j]); printf("\n"); } printf("\n"); } /* floyd_warshall() after calling this function dist[i][j] will the the minimum distance between i and j if it exists (i.e. if there's a path between i and j) or 0, otherwise */ void floyd_warshall() { int i, j, k; for (k = 0; k < n; ++k) { printDist(); for (i = 0; i < n; ++i) for (j = 0; j < n; ++j) /* If i and j are different nodes and if the paths between i and k and between k and j exist, do */ if ((dist[i][k] * dist[k][j] != 0) && (i != j)) /* See if you can't get a shorter path between i and j by interspacing k somewhere along the current path */ if ((dist[i][k] + dist[k][j] < dist[i][j]) || (dist[i][j] == 0)) dist[i][j] = dist[i][k] + dist[k][j]; } printDist(); } int main(int argc, char *argv[]) { FILE *fin = fopen("dist.txt", "r"); fscanf(fin, "%d", &n); int i, j; for (i = 0; i < n; ++i) for (j = 0; j < n; ++j) fscanf(fin, "%d", &dist[i][j]); fclose(fin); floyd_warshall(); return 0; } [/sourcecode] Note that of the above programme, all the work is done by only five lines (30-48). That's it. Good luck. Always open to comments.

In this article I give an informal definition of a graph and of the minimum spanning tree. Afterwards I describe Prim’s algorithm and then follow its execution on an example. Finally, the code in C is provided.

Graphs

Wikipedia gives one of the common definitions of a graph:

In computer science, a graph is a kind of data structure, specifically an abstract data type (ADT), that consists of a set of nodes and a set of edges that establish relationships (connections) between the nodes. The graph ADT follows directly from the graph concept from mathematics.
Informally, G=(V,E) consists of vertices, the elements of V, which are connected by edges, the elements of E. Formally, a graph, G, is defined as an ordered pair, G=(V,E), where V is a finite set and E is a set consisting of two element subsets of V.

This is a graph:
p1.png

It’s a set of nodes (A, B, C, D and E) and the edges (lines) that interconnect them.

An important thing to note about this graph is that the edges are bidirectional, i.e. if A is connected to B, then B is connected to A. This makes it an undirected graph.

A common extension is to attribute weights to the edges. This is what I’ve done with the previous graph:
p2

Minimum spanning trees

Basically a minimum spanning tree is a subset of the edges of the graph, so that there’s a path form any node to any other node and that the sum of the weights of the edges is minimum.

Here’s the minimum spanning tree of the example:
g3.png

Look at the above image closely. It contains all of the initial nodes and some of the initial edges. Actually it contains exactly n – 1 edges, where n is the number of nodes. It’s called a tree because there are no cycles.

You can think of the graph as a map, with the nodes being cities, the edges passable terrain, and the weights the distance between the cities.

It’s worth mentioning that a graph can have several minimum spanning trees. Think of the above example, but replace all the weight with 1. The resulting graph will have 6 minimum spanning trees.

Given a graph, find one of its minimum spanning trees.

Prim’s Algorithm

One of the classic algorithms for this problem is that found by Robert C. Prim. It’s a greedy style algorithm and it’s guaranteed to produce a correct result.

In the following discussion, let the distance from each node not in the tree to the tree be the edge of minimal weight between that node and some node in the tree. If there is no such edge, assume the distance is infinity (this shouldn’t happen).

The algorithm (greedily) builds the minimal spanning tree by iteratively adding nodes into a working tree:

  1. Start with a tree which contains only one node.
  2. Identify a node (outside the tree) which is closest to the tree and add the minimum weight edge from that node to some node in the tree and incorporate the additional node as a part of the tree.
  3. If there are less then n – 1 edges in the tree, go to 2

For the example graph, here’s how it would run:

Start with only node A in the tree.
g4.png

Find the closest node to the tree, and add it.
g5.png

Repeat until there are n – 1 edges in the tree.
g6.png

g7.png

g8.png

The Programme

The following programme just follows the algorithm. It runs in O(n2) time.

Here’s the code in C (prim.c):

#include <stdio.h>

/*
	The input file (weight.txt) look something like this
		4
		0 0 0 21
		0 0 8 17
		0 8 0 16
		21 17 16 0

	The first line contains n, the number of nodes.
	Next is an nxn matrix containg the distances between the nodes
	NOTE: The distance between a node and itself should be 0
*/

int n; /* The number of nodes in the graph */

int weight[100][100]; /* weight[i][j] is the distance between node i and node j;
			if there is no path between i and j, weight[i][j] should
			be 0 */

char inTree[100]; /* inTree[i] is 1 if the node i is already in the minimum
			spanning tree; 0 otherwise*/

int d[100]; /* d[i] is the distance between node i and the minimum spanning
		tree; this is initially infinity (100000); if i is already in
		the tree, then d[i] is undefined;
		this is just a temporary variable. It's not necessary but speeds
		up execution considerably (by a factor of n) */

int whoTo[100]; /* whoTo[i] holds the index of the node i would have to be
			linked to in order to get a distance of d[i] */

/* updateDistances(int target)
	should be called immediately after target is added to the tree;
	updates d so that the values are correct (goes through target's
	neighbours making sure that the distances between them and the tree
	are indeed minimum)
*/
void updateDistances(int target) {
	int i;
	for (i = 0; i < n; ++i)
		if ((weight&#91;target&#93;&#91;i&#93; != 0) && (d&#91;i&#93; > weight[target][i])) {
			d[i] = weight[target][i];
			whoTo[i] = target;
		}
}

int main(int argc, char *argv[]) {
	FILE *f = fopen("dist.txt", "r");
	fscanf(f, "%d", &n);
	int i, j;
	for (i = 0; i < n; ++i)
		for (j = 0; j < n; ++j)
			fscanf(f, "%d", &weight&#91;i&#93;&#91;j&#93;);
	fclose(f);

	/* Initialise d with infinity */
	for (i = 0; i < n; ++i)
		d&#91;i&#93; = 100000;

	/* Mark all nodes as NOT beeing in the minimum spanning tree */
	for (i = 0; i < n; ++i)
		inTree&#91;i&#93; = 0;

	/* Add the first node to the tree */
	printf("Adding node %c\n", 0 + 'A');
	inTree&#91;0&#93; = 1;
	updateDistances(0);

	int total = 0;
	int treeSize;
	for (treeSize = 1; treeSize < n; ++treeSize) {
		/* Find the node with the smallest distance to the tree */
		int min = -1;
		for (i = 0; i < n; ++i)
			if (!inTree&#91;i&#93;)
				if ((min == -1) || (d&#91;min&#93; > d[i]))
					min = i;

		/* And add it */
		printf("Adding edge %c-%c\n", whoTo[min] + 'A', min + 'A');
		inTree[min] = 1;
		total += d[min];

		updateDistances(min);
	}

	printf("Total distance: %d\n", total);

	return 0;
}

And here’s a sample input file (dist.txt). It’s the example graph:
5
0 10 0 5 0
10 0 5 5 10
0 5 0 0 0
5 5 0 0 20
0 10 0 20 0

The code’s commented and there shouldn’t be any problems.

Good luck. Always open to comments.

In a previous article, I described the basics of binary arithmetic and gave a function to display the binary representation of a number. Here, we’ll look at several ways to count the set (1) bits in a number.

First of all, why would you want to count bits? Bitsets. If you use bitsets as a fast set implementation, you might want to find out how many elements there are in the set. I used this in a sudoku programme to memorise which digits can’t be placed in a particular cell. Bit counting functions are also used extensively in graphics (bitmap manipulations).

Here’s 22 in binary:
00010110

From the binary representation, it’s obvious that there are 3 set bits and 5 unset bits. How do you count them? I’ll give three methods.

Classic, let’s iterate through every bit, solution

The idea behind this is simple: take every bit in the number (n) and check if it’s 1. To do this, you can simply use a variable (i):

  1. initialise i with 1
  2. check if n AND i is greater than zero and if so, increment a counter
  3. multiply i by 2
  4. if i < n, go to 2

Here’s the code:

/* Count the ON bits in n using an iterative algorithm */
int bitcount(int n) {
	int tot = 0;

	int i;
	for (i = 1; i <= n; i = i<<1)
		if (n & i)
			++tot;

	return tot;
}
&#91;/sourcecode&#93;

This isn't bad and works in <strong>O(lg(n))</strong> time, but if you know (you probably will) if the number is made up mostly of ones or zeros, use one of the following algorithms.

<h5>Sparse ones algorithm</h5>
This solution relies on the following observation:
<code>22<sub>10</sub> = 00010110<sub>2</sub>
22 - 1 = 21<sub>10</sub> = 00010101<sub>2</sub>
22 AND 21 = 00010100
</code>

Notice what happened: by logically ANDing <em>22</em> and <em>21</em>, you get a number whose binary representation is the same as <em>22</em> but with the last <em>1</em> flipped to <em>0</em>.

The idea behind this algorithm is to logically AND <em>n</em> and <em>n - 1</em> until <em>n</em> is <em>0</em>. The number of times necessary to do this will the the number of <em>1</em> bits.

Here's the code:

/* Counts the ON bits in n. Use this if you know n is mostly 0s */
int bitcount_sparse_ones(int n) {
	int tot = 0;

	while (n) {
		++tot;
		n &= n - 1;
	}

	return tot;
}

Why call it sparse ones? Look at the algorithm, and apply it to a few numbers on a piece of paper. You’ll notice that you go through the inner loop the x times, where x is the number of 1s in the number. So the time is O(x), and it’s best to use it if there are few ones in the number.

Dense ones algorithm

But what if your number is mostly 1? The solution is obvious: flip every bit in n, then apply the sparse ones algorithm:

/* Counts the ON bits in n. Use this if you know n is mostly 1s */
int bitcount_dense_ones(int n) {
	int tot = 0;

	n ^= (unsigned int)-1;

	while (n) {
		++tot;
		n &= n - 1;
	}

	return sizeof(int) * 8 - tot;
}
Full source

Here’s the full C source to the programme (bit.c):
NOTE: Source is mangled by WordPress. Don’t copy-paste; download the file.

#include

/* Print n as a binary number */
void printbits(int n) {
unsigned int i, step;

if (0 == n) { /* For simplicity’s sake, I treat 0 as a special case*/
printf(“0000”);
return;
}

i = 1<>= 4; /* In groups of 4 */
while (step >= n) {
i >>= 4;
step >>= 4;
}

/* At this point, i is the smallest power of two larger or equal to n */
while (i > 0) {
if (n & i)
printf(“1”);
else
printf(“0”);
i >>= 1;
}
}

/* Count the ON bits in n using an iterative algorithm */
int bitcount(int n) {
int tot = 0;

int i;
for (i = 1; i <= n; i = i<<1) if (n & i) ++tot; return tot; } /* Counts the ON bits in n. Use this if you know n is mostly 0s */ int bitcount_sparse_ones(int n) { int tot = 0; while (n) { ++tot; n &= n - 1; } return tot; } /* Counts the ON bits in n. Use this if you know n is mostly 1s */ int bitcount_dense_ones(int n) { int tot = 0; n ^= (unsigned int)-1; while (n) { ++tot; n &= n - 1; } return sizeof(int) * 8 - tot; } int main(int argc, char *argv[]) { int i; for (i = 0; i < 23; ++i) { printf("%d = ", i); printbits(i); printf("\tON bits: %d %d %d", bitcount(i), bitcount_sparse_ones(i), bitcount_dense_ones(i)); printf("\n"); } return 0; } [/sourcecode] That's it. If you're intrested in a few more (slightly esoteric) algorithms, see this article.

Good luck. Always open to comments.