Combinatorics


Wikipedia defines combinations as:

In combinatorial mathematics, a combination is an un-ordered collection of unique elements. (An ordered collection is called a permutation.) Given S, the set of all possible unique elements, a combination is a subset of the elements of S. The order of the elements in a combination is not important (two lists with the same elements in different orders are considered to be the same combination). Also, the elements cannot be repeated in a combination (every element appears uniquely once); this is often referred to as “without replacement/repetition”. This is because combinations are defined by the elements contained in them, s the set {1, 1, 1} is the same as {1}. For example, from a 52-card deck any 5 cards can form a valid combination (a hand). The order of the cards doesn’t matter and there can be no repetition of cards.

Mathworld provides a more terse definition:

The number of ways of picking k unordered outcomes from n possibilities.

The combinations of n elements chosen as k is the number of unique ways of selecting k elements from a set of n.

From now on, by set of n I always mean one of the form {1, 2, 3, …, n}.

So, what are the ways of choosing 2 elements from a set of 4, {1, 2, 3, 4}?
{1, 2}
{1, 3}
{1, 4}
{2, 3}
{2, 4}
{3, 4}

That’s 6 ways, but what is the general formula?
Formula for combinations of n chosen as k

This is easily proved: for a set of n, there are n ways of choosing the first element, n * (n – 1) ways of choosing the first two elements, …, n * (n – 1) * … * (n – k + 1) ways of choosing the first k elements. Unfortunately, this will generate duplicate subsets: for every subset of k elements, this will generate all the k! permutations of the subset. So, we have to divide the total number of subsets (n * (n – 1) * … * (n – k + 1)) by the number of repetitions (k!). This yields exactly the formula noted above.

Combinations are an astoundingly wide-spread concept, and are used in every branch of mathematics and especially in the analysis of algorithms. This said, there’s only one thing you really need to know: how to apply the formula.

Look at the formula above, notice that there are exactly k factors in the nominator and k factors in the denominator. So, to remember the formula and easily apply it:
P1. Draw the fraction line.
P2. Above the line, write k terms of the form: n, n - 1, n - 2, ...
P3. Below the line, write k terms of the form: 1, 2, 3, ...

Here are a few examples:
Combinations of 4 chosen as 1, 2, 3 and 4

And now for the fun part. How do you generate combinations? Look closely at the example above. First thing to note is that every combination is an array of k elements. Next, the first digit in every set is, basically, every digit between 1 and n. What about the other digits? They’re always between 1 and n and they’re always in ascending order. Now it should be obvious what the algorithm is:
P1. Start of with (1, 2, ..., k); this is the first combination.
P2. Print it.
P3. Given the combination (c0, c1, ..., cn), start from the back and for ci, if it is larger than n - k + 1 + i then increment it and go on to the next indice i. After this, if c0 > n - k, then this is not a valid combination so we stop. Otherwise give ci+1, ci+2, ... the values of ci + 1, ci+1 + 1, .... Jump to P2.

Here’s the sourcecode in C (comb1.c):
NOTE: Source is mangled by WordPress. Download the source file, or copy-paste it from here or remember to replace the amp-s with ampersands and the lt-s with “less then” signs.

#include <stdio.h>

/* Prints out a combination like {1, 2} */
void printc(int comb[], int k) {
	printf("{");
	int i;
	for (i = 0; i < k; ++i)
		printf("%d, ", comb[i] + 1);
	printf("\\b\\b}\\n");
}

/*
	next_comb(int comb[], int k, int n)
		Generates the next combination of n elements as k after comb

	comb => the previous combination ( use (0, 1, 2, ..., k) for first)
	k => the size of the subsets to generate
	n => the size of the original set

	Returns: 1 if a valid combination was found
		0, otherwise
*/
int next_comb(int comb[], int k, int n) {
	int i = k - 1;
	++comb[i];
	while ((i >= 0) &amp;&amp; (comb[i] >= n - k + 1 + i)) {
		--i;
		++comb[i];
	}

	if (comb[0] > n - k) /* Combination (n-k, n-k+1, ..., n) reached */
		return 0; /* No more combinations can be generated */

	/* comb now looks like (..., x, n, n, n, ..., n).
	Turn it into (..., x, x + 1, x + 2, ...) */
	for (i = i + 1; i &lt; k; ++i)
		comb[i] = comb[i - 1] + 1;

	return 1;
}

int main(int argc, char *argv[]) {
	int n = 5; /* The size of the set; for {1, 2, 3, 4} it's 4 */
	int k = 3; /* The size of the subsets; for {1, 2}, {1, 3}, ... it's 2 */
	int comb[16]; /* comb[i] is the index of the i-th element in the
			combination */

	/* Setup comb for the initial combination */
	int i;
	for (i = 0; i &lt; k; ++i)
		comb[i] = i;

	/* Print the first combination */
	printc(comb, k);

	/* Generate and print all the other combinations */
	while (next_comb(comb, k, n))
		printc(comb, k);

	return 0;
}

Always open to comments. Have fun.

Advertisements

Wikipedia defines the partition of a set as:

In mathematics, a partition of a set X is a division of X into non-overlapping “parts” or “blocks” or “cells” that cover all of X. More formally, these “cells” are both collectively exhaustive and mutually exclusive with respect to the set being partitioned.

A more succinct definition is given by Mathworld:

A set partition of a set S is a collection of disjoint subsets of S whose union is S.

Simply put, the partitions of a set S are all the ways in which you can choose disjoint, non-empty subsets of S that unioned result in S.

From now on, when I say a set of n elements, I mean {1, 2, …, n}. So, what are the subsets of {1, 2, 3}?
{1, 2, 3}
{2, 3} {1}
{1, 3} {2}
{3} {1, 2}
{3} {2} {1}

It’s obvious that these verify the definition: {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3} and {1, 2, 3} are all subsets of {1, 2, 3}. They’re all non-empty and, in any partition, the same element never appears twice. Finally, in a partitioning, the union of the partitions is the original set.

In how many ways can you partition a set of n elements? There are many ways to calculate this, but as far as I can tell, the easiest is using Catalan numbers:
Formula for the nth Catalan Number

If you check the formula for 3 you’ll see that it does give the correct answer: 5.

A reader pointed out that what we may need here are not Catalan numbers, but Bell numbers. Wikipedia’s definition seems to agree with him.

Ok. We know what a partitioning is, we know how many there are, but how do you generate them? This is the first algorithm I could think of. It may not be clear from the explanation why it works but try it on a piece of paper for n=3 and it will become obvious. Here’s how I came up with it:

First of all, how do you represent a partitioning of a set of n elements? The straight-forward way would be using a vector of n integers, each integer representing the number of the subset in which the corresponding element is in. If the corresponding element of 3 is 2, that means that 3 is in the 2nd subset. So, given the set {1, 2, 3}:
Partitioning -> Encoding
{1, 2, 3} -> (1, 1, 1)
{1} {2, 3} -> (2, 1, 1)
{2} {1, 3} -> (1, 2, 1)
{1, 2} {3} -> (2, 2, 1)
{1} {2} {3} -> (3, 2, 1)

Notice that the encodings, written backwards are: 111, 112, 121, 122 and 123. From this you can guess how the generator works: more or less, generate all the numbers between 111 and 123 using only the digits 1, 2 and 3:

111
112
113
121
122
123

That’s almost right. The encodings (1, 1, 2) and (1, 1, 3) translate into the same partitioning: {1} {2, 3}. If you do the same thing for a larger n you’ll notice this happening again and again. Fortunately, there’s an easy solution: never use a digit that’s more than 1 larger than any other digit in the encoding. i.e. You can’t use (1, 1, 3) because 3 is larger by 2 than the other digits in the encoding (1 and 1).

To do this, I use another vector m with the following significance: m[i] is the largest of the first i elements in the encoding. This makes it very easy not to generate any duplicate partitionings.

Here’s the code in C (part.c):

#include <stdio.h>

/*
	printp
		- print out the partitioning scheme s of n elements 
		as: {1, 2, 4} {3}
*/
void printp(int *s, int n) {
	/* Get the total number of partitions. In the exemple above, 2.*/
	int part_num = 1;
	int i;
	for (i = 0; i < n; ++i)
		if (s&#91;i&#93; > part_num)
			part_num = s[i];

	/* Print the p partitions. */
	int p;
	for (p = part_num; p >= 1; --p) {
		printf("{");
		/* If s[i] == p, then i + 1 is part of the pth partition. */
		for (i = 0; i < n; ++i)
			if (s&#91;i&#93; == p)
				printf("%d, ", i + 1);
		printf("\\b\\b} ");
	}
	printf("\\n");
}

/*
	next
		- given the partitioning scheme represented by s and m, generate
		the next

	Returns: 1, if a valid partitioning was found
		0, otherwise
*/
int next(int *s, int *m, int n) {
	/* Update s: 1 1 1 1 -> 2 1 1 1 -> 1 2 1 1 -> 2 2 1 1 -> 3 2 1 1 ->
	1 1 2 1 ... */
	/*int j;
	printf(" -> (");
	for (j = 0; j &lt; n; ++j)
		printf("%d, ", s[j]);
	printf("\\b\\b)\\n");*/
	int i = 0;
	++s[i];
	while ((i < n - 1) &amp;&amp; (s&#91;i&#93; > m[i] + 1)) {
		s[i] = 1;
		++i;
		++s[i];
	}

	/* If i is has reached n-1 th element, then the last unique partitiong
	has been found*/
	if (i == n - 1)
		return 0;

	/* Because all the first i elements are now 1, s[i] (i + 1 th element)
	is the largest. So we update max by copying it to all the first i
	positions in m.*/
	int max = s[i];
	for (i = i - 1; i >= 0; --i)
		m[i] = max;

/*	for (i = 0; i &lt; n; ++i)
		printf("%d ", m[i]);
	getchar();*/
	return 1;
}

int main(int argc, char *argv[]) {
	int s[16]; /* s[i] is the number of the set in which the ith element
			should go */
	int m[16]; /* m[i] is the largest of the first i elements in s*/

	int n = 3;
	int i;
	/* The first way to partition a set is to put all the elements in the same
	   subset. */
	for (i = 0; i &lt; n; ++i) {
		s[i] = 1;
		m[i] = 1;
	}

	/* Print the first partitioning. */
	printp(s, n);

	/* Print the other partitioning schemes. */
	while (next(s, m, n))
		printp(s, n);

	return 0;
}

The code is heavily commented, but I’ll happily respond to any questions. This is also what I used to generate all the above listings. Try decommenting some of the code to see how the programme works. Good luck!

P.S. Every encoding after (3, 2, 1) yields a duplicate partitioning. For fun, try proving this mathematically.

There quite a few definitions of what a set is, but it all boils down to this:

A set defined as a collection of distinct elements, in which order is not important.

So {1, 2, 3}, {3, 4}, {} and {5, 99, -1} are all sets. Because the order of the elements is ignored, {1, 2, 3} and {3, 2, 1} is the same set. In case you’re wandering, there are exactly n! diffrent ways to write a set of n elements.

For the rest of the discussion, I’ll use sets of the form {1, 2, …, n}, so when I say a set of 3 elements, I mean {1, 2, 3}. Just remember that is not a property of sets. They can contain anything as elements, not necessarily consecutive numbers.

The set S1 is said to be the subset of the set S2, if all the elements of S1 also belong to S2.

Knowing this, it’s easy to figure out the subsets of {1, 2, 3}:
{ }
{ 1 }
{ 2 }
{ 1, 2 }
{ 3 }
{ 1, 3 }
{ 2, 3 }
{ 1, 2, 3 }

How many subsets are there? For a set of one element, there are 2 subsets: {} and {1}. For a set of 2 elements, there are 4 subsets: {}, {1}, {2}, {1, 2}. For a set of 3 elements, there are 8 subsets. Notice the pattern?
n = 1: 21
n = 2: 22
n = 3: 23

For a set of n there are 2n subsets. This is easily proved: Any subset of the set can either contain or not contain an element; so, for a subset, there are 2 states for the first element, 2 for the second element, …, 2 for the nth element; so, there are 2 states for the first element, 2 * 2 = 22 states for the first two, 2 * 2 * 2= 23 states for the first three, …, 2 * 2 * 2 * … * 2 = 2n states for all the n elements.

The problem here is how to generate all the subsets of a given set. There are a few algorithms for doing this, but in the end, only two are worth considering.

The first is this: given all the subsets of S and the element y, you can generate all the subsets of S U {y} by taking each subset of S, once adding to it y and once leaving it as it is. i.e. Knowing that {1, 3} is a subset of S, you obtain the following two subsets of S U {y}: {1, 3, y} and {1, 3}.

This does what it’s supposed to – it generates all the subsets of S, and it wastes no time. It can also be used as another way to prove that there are 2n subsets for any set of n elements. The only problem is that you need the subsets from the previous step to generate those of this step. This means that just before the end, you must have 2n – 1 subsets in memory. Considering how much memory computers have this days, it’s not particularly wasteful, but still, there’s a better way.

The better way involves using a mask. If you have the a set of n elements, a valid mask would be an array of n boolean (true/false; 1/0) elements. When you apply a mask to a set, you check each element (e) in the set and the corresponding one in the mask (m): if m is true(1), you add e to the result, otherwise, you ignore it. After applying the mask (0, 1, 0, 0, 1) to {1, 2, 3, 4, 5}, you get {2, 5}.

So, to generate all the subsets of a set of n elements, you first have to generate all the possible 2n masks of the set and then apply them.

Generating the masks is a simple problem. Basically, you just have to implement a binary counter, i.e. something that generates:
000
001
010
011
100
101
110
111

Here’s the code in C (sub.c):

#include <stdio.h>

/* Applies the mask to a set like {1, 2, ..., n} and prints it */
void printv(int mask[], int n) {
	int i;
	printf("{ ");
	for (i = 0; i &lt; n; ++i)
		if (mask[i])
			printf("%d ", i + 1); /*i+1 is part of the subset*/
	printf("\\b }\\n");
}

/* Generates the next mask*/
int next(int mask[], int n) {
	int i;
	for (i = 0; (i &lt; n) &amp;&amp; mask[i]; ++i)
		mask[i] = 0;

	if (i &lt; n) {
		mask[i] = 1;
		return 1;
	}
	return 0;
}

int main(int argc, char *argv[]) {
	int n = 3;

	int mask[16]; /* Guess what this is */
	int i;
	for (i = 0; i &lt; n; ++i)
		mask[i] = 0;

	/* Print the first set */
	printv(mask, n);

	/* Print all the others */
	while (next(mask, n))
		printv(mask, n);

	return 0;
}

Note: The next() function generates the bits in reverse order.

Always open to comments.

Last time, we defined what permutation is and gave a few basic properties.

In a few minutes we’ll see another algorithm for generating them, but first a little theory.

Lexicographical order is defined by Wikipedia as:

In mathematics, the lexicographic or lexicographical order, (also known as dictionary order, alphabetic order or lexicographic(al) product), is a natural order structure of the Cartesian product of two ordered sets.

Given two partially ordered sets A and B, the lexicographical order on the Cartesian product A × B is defined as
(a,b) ≤ (a′,b′) if and only if a < a′ or (a = a′ and b ≤ b′).

The result is a partial order. If A and B are totally ordered, then the result is a total order also.

More generally, one can define the lexicographic order on the Cartesian product of n ordered sets, on the Cartesian product of a countably infinite family of ordered sets, and on the union of such sets.

Mathworld adds the following regarding permutations and sets:

When applied to permutations, lexicographic order is increasing numerical order (or equivalently, alphabetic order for lists of symbols; Skiena 1990, p. 4). For example, the permutations of {1,2,3} in lexicographic order are 123, 132, 213, 231, 312, and 321.

When applied to subsets, two subsets are ordered by their smallest elements (Skiena 1990, p. 44). For example, the subsets of {1,2,3} in lexicographic order are {}, {1}, {1,2}, {1,2,3}, {1,3}, {2}, {2,3}, {3}.

An easy way to determine if a set is lexicographically after another is to interpret them as numbers in base n, where n is the largest element the set contains. So, (2, 1, 3) is after (1, 2, 3) because 213 < 123. Note: You may also choose n as any number greater than the largest element of the set. This is particularly convenient as most would rather use numbers in base 10 and not base 3.

Ok, but what does this have to do with permutations? Well, generating permutations in any order isn’t enough; you must generate them in lexicographic order.

Now, if you run last times’ algorithm, you find that, for n = 3, it prints:

1 2 3
1 3 2
2 1 3
2 3 1
3 1 2
3 2 1

Now, 123 < 132 < 213 < 231 < 312 < 321. So, the permutations are in lexicographic order!

The worst algorithm for any problem is usually called naive, but a more adequate adjective for the last algorithm would be retarded. It’s the slowest one I can think of, but it’s extraordinarily easy to explain.

This algorithm is slightly faster (about twice as fast) than the last one. It’s quite complex and harder to understand. It does the same thing as the last one, but where the naive algorithm just generated all possible sets, this one generates only valid permutations.

Here it is:

P1. Given n, we start with the first imaginable permutation p = (1, 2, ..., n) from the lexicographic point of view.

P2. Print the the permutation p or use it for something else.

P3. Let's say we have already build the permutation p = (p1, p2, ..., pn). In order to obtain the next permutation, we must first find the largest index i so that Pi<Pi + 1. Then, the element, Pi will be swapped with the smallest of the elements after Pi, but not larger than Pi. Finally, the last n - i elements will be reversed so that they appear in ascending order. Then, jump to P2.

That’s it for the algorithm, here’s the code in C (lexicoPerm.c):

#include <stdio.h>

void printv(int v[], int n) {
	int i;

	for (i = 0; i < n; i++)
		printf("%d ", v[i]);
	printf("\\n");
}

/*!
	This just swaps the values of a and b

	i.e if a = 1 and b = 2, after

		SWAP(a, b);

	a = 2 and b = 1
*/
#define SWAP(a, b) a = a + b - (b = a)

/*!
	Generates the next permutation of the vector v of length n.

	@return 1, if there are no more permutations to be generated

	@return 0, otherwise
*/
int next(int v[], int n) {
	/* P2 */
	/* Find the largest i */
	int i = n - 2;
	while ((i >= 0) &amp;&amp; (v[i] > v[i + 1]))
		--i;

	/* If i is smaller than 0, then there are no more permutations. */
	if (i < 0)
		return 1;

	/* Find the largest element after vi but not larger than vi */
	int k = n - 1;
	while (v[i] > v[k])
		--k;
	SWAP(v[i], v[k]);

	/* Swap the last n - i elements. */
	int j;
	k = 0;
	for (j = i + 1; j &lt; (n + i) / 2 + 1; ++j, ++k)
		SWAP(v[j], v[n - k - 1]);

	return 0;
}

int main(int argc, char *argv[]) {
	int v[128];
	int n = 3;

	/* The initial permutation is 1 2 3 ...*/
	/* P1 */
	int i;
	for (i = 0; i &lt; n; ++i)
		v[i] = i + 1;
	printv(v, n);

	int done = 1;
	do {
		if (!(done = next(v, n)))
			printv(v, n); /* P3 */
	} while (!done);

	return 0;
}



The code is commented and it does nothing but implement the algorithm. Have fun!

A permutation of n objects is an arrangement of n distinct objects.

Wikipedia gives a slightly more detailed definition:

Permutation is the rearrangement of objects or symbols into distinguishable sequences. Each unique ordering is called a permutation. (For cases wherein the ordering of elements is irrelevant, compare combination and set.) For example, with the numerals one to six, each possible ordering consists of a complete list of the numerals, without repetitions. There are 720 total permutations of these numerals, one of which is: “4, 5, 6, 1, 2, 3”.

And Mathworld gives the standard mathematical definition:

A permutation, also called an “arrangement number” or “order,” is a rearrangement of the elements of an ordered list S into a one-to-one correspondence with S itself.

Permutations are crucial to studying the behaviour of many algorithms and we’ll find a lot of intresting things about them.

For starters, what are the permutations of {1, 2, 3}? The definition says a permutation is a rearrangement of the list’s elements. So, the permutations (plural) are all the possible rearrangements of the list’s elements. This gives us six permutations:


123, 132, 213, 231, 312, 321

For convenience, we’ll only work with sets like {1, 2, 3, …, n}. In computer science, the permutations of this set is called the permutations of n. In mathematics the permutations of n means the number of permutations of the given set.

How many permutations of n are there? This is easily solved, to create a permutation one element at a time: there are n ways in which to choose the first element; then, there are n – 1 ways in which to choose the second element, so that no element repeats itself; then, there are n – 2 ways to choose the third element; …; finally, there is only one way to choose the nth element. How many posibilities does this give us? So, n ways to choose the 1st element, n(n – 1) ways for the first 2 elements, n(n – 1)(n – 2) for the first 3 elements, …, n(n – 1)(n – 2)…(n – k + 1) for the first k elements, …, n(n – 1)(n – 2)…(1) ways for all the n elements.

Now we can calculate that there are 1 * 2 * 3 = 6 permutations of 3. And … that’s right! By the way, the value 1 * 2 * 3 * … * n is usually written as n! and is called n factorial.

Great. We know what a permutation is. We know how many there are for a given set. But how do we generate them?

There are quite a few (more like dozens) methods, and I’ll describe a few here. The simplest one I can think of is this:

Let’s say you want to generate all the permutations of 3. So, you want to generate the permutations of the set {1, 2, 3}. We’ll generate the list:


123, 131, 132, 133,
211, 212, 213, 221, 222, 223, 231, 232, 233
311, 312, 313, 321, 322, 323, 331, 332, 333

That’s all the numbers you can make of length 3 using only the digits 1, 2 and 3. I start from 123 and not from 111 because there’s no permutation between 111 and 123.

Then we’ll filter the results using the rule: “A valid permutation cannot contain the same digit twice“.

Then we’ll print out what’s left.

Here’s the code in C (naiveperm.c):

#include <stdio.h>

/*!
    Generates the next try.

    If v is 1 2 1 2, after calls to

        next(v, 4);

    v will be     1 2 1 3

                1 2 1 4

                1 2 2 1

                1 2 2 2

    @return 0, if there are no more valid tries

    @return 1, otherwise
*/
int next(int v[], int n) {
    int i = n - 1;
    v[i] = v[i] + 1;
    while ((i >= 0) &amp;&amp; (v[i] > n)) {
        v[i] = 1;
        i--;
        if(i >= 0)
            v[i]++;
    }

    if (i &lt; 0)
        return 0;
    return 1;
}

void printv(int v[], int n) {
    int i;

    for (i = 0; i &lt; n; i++)
        printf("%d ", v[i]);
    printf("\\n");
}

/*!
    @return 1, if v is a valid permutation (no digits repeat)

    @return 0, otherwise
*/
int is_perm(int v[], int n) {
    int i, j;

    for (i = 0; i &lt; n; i++)
        for (j = i + 1; j &lt; n; j++)
            if (v[i] == v[j])
                return 0;

    return 1;
}

int main(int argc, char *argv[]) {
    int v[128];
    int n = 8;

    /* The initial permutation is 1 2 3 ...*/
    int i;
    for(i = 0; i &lt;= n; i++)
        v[i] = i + 1;

    while (next(v,n))
        if (is_perm(v,n))
            printv(v,n);

    return 0;
}

The code’s commented and it’s fairly simple, so there shouldn’t be any problems understanding it. Of course, I’m open to suggestions.

The next article in this series is Generating permutations: 2.