Choose the Axiom II - 3 Quarks Daily

by Carl Pierer

6a01a3fbe69b11970b01bb091a80ea970d-250wi

M.C. Escher: Reptiles.

In the first part of this essay, the axiom of choice was introduced and a rather counterintuitive consequence was shown: the Banach-Tarski Paradox. To recapitulate: the axiom of choice states that, given any collection of non-empty sets, it is possible to choose exactly one element from each of them. This is uncontroversial in the case where the collection is finite. Simply list all the sets and then pick an element from each. Yet, as soon as we consider infinite collections, matters get more complicated. We cannot explicitly write down which element to pick, so we need to give a principled method of choosing. In some cases, this might be straightforward. For example, take an infinite collection of non-empty subsets of the natural numbers. Any such set will contain a least element. Thus, if we pick the least element from each of these sets, we have given a principled method. However, with an infinite collection of non-empty subsets of the real numbers, this particular method does not work. Moreover, there is no obvious alternative principled method. The axiom of choice then states that nonetheless such a method exists, although we do not know it.

The axiom of choice entails the Banach-Tarski Paradox, which states that we can break up a ball into 8 pieces, take 4 of them, rotate them around and put them back together to get back the original ball. We can do the same thing with the remaining 4 pieces and get another ball of exactly the same size. This allows us to duplicate the ball. This beautifully paradoxical result casts some doubt about the status of the axiom. Perhaps, if the axiom leads to something as counter-intuitive as the Banach-Tarski Paradox, it should be rejected?

This second part will look at a famous and important lemma, known as Zorn's Lemma, which is logically equivalent to the axiom[i]. Due to its relevance in proofs of highly useful mathematical propositions, this will give some support against the counterintuitive consequence of the axiom.

One statement of Zorn's Lemma is:

Suppose that ≤ is a partial order in a set S and that every chain in S has an upper bound. Then S has a maximal element.

In order to make the lemma more understandable, it is necessary to define the terminology involved. A partial ordering on a set is a relation, denoted by ≤, such that:

For every element s ∊ S, s ≤ s.

Notice that this does not say that every element has to be less or equal or bigger or equal to some other element. In other words, this relation might not apply to some of the elements, so that there are elements that cannot be ordered. If all the elements can be ordered in this way, then this is called a total ordering of the set. Further we need to have:

2. If s ≤ s' and s' ≤ s, then s=s'.

3. If s ≤ s' and s' ≤ s'', then s ≤ s''.

Of course, an easy example of such a partial ordering (which, as it happens, is a total ordering) is the notion of less or equal with respect to the real numbers, as for instance 2 ≤ 3. A partial ordering on the natural numbers {1,2,3,…} that is not a total ordering would be divisibility: We say that for any two natural numbers a and b, a ≤ b if a divides b. So, for instance 2≤4 – indeed, this gives a total ordering on the subset of all even numbers. Note, however, that this is not a total ordering on the whole set of natural numbers; since 2 does not divide 5 nor does 5 divide 2, these two numbers cannot be brought into a relation on this notion.

A chain C is a subset of S that is totally ordered by the relation ≤. So, if we continue with the divisibility ordering, the even numbers would be a chain in the natural numbers. An upper bound on a subset T of a partially ordered set S is an element s ∊ S such that for all t ∊ T, t ≤ s. For example, if we consider the interval [0,1] of the real numbers, then any real number greater or equal to 1 is an upper bound. In particular, 2 is an upper bound to this interval. In contrast, the chain of even numbers in the previous example does not have an upper bound. Finally, a maximal element is a little bit more subtle: an element m of a partially ordered set S is maximal if for all s ∊ S to which m is comparable, s ≤ m. The subtlety is that S is only partially ordered, hence there might be elements to which m is not comparable and so several (different) maximal elements. Consider as an example the following set S={1,2,3,4,5} and let ≤ again be the relation of divisibility. Then 3,4,5 are all maximal elements.

Having clarified the terms involved in the statement of the lemma, its proof turns out to be relatively simple[ii], given the axiom of choice. The proof works by contradiction in four clever steps:

Suppose that S does not have a maximal element and use the axiom of choice to define a function that assigns to each chain C in S an element s ∊ S that is bigger than any element of C.
Show that if we have two chains A,B in S, where each element (in the respective chain) is the strict upper bound to the chain of preceding elements chosen by the function defined in (i) (and a further condition to be specified later), then one of the two chains is contained in the other.
Observe that the union U of all chains as in (ii) meets these two conditions.
Deduce a contradiction by showing that U cannot have a strict upper bound.

Suppose for contradiction that S does not have a maximal element. By hypothesis, every chain C in S has an upper bound. So, if C is a chain and u an upper bound, we can find an s ∊ S, such that u < s and hence c < s for all c ∊ C (since we've assumed that S does not contain a maximal element, such an s can always be found). Such an s is called a strict upper bound. By the axiom of choice, we have a function f that chooses for every chain C in S such a strict upper bound.

We need two more definitions:

If C is a chain in S and c ∊ C, we define an initial segment as the set: P(C,c) = {y ∊ C│y < c }. In words, it is the set of all elements strictly less than c.
A subset A of S is said to be conforming if the following two conditions hold:
1. The set A is totally ordered under ≤ and every non-empty subset of A has a least element. (A is well-ordered under ≤).
2. Every element a in A is the strict upper bound chosen by the function f for the initial segment P(A,a). That is: a=f(P(A,a)).

The condition 2.b is somewhat unclear – but because there are possibly many different strict upper bounds, it ensures that A contains those that are actually picked by the function f.

The next step in the proof is (ii), which rephrased in the new terminology says: If A and B are conforming subsets of S and A≠B, then one of these two sets is an initial segment of the other.
We have that A≠B, so if we consider all elements in A that are not in B (AB), we know that this set is non-empty. Hence, if v is the least element of AB, then the initial segment P(A,v) is a subset of B, because both A and B are conforming sets. To show that indeed P(A,v)=B, suppose for contradiction that there is at least one element in B that is not in P(A,v). Let u be the least of these elements. We show the contradiction by showing that u is in fact in P(A,v). To do so, consider the initial segment P(B,u): for any x ∊ A and y ∊ P(B,u) such that x < y, we find that x ∊ P(B,u). This is because such an x is necessarily in the initial segment P(A,v). So we have that any element in A that is less than an element in the initial segment P(B,u) (which is in B) is also in that initial segment. Therefore, if z is the least element of the set A without the set P(B,u), then we have P(A,z) = P(B,u). But also, z ≤ v, because v is the least member of AB, and clearly P(A,z)=P(B,u) is contained in B. But, since each of these are conformal, we also have that:

z = f(P(A,z))= f(P(B,u)) = u.

Now, u is in B, hence z ≠ v and so z < v. But then, this means that z is in P(A,v), and, since z=u, so is u. This contradicts our choice of u and thus completes the proof of step (ii).

For step (iii), it is important to observe that if A is a conforming subset of S and x ∊ A, then whenever y < x, either y ∊ A or y does not belong to any conforming subset. This, indeed just follows from step (ii). Now step (iii), that the union U of all conforming subsets of S is conforming, follows.

Finally, to deduce a contradiction (and complete step (iv)), consider x=f(U), i.e. x is a strict upper bound to the union of all conforming subsets. Then, the union of U and x is a conforming set, hence x is in U and so cannot be a strict upper bound.

This last step might come a little bit surprising – how does the Lemma follow from this contradiction? The union of all conforming subsets U is a chain. We have supposed (for contradiction), that all chains have an upper bound, but that the set S does not contain a maximal element. This means, that all chains do not only have an upper bound, but also a strict upper bound. We have seen that U cannot have a strict upper bound, hence S contains a maximal element[iii].

The Lemma, as inconspicuous as it may seem, has very useful consequences. It is, for instance, needed to show that infinite dimensional vector spaces always have a basis. That is, in other words, there are vectors (an infinite collection of them) that span the vector space. Indeed, it has been shown that the proposition that every vector space has a basis is equivalent to Zorn's Lemma (and hence also to the axiom of choice)[iv]. There are many more applications to look at, but those would require further algebraic notions that cannot be introduced in the space of this article. A different, curious application, is to prove the well-ordering theorem.

We've already defined what it means for a set to be well-ordered. Zorn's Lemma can be used to show that any set can be well-ordered. This might be a bit surprising, especially to people familiar with open sets. Consider for example the real interval (0,1). This notation means that the set contains all real numbers that are strictly greater than 0 and strictly less than 1. Now, to give a well-ordering, we need to find a relation between the elements, such that there is a least element, one to begin with. Certainly, the usual notion of bigger (or less) than does not work here. For if we pick any real number a, no matter how small, a/2 is still smaller and contained in the set. The well-ordering theorem then says that nonetheless such an ordering exists.

To give the full argument is beyond the scope of this article, but the basic idea is the following[v]. We order the set S by picking an element s₁ in S at random and declare it to be the minimal element. We pick another element from the remaining set, and say it is bigger or equal to s₁. To apply Zorn's lemma, we need to rephrase the problem in terms so that the lemma applies. We may define an attempt at ordering the set S as a subset W of S which is well-ordered. Then, we can impose a partial order on this set of attempts by saying that for any two subsets V,W, V≤W, if V is an initial segment of W, in the sense defined earlier. Each of the attempts, being an initial segment, is well-ordered. The idea then is to say that an upper bound to these chains is the well-ordered entire set. We can then apply Zorn's Lemma and deduce that there is a maximal element, which is the longest well-ordered set and thus coincides with a well-ordering of the entire set[vi].

The second part of this essay demonstrated a useful consequence (or indeed, an equivalent) of the axiom of choice, known as Zorn's Lemma and looked at a few applications of this Lemma. Two positions have been mapped out in the course of this essay. On the one hand, the axiom has very counterintuitive consequences, so much so that they've received the name of a paradox. On the other hand, the axiom proves to be very useful in deducing mathematical propositions. These considerations lead back to the question that has already been raised at the end of the first part: how are we to decide on the status of an axiom, on whether to accept it or reject it?

These philosophical aspects of the axiom of choice shall be discussed in the third part of this essay.

***

References:

Lewin, J. (1991). A Simple Proof of Zorn's Lemma. The American Mathematical Monthly, 353-354.

Moore, G. H. (1982). Zermelo's Axiom of Choice. Heidelberg: Springer Verlag.

[i] See for instance in Moore (1982)

[ii] This proof is given in Lewin (1991)

[iii] There surely is something a little bit dodgy about this proof. If indeed the axiom of choice and Zorn's lemma are logically equivalent, then invoking the axiom to ‘prove' the Lemma is a bit of a sham – for by definition, an axiom cannot be proven.

[iv] More precisely: if a subset A of a vector space V generates V, then A includes a basis. (Moore, 1982)

[v] The argument is fleshed out and given in detail, with many more interesting aspects of Zorn's Lemma, here: https://gowers.wordpress.com/2008/08/12/how-to-use-zorns-lemma/

[vi] There are a few things to worry about in this sketch – do these initial segments meet the chain condition? Can we be sure that the maximal element (which exists by the lemma) coincides with the full set? These questions are answered in the article cited above.