## Monday, May 02, 2011

### Asymptotic analysis

**by Rishidev Chaudhuri**

Asymptotic reasoning is ubiquitous in mathematics and the natural sciences, representing both a general approach to problems as well as a collection of techniques. It is one way of answering the question, “Which features of the world are relevant for our understanding?” Often, the behavior of a system gets simpler as it gets larger: only some of the features that are relevant for understanding a small system are necessary for understanding a very large one. The averaging out of fluctuations in the long-term is perhaps the most obvious example of this. Asymptotic analysis attempts to describe the behavior of an object (function, physical system, algorithm) as some quantity gets very large or very small. It is thus fundamentally the study of particular sorts of approximations, albeit approximations that can be made as precise as one wants, and a guide to which features of an object can safely be ignored.

To start with a simple example, look at what happens to the square of a number and to its cube as the number gets bigger and bigger. Both the square and the cube race off to infinity, but one does so faster than the other. 1^{3} and 1^{2} are the same; 2^{3} is twice as big as 2^{2}; 3^{3} is thrice as big as 3^{2} and so on. For any number N, N^{3} is N times as big as N^{2}, and as N gets really big the function N^{2} is dwarfed by N^{3}. So to see how the behavior of a function could become simpler as it approaches infinity, look at the behavior of

N^{3}+N^{2}

If we are thinking asymptotically, we would say that this function behaves like N^{3}: for any degree of approximation we choose the contribution of N^{2} will be irrelevant for large enough N. Here “large enough” depends on the degree of approximation we want. If we decide that irrelevant means “contributes less than 1% to the value of the function”, then N^{2} is irrelevant once we reach N=100; if instead we decide that it means less than 0.01%, then we must wait till N=10,000, and so on. The crucial point here is that we can satisfy any desired degree of approximation, no matter how stringent.

We could even decide to give N^{2} a handicap, and start it off at a much higher value. So look at

N^{3}+10000*N^{2}

Here * means multiplication. In the long run this doesn’t help at all. For small values of N, 10000*N^{2} is much bigger than N^{3} (10000 times bigger for N=1, for example). But we have an infinitely long race-track and N^{3} eventually catches up and then dominates. Once we reach N=10000 they’re the same and, as we keep going, N^{2} is once again irrelevant. This is true no matter how large the number we initially multiply N^{2} by; it takes longer for N^{3} to catch up but, to the infinitely patient asymptotic gaze, nothing has changed. You can see this illustrated in the figure to the right.

Another way of looking at these different rates of growth is to notice that if we double our number N, N^{3} increases by a factor of 8 but N^{2} only increases by a factor of 4. Given enough doublings, the extra factor of 2 will eventually win out. On the other hand 10*N^{3} and N^{3} behave in fundamentally the same way. Both increase by a factor of 8 when N is doubled and so both will eventually overtake N^{2}. 10*N^{3} will always be bigger than N^{3} but this is only by a factor of 10 and that doesn’t change. The 10 doesn’t affect the way the function approaches infinity, and so we describe them both as growing on the order of N^{3}.

Just as N^{3} grows faster than N^{2}, we find that N^{4} grows faster than N^{3} and so on. So far, everything we’ve looked at looks like N^{something} (where that something is 1,2, 3 and so on), and N^{something} grows faster than N^{something-1}. Can we grow faster than this? Let’s flip things around and look at something^{N}. For example, look at 2^{N}. This is the function you’d get if you placed one grain of rice on the first square of a chessboard, two on the second, four on the third and so on. It grows much much faster than N^{2}; by the end of the squares on a chessboard, we end up with about a million billion times as many grains of rice when compared to just squaring. Less obviously, it also eventually grows faster than N^{3}, N^{4}, N^{5} and, in general, N^{anything}, as long as that anything is fixed (i.e. doesn’t get bigger as N gets bigger). We’ve jumped a level in the hierarchy and found growth of a completely new order: growth like N^{something} is called polynomial, while growth like something^{N} is exponential. We needn’t stop here though. 2^{N^2} grows faster than 2^{N}, 2^{(N^3)} grows faster than 2^{N^2} and so on. And 2^{2^N} grows faster than any of these[1].

The asymptotic behavior of functions is often encountered in computer science in comparisons of how hard different problems are or how efficient various algorithms for solving them are. Typically, we compare how the number of steps required to solve a problem grows as the problem gets larger and, for large enough problems, this is often simply described by a single factor. Here some measure of the size of the problem is N and the number of steps (for suitably defined steps) asymptotically grows as N^{2} or N^{3} or 2^{N}.

Take the simple algorithms for addition and multiplication we learn in school. These convert the problem of adding or multiplying N digit numbers to a series of additions and multiplications on single digit numbers. So here’s an example of adding two three digit numbers:

157

+232

-------

389

-------

For simplicity, let’s ignore places where you need to carry digits (the analysis is effectively unchanged). For 3-digit numbers (N=3) we need 3 additions (in the problem above, these are 7+2, 5+3 and 1+2). For two 4-digit numbers (N=4) we need 4 additions, and so on. So the problem is said to grow as N (or N^{1}).

Now look at multiplication:

157

x 232

-------

00314

04710

31400

--------

36424

---------

Again ignoring the numbers that need to be carried, we have 3 multiplications for each of the 3 digits in the lower number, and then we need to add 5 or 6 columns of three numbers each (assuming that adding the 0s is the same as adding any other number). So we have N^{2} multiplications and approximately 2*N^{2} additions (I’ve been a little sloppy about counting the additions, but if you do it more carefully you’ll see that it doesn’t matter asymptotically). If we assume that single-digit multiplication and addition take about the same time[2], we have 3*N^{2} operations and so this multiplication algorithm grows as N^{2}. Note that there’ll be some cost to setting up the problem (taking out your notebook, writing the numbers down, and so on) but these will all grow at most at order N and will not change asymptotic behavior.

What counts as an operation in the addition algorithm and what counts as one in the multiplication algorithm are not quite the same, so this method doesn’t tell you how to compare the absolute time cost of these problems. What this does tell you is how the cost grows for large problems: if you double the length of the numbers you are using, addition takes twice as long but multiplication takes four times as long. And, as long as the single operations don’t take radically different amounts of time, the contribution from the growth will very quickly dominate. So even if the individual operations in the addition algorithm are hard and the ones in the multiplication algorithm are easy (not the case here, of course), as N gets large there will be so many more operations for multiplication that the cost difference of individual operations will be much less relevant.

Also note that we’ve been speaking of the time cost and complexity of the addition and multiplication **algorithms**. The complexity of a problem might be lower than the complexity of a particular algorithm for the problem; this means that the algorithm is asymptotically sub-optimal. For example, there exist multiplication algorithms that grow as N^{1.58}. Again, for small N the asymptotically sub-optimal algorithm might still do better – it might have a smaller constant out in front or the terms that depend on lower powers of N might still be significant. But this advantage will vanish in the long run.

When the asymptotic behavior is a good guide, differences in scaling can make the gains from improved algorithms quite stunning. A good example is the much celebrated FFT algorithm which tackles the problem of representing a signal as the sum of waves of different frequencies (imagine someone playing several notes together on a piano and then asking you to identify how much of each note went into the resulting sound). This sounds like an obscure problem, but it crops up often in practice (and can even be used to speed up multiplication). The standard way of doing this used to take on the order of N^{2} operations and the FFT speeds this up to N*log(N). This may not seem like much, but it has had an enormous impact. The speed-up starts to become relevant quite quickly (for N=1000 it’s already over a 100 times faster) and becomes more and more useful as N gets bigger. In practice, it makes the problem solvable in many cases where it was unsolvable before, and the FFT is now used everywhere in modern electronics.

A fundamental distinction in computer science is between problems whose complexity is polynomial (remember, this means it grows as N^{something}) and those whose complexity is exponential (something^{N}). The first set is, roughly, considered tractable and the second isn’t. The P=NP problem asks whether there exist polynomial-time algorithms for a certain class of problems for which we currently only have exponential time algorithms. Apart from its theoretical interest, the existence of such polynomial time algorithms would have enormous practical consequences for a range of problems from protein folding to flight scheduling[3].

Since exponentials grow faster than polynomials, something divided by an exponential becomes small quicker than something divided by a polynomial. Both 1/(something^{N}) and 1/(N^{something}) fall off to 0 as N gets bigger and bigger but 1/(something^{N}) does so quicker. This is the distinction that gives rise to “heavy-tailed distributions”. If we have a function describing the probability of seeing a fluctuation of size N, we expect it to go to 0 as N gets bigger and bigger (we do not see infinitely large fluctuations in, say, the stock market). In most cases, the functions decrease quite quickly and behave like the inverse of an exponential. In a few cases they behave like the inverse of a polynomial (1/N^{2}, for example) [4]. Modeling the latter case by the former leads one to drastically underestimate the probability of large fluctuations; for example, one might find that events that models say should only happen once in a thousand years actually happen every fifty years or so. This seems to happen frequently in financial modeling, among other places.

Asymptotic analysis also shows up as quantities get smaller and smaller. If I have a function that looks like x+x^{2}, then as x becomes smaller and smaller, x^{2} becomes irrelevant. As before, how small is “small enough” depends on what you decide “irrelevant” should mean. If x is 1 then x and x^{2} contribute equally; if x is 0.1 then x^{2} only contributes 1/100^{th} as much as x; and so on.

A form of this idea underlies calculus. If you want to approximate changes in a function around a point and you can write the function as a polynomial, then near enough to the point the only term that matters looks like (some number) multiplied by x. The (some number) is the derivative of the function at that point. Again, what “near enough” means depends on how good you want your approximation to be.

In physics, you can often approximately solve hard problems provided some quantity is small enough or big enough. For example, to fully describe the motion of a pendulum you need to solve a complicated differential equation. But, as long as the pendulum isn’t swinging too far, the force on it is well approximated by (something)*x and the equation is solvable. At the other end of the spectrum, much of thermodynamics relies on the fact that most of the microscopic details of a large system are irrelevant for understanding its macroscopic behavior.

There are many more examples of asymptotic reasoning at work[5]. But one of the most useful insights that emerges from all this is a refinement of our thinking about infinity. There’s a difference between thinking of infinity as a number and thinking of it as shorthand for a process. For any finite N, N^{2} is also finite. When we say that N^{2} grows to infinity we are really saying that given any large but finite number, N^{2} is eventually bigger than this number[6]. Now both N^{2} and N^{3} are eventually larger than any finite number and so we say that they both grow to infinity. But when we start to pay attention to what makes these two infinity-approaching processes different and whether a further classification beyond remaining-finite/growing-to-infinity is possible we are led quite naturally to classifying objects by their asymptotic behavior. In this asymptotic reasoning is another grand step in the long struggle to tame infinity and to make the gods nervous.

[1] For an excellent discussion of big numbers and how this hierarchy can keep going, see this article by Scott Aaronson: http://www.scottaaronson.com/writings/bignumbers.html

[2] If additions are much faster than multiplications, you can ignore them. You still get the algorithm growing as N^{2}.

[3] Again, with the caveat that this is relevant for large enough problem sizes

[4] The exponential decrease is often a consequence of the Central Limit Theorem, which applies when independent quantities are averaged together. If the quantities are correlated the probability of large fluctuations can decrease much more slowly.

[5] I haven’t mentioned any purely mathematical examples, but there are many. Asymptotic analysis is often used to decide if certain functions are well-defined. For example, if a function is made up of one part that grows to infinity and another that shrinks to 0, and if we require the function as a whole to go to 0 at infinity in order to be well-defined, then we need to compare the rate at which the two parts grow and shrink as the argument of the function goes off to infinity.

[6] Similarly, when we say that 1/N^{2} falls off to 0 we mean that given any small but non-zero number, 1/N^{2} is eventually smaller than this number.

Posted by Rishidev Chaudhuri at 12:40 AM | Permalink