# Entropy — a primer

by Rishidev Chaudhuri and Jason Merrill

C.P. Snow famously said that not knowing the second law of thermodynamics is like never having read Shakespeare. Whatever the particular merits of this comparison, it does speak to the centrality of the idea of entropy (and its increase) to the physical sciences. Entropy is one of the most important and fundamental physical concepts and, because of its generality, is frequently encountered outside physics. The pop conception of entropy is as a measure of the disorder in a system. This characterization is not so much false as misleading (especially if we think of order and information as being similar). What follows is a brief explanation of entropy, highlighting its origin in the particular ways we describe the world, and an explanation of why it tends to increase. We've made some simplifying assumptions, but they leave the spirit of things unchanged.

The fundamental distinction that gives rise to entropy is the separation between different levels of description. Small systems, systems with only a few components, can be described by giving the state of each of their components. For a large system, say a gas with billions of molecules, describing the state of each molecule is impossible, both because it would be tedious and because we don't know the state of each molecule. And, as we'll point out again later, for many purposes knowing the exact state of the system isn't useful. In theory we can predict how a system evolves by knowing its exact state, but in practice this is much too complicated to do unless the system is very small. So we instead build probabilistic predictions taking into account only a few parameters of the system, which gives us a coarser but more relevant level of description, and we seek to describe changes in the world at this level.

There is nothing that makes this uniquely part of physics, of course, and there are many other cases where we need to investigate the relationship between levels of description. Let's consider a toy example. Imagine we have a deck of cards in some order. We can describe the ordering in many different ways. The most complete way is to give an ordered list, like so: King of Hearts, Two of Clubs, One of Diamonds, and so on. But we can also use a coarser description, which is what we do when we describe a pack of cards as shuffled or not. So just for concreteness, let's say that we can only distinguish two states: one in which the cards are arranged in order (One of Clubs, Two of Clubs, …) and the other being everything else. Let's call these state A and state B. This is a less informative level of description, of course.

The description in terms of states A and B is a macroscopic one, and the description in terms of the exact ordering is a microscopic one. This is a matter of difference in degree rather than kind, and there are many intermediate levels of description. The states in the macroscopic level of description are the “macrostates”; in this case we have macrostates A and B. Similarly, the states in the microscopic level of description are the microstates; in this case we have a gigantic number of different ones, each corresponding to a particular ordering of the cards.

Now let's start shuffling the cards. If we start in state A (cards arranged in order), we'll quickly end up in state B (cards not in order). On the other hand, if we start in state B we'll almost certainly remain in state B. So the system doesn't seem to be reversible: state A almost always leads to state B, and state B almost never leads to state A. However, if we were describing the system using the microscopic level of description, we'd always see one arrangement of cards lead to another, and the chances of transitioning between the various arrangements are the same, so everything is reversible.

So what happened? Well, from the microscopic point of view, our macrostates are asymmetric and the asymmetry comes from the particular representation we chose. State B includes a large number of microscopic states, so most arrangements of cards will belong to state B. State A includes very few microstates; there are only a few ways for the cards to be in order. And when we shuffle the cards we naturally end up in the state which includes more microstates. So to explain what's happening we need the number of microstates compatible with a given macrostate. This is called the multiplicity. In this picture, we'd associate a small number with A and a large number with B, and we'd say that the system tends to go from states with a small number of compatible microstates (low multiplicity) to those with a larger number of compatible states or higher multiplicity.

Entropy is a measure of this number (it is the log of this number for reasons that are interesting but not critical). And so the entropy is a property of the particular macrostate (macrostate A has low entropy; macrostate B has high entropy). Entropy is also a property of the description. If we choose a different set of macrostates, we'll have a different set of associated entropies. But as long as the system is being mixed up at the microscopic level, which is what happens when we shuffle, we'll see the system move from states with low entropy to states with high entropy. In the card example, we can call state B “disordered” and state A “ordered”, but entropy is not measuring disorder. The high entropy of state B just tells us that there are many more states we call disordered that there are states we call ordered. We could have instead chosen macrostates C and D where state C contained three disordered arrangements of cards and state D contained everything else (including the ordered arrangements). Here state C would have low entropy even though the microstates it contains are disordered.

The entropy increase is probabilistic, in that it happens on average. There's nothing to prevent the mixed up set of cards from being shuffled back into the ordered state. But this is massively unlikely for anything but the smallest systems. It's a fun digression to look at the numbers involved to get a sense of how large they are and to see why these statistical laws, like the law that entropy increases on average, are in practice exact. The number of ways of arranging things grows very very fast. If we have a deck of 2 cards, there are two possible arrangements. If there are 3 cards there are 6 possible arrangements and there are 24 possible arrangements of 4 cards. These numbers are small but they rapidly become much bigger than astronomical. The number of ways of arranging half a deck of cards is already about a billion times the age of the universe in seconds, and the rate at which the numbers grow keeps increasing. With numbers like these, “almost always” and “almost never” are “always” and “never” on the timescales that we experience.

Physicists are fond of gases in boxes and a classic physics example for entropy increase is the expansion of a gas in a box. Say we have a box with a partition dividing it into two halves and we fill one of the halves with a gas. We then remove the partition and watch what happens. The gas molecules start off in one half of the box and we'll always observe that the gas expands to fill the other half. So now consider two macrostates. State A will have the gas in one half of the box and state B will have the gas spread out everywhere. If the gas molecules are wandering around freely, they will wander through all the possible arrangements of gas molecules in the box (the microstates). Very few of these correspond to state A; most correspond to state B. And so our system moves from a state of low entropy to a state of high entropy. Again, we might call state A more ordered than state B, to reflect the fact that it would take an unusual conspiracy to see the system in state A. But entropy is not measuring this putative order or disorder.

Now note a couple more things. First, we were able to make this prediction without knowing the detailed state of the system. We used our two macrostates and the entropies associated with them to predict the transition. And, as pointed out before, even if we did know the detailed state of the system we'd find it useless for prediction. In fact, given any particular microstate we'd find it practically impossible to predict how the system evolved, but knowing that the microstate is chosen randomly from a collection of microstates allows us to make a probabilistic prediction, which is exact because of the large numbers involved. So this has the interesting consequence that not only can we make predictions from a higher, incomplete level of description, it actually seems to help. Different levels of description make different things possible.

In these simple examples, the macrostates are fairly obviously a product of our description. In general, are the macroscopic variables we use to describe systems purely subjective or do our theories and the universe give us preferred macroscopic variables and preferred levels of description? Can just anything be a macroscopic variable or are there particular criteria that make for a good macroscopic variable? Can we really just lump together a few arbitrarily chosen states and call that a macrostate? This is a matter of vigorous debate, and is perhaps a subject for a separate article.

Now given a system we can ask how various changes to the system affect the number of states accessible to it or, equivalently, the entropy. In particular, how does adding energy to a system change the entropy? Adding more energy to a system usually increases the number of states available to it. This is both because with more packets of energy there are more ways to distribute them between the members of a system, and because more energy makes high energy states accessible in addition to lower energy ones. Trying to formalize this relationship leads naturally to temperature, which is the factor that tells us how to convert changes in energy into changes in entropy. Adding a given quantity of energy to a system at low temperature increases the entropy more than adding the same quantity of energy to a system at high temperature.

So imagine we have two systems at different temperatures connected together. Packets of energy are being exchanged back and forth, and the joint system wanders through a number of possible states, just like the cards being shuffled. Let's say we switch a packet of energy from the hotter system to the colder one. Taking away the energy from the hotter system and giving it to the colder system reduces the entropy of the first system but increases the entropy of the second. Crucially, the increase in entropy of the low temperature system is greater than the decrease of the high temperature system. So there will be more states where the energy packet has moved from the high temperature to the low temperature system than vice versa, and energy will flow from the hotter system to the colder one.

There are many interesting directions to explore from here; we've really only scratched the surface. For one thing, we've left out some subtleties. Apart from the issue of what makes a good macrostate or level of description, we've often invoked a process that mixes things, like shuffling cards. We haven't explored the details of this process or been explicit about what we require from this process. For example, what happens if the shuffling process depends on which macrostate we are in? We've also assumed that the system explores all its possible microstates with equal probability. What happens when this is not the case?

We also haven't talked about attempts to understand the direction of time using entropy increase. The laws of physics seem to be time symmetric at a microscopic level, in much the same way that our card shuffling doesn't pick out a preferred direction, and so it's puzzling where the direction of time comes from. Entropy increase at the macroscopic level does seem to give us an asymmetry -– entropy increases towards the future -– and some people have argued that this can be used to ground the direction of time increase.

But the origins of the current low entropy state of the universe and its consequences are far from clear. To put this in the context of our card example, if I come across a pack of ordered cards there are two primary possibilities. Perhaps the cards started in an ordered configuration (maybe someone put them that way, or they were manufactured that way) and they haven't been shuffled very much. In this case, the cards were in a low entropy state in the past and will be in a higher entropy state in the future and we have an asymmetry that comes from the initial conditions. But another possibility is that we've been shuffling the cards for a long time and, just by random fluctuation, they've ended up in an ordered state. In this case, the cards were probably in a high entropy state in the past and will be in a high entropy state in the future.

Similar to the first scenario, most explanations of why entropy seems to increase in one direction require that the universe started in a state of low entropy. This may seem like question begging, since it just pushes the asymmetry back to where the universe started. But it might be the best we have at the moment. And it may turn out that low-entropy initial conditions will emerge from cosmology and the next generation of physics as we better understand the initial state of the universe. Alternatively, there is the grimmer Boltzmann brain hypothesis: we are just random low entropy fluctuations in a high entropy universe, much like an ordered pack of cards emerging from repeated shuffling after a very long time. In this view, a part of the universe (or a particular universe) has just briefly fluctuated into an ordered state. Since one brain fluctuating into existence is much more probable than an entire world doing so, according to this view the rest of the world is probably an unstable illusion and will wink out of existence in the next moment as the system fluctuates back to a high entropy state. Thankfully, few working physicists seem to actually believe this.

Like what you're reading? Don't keep it to yourself!