Over the past several days, there has been considerable debate at UD on thermodynamics, information, order vs disorder etc. In a clarifying note to Mung (who was in turn responding to Sal C) I have commented as follows. (My note also follows up from an earlier note that was put up early in the life of the recent exchanges here, and a much earlier ID Foundations series post on counter-flow and the thermodynamics FSCO/I link.)
I think it convenient to scoop the below out for record and reference, as across time comments in threads are much harder to find than original posts:
_____________________
>>One more time [cf. 56 above, which clips elsewhere . . . ], let me clip Shannon, 1950/1:
The entropy is a statistical parameter which measures, in a certain sense, how much information is produced on the average for each letter of a text in the language. If the language is translated into binary digits (0 or 1) in the most efficient way, the entropy is the average number of binary digits required per letter of the original language. The redundancy, on the other hand, measures the amount of constraint imposed on a text in the language due to its statistical structure, e.g., in English the high fre-quency of the letter E, the strong tendency of H to follow T or of V to follow Q [sic, probably V is a typo for U] . It was estimated that when statistical effects extending over not more than eight letters are considered the entropy is roughly 2.3 bits per letter, the redundancy about 50 per cent.
Going back to my longstanding, always linked note, which I have clipped several times over the past few days, here on is how we measure info and avg info per symbol:
To quantify the above definition of what is perhaps best descriptively termed information-carrying capacity, but has long been simply termed information (in the “Shannon sense” – never mind his disclaimers . . .), let us consider a source that emits symbols from a vocabulary: s1,s2, s3, . . . sn, with probabilities p1, p2, p3, . . . pn. That is, in a “typical” long string of symbols, of size M [say this web page], the average number that are some sj, J, will be such that the ratio J/M –> pj, and in the limit attains equality. We term pj the a priori — before the fact — probability of symbol sj. Then, when a receiver detects sj, the question arises as to whether this was sent. [That is, the mixing in of noise means that received messages are prone to misidentification.] If on average, sj will be detected correctly a fraction, dj of the time, the a posteriori — after the fact — probability of sj is by a similar calculation, dj. So, we now define the information content of symbol sj as, in effect how much it surprises us on average when it shows up in our receiver:
I = log [dj/pj], in bits [if the log is base 2, log2] . . . Eqn 1
This immediately means that the question of receiving information arises AFTER an apparent symbol sj has been detected and decoded. That is, the issue of information inherently implies an inference to having received an intentional signal in the face of the possibility that noise could be present. Second, logs are used in the definition of I, as they give an additive property: for, the amount of information in independent signals, si + sj, using the above definition, is such that:
I total = Ii + Ij . . . Eqn 2
For example, assume that dj for the moment is 1, i.e. we have a noiseless channel so what is transmitted is just what is received. Then, the information in sj is:
I = log [1/pj] = – log pj . . . Eqn 3
This case illustrates the additive property as well, assuming that symbols si and sj are independent. That means that the probability of receiving both messages is the product of the probability of the individual messages (pi *pj); so:
Itot = log1/(pi *pj) = [-log pi] + [-log pj] = Ii + Ij . . . Eqn 4
So if there are two symbols, say 1 and 0, and each has probability 0.5, then for each, I is – log [1/2], on a base of 2, which is 1 bit. (If the symbols were not equiprobable, the less probable binary digit-state would convey more than, and the more probable, less than, one bit of information. Moving over to English text, we can easily see that E is as a rule far more probable than X, and that Q is most often followed by U. So, X conveys more information than E, and U conveys very little, though it is useful as redundancy, which gives us a chance to catch errors and fix them: if we see “wueen” it is most likely to have been “queen.”)
Further to this, we may average the information per symbol in the communication system thusly (giving in termns of -H to make the additive relationships clearer):
– H = p1 log p1 + p2 log p2 + . . . + pn log pn
or, H = – SUM [pi log pi] . . . Eqn 5
H, the average information per symbol transmitted [usually, measured as: bits/symbol], is often termed the Entropy; first, historically, because it resembles one of the expressions for entropy in statistical thermodynamics. As Connor notes: “it is often referred to as the entropy of the source.” [p.81, emphasis added.] Also, while this is a somewhat controversial view in Physics, as is briefly discussed in Appendix 1below, there is in fact an informational interpretation of thermodynamics that shows that informational and thermodynamic entropy can be linked conceptually as well as in mere mathematical form . . .
What this last refers to is the Gibbs formulation of entropy for statistical mechanics, and its implications when the relationship between probability and information is brought to bear in light of the Macro-micro views of a body of matter. That is, when we have a body, we can characterise its state per lab-level thermodynamically significant variables, that are reflective of many possible ultramicroscopic states of constituent particles.
Thus, clipping again from my always linked discussion that uses Robertson’s Statistical Thermophysics, CH 1 [and do recall my strong recommendation that we all acquire and read L K Nash’s elements of Statistical Thermodynamics as introductory reading):
Summarising Harry Robertson’s Statistical Thermophysics (Prentice-Hall International, 1993) . . . .
For, as he astutely observes on pp. vii – viii:
. . . the standard assertion that molecular chaos exists is nothing more than a poorly disguised admission of ignorance, or lack of detailed information about the dynamic state of a system . . . . If I am able to perceive order, I may be able to use it to extract work from the system, but if I am unaware of internal correlations, I cannot use them for macroscopic dynamical purposes. On this basis, I shall distinguish heat from work, and thermal energy from other forms . . .
And, in more details, (pp. 3 – 6, 7, 36, cf Appendix 1 below for a more detailed development of thermodynamics issues and their tie-in with the inference to design; also see recent ArXiv papers by Duncan and Samura here and here):
. . . It has long been recognized that the assignment of probabilities to a set represents information, and that some probability sets represent more information than others . . . if one of the probabilities say p2 is unity and therefore the others are zero, then we know that the outcome of the experiment . . . will give [event] y2. Thus we have complete information . . . if we have no basis . . . for believing that event yi is more or less likely than any other [we] have the least possible information about the outcome of the experiment . . . . A remarkably simple and clear analysis by Shannon [1948] has provided us with a quantitative measure of the uncertainty, or missing pertinent information, inherent in a set of probabilities [NB: i.e. a probability different from 1 or 0 should be seen as, in part, an index of ignorance] . . . .
[deriving informational entropy, cf. discussions here, here, here, here and here; also Sarfati’s discussion of debates and the issue of open systems here . . . ]
H({pi}) = – C [SUM over i] pi*ln pi, [. . . “my” Eqn 6]
[–> This is essentially the same as Gibbs Entropy, once C is properly interpreted and the pi’s relate to the probabilities of microstates consistent with the given lab-observable macrostate of a system at a given Temp, with a volume V, under pressure P, degree of magnetisation, etc etc . . . ]
[where [SUM over i] pi = 1, and we can define also parameters alpha and beta such that: (1) pi = e^-[alpha + beta*yi]; (2) exp [alpha] = [SUM over i](exp – beta*yi) = Z [Z being in effect the partition function across microstates, the “Holy Grail” of statistical thermodynamics]. . . .
[H], called the information entropy, . . . correspond[s] to the thermodynamic entropy [i.e. s, where also it was shown by Boltzmann that s = k ln w], with C = k, the Boltzmann constant, and yi an energy level, usually ei, while [BETA] becomes 1/kT, with T the thermodynamic temperature . . . A thermodynamic system is characterized by a microscopic structure that is not observed in detail . . . We attempt to develop a theoretical description of the macroscopic properties in terms of its underlying microscopic properties, which are not precisely known. We attempt to assign probabilities to the various microscopic states . . . based on a few . . . macroscopic observations that can be related to averages of microscopic parameters. Evidently the problem that we attempt to solve in statistical thermophysics is exactly the one just treated in terms of information theory. It should not be surprising, then, that the uncertainty of information theory becomes a thermodynamic variable when used in proper context . . . .
Jayne’s [summary rebuttal to a typical objection] is “. . . The entropy of a thermodynamic system is a measure of the degree of ignorance of a person whose sole knowledge about its microstate consists of the values of the macroscopic quantities . . . which define its thermodynamic state. This is a perfectly ‘objective’ quantity . . . it is a function of [those variables] and does not depend on anybody’s personality. There is no reason why it cannot be measured in the laboratory.” . . . . [pp. 3 – 6, 7, 36; replacing Robertson’s use of S for Informational Entropy with the more standard H.]
As is discussed briefly in Appendix 1, Thaxton, Bradley and Olsen [TBO], following Brillouin et al, in the 1984 foundational work for the modern Design Theory, The Mystery of Life’s Origins [TMLO], exploit this information-entropy link, through the idea of moving from a random to a known microscopic configuration in the creation of the bio-functional polymers of life, and then — again following Brillouin — identify a quantitative information metric for the information of polymer molecules. For, in moving from a random to a functional molecule, we have in effect an objective, observable increment in information about the molecule. This leads to energy constraints, thence to a calculable concentration of such molecules in suggested, generously “plausible” primordial “soups.” In effect, so unfavourable is the resulting thermodynamic balance, that the concentrations of the individual functional molecules in such a prebiotic soup are arguably so small as to be negligibly different from zero on a planet-wide scale.
By many orders of magnitude, we don’t get to even one molecule each of the required polymers per planet, much less bringing them together in the required proximity for them to work together as the molecular machinery of life. The linked chapter gives the details. More modern analyses [e.g. Trevors and Abel, here and here], however, tend to speak directly in terms of information and probabilities rather than the more arcane world of classical and statistical thermodynamics . . .
Now, of course, as Wiki summarises, the classic formulation of the Gibbs entropy is:
The macroscopic state of the system is defined by a distribution on the microstates that are accessible to a system in the course of its thermal fluctuations. So the entropy is defined over two different levels of description of the given system. The entropy is given by the Gibbs entropy formula, named after J. Willard Gibbs. For a classical system (i.e., a collection of classical particles) with a discrete set of microstates, if E_i is the energy of microstate i, and p_i is its probability that it occurs during the system’s fluctuations, then the entropy of the system is:
S = -k_B * [sum_i] p_i * ln p_i
This definition remains valid even when the system is far away from equilibrium. Other definitions assume that the system is in thermal equilibrium, either as an isolated system, or as a system in exchange with its surroundings. The set of microstates on which the sum is to be done is called a statistical ensemble. Each statistical ensemble (micro-canonical, canonical, grand-canonical, etc.) describes a different configuration of the system’s exchanges with the outside, from an isolated system to a system that can exchange one more quantity with a reservoir, like energy, volume or molecules. In every ensemble, the equilibrium configuration of the system is dictated by the maximization of the entropy of the union of the system and its reservoir, according to the second law of thermodynamics (see the statistical mechanics article).
Neglecting correlations between the different possible states (or, more generally, neglecting statistical dependencies between states) will lead to an overestimate of the entropy[1]. These correlations occur in systems of interacting particles, that is, in all systems more complex than an ideal gas.
This S is almost universally called simply the entropy. It can also be called the statistical entropy or the thermodynamic entropy without changing the meaning. Note the above expression of the statistical entropy is a discretized version of Shannon entropy. The von Neumann entropy formula is an extension of the Gibbs entropy formula to the quantum mechanical case.
It has been shown that the Gibb’s Entropy is numerically equal to the experimental entropy[2] dS = delta_Q/{T} . . .
Looks to me that this is one time Wiki has it just about dead right. Let’s deduce a relationship that shows physical meaning in info terms, where (- log p_i) is an info metric, I-i, here for microstate i, and noting that a sum over i of p_i * log p_i is in effect a frequency/probability weighted average or the expected value of the log p_i expression, and also moving away from natural logs (ln) to generic logs:
S_Gibbs = -k_B * [sum_i] p_i * log p_i
But, I_i = – log p_i
So, S_Gibbs = k_B * [sum_i] p_i * I-i
i.e. S-Gibbs is a constant times the average information required to specify the particular microstate of the system, given its macrostate, the MmIG (macro-micro info gap.
Or, as Wiki also says elsewhere:
At an everyday practical level the links between information entropy and thermodynamic entropy are not close. Physicists and chemists are apt to be more interested in changes in entropy as a system spontaneously evolves away from its initial conditions, in accordance with the second law of thermodynamics, rather than an unchanging probability distribution. And, as the numerical smallness of Boltzmann’s constant kB indicates, the changes in S / kB for even minute amounts of substances in chemical and physical processes represent amounts of entropy which are so large as to be right off the scale compared to anything seen in data compression or signal processing.
But, at a multidisciplinary level, connections can be made between thermodynamic and informational entropy, although it took many years in the development of the theories of statistical mechanics and information theory to make the relationship fully apparent. In fact, in the view of Jaynes (1957), thermodynamics should be seen as an application of Shannon’s information theory: the thermodynamic entropy is interpreted as being an estimate of the amount of further Shannon information needed to define the detailed microscopic state of the system, that remains uncommunicated by a description solely in terms of the macroscopic variables of classical thermodynamics. For example, adding heat to a system increases its thermodynamic entropy because it increases the number of possible microscopic states that it could be in, thus making any complete state description longer. (See article: maximum entropy thermodynamics.[Also,another article remarks: >>in the words of G. N. Lewis writing about chemical entropy in 1930, “Gain in entropy always means loss of information, and nothing more” . . . in the discrete case using base two logarithms, the reduced Gibbs entropy is equal to the minimum number of yes/no questions that need to be answered in order to fully specify the microstate, given that we know the macrostate.>>]) Maxwell’s demon can (hypothetically) reduce the thermodynamic entropy of a system by using information about the states of individual molecules; but, as Landauer (from 1961) and co-workers have shown, to function the demon himself must increase thermodynamic entropy in the process, by at least the amount of Shannon information he proposes to first acquire and store; and so the total entropy does not decrease (which resolves the paradox).
So, immediately, the use of “entropy” in the Shannon context, to denote not H but N*H, where N is the number of symbols (thus, step by step states emitting those N symbols involved), is an error of loose reference.
Similarly, by exploiting parallels in formulation and insights into the macro-micro distinction in thermodynamics, we can develop a reasonable and empirically supportable physical account of how Shannon information is a component of the Gibbs entropy narrative. Where also Gibbs subsumes the Boltzmann formulation and onward links to the lab-measurable quantity. (Nash has a useful, relatively lucid — none of this topic is straightforward — discussion on that.)
Going beyond, once the bridge is there between information and entropy, it is there. It is not going away, regardless of how inconvenient it may be to some schools of thought.
We can easily see that, for example, information is expressed in the configuration of a string, Z, of elements z1 -z2 . . . zN in accordance with a given protocol of assignment rules and interpretation & action rules etc.
Where also, such is WLOG as AutoCAD etc show us that using the nodes and arcs representation and a list of structured strings that record this, essentially any object can be described in terms of a suitably configured string or collection of strings.
So now, we can see that string Z (with each zi possibly taking b discrete states) may represent an island of function that expresses functionally specific complex organisation and associated information. Because of specificity to achieve and keep function, leading to a demand for matching, co-ordinated values of zi along the string, that string has relatively few of the [oops] b^N possibilities for N elements with b possible states being permissible. We are at isolated islands of specific function i.e cases E from a zone of function T in a space of possibilities W.
(BTW, once b^N exceeds the config space of 500 bits on the gamut of our solar system, or 1,000 bits on the gamut of our observable cosmos, that brings to bear all the needle in the haystack, monkeys at keyboards analysis that has been repeatedly brought forth to show why FSCO/I is a useful sign of IDOW — intelligently directed organising work — as empirically credible cause.)
We see then that we have a complex string to deal with, with sharp restrictions on possible configs, that are evident from observable function, relative to the general possibility of W = b^N possibilities. Z is in a highly informational, tightly constrained state that comes from a special zone specifiable on macro-level observable function (without actually observing Z directly). That constraint on degrees of freedom contingent on functional, complex organisation, is tantamount to saying that a highly informational state is a low entropy one, in the Gibbs sense.
Going back to the expression, comparatively speaking there is not a lot of MISSING micro-level info to be specified, i.e. simply by knowing the fact of complex specified information-rich function, we know that we are in a highly restricted special Zone T in W. This immediately applies to R/DNA and proteins, which of course use string structures. It also applies tot he complex 3-D arrangement of components in the cell, which are organised in ways that foster function.
And of course it applies to the 747 in a flyable condition.
Such easily explains why a tornado passing through a junkyard in Seattle will not credibly assemble a 747 from parts it hits, and it explains why the raw energy and forces of the tornado that hits another formerly flyable 747, and tearing it apart, would render its resulting condition much less specified per function, and in fact result in predictable loss of function.
We will also see that this analysis assumes the functional possibilities of a mass of Al, but is focussed on the issue of functional config and gives it specific thermodynamics and information theory context. (Where also, algebraic modelling is a valid mathematical analysis.)
I trust this proves helpful. >>
I also post scripted:
>> The most probable or equilibrium cluster of microstates consistent with a given macrostate, is the cluster that has the least information about it, and the most freedom of variation of mass and energy distribution at micro level. This high entropy state-cluster is strongly correlated with high levels of disorder, for reasons connected to the functionality constraints just above. And in fact — never mind those who are objecting and pretending that this is not so — it is widely known in physics that entropy is a metric of disorder, some would say it quantifies it and gives it structured physical expression in light of energy and randomness or information gap considerations. >>
_____________________
So, I think it reasonable to associate higher and higher entropy states of a given body or collection of material objects with increasingly disordered configurations, and that there is a bridge between Shannon Info and the H-metric of avg info per symbol, and the Gibbs entropy formulation, which is more fundamental than that which is used in classical formulations of thermodynamics. In turn, this is connected to the issues of how FSCO/I is an index not merely of order but organisation, and how such an information-rich state is one in which there is comparatively low uncertainty about the zone of possible configs, i.e it is a low entropy state relative to the sea of non-functional possible configs of the same components.
As I have said, this area is quite technical, and I strongly recommend the L K Nash Book, Elements of Statistical Thermodynamics as a well put together introduction. In fact, its treatment of the Boltzmann distribution alone (with the associated drawing out of quantities, functions and relationships in physical context) well is worth the price. I don’t know if the “classical books” publisher, Dover, could be induced to do an ebook edition. (U/D Sep 10: Thanks to KD, I suggest Fitzpatrick’s freely downloadable course notes, Thermodynamics and Statistical mechanics, here. Nash’s presentation is still the more intuitively clear.)
Harry S Robertson’s Statistical Thermophysics is a good modern presentation on the informational school of thought on thermodynamics, but can hardly be said to be an introductory level treatment for the first time reader on this topic, so work through Nash first to get your grounding.
I trust the above will help us all clarify our discussion on this important though admittedly difficult topic. END