A friend alerts me to this PhysOrg article about Zipf’s law, according to which,
… the same patterns emerge in a wide variety of situations. The linguist George Kingsley Zipf first proposed the law in 1949, when he noticed that the distribution of words in a newspaper, book, or other literary article always followed the same pattern.
Zipf counted how many times each word appeared, and found that the probability of the occurrence of words starts high and tapers off. Specifically, the most frequent word occurs about twice as often as the second most frequent word, which occurs about twice as often as the fourth most frequent word, and so on. Mathematically, this means that the frequency of any word is inversely proportional to its rank. When the Zipf curve is plotted on a log-log scale, it appears as a straight line with a slope of -1.
Some enterprising researchers tested Zipf’s law on the growth of Linux:
“Linux Debian gave us the opportunity to verify the ‘proportional mechanism,’ thanks to an important dataset and a huge investigation potential,” Maillart said. “All changes (evolution) in open source software are freely available and therefore can be tracked in detail. However, model verification has brought one answer and many resulting questions we intend to give an answer to. We think particularly of mechanisms of success/failure of projects in relation with their management.
“Remember that we still do not clearly understand the reasons of the success of the open source, since it’s free and based on altruist contributions by programmers,” he said. “Additionally, one can bet that further research in this direction (open source and proportional growth) may raise useful questions for other systems (cities, economy, etc.) that would bring new insights to explain their evolution.”
Re the appearance of words in a document, one factor may be that, to discuss a given subject, some words are essential and others useful but not essential. A third group are optional and thus may or may not appear. Here, for example, is a Wordle of this post:

Incidentally, a friend writes to say,
There you have, altruism raising its ugly head once more. An “evolutionary” software project, probably run by people that wouldn’t question evolutionary assumptions in any other context, relying upon the altruism of its contributors.
Actually, it’s not that they don’t understand the reasons, it’s that they can’t accept the evidence. The evidence is that altruism is normal enough among humans not to need an explanataion as an aberration – but not universal and therefore not governed by a law. Call it part of the design, if you like.
Here’s the abstract info: Read More ›