CoCalc -- ch8.txt

OANC_GrAF / data / written_2 / non-fiction / OUP / Kauffman / ch8.txt

³⁹⁶⁷³ views

5
Chapter 8
6
Candidate Laws for the Coconstruction of a Biosphere
7
or the purpose of our discussion here, grant that molecular autonomous agents propagate organization and evolve by the roughly familiar Darwinian aegis of mutation and selection. These agents coevolving with one another, discovering displacements from equilibrium that can be used to accomplish work, making records of such sources of energy, then linking those exergonic reactions to endergonic reactions are the means by which our biosphere has come into being, actually coconstructed by the activities, accidents, striving, and failures of these autonomous agents, exapting persistently into their adjacent possible.
8
Yes. But how does a biosphere get itself constructed? Are there laws? Are there laws that might hold for any biosphere? Laws of a general biology, wherever autonomous agents swirl into existence and change forever the begetting of the universe?
9
No one knows. Yet is seems reasonable to expect such laws and honorable to begin, even now, to seek them. At worst we will be wrong. Rather more stunningly, we may be right. It is surely enough if at this early state we can even begin to formulate candidate general laws. Our eorts will only improve over time.
10
In the present chapter, I consider four candidate general laws for any biosphere. Because the science is more advanced than some of the material in the previous chapters, I will be able to describe it in somewhat more detail. That which is more worked out is, I hope, a signature of how the glimmered science of the past seven chapters may develop.
11
Coevolutionarily constructible communities of molecular autonomous agents may evolve to four apparently dierent phase transitions:
12
Law .Communities of autonomous agents will evolve to the dynamical “edge of chaos” within and between members of the community, thereby simultaneously achieving an optimal coarse graining of each agent’s world that maximizes the capacity of each agent to discriminate and act without trembling hands.
13
Law .A coassembling community of agents, on a short timescale with respect to coevolution, will assemble to a self-organized critical state with some maximum number of species per community. In the vicinity of that maximum, a power law distribution of avalanches of local extinction events will occur. As the maximum is approached the net rate of entry of new species slows, then halts.
14
Law .On a coevolutionary timescale, coevolving autonomous agents as a community attain a self-organized critical state by tuning landscape structure (ways of making a living) and coupling between landscapes, yielding a global power law distribution of extinction and speciation events and a power law distribution of species lifetimes.
15
Law .Autonomous agents will evolve such that causally local communities are on a generalized “subcritical-supracritical boundary” exhibiting a generalized self-organized critical average for the sustained expansion of the adjacent possible of the eective phase space of the community.
16
Candidate Law 1: The Dynamical Edge of Chaos
17
Molecular autonomous agents, for example, free-living cells, are parallel-processing molecular dynamical systems. A bacterium such as E. coli has on the order of three thousand structural genes. The diversity of molecular species in E. coli includes perhaps a thousand small molecules in metabolism, the genes, RNA and protein species, lipids, large carbohydrates, and so forth. For the sake of argument, let’s say there are about five thousand molecular species in E. coli. Perhaps the number is larger.
18
A cell is a parallel-processing dynamical system. That is to say, the cell carries out a wide variety of molecular activities, including the turning on and o of transcription of genes into RNA; the processing of that RNA into mature messenger RNA; the translation of that RNA into proteins, the activities of many of those proteins as enzymes to catalyze reaction, the modification of the activities of enzymes by chemical events such as phosphorylation and dephosphorylation; the building of structural components such as bilipid membrane, and microtubule assembly and disassembly; and the construction of proteins and other receptors. These receptors are located transmembrane at the cell boundary and elsewhere in the cell, including on nuclear membranes of eukaryotes, such that signal molecules can be detected and responded to.
19
So, lots of activities are going on all the time, in parallel, in your typical E. coli, yeast, or your own pancreatic cells. The proper conceptual framework to think about all this activity is the “state space” of the system. If we ignore geometry, that is, the locations of molecules relative to one another in the cell which is a big idealization then the state space of a cell consists of a list of all the molecular species and their “activities” or “concentrations.”
20
In the following, I will focus on the behavior of the genetic regulatory network. It has been known since the seminal work of Jacob and Monod in and , work for which they won the Nobel Prize, that genes can turn one another on and o. In more detail, the protein made by one gene can diuse in the cell and bind to a DNA site, called a “cis acting site,” near a second gene. Genes that encode proteins are the structural genes; the binding of the protein, or several proteins at a set of nearby cis sites, can turn the second structural gene on or o. More generally, the binding of diusible factors, called “trans acting factors,” to cis sites, can tune graded rates of transcription of the nearby structural gene.
21
The human cell is estimated to have about eighty to a hundred thousand structural genes and between ten thousand to perhaps a hundred thousand cis acting sites. In general, any structural gene may be regulated by zero to ten dierent trans acting factors that may bind at one or more nearby cis acting sites. Therefore, the human genomic system is a highly complex web of regulatory connections and interactions by which the activities of genes turn one another on and o, or more generally tune one another’s activity.
22
It is the joint dynamical behavior of such genetic networks, plus the remaining cellular network of proteins and other molecular interactions, that controls cell behavior, including development from the fertilized egg to the adult.
23
In the past three decades, considerable theoretical insight has been achieved with respect to the expected behaviors of such large genetic regulatory networks. I discuss the relations to experimental evidence briefly below.
24
Boolean Networks
25
To take a very simple case, we consider the N genes of a cell and idealize further to imagine that at any moment in time a gene is either actively transcribing into RNA, with active = , or it is not transcribing, so is inactive, hence . Thus, the genes are treated as binary, or “Boolean,” variables. A Boolean network is a model genetic network with N binary, or Boolean, genes, each receiving regulatory inputs from some among the N genes and each governed by a Boolean function on its inputs telling the activities of its inputs for which it should turn on or o.
26
The Boolean idealization is severe, but it is a very useful place to start. If the human genome has , structural genes and each can be on or o, then the number of possible patterns, or “states,” of gene activity in the human genome is a staggering , or about ,. That is, a human cell could, in principle, be in any one of a , states of gene activity. Its state space is ,. There has only been seconds since the big bang. If it took merely a second to turn a gene on or o, then no human cell could have explored more than an infinitesimally tiny fraction of its state space, ,, even if it had been chugging along since the big bang.
27
That cannot be what cells do. Something must confine their “flow” in their state space. And, indeed, what confines their flow is precisely the genetic regulatory network by which genes turn one another on and o.
28
Very good work shows that such networks can exist in three broad regimes: an ordered regime, a chaotic regime, and near a phase transition between order and chaos. All the evidence suggests that cells have evolved to lie in the ordered regime, fairly near the edge of chaos. Communities of cells may lie even closer to the edge of chaos. The hypothesis that cells and communities of cells lie in the ordered regime near the phase transition to chaos is candidate law .
29
To understand this candidate law, I need to describe to you the structure and behaviors of model genetic networks. Thereby we can characterize the ordered, chaotic, and edge-of-chaos regimes.
30
To take a very simple case, we consider a cell with three genes, A, B, and C. In this simplest Boolean idealization, there are three genes, each of which can be on or o, hence, there are two raised to the third, or eight possible states of gene activities: (), (), (), (), (), (), (), (), where the ordering of the three symbols stands for the activity states of ABC, respectively.
31
The most general description of a dynamical system consists in specifying its state space, then identifying for each state which state or states it changes into. For a deterministic dynamical system, each state changes into a unique successor state. For a nondeterministic system, single states can change to two or more successor states. Which of the successor states is chosen in the nondeterministic system is given by some random process such as flipping a coin.
32
Figure .a shows an arbitrary deterministic state space among the eight states of three genes, A, B, C. For each state, I have chosen its successor state at random.
33
Figure .a shows several characteristic features of these very simple Boolean dynamical systems. First, note that the system is parallel processing. More than a single gene changes its activity value from to or to on many of the state transitions. Next, there is a finite number of states, here, eight. Each state has a unique successor. Over time, if the system is released from any initial state, it will follow a trajectory of states through state space. Since there is a finite number of states, eventually the trajectory must hit a state previously encountered on the trajectory. But the system is deterministic, thus once the trajectory reenters a state previously encountered, it will follow a recurrent loop of states in state space, called a “state cycle.”
34
In general, the length of a state cycle can be a single state that reenters itself, a “steady state,” or all of the states in the state space may lie on a single, long cycle that traverses all the states of the state space or state cycles may be any length between these two limits.
35
A second typical property of such a parallel-processing Boolean system is that more than one state cycle may exist. In the present example, three state cycles exist in the state space.
36
State cycles are called “attractors” because they typically attract the flow of other states into themselves. This is shown in the first state cycle, where the states () and () flow into the state cycle, but are not on it. These two states are called “transients.” Transient states are encountered on trajectories flowing to state cycle attractors, but are not encountered again once the attractor is reached, assuming no perturbations occur to the system.
37
The set of states flowing into a state cycle attractor plus that state cycle is called the “basin of attraction” of the attractor.
38
The set of attractors are jointly the asymptotic long-term alternative behaviors of the network. If released from any initial state, the system ultimately winds up cycling on one of its attractors.
39
Thus in discrete-valued deterministic networks here, binary ones each state lies in a single basin of attraction, so the basins of attraction partition the state space into disjoint sets of states.
40
The simple example of Figure .b allows us to show another feature of such synchronous Boolean networks here, synchronous means that all the binary variables change value at the same clocked moment. Since each state has a unique successor state, we can write a table of all the states and for each, its unique successor.
41
Figure .b shows the state transitions for each state, at time T, to the state it transforms to one clocked moment later, at time T + .
42
But Figure .b also shows for each gene, in order (ABC), the Boolean rule, or Boolean function, of the three genes, A, B, C, that turns each on and o as a function of the values of itself and the other two genes.
43
As it happens, genes A and B have Boolean functions that depend on all three genes, A, B, and C. By examination, however, gene C has a Boolean function that depends only on genes A and C, not on gene B. To say that the activity of gene C depends only on A and C and not B means that once the combinations of activities of A and C are defined at a moment T, the next activity of C at T + is indepen-dent of whether B is on or o at time T. Indeed, the Boolean function is C = (not A or not C); that is, gene C will turn on at the next moment if at the current moment either A is not active or C is not active or both are not active.
44
From Figure .c, after simplifying Boolean expressions as we just did for gene C, we can write down the “wiring diagram” of inputs among the three genes. Since A and B depend upon all three genes, each receives a regulatory input from all three genes. Gene C, however, receives inputs only from itself and gene A.
45
The combination of Figures .d and .c, respectively, shows for each gene the inputs to that gene and the logical or Boolean function by which it turns on and o. Thus, the combination of .c and d is the genetic network among the genes.
46
Order, Chaos, and the Edge of Chaos
47
As noted above, thirty years of work by many scientists initially on synchronous Boolean networks but now generalized to a wider family of model genetic systems, including some where genes can exhibit continuously graded levels of activity as their inputs turn gradually up or down all show the same simple, general results: There are three broad regimes of behavior. A number of very simple properties of networks, involving connectivity of the network and simple biases on the Boolean functions, control which regime a network lies in.
48
The Ordered Regime
49
Figure . shows a hypothetical movie of a network in the ordered regime. Let the network be released from an arbitrary initial state and flow along a trajectory toward a state cycle attractor. If a gene is rapidly turning on and o, or twinkling, call it green. If the gene is frozen in the active or frozen in the inactive value for a long time, say, fifty state transitions or more, call it red. In particular, once the system is on its state cycle, the twinkling genes that turn on and o on the state cycle would be green and those genes frozen on or frozen o, red.
50
Initially, just after release from some random initial state, most genes are twinkling, hence green. As the network approaches its state cycle, more and more genes turn red, hence are frozen on or frozen o. By the time the system has reached its state cycle attractor, the majority of the genes are colored red (Figure .a).
51
Most critically, if one considers the subset of red genes, they form a giant “percolating cluster” whose size scales linearly with the size of the entire network. In eect, the frozen red component is a “frozen red sea,” which spans the entire network and typically leaves behind isolated twinkling green islands.
52
The Chaotic Regime
53
In the chaotic regime, the same movie shows that the majority of the genes remain green, twinkling on and o (Figure .c). So a vast, twinkling green sea spans the network, typically leaving behind isolated frozen red islands.
54
The Edge of Chaos
55
As parameters of the network discussed below are tuned from the chaotic regime toward the ordered regime, the green percolating sea becomes smaller and eventually fragments into two or many isolated green islands. The point of fragmentation of the green sea into green islands constitutes a phase transition from the chaotic to the ordered regime. This phase transition is sometimes called the edge of chaos (Figure .b).
56
Several critical features distinguish the ordered from the chaotic regime. In the ordered regime, the lengths of state cycle attractors scales polynomially with the number of genes. Remarkably, in the ordered regime near the phase transition to chaos there is evidence for universal scaling in which the number of states on a state cycle scales as the square root of the number of genes. This scaling, which I first discovered over thirty years ago, still staggers me. If the human genome has , genes, it has a state space of , or , states. Yet if the human genomic system lies in the ordered regime near the phase transition to chaos, it will settle down and cycle among the square root of ,, or about states!
57
Now, states is very very small compared to ,. The overwhelming order of the ordered regime, I believe, keeps cells from wandering all over their state spaces for eternities beyond eternities. In fact, it takes about one to ten minutes to turn a eukaryotic gene on or o, so it would take a cell from to , minutes, or from about . hours to about hours, to traverse its state cycle attractor. This is right in the biological ballpark. For example, the cell division cycle of dierent human cell types is in the range of to hours.
58
By contrast, in the chaotic regime state cycle lengths scale exponentially with the size of the network. The deepest one can go into the chaotic regime is to assign at random the successor state for each state. In that case, in general, each gene is a Boolean function of all N genes. In this case, the typical state cycle length is the square root of the number of states in the state space. For the human genome with , states, a typical state cycle length would be the square root, hence ,. Remember, it is only seconds since the big bang. State cycle attractors of lengths ,? Not in my body, thank you very much.
59
In the first two articles I wrote on the subject of random Boolean nets, as long ago as and , I plotted cell cycle lengths from organisms as diverse as bacteria, yeast, worms, plants, and simple and complex animals. These progressively more complex organisms have progressively more DNA per cell and more genes per cell. If the Boolean net theory is on the right track, and if cell cycles are a reasonable proxy for expected state cycle times as a function of the number of genes, then cell cycle time should scale as a square root function of the number of genes. This prediction is actually pretty much correct. Indeed, a plot of median cell cycle time versus total DNA per cell is a square root function from bacteria to human cells.
60
But there are caveats: The number of genes in cells may not be proportional to the amount of DNA per cell. Some DNA is “junk.” A plausible estimate of the number of genes per cell is now available for many organisms. On this basis, cell cycle time scales somewhere between a square root and a linear that is, directly proportional function of estimated genes per cell. So without yet invoking natural selection to tune the structure and logic, the theory of random Boolean nets is already quite close to the data.
61
A second critical feature that distinguishes the ordered from the chaotic regime is what happens when the activity of a single gene is transiently reversed. I would note that such transient reversals happen all the time in normal development. For example, a single hormone enters a cell, then the nucleus, then binds to a nuclear genetic site and transiently changes the activity of some gene. Typically, the results unleash a cascade of alterations of gene activities. These cascades of alterations guide development and cell dierentiation. In the ordered regime, these cascades tend to be smallish. In the chaotic regime, the cascades are typically huge. In real cells, the cascades tend to be smallish. More, we can actually predict their size distribution.
62
Let’s define a gene as “damaged” if after an initial gene has had its activity reversed for a single moment the gene in question ever behaves dierently than it would have had the perturbed gene been left undisturbed. In eect, damage shows that perturbation of a gene aects the behavior of the damaged gene. Imagine damaged genes purple. If a gene has misbehaved once, it is purple, whether it stops misbehaving or keeps misbehaving.
63
Given this definition of damage, we can consider in detail two identical copies of a network, in the same state, running at the same speed. Now pick a model gene at random. If it is on, flip it o. It is o, flip it on. Color it purple since you have damaged it.
64
Now watch the unperturbed and perturbed copies of the network, and consider purple any gene that ever does something dierent in the perturbed network compared to the unperturbed network. In general, you will see a purple avalanche spread out from the initially perturbed gene. The purple avalanche will spread out in some way, then must eventually stop. For example, at a maximum, all the genes turn purple. Thus, we can define the size of a given damage avalanche as the number of genes that turned purple, hence misbehaved at least once.
65
In the ordered regime, something magic happens. If one of the frozen red genes is perturbed, typically no avalanche spreads from that purple gene. If an avalanche spreads at all, it is tiny.
66
By contrast, if a twinkling green gene in one of the isolated green islands is perturbed and turned purple, a purple avalanche spreads to some or all of the twinkling green genes in that island. But the avalanche stops at the boundaries of the green island since damage avalanches cannot propagate through the frozen red percolating sea.
67
Because purple avalanches cannot propagate through the frozen red percolating sea, the green islands are functionally isolated from one another. The consequence is that there is a characteristic size distribution of avalanches in the ordered regime and a very dierent distribution in the chaotic regime.
68
Figure .a schematizes the distribution of avalanches for networks in the ordered regime very near the phase transition to chaos. The figure plots the logarithm of the size of the avalanche on the x-axis and the logarithm of the number of instances of avalanches of each size on the y-axis. As you can see, the size distribution shows up as a straight line in this log log plot, sloping down to the right. Hence, the distribution is a power law distribution, with many small and few large avalanches of change propagating through the network. In addition, there is a finite cuto and thus a largest-size avalanche, which seems to scale as a square root function of the total number of genes in the network. Deeper in the ordered regime, the size distribution of avalanches remains a power law, but the slope down to the right becomes steeper, so there are fewer big avalanches compared to small avalanches.
69
If a human genome is in the ordered regime near the phase transition to chaos and harbors some , genes, then the largest avalanches should be about two times the square root of the number of genes, hence x , or about genes. This is probably about right. In the fruit fly, Drosophila melanogaster, with about , genes, the largest avalanches should be about twice the square root of ,, or x , or . The largest avalanche I am aware of occurs when the moulting hormone ecdysone acts on the salivary glands and induces changes in about of the “pus” in the polytene chromosomes. If each pu is a single gene, as most geneticists think, then ecdysone unleashes an avalanche altering the activities of genes.
70
No comparative data yet show whether the size distribution of avalanches in real organisms is a power law, nor whether the largest avalanches scale as a square root function of the number of genes in the organism. But these hypotheses are fully testable using today’s experimental techniques.
71
Nevertheless, these predictions do roughly fit one’s expectation as a biologist. Most genes if perturbed should unleash no avalanches or just small avalanches, fewer genes should unleash larger avalanches, and some modest fraction of the genes at most should be open to alteration by transient alteration in the activity of any single gene. Thus, this typical, or “generic,” behavior of parallel-processing networks in the ordered regime closely fits the known data and our informed intuitions.
72
The chaotic regime contrasts starkly with the ordered regime. Its expected behaviors are not biologically plausible. The chaotic regime diers from the ordered regime for a simple reason. In the chaotic regime, the twinkling green sea percolates. If a single green gene is perturbed and turned purple, that perturbation usually unleashes a purple avalanche that spreads through much of the percolating green sea. Huge damage avalanches are unleashed by single gene perturbations.
73
These huge avalanches are exactly the signature of the famous “butterfly eect” seen in the weather, where a small initial change can have large-scale consequences. In short, the spreading purple avalanches constitute “sensitivity to initial conditions.” On the other hand, it is important to distinguish between low-dimensional chaos, characterized by three or four variables governed by three or four equations, and the high-dimensional chaos, shown in large-model genetic networks with tens of thousands of gene variables. In high-dimensional networks of genes modeled as binary variables, chaos shows up as the enormous avalanches of damage that spread from one to many of the variables of the model network.
74
Figure .b schematizes the size distribution of avalanches in the chaotic regime. Unlike the ordered regime, there is a spike of huge avalanches where to percent of the genes are damaged. In a cell, this would correspond to a hormone changing the activity of a single gene and , to , genes downstream changing their activities. This does not happen.
75
As in the ordered regime, however, in the chaotic regime there is also a power law distribution of small avalanches, present in addition to the vast avalanches that rocket through the green sea. Presumably, these small avalanches occur when green genes near the filigreed “coasts” of red frozen islands are perturbed and the purple avalanche is trapped on the fingers of the red beaches.
76
A third feature, a convergence versus divergence along flows in state space that characterizes the ordered versus chaotic regime, is perhaps the most important to our future discussions. In the ordered regime, initially nearby states lie on trajectories that tend to converge in state space. In the chaotic regime, initially nearby states tend to lie on trajectories that diverge in state space. At the edge of chaos, initially nearby states tend to lie on trajectories that neither converge nor diverge in state space.
77
These behaviors are conveniently shown in a recurrence map, which I call a “Derrida curve” since physicist Bernard Derrida of Saclay, France, first showed it to me. In Derrida and Pomeau were also the first to find analytic proof of the ordered and chaotic regimes and the phase transition between them.
78
Consider two states of a Boolean network with five genes and the successors to each of those states:
79
state () ’ () state ’
80
state () ’ () state ’
81
We can define the “Hamming distance” between state and state , the number of binary variables by which the two states dier. Here the Hamming distance is , since gene is in state and in state . In addition, we can define the normalized Hamming distance, Dt, between these two initial states at time t as the fraction of binary variables by which they dier. Hence, Dt is / = .. Similarly, we can define the normalized Hamming distance between the two successor states at time t + . In the case above, D(t + ) = / = ..
82
Then we can compare Dt and D(t + ) and ask if the initial distance at time T decreased or increased at time T + . In the current case, the initial distance increased from . to . at the next moment, T + . In short, the two initially nearby states, diering in the activity of a single gene, spread further apart one moment later. Indeed, this spreading is the first time step of the spreading of a purple avalanche of damage.
83
To characterize Boolean networks with respect to the ordered and chaotic regime, it is convenient to take thousands of random pairs of states at dierent initial distances, Dt, where Dt can vary from . to .. For each pair of initial states, run the Boolean network forward one moment, discover the two successor states of the two initial states, and compute D(t + ) for that pair of initial states. Average the D(t + ) values for the thousands of pairs of initial states at each initial distance, Dt. The typical results are shown in Figure ..
84
Figure . is a recurrence map, with Dt shown on the x-axis, D(t + ) shown on the y-axis. Thus, for an initial pair of states at Dt = ., if the successor states have spread apart, say to D(t + ) = ., a dot in the xy plane at x = ., y = . records this event. The averaged set of these dots for the thousands of pairs of initial states at all dierent initial distances is the average recurrence map for the network.
85
In Figure ., the main diagonal, running at a -degree angle from the lower-left corner, which corresponds to Dt = and D(t + ) = , shows the condition where D(t + ) = Dt. If a dot lies on the main diagonal, then the distance between the initial states is the same as the distance between the successor states. The two initial states lie on trajectories that neither diverge nor converge in state space.
86
In the chaotic regime, nearby states lie on trajectories that diverge further apart in state space, so the recurrence map lies above the main diagonal for small initial distances. For large initial distances, even networks in the chaotic regime have the property that states tend to lie on trajectories that converge. The degree to which the Derrida recurrence curve lies above the main diagonal for small Dt is a measure of how deeply into the chaotic regime the network lies. Deep into the chaotic regime, nearby states, hence small Dt, diverge swiftly. Thus, the recurrence curve is well above the main diagonal for small values of Dt.
87
By contrast, in the ordered regime, as shown in Figure ., the Derrida recurrence curve is below the main diagonal for small Dt, that is, initial states that are close lie on trajectories that converge. At the phase transition between order and chaos, the Derrida recurrence curve begins, at Dt = , tangent to the main diagonal, then falls below it as Dt increases.
88
In summary, in the ordered regime, nearby states tend to lie on trajectories that converge in state space. At the phase transition, nearby states tend to lie on trajectories that neither converge nor diverge. In the chaotic regime, nearby states tend to diverge.
89
As I will shortly discuss below, it is plausible to think that autonomous agents, and communities of autonomous agents, evolve such that they lie in the ordered regime near the phase transition to chaos. A major reason for this intuition is that under such circumstances flow in state space is mildly convergent. In turn, this will allow the autonomous agents to make the maximum number of reliable discriminations and reliable actions, hence, to play the most sophisticated natural games by which to earn their livings.
90
The three features that characterize the phase transition between order and chaos seem by good numerical evidence to coincide. That is, when parameters discussed below are tuned from the chaotic to the ordered regime such that the green sea is just breaking up into green islands, simultaneously, the Derrida curve changes from the chaotic regime to become tangent with the main diagonal for small values of Dt. And at just this point, state cycle lengths switch from scaling exponentially to scaling polynomially with the number of genes.
91
At least three simple parameters tune whether networks are in the ordered or chaotic regimes. Therefore it is important that evolution can readily tune whether genomic systems lie in the ordered or chaotic regime by tuning any of these three parameters. Even more important, evolution seems to have done just that and tuned cells into the ordered regime. Since cells are our only example of evolved autonomous agents, the data support my candidate first law that autonomous agents and communities of autonomous agents will evolve to the ordered regime near the phase transition to chaos.
92
The simplest parameter to tune is the number of inputs, K, per gene. I showed numerically in , and Derrida and Pomeau showed analytically in , that if K = or less, networks lie in the ordered regime. Derrida and Pomeau showed that K = is the edge of chaos phase transition. For K greater than , networks lie in the chaotic regime.
93
Already this is worth the excited attention I gave it so long ago. For the results show that a network with randomly chosen logic nevertheless behaves with exquisite order. Say, a network of , genes is constructed at random, with the simple limitation that each model gene have K = inputs but that the wiring diagram be chosen at random, and the Boolean function assigned to each gene among the possible Boolean functions of K = inputs is also chosen, once and for all, at random, this spaghetti mess of a network with its tangle of , wires connecting the genes in some mad scramble will straighten itself out.
94
The system settles down to cycle among about states out of , or ,! Order for free, I keep saying. Selection need not struggle against all odds to achieve cells that behave with overwhelming order. That order lies to hand for selection’s further craftings.
95
There are two further known parameters that can tune networks from the chaotic to the ordered regime if K is greater than . Both are biases on the Boolean functions. Remarkably, real cells show dramatic evidence of one of these two biases, which I call “canalyzing Boolean functions.” I discuss this canalyzing bias second.
96
The first bias is characterized by a parameter Derrida and colleagues called “P.” Consider in Figure .c, the Boolean function for gene A. It has five values and three values. The parameter P is defined as the number of instances of the majority value over the full set of cases. Hence, P for gene A = /. For gene B, the Boolean function has seven values and one value. Its P is /. For gene C there are six values and two values. Its P is /.
97
By definition, P for a Boolean function can vary from . to ., when the majority fraction varies from half to all the possible cases. Derrida and colleagues showed that, in general, when K > , P can be tuned upward from . to some critical value, Pc, where networks pass from the chaotic to the ordered regime. The critical value of P as a function of K is shown in Figure .a. Universal scaling for cycle lengths as a square root function of the number of genes has been established along the phase transition in the PK plane.
98
The second bias in Boolean functions are the canalyzing Boolean functions. Consider gene C in Figure .c. If gene A is , gene C will be at the next moment no matter what the activities of gene B or C may be. If gene A is , gene C may be or at the next moment, depending on the prior state of gene C itself.
99
Gene C is governed by a canalyzing Boolean function. Canalyzing Boolean functions have at least one input with at least one value that suces to guarantee the next state of the regulated gene, regardless of the values of all other inputs. By inspection, if A is now, then C is guaranteed to be a moment later. So the Boolean function is canalyzing, and I call A a canalyzing input to C. Note that gene C is also a canalyzing input to gene C.
100
Look next at the Boolean function for gene B in Figure .c. If gene A is , then gene B is sure to be the next moment, regardless of the activities of B and C. So A is a canalyzing input to B. But if B is , that too assures that B will be at the next moment. So B is a canalyzing input to itself. And similarly, if gene C is , gene B is sure to be the next moment. Gene B has three canalyzing inputs. By contrast, gene A in Figure .c has no canalyzing input, so is not a canalyzing Boolean function. No value of A alone, B alone, or C alone suces to guarantee the next activity of gene A.
101
In general, Boolean functions of K inputs may have ,,, . . . K canalyzing inputs. Numerical evidence shows that, for K > inputs per gene, a sucient bias toward a high fraction of genes with a sucient number of canalyzing inputs drives networks from the chaotic into the ordered regime. Figure .b shows the phase transition curve in the CK plane.
102
Before turning back to biology, it is essential to stress that the results noted above for synchronous Boolean networks extend to asynchronous Boolean networks and, more critically, extend to a family of model gene networks in which the genes have graded levels of activity. This is important because the on-o Boolean idealization is quite severe. Real genes show graded levels of activities as a function of the concentrations of their trans acting inputs and the bound states of their cis regulatory loci. If our results were fragile in that they depended upon the Boolean, on-o idealization, we could not trust them to inform us about real cells. Glass and Hill have examined a model with continuously graded levels of gene activity, the “piecewise linear model,” and found the same qualitative behaviors. In particular, the same phase transition occurs as a function of K, P, and C. The striking dierence, however, is that deep in the ordered regime of the piecewise linear case, the genes of the twinkling green islands settle down to steady states that dier on dierent attractors. Near the phase transition to chaos, the green islands begin to exhibit sustained “limit cycle” oscillations that become chaotic in the chaotic regime.
103
Thus, there is now good general evidence that the ordered and chaotic regimes and the phase transition between them are deeply characteristic of some enormous class of parallel-processing nonlinear dynamical systems.
104
The Biology
105
But what of real cells? We have no conclusive evidence, yet an abundance of telling hints. If I am not yet entirely convinced and if I am as I am biased, I nevertheless become increasingly confident that cells, and probably communities of cells, do live in the ordered regime near the edge of chaos. Not only does the evidence point this way, but cells should live near the edge of chaos. Why? As remarked already, the intuition is simple. Being autonomous agents, cells must, as individuals living in communities, make the maximum number of reliable discriminations possible and act on them reliably, without “trembling hands.” Just inside the edge of chaos seems the ideal place.
106
Intuitively, slightly convergent flow in state space allows classification, for when two states converge on a single successor state, those two states have been classified as “equivalent” by the network. Slightly convergent flow would seem to allow the maximum number of reliable classifications in the face of a noisy environment. The convergent flow buers the system against the noise of the environment.
107
And what of trembling hands? No point making superb discriminations, seeing the stag deer, drawing your bow, aiming the arrow, then shooting yourself in the foot. Again, slightly convergent flow in state space to buer external and internal noise seems ideal.
108
So what about cells? My colleagues Steven Harris, Bruce Sawhill, and Andrew Wuensche, and I have carried out work over the past several years that has analyzed actual gene regulatory rules for eukaryotic genes drawn from a variety of eukaryotic organisms yeast, Drosophila, maize, mouse, and so forth. The results show a very strong statistical bias in favor of genes governed disproportionately by canalyzing Boolean functions. When we have constructed model networks with the observed bias toward canalyzing functions, such networks lie modestly in the ordered regime by the Derrida curve and other criteria noted above.
109
It all began at the Santa Fe Institute several years ago. Steve Harris, a molecular biologist from Texas was visiting. I told him about canalyzing functions. “Never,” said Steve. Seeing my opportunity I replied, “You have a good genetics library, want to read a bunch of papers and analyze the transcription rules of genes with three or four or five known regulatory inputs?”
110
I didn’t think Steve would say yes, for the reading of over a hundred papers in the subsequent years, and cataloging the detailed results, was going to be a substantial task.
111
“Sure,” he replied.
112
Some months later, Harris called. “Hey, the results for genes with K = inputs look interesting! There is a bias toward canalyzing functions.”
113
“Never,” I said.
114
“I’ll send the data,” was the reply.
115
Steve had carefully read about sixty papers on regulated genes with K = known inputs, where the data was available at the level of actual binding of trans acting factors to cis sites and the turning on of transcription. A gene with K = known inputs has, in the Boolean idealization, to the rd, or , possible on-o states of those inputs, as we have seen. In virtually all the cases used, Steve had good data for all eight input states. He warranted the Boolean idealization had its problems, but found that in many cases the response of a gene was nonlinear to its inputs. Thus, gene A might be turned on percent by factor and percent by factor , but percent by both factors at the same concentration. It looks like the Boolean “and” function, where the regulated gene is “on” at the next moment only if both inputs are “on” now.
116
We need some mathematical facts. The number of Boolean functions of K inputs is to the to the K, (K). For K = , there are Boolean functions. For K = , there are Boolean functions. For K = , there are , Boolean functions. For K = , there are over a billion Boolean functions.
117
A Boolean function with K inputs, as noted, can have , , , . . . K canalyzing inputs. But, as K increases, the number of Boolean functions that are canalyzing at all, on or more inputs, declines dramatically, as shown in Figure .. In particular, . percent of the Boolean functions of K = inputs are canalyzing. But only percent of the Boolean functions of K = inputs are canalyzing. Only percent of the , Boolean functions of K = inputs are canalyzing. Only less than percent of the billion or so Boolean functions of K = inputs are canalyzing.
118
This shift means that we can test if there is a bias in sampled eukaryotic genes. Indeed, in more detail, among the K = Boolean functions, percent have no canalyzing inputs and a decreasing fraction of the Boolean functions have , , or canalyzing inputs, as you can see in Figure ..
119
Also plotted on Figure . is the observed fraction of eukaryotic genes with K = inputs. The observed curve is the opposite from the curve expected if K = genes were regulated by Boolean rules drawn at random from among the functions. Indeed, fully percent of the observed cases have canalyzing inputs, while the expected fraction would be only peercent if rules were drawn at random.
120
One does not need fancy statistics, but they readily confirm that the observed distribution is sharply shifted to large numbers of canalyzing inputs per gene. Figure . shows similar results for genes with K = known inputs. Again, the shift toward genes regulated with a high number of canalyzing inputs is apparent and strongly statistically significant. Data for K = and K = genes shows the same bias, but the cases are too few to be statistically significant.
121
But there remains analysis to be done. Recall the P parameter. Networks with high P values, where genes are mostly turned on or turned o by their inputs, also lie in the ordered regime. Moreover, there is an overlap but nonidentity between the classes of Boolean functions of high P values and Boolean functions with or more canalyzing inputs. When we analyzed our samples of genes, they also had high P values compared to a random distribution of Boolean functions with K = or K = inputs.
122
In order to discriminate whether the observed bias was toward high canalyzing inputs or high P values or both, we carried out a “residual analysis.” That is, we classified all K = Boolean functions into dierent P classes, P = /, P = /, P = /, P = /. Within each P class, some Boolean functions have , , , or canalyzing inputs. Therefore, among all the Boolean functions for K = within a given P class, there is some distribution of Boolean functions with , , , or canalyzing inputs. Thus, we asked, within a given P class, if the real genes showed a residual bias toward a high number of canalyzing inputs per gene compared to what would happen if real genes were governed by Boolean rules drawn at random with respect to canalization. The answer for K = and K = genes is overwhelmingly yes.
123
In short, if we control for P classes, there is a very strong and very statistically significant residual bias toward high numbers of canalyzing inputs per gene. Conversely, when we controlled for canalyzing input classes and tested for a residual bias toward high P values, there was no sign whatsoever of such a bias. Thus, it appears that evolution has, in fact, tuned the choices of Boolean rules used to govern genes with K = and K = known inputs, as well as genes with K = and K = inputs that we have sampled, sharply in favor of a high bias toward usage of Boolean rules that are canalyzing functions.
124
The main caveat to hold in mind, in addition to misreading the articles or the articles being a nonrandom sample of published data, is that genes governed by canalyzing functions may have more easily detected genetic eects, hence, be noticed and studied. Only future work with randomly chosen structural genes will overcome this source of bias. Despite the caveat, I am quite convinced by the data. In particular, genes governed by high P values would also have easily detected genetic eects, yet there is no such bias in the data.
125
Tentatively, eukaryotic genes are governed by rules biased toward many canalyzing inputs per gene. Why? Either chemical simplicity or natural selection or both, I think.
126
Now let’s examine the consequences. We know that networks with K = , K = , or more inputs per gene are generically in the chaotic regime if Boolean functions are chosen randomly from the full range of possible functions of K = , K = , or more inputs. We have observed a substantial bias toward canalyzing functions. Does this bias suce to tune networks with K = , or K = or more genes into the ordered regime?
127
Our group constructed large networks of genes, using Wuensche’s wonderful DDlab program, available on line, to examine model systems with up to , genes. When we made networks with K = or K = inputs and randomly chosen Boolean functions, their Derrida curves, as expected, were in the chaotic regime, a percolating green sea existed, and vast purple avalanches careened around the system.
128
When we made networks with K = or K = inputs, tuned to the exact distribution of fractions of genes with , , , , or canalyzing inputs, the results (Figure .a) show that such networks are clearly in the ordered regime. The Derrida curve is below the main diagonal. Therefore, in such networks a percolating frozen red sea exists, leaving behind isolated green islands, and the distribution of purple damage avalanches is a power law with a finite cuto at about times the square root of the number of genes (Figure .b). This last predicts that the largest avalanches of gene changes if any single gene is perturbed in humans should be about genes. This fits presently known data.
129
We even have tentative evidence of detailed evolutionary tuning. As the number of inputs per gene increases, a gradually decreasing fraction of the Boolean functions must be canalyzing to cross the phase transition into the ordered regime. Although the data are too few to warrant conclusion, the fraction of canalyzing inputs for K = , K = , and K = eukaryotic genes trends downward as K increases along the curve needed to remain just within the ordered regime. This decrease as K increases results in the virtual identity of the K=3 Data, K=4 Data, and K=5 Data curves in Figure 8.9a. If so, only natural selection can have tuned it thus.
130
There are other clues, reported in Origins of Order and At Home in the Universe in some detail, that support the hypothesis that cells lie in the ordered regime. This interpretation is based on the assumption that the dierent cell types of a higher eukaryote correspond to the dierent state cycle attractors of the network. One attractor is a liver cell, another is a kidney cell, and so forth.
131
•The percolating frozen core that is identical on all attractors of the Boolean network is likely to correspond to the core set of genes whose expression is known to be identical on all cell types, commonly thought to be housekeeping genes.
132
•The typical dierences in gene activity patterns in model cell type attractors, usually a few percent, mirror the data for real cells.
133
•The number of state cycle attractors robustly scales as a square root function of the number of genes in the ordered regime. The number of cell types in real cells scales as roughly a square root to a linear function of the estimated number of genes in that organism, from yeast to sponge to worm to man. Indeed, the square root of , is about , and Bruce Alberts and colleagues quote the number of cell types in humans as .
134
•The expected power law size distribution of avalanches of gene changes after perturbation of a single gene’s activity seems plausible and fits the still sparse data.
135
•Model cell types are homeostatically stable to most small perturbations. So are real cell types.
136
•If a state cycle attractor is a cell type, then cellular dierentiation from one to another cell type corresponds to a perturbation that causes the cell type to leave one attractor and flow to another attractor. In the ordered regime, any cell type can only directly reach a few adjacent cell types and may, by a succession of perturbations, eventually dierentiate along branching developmental pathways to a larger number of cell types. Precisely this pattern of branching dierentiation is known in all multicelled organisms.
137
There are other data, but perhaps that will suce. I believe the initial evidence strongly suggests that eukaryotic cells are in the ordered regime, not too far from the phase transition to chaos.
138
This hypothesis, which I here tentatively adopt as a candidate general law for any biosphere a very long jump to be sure is now open to direct tests. Current technology, based on Aymetrix chips, displays the DNA from thousands of dierent genes in a two-dimensional array. RNA can be sampled from small tissue fragments, or even single cells, and, via a few steps, caused to bind through Watson–Crick base pairing to the corresponding DNA sequence. In this way, the transcribed RNA abundances of thousands of genes can be sampled simultaneously. Thus, we can now follow the RNA states of cells over time, in normal and diseased states, treated and nontreated states, and so forth. Companies such as Incyte are doing just this and selling the data to the large pharmaceutical companies for analysis.
139
But then we can clone controllable cis sites such as promoters into cells at one or more randomly chosen sites and study the eects of transiently perturbing the activities of one or a few genes. Is the Derrida curve below the main diagonal or not? Does a power law distribution of avalanches of change erupt or not? We can use the data to find the genes in the same isolated green islands, for avalanches should be confined to one island and overlap if started at dierent genes in the same island. More, patterns of gene activities that change will change in correlated ways for genes in the same green island, but not for genes in dierent green islands.
140
Remarkably, recent evidence suggests just such correlated patterns of gene activity changes. John Welsh has analyzed the transcription patterns of almost , dierent genes in a specific cell type, the human melanocyte, from newborn children, subjected to the eight possible dierent combinations of three distinct modes of perturbation. Welsh could, in principle, distinguish increases, decreases, or no change in the abundances of gene transcripts for his nearly , genes. Of these, , showed no detectable change, about showed changes. Given eight treatment regimes, the control, and the seven other treatments consisting of all combinations of one or more of his three perturbations, in principle, there are three raised to the seventh power, or about ,, possible patterns of response. But, surprisingly, the genes showed only patterns. Already this is unexpected.
141
But the most interesting result is that, of the patterns, fall into eight mirror-symmetric pairs: Under some conditions, one set of genes increases in transcript abundance while a second set of genes decreases in transcript abundance. Under other of the seven perturbing conditions, the roles are reversed, and the first set of genes decreases in transcript abundance while the second set increases in transcript abundance. Welsh found eight such mirror symmetric pairs of sets of genes, suggesting at least eight dierent coordinated sets of genes, each coregulated, yet each buered from the other sets of genes.
142
It may be that Welsh has found the first evidence of genes lying in eight dierent green islands, buered from one another by the percolating red frozen structure. If the green islands exist, they are the paragraph structure of the genome. They are the midsize decision-taking subcircuits of the genome. For each such island, cut o from influence by other islands by the frozen red structure, has its own alternative attractors, two for this island, five for that island, seven for a third island. The total number of attractors for the entire network is then x x = . And if so, cell types are a kind of combinatorial code of the choices made by the dierent islands.
143
And yet more: My colleague Marc Ballivet, with a minor bit of help from me, has come up with a means to rapidly clone most or all cis sites from cells. If the thousands of cis sites can be cloned, each can be used to anity purify the trans factors binding it. Other biologists are learning how to construct small genetic circuits. By our means or others, the medicine of the twenty-first century will learn to control the activities of genes in genetic networks, hence, control tissue regeneration and dierentiation. We enter the “postgenomic” era.
144
I return to my candidate law and remark next that cells typically do not live alone; they live in communities of single-celled organisms or other simple multicellular organisms, or they live in tissues in highly complex multicellular organisms. Thus, any candidate law must be considered with respect to a community of autonomous agents.
145
Consider an ecosystem with dierent species of bacteria. Each species may secrete dierent chemical species, S, that impinge on a subset, C, of the other cell species. In Figure . I show a three-dimensional coordinate system. One axis shows the order-chaos axis for a single isolated bacterium, measured by Derrida curve criteria, with order on the left, near the origin, and chaos on the right. The remaining axes show C and S.
146
If a given cell is at the phase transition between order and chaos, and additional molecular inputs, S per cell, come from C dierent types of cells, the total connectivity in the cell is raised from KN to KN + CS. The results will typically drive that cell into the chaotic regime. Indeed, the entire community will be driven into the chaotic regime.
147
Hence, in the coordinate system of Figure ., Igor Yakushin at Bios Group has shown that a hyperbolic surface, as shown, separates the ordered regime from the chaotic regime. Cells can buer themselves from chemical perturbations from other species by retreating deeper into the ordered regime. This can be accomplished by increasing the number of canalyzing inputs per gene. And indeed, as described above, individual eukaryotic cells do appear to lie well within the ordered regime, perhaps as buering for the fact that such cells, like yeast, live in microbial communities or, like human cells, live in tissues where each cell is bombarded with chemical signals from other cells in the same body.
148
It is a plausible conjecture that communities of cells, and tissues, come to lie on the phase transition surface between order and chaos. The hypothesis is readily tested. If so, perturbations of a single gene in one type of cell should trigger a power law distribution of avalanches of changes of gene activities that spreads from the perturbed gene to other genes in that cell and species, and to cells of other species. Further, the Derrida curve of the total community should be at the phase transition.
149
A final numerical test is under way. The aim of the numerical test is to check whether cells and communities that lie at the phase transition would, in fact, make the maximum number of reliable discriminations and act without trembling hands.
150
Cells that are yammering at one another probably never reach their attractors. Pick a subset, M, of the states in each cell in the community as its “action” states. Now release the community, numerically, somewhere on the edge-of-chaos surface. Over a long period of time, the M action states will be encountered in some order, yielding some probability distribution of transitions between pairs of the M states, as each cell is perturbed by S chemical signals from C other cell types. The transitions among the M action states of each cell can be written in a matrix showing the transition probabilities between any pair of the M states. From this it is possible to calculate the mutual information, MI, between pairs of the M action states.
151
MI is H(A) + H(B) - H(AB). Here H(A) is the entropy of A, H(B) the entropy of B, and H (AB) is the joint entropy of A and B. If A and B are occurring randomly with respect to one another, then H(AB) equals the sum of H(A) + H(B), so MI is . If either A or B is unchanging, MI is again . But if A and B are changing in correlated ways, MI is positive. Hence, we can ask what value of M maximizes the mutual information among the M states, how is that related to cell and community position on the edge-of-chaos phase transition in Figure ., and how well does the mutual information among the M action states correlate with the mutual information in the patterns of CS input signals arriving at each cell type? More broadly, can selection maximize both M and that mutual information correlation and, if so, for what value of M and where on or o the phase transition surface in Figure .?
152
I do not know the answers but hope the optimal point lies on the phase transition surface, for such selected mutual information correlation would begin to show that such communities of cells with such regulatory networks can indeed make the maximum number of reliable discriminations and act on them without trembling hands to make a complex living in a complex world.
153
Candidate Law 2: Community Assembly Reaches a Self-Organized Critical State
154
First, a foray into self-organized criticality, a concept that will drive the rest of this chapter.
155
Per Bak and his colleagues in published a paper concerned with sand piles. One is to take a large, flat table, supply lots of sand, and gently let sand fall from on high onto the table. As the sand piles up, it eventually reaches the rest angle for sand and also extends to the boundaries of the table. You keep adding sand slowly. Sand-slide avalanches begin to form and sand drops to the floor. Measure the size distribution of the avalanches and a power law distribution is revealed, with many small avalanches and few large avalanches (Figure .).
156
Power law distributions can, in fact, arise in many ways. One of those ways is at a phase transition, for example, in a ferromagnet at the phase transition temperature. Above the phase transition temperature, the magnetic spins line up randomly with one another and keep flipping. Below the phase transition temperature, the ferromagnet tends to line up with magnetic spins all pointing the same way, say, north pole upward, hence, the material is magnetized. At the phase transition temperature, something magic and “universal” occur: clusters of spins oriented the same way arise, and the clusters have a power law distribution of sizes. The power law distribution implies that there is no preferred size scale in the system. If there were a preferred size scale, clusters would be distributed exponentially in size, setting a size scale at which clusters at that size were half as likely as tiny clusters of spins.
157
So too for the sand pile. There is no preferred size scale revealed by the power law distribution of sand slide avalanche sizes. The big dierence is that clever physicists tuned the temperature of the ferromagnet to the critical phase transition temperature. Per and company tuned nothing, they merely let sand drop randomly and gently onto the sand pile, and the sand pile tuned itself to criticality, hence the name: self-organized criticality.
158
Many of us have fallen in love with these results. Many are critical as well. I am on the pro side of the debate. I know the theory does not apply to real sand, with its rough edges, and works well with short-grain Swedish rice, but I love it anyway. Newton’s law of universal gravitation does not work for bits of paper and cannon balls falling from the Tower of Pisa wind resistance, you see. The bits of paper do not hit the ground at the same time as do the cannon balls. Most theories have ceterus paribus clauses. But I suspect Per and friends have found a deep truth about how nonequilibrium systems self-organize.
159
The remaining three of the four candidate laws apply versions of this idea. Even law above is a version. Selection tunes communities of cells to the phase transition between order and chaos where a power law distribution of damage avalanches on all scales propagate across the system.
160
The claims coming next are not my own work, but derive from fine eorts by ecologists Stuart Pimm, Mack Post, and more recently, Bruce Sawhill and Tim Keitt, making use of work by physicist Scott Kirkpatrick and his colleagues.
161
First, the early work of Stuart Pimm and Mack Post, done in the late s: They were concerned with community assembly of organisms into a local ecosystem (Figure .). They ignored long-term coevolution and made use of the Lotka-Volterra equations. These equations basically say, for any species, what other species it eats, how readily it turns the eaten prey into an extra copy of itself, and how fast it reproduces on its own without eating. Plants, herbivores, and carnivores are readily represented.
162
Stuart and Mack did a surprising study with an astonishing and still poorly understood result. They made a pile of hypothetical critters in a computer, each governed by some plausible Lotka-Volterra equation whose parameters were drawn at random from some distribution. Then, with fond hopes, Stuart and Mack randomly chose species, one after another and tossed them into the computational equivalent of east Kansas. (“Kansas is a place for people who like subtlety,” my friend Wes Jackson once told a group of us visiting his Land Institute in Kansas in January. We observed subtly dierent shades of brown grass, trees, dirt, dust, hawks, mice, and agreed.)
163
What Stuart and Mack found was this: When the first few species were tossed into east Kansas in silico, they tested whether these few could coexist by running the corresponding Lotka-Volterra equations of all the species simultaneously to see if the hypothetical species all sustained abundances above zero. For example, the model community might go to a steady-state ratio of four species or might enter a limit cycle oscillation or some chaotic dynamics with a strange attractor that remains above zero for all species.
164
Dandy, the mock community was stable. They kept adding randomly chosen species. At first, it was easy to add new species, but it became progressively harder until no more species could be added. The deep mystery is, why? It is not a lack of food or energy. Furthermore, on the way to assembling the community, Stuart and Mack began to notice that as the community filled up, addition of one species could make one or more other species go locally extinct. They began to find evidence of power law distributions of such extinction events.
165
Why? No one knows for sure, and Stuart has some wonderful ideas discussed in his fine book, The Balance of Nature. But I am going to move on to the recent ideas of Bruce Sawhill and Tim Keitt, which build on the work of physicist Scott Kirkpatrick. Scott is famous for coinventing the Sherrington-Kirkpatrick spin-glass model and hangs out at IBM being smart. Not long ago, he took up what is called the “Ksat” problem.
166
Here is the Ksat problem. Consider some logical formula, or expression, as in our Boolean net. Any Boolean expression can be cast in “normal disjunctive form”; an example is (A or A) and (A or A) and (not A or not A). In such an expression the variables between brackets constitute a “clause,” so this logical expression has three clauses. The clauses are linked by “and.” Thus, for the entire statement to be true, all three clauses must be true. Within each clause, variables are linked by the logical “or,” symbolized with v. (A v A) is true if A is true, if A is true, or if both A and A are true.
167
Now we can ask if the above expression can be satisfied by some assignment of true or false, or , to the four variables, A, A, A, and A. The answer is yes since if A = true and A = true and A = false, then all clauses are satisfied. On the other hand, consider (A) and (not A). There is no assignment of true or false to A that makes both clauses true since they contradict one another. More generally, an expression in normal disjunctive form, with K variables (A v A v . . . Ak) in each clause, a total of C clauses and a total of V variables A, A, . . . Av may or may not be satisfiable.
168
The wonderful result that Scott and friends showed is a phase transition from Ksat expressions that are almost certainly satisfiable to Ksat expressions that are almost certainly not satisfiable (Figure .).
169
In Figure ., the horizontal axis is labeled C/V. Thus, the x-axis shows the ratio of clauses to variables, hence, on average, how many clauses each variable is in. Obviously, as each variable is in more and more of the C clauses, with randomly assigned truth requirements, Vi versus not-Vi, the chance that the set of clauses can be jointly satisfied gets harder. On the other hand, as K goes up, there are more variables per clause, any one of which, if satisfied, satisfies the clause since the variables within a clause are joined by “or.” Thus, as K goes up, the problem gets easier.
170
Remarkably, there is a phase transition on the C/V axis at ln x (K) or . x K. As shown in Figure ., for C/V values less than this phase transition value, the expression is almost certainly satisfiable. As C/V passes the . x K critical value, the probability that the expression can be satisfied plunges to near zero.
171
Now, the idea that Bruce and Tim had was that building a community with random critters having random food and niche requirements is like the Ksat problem. They consider S species. Each species’ niche includes some of the S species that it eats and some that eat or kill it. Thus, species S must eat S or S or S to survive, but can survive on its own only in the absence of S, which poisons the chloroplasts that allow S to be an autotroph. In disjunctive form, the requirements for species S to survive are (S v S v S) and (not S).
172
Bruce and Tim did numerical experiments for dierent values of S and K and C where again K is the number of alternative species any given species could eat. (S v S v S) corresponds to K = .They found the same phase transition. As more and more species are added, there are more potential interactions among the species since the pairwise possibilities increase as the square of the total species diversity. As this occurs, each species appears in an increasing number of clauses, so as C/V increases, at some point the satisfiability of the Ksat system went from easy to hard. It became hard, then impossible, to add new species. The community filled up because the Ksat problem went from easy to impossible.
173
I find this line of thinking rather interesting. Given rather general assumptions on the probability per pair of species of who eats or kills what, assigned more or less randomly, then as the number of species, hence pairs of species increase, such communities can fill up in the presence of persistent attempts at invasion. Bruce and Tim may have found the underlying reason for Stuart and Mack’s earlier results. Moreover, Bruce and Tim have found reasonable numerical evidence for small and large avalanches of local extinction events upon entry of new species while the community was filling up.
174
Meanwhile, experimental work assembling communities of real organisms shows much the same results. Communities tend to fill up and do exhibit small and large local extinction events.
175
Candidate Law 3: Coevolutionary Tuning of Fitness Landscapes and Organisms to a Self-Organized Critical State
176
Begin with a well-stated claim of Darwin: gradualism. Species evolve, argued Darwin, by the gradual accumulation of useful variations that were gradually sifted by natural selection.
177
Darwin is correct about contemporary life, and presumably about ancient life, based on the record. In fact, for current life forms, seven decades of hard work by geneticists, working with organisms as disparate as mouse, fruit fly, maize, yeast, and many other eukaryotes, demonstrates conclusively that most mutations are of minor eect. For example, the fruit fly, Drosophila melanogaster, upon which I worked for twelve years, has the abdominal bristles alluded to above. A modest number of mutants exist that slightly increase or slightly decrease the number of abdominal bristles. The flies don’t seem to mind, at least in the odd security of my and other biologists’ laboratories.
178
More rarely, there are mutants of rather dramatic eect, none more so than the famous homeotic mutants of Drosophila, which fascinated me and many others. Here a single mutant can change an antenna into a leg or an eye to a wing or a head to the genitalia. These survive perfectly well in the laboratory as well, but one expects would not fare well in the real world.
179
Therefore, most of the heritable variation cast up by mutation is of minor eect, and gradual variation is persistent grist for the selection mill. Somehow, current organisms have contrived themselves to be such that most mutations are of minor eect. But is it necessarily the case that all complex systems have the property that most mutations are of minor eect? And if not, where does the gradualism of Darwinian selection come from?
180
Importantly, it is easy to create systems that are not readily adaptable by mutation and selection. What follows is not quite a theorem (and was discussed in At Home in the Universe), but it will do. Many of the readers of this book are competent programmers. Consider a typical program, say written in C, Java, or some other language. Perhaps it computes something as simple as the square roots of the first two million integers. Perhaps it simulates an ecosystem. Whatever that program does, imagine trying to evolve it by making random mutations in the code.
181
We all know what would happen. Most mutations of a computer program are of major eect. The program won’t compile. If it compiles, it generates some vast stream of symbolic nonsense or goes into an undetected infinite loop and “hangs.”
182
And we can make the matter substantially worse by eliminating redundancy in the code. Any computer program can be written as a sequence of binary, and , symbols, where that sequence represents the input data to the program and the program itself.
183
Now, a well-known area of computer science considers how redundant a program is; for example, a simple redundancy would duplicate each binary symbol. Computer scientists talk of eliminating the redundancy of a computer code to achieve the most compressed possible code. A fascinating theorem states that there is no proof that a given computer program is maximally compressed, but that if it is maximally compressed, it is in a rigorous sense not detectably dierent from a random sequence of binary digits.
184
Figure . shows a four-dimensional Boolean hypercube with all two to the fourth, or sixteen, possible binary sequences of length four, ranging from () to (). Each sequence is on one of the sixteen vertices of the hypercube and connected to four -mutant neighbors achieved by changing a single binary symbol among the four from to or from to . Imagine that the minimal program we were considering were a -long binary sequence. Then that minimal program could be represented as a single vertex on the -dimensional Boolean hypercube with to the th = to the th dierent sequences, hence vertices. A remarkable theorem due to Gregory Chaiten shows that if there is a minimal program with binary symbols, there is at most about one such minimal -bit symbol sequences. In short, only a single vertex on the -dimensional Boolean hypercube corresponds to the desired minimal program. Now consider each of the other -bit sequences on the hypercube as a computer program. (Consider a gedankenexperiment, a thought experiment, for I have not carried out the actual computer experiment, and no one has been able to prove my following plausible, probably true, conjecture.) Run each binary sequence as the input data and program on our universal computer. Measure in some sense how far away the printout of that program is from the correct program for some finite chunk of the correct program’s printout. If one started with the correct program, my bet is that since all redundancy has been removed, any single mutation to the code will randomize what the code does. Distant -bit binary strings will, on average, be as good or bad an approximation of the correct program as its -mutant variants.
185
If one thinks of the measure of how close the output of a binary string program is to the correct program as the “fitness” of that trial binary string, then the fitness can be thought of as a height. The distribution of heights over the -dimensional Boolean hypercube therefore creates a fitness landscape. In fact, my conjecture amounts to stating that the resulting fitness landscape is completely random. Neighboring points have fitnesses that have no correlation.
186
Assume my conjecture is true. There are theorems stating that there is no way to hill climb to the global peak, the single correct program, by accumulating mutants that gradually improve the program. Indeed, the only way to find the single good program is, eectively, to search the entire -dimensional Boolean hypercube. You’d have to look at most of the vertices on that cube to find the working program. The problem is, as they say, NP hard, meaning the size of the problem scales exponentially in the length of the binary symbol sequence. But a single example makes a general point: Not all complex systems can be assembled by an evolutionary process!
187
It follows that only some complex systems can be assembled by an evolutionary process. And it turns out that an evolutionary process based on mutation, recombination, and selection, the genetic search mechanisms of current life, does very well on a special kind of fitness landscape, where the high peaks tend to cluster near one another and the sides of the peaks are reasonably smooth, rather like the high Alps.
188
Then, as I first asked in chapter , where do such correlated fitness landscapes come from? More generally, I recall the no-free-lunch theorem of Bill Macready and David Wolpert. Macready and Wolpert wondered whether, averaged over all possible fitness landscapes, some search algorithms, such as mutation and selection, on average outperform all other search algorithms, such as random search on the landscape or hill descending or picking birthdays, taking their square, and jumping that distance in a randomly chosen direction.
189
The no-free-lunch theorem proves that, averaged over all landscapes, no search algorithm outperforms any other. Well, my goodness! On average, random search and hill descending do just as well as hill climbing in finding peaks of high fitness. And here we organisms are, stuck using mutation, recombination, and selection. Yet organisms and ecosystems seem to be pretty complex. Once again, where did the “good” landscapes come from, the ones that Darwinian gradualism works so well in searching?
190
In chapter I was led by the above to define natural games as ways of making a living. Naturally, ways of making a living have evolved as organisms have evolved. So rather easily one gets to the conclusion that the winning games are the games that winners play.
191
And, as noted in chapter , those ways of making a living that set problems that are well searched out by the search mechanisms of organisms mutation, recombination, and selection will be well searched out. Many sibling species will arise and lineages will branch. There will be many species making livings and many cases of livings being made that are well searched out by the search mechanisms of organisms.
192
In short, there must be a self-consistent coconstruction of a biosphere in which organisms, ways of making a living, and search mechanisms jointly and self-consistently come into existence. Organisms are not solving arbitrary problems. We are solving the kinds of problems we can solve given our solution procedures. How could it be otherwise?
193
So, somehow and we will have to seek plausible mechanisms organisms are tuning the statistical structure of the fitness landscapes they are searching in evolution. But the problem is very much more complex than merely searching a fixed fitness landscape. Fitness landscapes are not fixed. If the abiotic environment changes, the fitness landscape of organisms changes, buckles, and deforms.
194
Worse, organisms coevolve. My favorite example remains the frog and the fly. If the frog develops a sticky tongue, the fitness of the fly is altered. But so too is the fitness landscape of the fly, what it should do next. It should develop slippery feet, or sticky stu dissolver or a better sense of smell to smell sticky stu before the frog gets too close or . . .
195
So, due to coevolution, the fitness landscape of each species heaves and deforms as other species make their adaptive moves.
196
A Sojourn to Coevolution in the NK Model
197
These results were presented in At Home in the Universe but are needed here. Since they are publicly available, I will be brief.
198
The NK model is a simple toy world in which an organism has N genes. Each gene comes in two “alleles,” or versions, or . Each allele of each gene makes a contribution to the fitness of the organism that depends on the allele of that gene and upon the alleles of K other genes. In genetics, these K other genes are called “epistatic” inputs to the fitness contribution of a given gene. The to the N combinations of alleles of the N genes are therefore located on the vertices of the N-dimensional hypercube, like Figure .. The fitness of each type of organism, or vertex, is written on that vertex and can be thought of as a height. Hence, the NK model creates a fitness landscape over the N-dimensional Boolean hypercube. To keep matters simple, I assume all critters have a single chromosome, that is, are haploids. When they are not feeling sexy, bacteria will do as an example.
199
Having chosen N and K, say N = and K = , the rest is done at random in the hopes that generic features of N and K will show up in the resulting statistical structure of the fitness landscape (Figure .a–c). In one limiting case, the K inputs to each of the N genes are chosen at random from among the N. Each gene has two alleles, and . The fitness contribution of that gene is aected by which of its alleles occurs and by which of the or alleles of K other epistatic input genes occurs. Thus, each gene’s fitness contribution is aected by K + genes.
200
To study the generic features of such systems, I assign, once and for all, a random “fitness contribution” to each of the to the (K + ) combinations of allele states aecting each of the N genes. The fitness contribution is drawn from the uniform interval between . and .. Thus, instead of the Boolean functions described above showing when genes turn on and o, here I obtain a column vector for each of the to the (K + ) allele states aecting a given gene, and in each position is a random decimal. Once this is done for each of the N genes, it remains to define the fitness of an organism with a specific allele at each of the N genes. I define this as the average of the fitness contributions of the N genes. The results yield a fitness landscape over the N-dimensional hypercube (Figure .c).
201
I will briefly summarize results for the structure of NK landscapes. When K = , each of the N sites is independent. There is an optimal allele at each site and hence a globally optimal genotype. Any other genotype is suboptimal, but can steadily climb to the peak by flipping any gene in a less favorable allele to the opposite, more favorable state, or . So the landscape is like Fujiyama, single peaked with smooth sides.
202
When K is the maximum value, N - , as in Figure .a–c, then each gene influences the fitness contribution of every gene. This is the totally interconnected system. Since fitness values are assigned at random for the to the (K + ), or to the N input configurations when K = N - , it is easy to show that the resulting fitness landscape is fully random.
203
A main feature of random landscapes is that there are nearly exponentially many local peaks, indeed the number of local peaks is to N/(N + ). For N = , there are local peaks on the landscape. Finding the global peak by hill climbing is improbable, and the system becomes trapped on a local peak. Other features include the lengths of walks via fitter neighbors to nearby peaks, which scales as the logarithm of N, and the way directions uphill dwindle on walks uphill. At each step uphill, the fraction of directions uphill is cut in half, yielding exponential slowing in the rate of finding fitter variants, hence, rather general laws about the rate of improvement slowing exponentially that we will discuss in the next chapter on “learning curves” in economics.
204
Now, on to coevolution and the evolution of the structure of fitness landscapes. Figure . shows a frog and fly, each characterized by an NK landscape, coupled together. Each of the N genes in the frog receives inputs from K genes in the frog and C genes in the fly, and vice versa. Thus, the sticky tongue of the frog aects the fitness of the fly via the presence or absence in the fly of slippery feet, sticky stu dissolver, or a strong sense of smell for sticky frog tongues. To accommodate the C couplings, each gene in the frog looks at K + C inputs and has its table of random fitness contributions augmented with new random decimals. So too for the fly feeling the eects of the frog.
205
Now, when the frog population moves by mutation and selection uphill on the frog landscape, those moves distort the fly’s landscape, and vice versa. Coevolution is a game of coupled deforming landscapes. Figure .a–c show coevolution in model ecosystems with four, eight, and sixteen species. Due to landscape deformations as species coevolve, an adaptive move by one species can cause the fitness of other species to decrease. In general, such coevolving systems can behave in two regimes, an ordered regime and a chaotic regime, separated by a phase transition.
206
In Figure .a the four-species ecosystem eventually settles down to a state where fitnesses stop changing. This corresponds to an ordered regime or unchanging evolutionary stable state in which each species has evolved to a local peak on its fitness landscape that is consistent with the peaks occupied by its ecosystem neighbors. Once attained, each species is better o not changing so long as its neighbors do not change. By contrast, in the eight- and sixteen-species ecosystems, Figures .b and c, fitnesses continue to jostle up and down as species evolve in a chaotic regime, each species chasing the adaptive peaks on its landscape that retreat due to adaptive moves of other species faster than each species can attain the peaks on its own landscape.
207
A major point of the coevolving NK landscape model is that the creatures can tune the structure of their fitness landscapes, each for its own selfish advantage. Yet, as if by an invisible hand, the tuned landscape structure works for the average benefit of all. This toy model is, to date, the only example I know in which creatures tune the structure of their fitness landscapes such that all evolve in problem spaces where, in some sense, they can search those spaces well self-consistently with their search mechanisms.
208
Here is how the toy model works. Each species is represented by a single individual. Hence, the species is assumed to be isogenic, except during the rapid evolution to fitter genotypes that happens as a fitter mutant of a species steps from one point to another point on the landscape. Very rapidly, it outreproduces its less-fit cousins. Hence, in this limit, the entire species can be said to hop between points on the landscape.
209
At each move of the computer program, any of four events may happen. A given species is chosen at random. First, it may do nothing. Second, it may change its genotype, and if the result is that it is fitter in interacting with its ecosystem neighbors, that innovation will be accepted. The creature has evolved on its landscape and probably deformed the landscapes of its neighbors. Third, the critter can change the ruggedness of its landscape by increasing K or decreasing K. Landscapes become more rugged and multipeaked as K increases. The move altering K is accepted only if that move makes the current genotype of the creature fitter. Hence, altering the ruggedness of the fitness landscape must pay o for the creature immediately and is accepted selfishly. The fourth thing that can happen is rather mean. A random other creature, say, Godzilla, is chosen to attempt to invade the current species niche. A copy of Godzilla, Godzilla’, connects to the first species’ ecosystem neighbors and has a go. If Godzilla’ is fitter when coupled to the first species’ econeighbors than that species, that species goes extinct in its niche and is replaced by Godzilla’.
210
By the fourth mechanism, if Godzilla’ happens to have a beneficial landscape ruggedness due to its K value, that good landscape ruggedness has now replicated from the initial Godzilla to its copy. So good landscape ruggedness can spread, by natural selection, through the model ecosystem, hence, landscape ruggedness can evolve.
211
The results are shown in Figure .. Indeed, landscape ruggedness does evolve to an intermediate ruggedness. During this evolution, the mean interval between extinction events increases dramatically, hence, the mean number of extinction events decreases. In this sense, all the creatures that remain become fitter due to the coevolutionary tuning of landscape ruggedness. In addition, when Godzilla’ replaces the hapless species that now goes extinct, its econeighbors find themselves interacting with Godzilla’ itself. They may not be as fit interacting with Godzilla’ as with the first species, hence, they too may be invaded and driven extinct in turn.
212
In short, avalanches of extinction events can propagate. Figure . shows that the distribution is a power law, with many small and few large extinction events. Moreover, once a new species comes into existence, say, Godzilla’, it may not fare well in its new niche, hence, may go extinct soon. But if it lasts in its niche, it may be well adapted, hence, resistant to being driven extinct. The results (Figure .a) are a power law distribution of species lifetimes.
213
Thus, this model shows an invisible hand in which natural selection, acting on individuals only, tunes landscape ruggedness. All players are, on average, fitter in the sense of surviving as species for much longer periods, yet the ecosystem appears self-organized critical with a power law distribution of extinction events. Species lifetime distributions are also power laws.
214
Does this model apply to the real world? There is now considerable evidence that over the past million years the size distribution of extinction events in the record, in terms of the number of species going extinct per -million-year period, is best understood as a power law, with many small and few large extinction events, (Figure .). Furthermore, the lifetime distribution of species, as well as genera and families, is, indeed, a power law (Figure .a,b).
215
I have discussed my own model primarily to focus on the fact that coevolution by self-natural selection alone acting on individuals alone can tune landscape ruggedness so those landscapes are self-consistently well searched by the creatures searching them and their search mechanisms. We can, and presumably do, self-consistently coconstruct ourselves, our niches (and hence problem spaces), and our search mechanisms such that, on average, we propagate ourselves and our descendants. It has worked for . billion years. Were our fitness landscapes such that we could not search them, we would not be here. We coconstruct our ways of making a living and search out better ways of making a living while we jiggle one another.
216
The NK model is but one crude model of coevolving organisms and their coupled deforming landscapes. More generally, each organism has traits that are aected by many genes, the polygeny discussed above, and each gene aects many traits, the pleiotropy alluded to above. It is interesting to note that were organisms to evolve to a position below but near the biological reality that is the proper analogue of the Ksat phase transition, such a location might well achieve the gradualism and capacity to persistently evolve that Darwin noted and that we observe. Both the gradualism and capacity to evolve are related to the number of alternative assignments of true or false to the V variables that satisfy the Ksat normal disjunctive form. If there are connected pathways from one such assignment via -Hamming-mutant neighboring assignments that all satisfy the normal disjunctive form, then adaptive walks via alternatives genotypes are available, all of which roughly generate the same organism. Gradualism is achieved. Polygeny and pleiotropy tune landscape ruggedness and deformability, which tune coevolutionary dynamics, perhaps to a self-organized critical state of an ecosystem.
217
There are, in short, dimly understood laws that allow the coevolutionary construction of accumulating complexity. And it appears that such coevolution typically is self-organized critical. The NK coevolutionary model is not the only example of a model exhibiting self-organized critical behavior of model ecosystems. Bak and colleagues, Ricard Solé, and others have created elegant models aiming in the same direction. In particular, Solé’s model comes closest to fitting the actual slopes of the observed power laws, which are -.
218
Candidate Law 4: Expanding the Adjacent Possible in a Self-Organized Critical Way
219
How does the biosphere, collectively, broach and persistently invade the adjacent possible at the chemical, morphological, and behavioral levels?
220
I suspect there is a general law and mentioned it in the last section of the previous chapter. It is my hoped-for fourth law of thermodynamics for self-constructing biospheres. We enter the adjacent possible, hence expand the workspace of our biosphere, on average, as fast as we can.
221
Recall the Noah’s Vessel experiment, with two of every species ground up in a blender, breaking all cell membranes, comingling the trillion or so proteins of the hundred million species with the thousands of small molecule metabolites. A supracritical explosion of chemical diversity would presumably ensue. As I noted, life has learned to avoid that fate. Cells are subcritical. Were they not, then any new chemical that chanced to enter the cells of Fredricka the fern would unleash a cascade of synthesis of novel molecular species, some of which would presumably kill poor Fredricka. Best defense? Stay subcritical. Why mess with that mess?
222
But recall that a mixed microbial community should be able to be driven to the subcritical-supracritical boundary by increasing the diversity of microbial species present and/or hitting the community with a sucient diversity of novel small molecule species. My argument follows that if the community is supracritical, the novel cascading molecular species will kill o some of the microbial species, thereby lowering the community toward the subcritical regime. On the other hand, mutation and immigration should drive the community toward the supracritical regime. Do mixed microbial communities hover on the subcritical-supracritical boundary? We have seen other reasons that bound a community’s complexity, including Ksat problems noted in this chapter. So a better question is: Can microbial communities be driven supracritical? And if so, are they often near that boundary?
223
Of course, I do not know, but the hypothesis is testable. Increase the diversity of species and test for the diversity of synthesized small molecules, say, by gas chromatography. A colleague and I once devised such an experiment with mixtures of increasingly diverse moss species, planning to measure the molecular diversity of gas species evolved as a function of community diversity and the diversity in gas species introduced to the community. We went skiing instead. I still like the lines of the experiment. It could be carried out directly with mixed microbial communities as well.
224
As noted in the previous chapter, there must be some interplay in the entry into the adjacent possible that gates the exploration by the capacity of natural selection to trim away the losers. I described the Manfred Eigen–Peter Schuster error catastrophe. If the mutation rate in a population of viruses is low, by successive rare successful mutations the population climbs steadily uphill, then becomes trapped on or in the near vicinity of a local peak.
225
But let the mutation rate be increased. The population on the peak is deformed by the rapid accumulation of mutations and diuses away from the peak into the lowlands of poor fitness. Tuning the mutation rate compared to the selection advantage of the fitness peak compared to nearby values on the fitness landscape tunes this error catastrophe. Above a critical ratio of the selective advantage at the peak to the mutation rate, the population remains near the peak. If the mutation rate is slightly higher, the population diuses into the high-dimensional hinterlands, lost adrift in sequence space.
226
The error catastrophe is, of course, rather general. Eigen and Schuster consider a fixed high-dimensional sequence space, like our N-dimensional Boolean hypercube, which can be regarded as a sequence space in which molecules have only two “nucleotides” and , C and G.
227
But what about an ever enlarging space of possibilities, expanding into an ever larger adjacent possible? Here too the rate of exploration of novel possibilities must be gradual enough that natural selection can weed out the losers. If not, the biosphere as a whole would diuse into new ways of making a living so rapidly that selection would not be able to control the exploration, and we would soon falter.
228
So a general balance must be struck. We broach the adjacent possible by those exaptations that are not, I hold, finitely describable beforehand and do so at a rate that manages to work. We gate our entry into the adjacent possible.
229
Globally, for the entire biosphere, this suggests that we enter the adjacent possible about as fast as we can get away with it. On average, the global diversity of the biosphere has increased secularly. Indeed, it would be fascinating to know what has happened to microbial diversity in . billion years. There are bacteria eating rocks down there two miles below the surface, in hot thermal vents, in the cold of Antarctic frozen tundra and lake edges, all over the place.
230
I do not know the form of the law that governs this exploration. Perhaps it is locally self-organized critical in communities, multiplied by the number of eectively independent local communities in the biosphere. But I can make out a law in which the adjacent possible is invaded, such that diversity and coconstructed, coevolved complexity accumulate, on average, as fast as it can.
231
I can sense a fourth law of thermodynamics for self-constructing systems of autonomous agents. Biospheres enlarge their workspace, the diversity of what can happen next, the actual and adjacent possible, on average, as fast as they can. Clues include the fact, noted above, that for the adjacent possible of a N-dimensional phase space to increase as the biosphere’s trajectory travels among microstates, a secular increase in the symmetry splittings of microstate volumes must occur, such that dierent subvolumes go to dierent adjacent possible microstates. Eventually, such subvolumes hit the Heisenberg uncertainty limit. As I noted, organisms do touch that limit all over the place in the subtle distinctions that we make, turning genes on and o, smell sensors on and o, and eyes on and o, this way and that.
232
The whole of this chapter suggests that autonomous agents coevolve to be as capable as possible of making the most diverse discriminations and actions, take advantage of the most unexpected exaptations, coevolve as readily as possible to coconstruct the blossoming diversity that is, and remains, Darwin’s “tangled bank.” I sense a fourth law in which the workspace of the biosphere expands, on average, as fast as it can in this coconstructing biosphere.
233
A fourth law for any biosphere? I hope so.
234

235

236

237

238

Product

Resources

Company