Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Download
29547 views
1
2
3
4
5
6
Why is the sky blue? Any scientist will answer this question with a statement of
7
mechanism: Atmospheric gas scatters some wavelengths of light more than others. To answer
8
with a statement of purpose—e.g., to say the sky is blue in order to make people
9
happy—would not cross the scientific mind. Yet in biology we often pose “why” questions in
10
which it is purpose, not mechanism, that interests us. The question “Why does the eye have
11
a lens?” most often calls for the answer that the lens is there to focus light rays, and
12
only rarely for the answer that the lens is there because lens cells are induced by the
13
retina from overlying ectoderm.
14
It is a legacy of evolution that teleology—the tendency to explain natural phenomena in
15
terms of purposes—is deeply ingrained in biology, and not in other fields (Ayala 1999).
16
Natural selection has so molded biological entities that nearly everything one looks at,
17
from molecules to cells, from organ systems to ecosystems, has (at one time at least) been
18
retained because it carries out a function that enhances fitness. It is natural to equate
19
such functions with purposes. Even if we can't actually know why something evolved, we care
20
about the useful things it does that could account for its evolution.
21
As a group, molecular biologists shy away from teleological matters, perhaps because
22
early attitudes in molecular biology were shaped by physicists and chemists. Even
23
geneticists rigorously define function not in terms of the useful things a gene does, but
24
by what happens when the gene is altered. Molecular biology and molecular genetics might
25
continue to dodge teleological issues were it not for their fields' remarkable recent
26
successes. Mechanistic information about how a multitude of genes and gene products act and
27
interact is now being gathered so rapidly that our inability to synthesize such information
28
into a coherent whole is becoming more and more frustrating. Gene regulation, intracellular
29
signaling pathways, metabolic networks, developmental programs—the current information
30
deluge is revealing these systems to be so complex that molecular biologists are forced to
31
wrestle with an overtly teleological question: What purpose does all this complexity
32
serve?
33
In response to this situation, two strains have emerged in molecular biology, both of
34
which are sometimes lumped under the heading “systems biology.” One strain, bioinformatics,
35
champions the gathering of even larger amounts of new data, both descriptive and
36
mechanistic, followed by computerbased data “mining” to identify correlations from which
37
insightful hypotheses are likely to emerge. The other strain, computational biology, begins
38
with the complex interactions we already know about, and uses computer-aided mathematics to
39
explore the consequences of those interactions. Of course, bioinformatics and computational
40
biology are not entirely separable entities; they represent ends of a spectrum, differing
41
in the degree of emphasis placed on large versus small data sets, and statistical versus
42
deterministic analyses.
43
Computational biology, in the sense used above, arouses some skepticism among
44
scientists. To some, it recalls the “mathematical biology” that, starting from its heyday
45
in the 1960s, provided some interesting insights, but also succeeded in elevating the term
46
“modeling” to near-pejorative status among many biologists. For the most part, mathematical
47
biologists sought to fit biological data to relatively simple mathematical models, with the
48
hope that fundamental laws might be recognized (Fox Keller 2002). This strategy works well
49
in physics and chemistry, but in biology it is stymied by two problems. First, biological
50
data are usually incomplete and extremely imprecise. As new measurements are made, today's
51
models rapidly join tomorrow's trash heaps. Second, because biological phenomena are
52
generated by large, complex networks of elements, there is little reason to expect to
53
discern fundamental laws in them. To do so would be like expecting to discern the
54
fundamental laws of electromagnetism in the output of a personal computer.
55
Nowadays, many computational biologists avoid modeling-as-data-fitting, opting instead
56
to create models in which networks are specified in terms of elements and interactions (the
57
network “topology”), but the numerical values that quantify those interactions (the
58
parameters) are deliberately varied over wide ranges. As a result, the study of such
59
networks focuses not on the exact values of outputs, but rather on qualitative behavior,
60
e.g., whether the network acts as a “switch,” “filter,” “oscillator,” “dynamic range
61
adjuster,” “producer of stripes,” etc. By investigating how such behaviors change for
62
different parameter sets— an exercise referred to as “exploring the parameter space”—one
63
starts to assemble a comprehensive picture of all the kinds of behaviors a network can
64
produce. If one such behavior seems useful (to the organism), it becomes a candidate for
65
explaining why the network itself was selected, i.e., it is seen as a potential purpose for
66
the network. If experiments subsequently support assignments of actual parameter values to
67
the range of parameter space that produces such behavior, then the potential purpose
68
becomes a likely one.
69
For very simple networks (e.g., linear pathways with no delays or feedback and with
70
constant inputs), possible global behaviors are usually limited, and computation rarely
71
reveals more than one could have gleaned through intuition alone. In contrast, when
72
networks become even slightly complex, intuition often fails, sometimes spectacularly so,
73
and computation becomes essential.
74
For example, intuitive thinking about MAP kinase pathways led to the long-held view that
75
the obligatory cascade of three sequential kinases serves to provide signal amplification.
76
In contrast, computational studies have suggested that the purpose of such a network is to
77
achieve extreme positive cooperativity, so that the pathway behaves in a switch-like,
78
rather than a graded, fashion (Huang and Ferrell 1996). Another example comes from the
79
study of morphogen gradient formation in animal development. Whereas intuitive
80
interpretations of experiments led to the conclusion that simple diffusion is not adequate
81
to transport most morphogens, computational analysis of the same experimental data yields
82
the opposite conclusion (Lander et al. 2002).
83
As the power of computation to identify possible functions of complex biological
84
networks is increasingly recognized, purely (or largely) computational studies are becoming
85
more common in biological journals. This raises an interesting question for the biology
86
community: In a field in which scientific contributions have long been judged in terms of
87
the amount of new experimental data they contain, how does one judge work that is primarily
88
focused on interpreting (albeit with great effort and sophistication) the experimental data
89
of others? At the simplest level, this question poses a conundrum for journal editors. At a
90
deeper level, it calls attention to the biology community's difficulty in defining what,
91
exactly, constitutes “insight” (Fox Keller 2002).
92
In yesterday's mathematical biology, a model's utility could always be equated with its
93
ability to generate testable predictions about new experimental outcomes. This approach
94
works fine when one's ambition is to build models that faithfully mimic particular
95
biological phenomena. But when the goal is to identify all possible classes of biological
96
phenomena that could arise from a given network topology, the connection to experimental
97
verification becomes blurred. This does not mean that computational studies of biological
98
networks are disconnected from experimental reality, but rather that they tend, nowadays,
99
to address questions of a higher level than simply whether a particular model fits
100
particular data.
101
The problem this creates for those of us who read computational biology papers is
102
knowing how to judge when a study has made a contribution that is deep, comprehensive, or
103
enduring enough to be worth our attention. We can observe the field trying to sort out this
104
issue in the recent literature. A good example can be found in an article by Nicholas
105
Ingolia in this issue of
106
PLoS Biology (Ingolia 2004), and an earlier study from Garrett Odell's
107
group, upon which Ingolia draws heavily (von Dassow et al. 2000).
108
Both articles deal with a classical problem in developmental biology, namely, how
109
repeating patterns (such as stripes and segments) are laid down. In the early fruit fly
110
embryo, it is known that a network involving cell-to-cell signaling via the Wingless (Wg)
111
and Hedgehog (Hh) pathways specifies the formation and maintenance of alternating stripes
112
of gene expression and cell identity. This network is clearly complex, in that Wg and Hh
113
signals affect not only downstream genes, but also the expression and/or activity of the
114
components of each other's signaling machinery.
115
Von Dassow et al. (2000) calculated the behaviors of various embodiments of this network
116
over a wide range of parameter values and starting conditions. This was done by expressing
117
the network in terms of coupled differential equations, picking parameters at random from
118
within prespecified ranges, solving the equation set numerically, then picking another
119
random set of parameters and obtaining a new numerical solution, and so forth, until
120
240,000 cases were tried. The solutions were then sorted into groups based on the predicted
121
output—in this case, spatial patterns of gene expression.
122
When they used a network topology based only upon molecular and generegulatory
123
interactions that were firmly known to take place in the embryo, they were unable to
124
produce the necessary output (stable stripes), but upon inclusion of two molecular events
125
that were strongly suspected of taking place in the embryo, they produced the desired
126
pattern easily. In fact, they produced it much more easily than expected. It appeared that
127
a remarkably large fraction of random parameter values produced the very same stable
128
stripes. This implied that the output of the network is extraordinarily robust, where
129
robustness is meant in the engineering sense of the word, namely, a relative insensitivity
130
of output to variations in parameter values.
131
Because real organisms face changing parameter values constantly—whether as a result of
132
unstable environmental conditions, or mutations leading to the inactivation of a single
133
allele of a gene—robustness is an extremely valuable feature of biological networks, so
134
much so that some have elevated it to a sort of sine qua non (Morohashi et al. 2002).
135
Indeed, the major message of the von Dassow article was that the authors had uncovered a
136
“robust developmental module,” which could ensure the formation of an appropriate pattern
137
even across distantly related insect species whose earliest steps of embryogenesis are
138
quite different from one another (von Dassow et al. 2000).
139
There is little doubt that von Dassow's computational study extracted an extremely
140
valuable insight from what might otherwise seem like a messy and ill-specified system. But
141
Ingolia now argues that something further is needed. He proposes that it is not enough to
142
show that a network performs in a certain way; one should also find out why it does so.
143
Ingolia throws down the gauntlet with a simple hypothesis about why the von Dassow
144
network is so robust. He argues that it can be ascribed entirely to the ability of two
145
positive feedback loops within the system to make the network bistable. Bistability is the
146
tendency for a system's output to be drawn toward either one or the other of two stable
147
states. For example, in excitable cells such as neurons, depolarization elicits sodium
148
entry, which in turn elicits depolarization—a positive feedback loop. As a result, large
149
depolarizations drive neurons to fully discharge their membrane potential, whereas small
150
depolarizations decay back to a resting state. Thus, the neuron tends strongly toward one
151
or the other of these two states. The stability of each state brings with it a sort of
152
intrinsic robustness— i.e., once a cell is in one state, it takes a fairly large
153
disturbance to move it into the other. This is the same principle that makes electronic
154
equipment based on digital (i.e., binary) signals so much more resistant to noise than
155
equipment based on analog circuitry.
156
Ingolia not only argues that robustness in the von Dassow model arises because positive
157
feedback leads to network bistability, he further claims that such network bistability is a
158
consequence of bistability at the single cell level. He strongly supports these claims
159
through computational explorations of parameter space that are similar to those done by von
160
Dassow et al., but which also use strippeddown network topologies (to focus on individual
161
cell behaviors), test specifically for bistability, correlate results with the patterns
162
formed, and ultimately generate a set of mathematical rules that strongly predict those
163
cases that succeed or fail at producing an appropriate pattern.
164
At first glance, such a contribution might seem no more than a footnote to von Dassow's
165
paper, but a closer look shows that this is not the case. Without mechanistic information
166
about why the von Dassow network does what it does, it is difficult to relate it to other
167
work, or to modify it to accommodate new information or new demands. Ingolia demonstrates
168
this by deftly improving on the network topology. He inserts some new data from the
169
literature about the product of an additional gene,
170
sloppy-paired , in Hh signaling, removes some of the more tenuous
171
connections, and promptly recovers a biologically essential behavior that the original von
172
Dassow network lacked: the ability to maintain a fixed pattern of gene expression even in
173
the face of cell division and growth.
174
Taken as a pair, the von Dassow and Ingolia papers illustrate the value of complementary
175
approaches in the analysis of complex biological systems. Whereas one emphasizes simulation
176
(as embodied in the numerical solution of differential equations), the other emphasizes
177
analysis (the mathematical analysis of the behavior of a set of equations). Whereas one
178
emphasizes exploration (exploring a parameter space), the other emphasizes the testing of
179
hypotheses (about the origins of robustness). The same themes can be seen in sets of papers
180
on other topics. For example, in their analysis of bacterial chemotaxis, Leibler and
181
colleagues (Barkai and Leibler 1997) found a particular model to be extremely robust in the
182
production of an important behavior (exact signal adaptation), and subsequently showed that
183
bacteria do indeed exhibit such robust adaptation (Alon et al. 1999). Although Leibler and
184
colleagues took significant steps toward identifying and explaining how such robustness
185
came about, it took a subsequent group (Yi et al. 2000) to show that robustness emerged as
186
a consequence of a simple engineering design principle known as “integral feedback
187
control.” That group also showed, through mathematical analysis, that integral feedback
188
control is the only feedback strategy capable of achieving the requisite degree of
189
robustness.
190
From these and many other examples in the literature, one can begin to discern several
191
of the elements that, when present together, elevate investigations in computational
192
biology to a level at which ordinary biologists take serious notice. Such elements include
193
network topologies anchored in experimental data, fine-grained explorations of large
194
parameter spaces, identification of “useful” network behaviors, and hypothesisdriven
195
analyses of the mathematical or statistical bases for such behaviors. These elements can be
196
seen as the foundations of a new calculus of purpose, enabling biologists to take on the
197
much-neglected teleological side of molecular biology. “What purpose does all this
198
complexity serve?” may soon go from a question few biologists dare to pose, to one on
199
everyone's lips.
200
201
202
203
204