Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Download
29547 views
1
2
3
4
5
Background
6
Signal transduction is the primary means by which cells
7
coordinate their metabolic, morphologic, and genetic
8
responses to environmental cues such as growth factors,
9
hormones, nutrients, osmolarity, and other chemical and
10
tactile stimuli. Traditionally, the discovery of molecular
11
components of signaling networks in yeast and mammals has
12
relied upon the use of gene knockouts and epistasis
13
analysis. Although these methods have been highly effective
14
in generating detailed descriptions of specific linear
15
signaling pathways, our knowledge of complex signaling
16
networks and their interactions remains incomplete. New
17
computational methods that capture molecular details from
18
high-throughput genomic data in an automated fashion are
19
desirable and can help direct the established techniques of
20
molecular biology and genetics.
21
DNA microarray technology has evolved to the point where
22
one can simultaneously measure the transcript abundance of
23
thousands of genes under hundreds of conditions, producing
24
hundreds of thousands of individual data points. Similarly,
25
high-throughput yeast two-hybrid experiments have
26
identified thousands of pairwise protein-protein
27
interactions. Once a core pathway is established, these
28
data can readily be integrated into model refinements, as a
29
recent study in systems biology elegantly demonstrates [ 1
30
] . However, synthesizing these data
31
de novo into models of pathways and
32
networks remains a significant challenge.
33
How can one bridge the gap from transcript abundances
34
and protein-protein interaction data to pathway models?
35
Clustering expression data into groups of genes that share
36
profiles is a proven method for grouping functionally
37
related genes, but does not order pathway components
38
according to physical or regulatory relationships. Here we
39
present an automated approach for modelling signal
40
transduction networks in
41
S. cerevisiae by integrating
42
protein-protein interaction [ 2 3 4 ] and gene expression
43
data. Our program, NetSearch, draws all possible linear
44
paths of a specified length through the interaction map
45
starting at any membrane protein and ending on any
46
DNA-binding protein. Microarray expression data [ 5 6 7 ]
47
is then used to rank all paths according to the degree of
48
similarity in the expression profiles of pathway members.
49
Linear pathways that have common starting points and
50
endpoints and the highest ranks are then combined into the
51
final model of the branched networks.
52
Our approach is calibrated using the yeast MAPK
53
(mitogen-activated protein kinases) pathways involved in
54
pheromone response, filamentous growth, and maintenance of
55
cell wall integrity (Fig. 1). These pathways are activated
56
by G protein-coupled receptors and characterized by a core
57
cascade of MAP kinases that activate each other through
58
sequential binding and phosphorylation reactions; they are
59
among the most thoroughly studied networks in yeast and are
60
therefore excellent benchmarks against which to test our
61
approach.
62
63
64
Results
65
66
Input data and parameters
67
Recent papers [ 2 3 4 ] have used the yeast-two-hybrid
68
technique and literature surveys to identify and assemble
69
over 7000 non-redundant protein-protein interactions
70
among more than 4000 proteins. While two-hybrid screens
71
efficiently identify fusion proteins that are able to
72
interact, the biological significance of the interaction
73
for native proteins acting
74
in vivo generally requires
75
verification, because the technique is susceptible to a
76
high rate of false positives [ 8 ] . To assess the
77
possible contribution of false-positive protein-protein
78
interactions to the combined interaction dataset, we
79
analyzed the connectivity of each protein and found that
80
a small fraction of proteins had a very high number of
81
interactions (highlighted in red, Fig. 2). With these
82
highly connected proteins included in our data set,
83
NetSearch generates 17 million candidate signaling
84
pathways of length seven or less, 95% of which involve
85
one of these twenty-two highly-connected proteins. We
86
excluded the highly interacting proteins from the
87
interaction dataset based on their nonspecific inclusion
88
in the predicted pathways and evidence of their
89
susceptibility to systematic error. This yielded an
90
interaction map that contains 5560 interactions among
91
3725 proteins, an average of three interactions per
92
protein.
93
Using the NetSearch algorithm, this protein
94
interaction network was queried for paths up to length
95
eight that begin at membrane proteins and end at
96
transcription factors. The search generated approximately
97
4.4 million candidate pathways of length eight or less
98
whose biological plausibility was assessed using gene
99
expression data.
100
To score the pathways, we first used a
101
k -means algorithm to cluster all
102
yeast genes into clusters based on their expression
103
profiles. NetSearch then assigned each pathway a
104
statistical score [ 10 ] according to the number of
105
pathway members that clustered together. For example, a
106
path with six members in one cluster would score higher
107
than a path that only had five members in that cluster.
108
Cluster size influenced path scoring such that a path
109
that had three members from a cluster of 30 elements
110
would score higher than a path that had three members
111
from a cluster of 100 elements. Also, a path with four
112
elements in one cluster and three elements in a second
113
cluster would score higher than a path that had four
114
elements in cluster one, but no more than two elements in
115
cluster two.
116
Pathways were scored using NetSearch's 'sumprob'
117
scoring metric: Assuming N proteins total and a
118
partitioning of proteins into k clusters C
119
1 , C
120
2 ,...C
121
k , with N
122
1 , N
123
2 ,...N
124
k members, respectively, and a pathway
125
p of L proteins p
126
1 →p
127
2 →...→ p
128
L , where c
129
p (i) = number of proteins in p in
130
cluster C
131
i , the sumprob score is computed as
132
follows:
133
134
prob
135
p (i) scores a pathway for a cluster C
136
137
i such that pathways which are more
138
concentrated in C
139
i have higher scores. The summation in
140
prob
141
p (i) computes the cumulative
142
hypergeometric probability of pathway p containing c
143
p (i) or more members of C
144
i . prob
145
p (i) assesses co-clustering of
146
pathway members in the single cluster C
147
i . sumprob(p), the sum of prob
148
p (i) values over all clusters for
149
which c
150
p (i) >= 2, is a simple measure of
151
co-clustering across the entire collection of available
152
clusters. The rationale for the restriction c
153
p (i) >= 2 is that without it a
154
pathway could get a high score simply from having single
155
members in one or more rare clusters, in which case the
156
score would no longer reflect
157
co -clustering.
158
The exact composition of paths discovered using
159
NetSearch depend on the parameters used in path drawing
160
and path scoring. To ensure that NetSearch reproducibly
161
generates statistically significant, biologically
162
plausible paths, we combinatorially varied every
163
parameter value in the path-drawing and path-scoring
164
algorithms, and selected parameter combinations that
165
generate the most statistically significant pathways.
166
Statistical significance was measured by drawing pathways
167
from membrane proteins to DNA-binding proteins through
168
the experimentally determined protein-interaction map
169
(henceforth called "real pathways") and comparing these
170
pathways with pathways drawn through control interaction
171
maps that were created by randomizing all pairwise
172
interactions in the original dataset. (The randomization
173
procedure was performed three times and statistics were
174
calculated on the average output of these runs. Paths
175
produced using these interaction maps are henceforth
176
referred to as "random pathways"). We ultimately chose
177
parameters that maximized the number of high-scoring
178
pathways produced with real interactions, while
179
minimizing high-scoring pathways from the randomized
180
interactions.
181
The parameters we varied included the number of
182
clusters into which the genes were grouped, the
183
microarray expression datasets used in clustering, the
184
maximum path length, and the scoring metric. Expression
185
data were clustered into 12, 25, 50, 100 and 250
186
clusters, and NetSearch best discriminated between real
187
pathways and random pathways when genes were grouped into
188
25 clusters. Three
189
S. cerevisiae expression datasets
190
were examined individually, including the "Compendium"
191
set, composed of expression profiles in response to 300
192
diverse mutations and chemical treatments [ 5 ] ; the
193
"MAPK" set, composed of 56 conditions chosen to probe the
194
behavior of MAPK signal transduction [ 6 ] ; and the
195
"Cell Cycle" set, composed of 77 conditions relevant to
196
the cell cycle [ 7 ] . Combinations of these datasets
197
were also examined, for a total of five different sets
198
that allowed us to compare the utility of data that
199
probes specific biological processes, such as MAPK
200
signaling or the cell cycle, and that which probes the
201
state of the cell more broadly, such as the Compendium
202
set and the combined sets. The composite data set that
203
combined all three individual sets (for a total of 433
204
conditions) provided the best discrimination between real
205
pathways and random pathways, although the other sets
206
performed comparably.
207
The final input parameter that required evaluation was
208
the maximum path length allowable for NetSearch paths.
209
While short path lengths risk omission of key path
210
members, longer path lengths increase the likelihood of
211
including false-positive interactions. As a first step
212
towards determining the optimal maximum path length, we
213
examined the path lengths connecting every possible pair
214
of the 3725 proteins in the interaction dataset,
215
regardless of subcellular localization. The minimal path
216
length between any two proteins chosen at random contains
217
on average 7.4 members. Secondly, we examined the
218
fraction of pathways with high coclustering ratios for
219
various path lengths. Consistent with our finding that
220
the average path length between any two proteins is 7.4,
221
this fraction peaks at eight, which we set as our
222
maximum, unless otherwise noted.
223
224
225
NetSearch output
226
Using a maximum path length of eight, and 25 gene
227
clusters from 433 conditions [ 5 6 7 ] , NetSearch
228
generated ~4.4 million pathways each for the real and
229
randomized protein interaction datasets. From the
230
experimental ("real") data, 4059 pathways had a
231
coclustering score ≥ 16 (Fig. 3). At this cutoff,
232
randomized interaction data produced on average only ~1%
233
this number of pathways (32 pathways, P = 7 × 10 -6).
234
However, we emphasize that NetSearch selects paths based
235
on their rank relative to all paths between selected
236
starting and endpoints. The absolute score depends on the
237
particular expression data set used, and varies from
238
network to network depending on the degree of
239
coregulation in the cell under the conditions tested in
240
the expression data.
241
The signaling network models generated by NetSearch
242
for the pheromone response, cell wall integrity and
243
filamentation pathways are depicted in Fig. 4. In each
244
case, the starting protein (receptor, depicted in blue)
245
and ending protein (transcription factor, depicted in
246
red) were selected as inputs, and NetSearch draws all
247
possible paths between these points. The size of each
248
vertex is proportional to the sum of scores of the paths
249
in which that protein is found, providing a useful visual
250
clue to the potential importance of a protein in the
251
given network. Comparison with Fig. 1shows that NetSearch
252
reproduced many of the essential elements of these MAPK
253
pathways, while providing a detailed account of the
254
experimentally determined interconnections among network
255
elements. Of the three network models, the one generated
256
for the pheromone response pathway originating at Ste3p
257
(Fig. 4A) exhibited the highest co-clustering scores.
258
Every protein NetSearch included in this network model
259
has a description in the Yeast Proteome Database (YPD) [
260
9 ] consistent with a known or plausible role in mating.
261
Of the nineteen proteins we have included in our
262
depiction of the pheromone response network, eighteen are
263
annotated as playing a role in the fungal cell
264
differentiation by MIPS [ 10 ] . The probability that
265
this selection would have occurred by chance was
266
calculated with the hypergeometric distribution was found
267
to be P = 5 × 10 -24. Our model does differ in several
268
respects from the canonical pheromone response pathway
269
depicted in Fig 1. It includes more members of the
270
heterotrimeric G protein complex, including the alpha,
271
beta, and gamma subunits, the GDP-GTP exchange factor,
272
and the GTPase-activating protein (Gpa1p, Ste4p, Ste18p,
273
Cdc24p, and Sst2p, respectively). It includes Far1p, a
274
protein necessary for pheromone-induced cell cycle arrest
275
in G1 [ 11 ] , Mpt5p, a protein necessary for recovery
276
from cell cycle arrest [ 12 ] , and Bem1p and Sph1, both
277
of which are necessary for establishment of cell polarity
278
during shmooing and budding [ 13 14 ] . In our
279
protein-interaction map there is no direct interaction
280
between a pheromone receptor (Ste2p or Ste3p) and any
281
component of the heterotrimeric G protein complex
282
(Ste4p/Ste18p/Gpa1p), so NetSearch drew indirect paths
283
through Akr1p, a known inhibitor of signaling in the
284
pheromone pathway [ 15 ] . The predicted network does not
285
include the GTPase Cdc42p (paths were instead drawn
286
preferentially through its cofactor Cdc24p, which
287
physically interacts with Ste4p) or Ste20p, because of
288
missing interactions in the protein-interaction map.
289
Fig. 4Bdepicts the pheromone response network at
290
several different score cutoffs, and demonstrates how
291
higher co-clustering score cutoffs reduces the complexity
292
of the protein-interaction map. NetSearch detects 354
293
paths of length eight from Ste3p to Ste12p, and
294
incorporates 70 different proteins into those paths. The
295
top graph in Fig. 4Bshows the network constructed from
296
all 354 paths (with each protein arranged on the
297
perimeter of an ellipse for clarity). In the middle
298
graph, all paths that scored below the median have been
299
eliminated, leaving only 27 proteins. On the bottom of
300
Fig. 4B, only the highest scoring paths (those used to
301
construct the network in Fig. 4A) with 19 proteins, are
302
depicted. Comparison of these networks indicates that
303
most proteins are eliminated by simply excluding the
304
pathways that score in the bottom half; further
305
modifications to the cutoff affect the results
306
incrementally. In setting a precise cutoff for pathway
307
inclusion in the final network models, one seeks to
308
strike a balance between the inclusion of false-positives
309
and the omission of true-positives. We set the cutoff
310
such that the top fifteen paths for each network were
311
included.
312
The network model generated for the cell wall
313
integrity pathway is depicted in Fig. 4C. Membrane
314
proteins in particular may fail to produce interactions
315
when forced into the nucleus by the requirements of the
316
standard two-hybrid technique. We observed this to be the
317
case for the cell wall integrity pathway, as neither
318
Wsc1p, Wsc2p, Wcs3p or Mid2p were observed to interact in
319
any of the high-throughput screens. To reconstruct this
320
network, we therefore started with the momomeric GTPase
321
Rho1p, and restricted our search to a length of seven
322
because of the omission of the initial signal sensor. Of
323
the 18 proteins included in this network model, all but
324
Smd3p have descriptions consistent with a role in cell
325
wall maintenance. NetSearch included both GTPase
326
constituents of this pathway, Rho1p and Cdc42p, as well
327
as associated GAPs and other interactors, including
328
Rdi1p, Rga1p, and Gic2p. Other included network elements
329
are Fks1p, the 1,3-βglucan synthase of which Rho1p is a
330
subunit [ 16 ] , the actin protein Act1p, and the
331
proteins Bni1p, Bud6p, and Sph1p, which are associated
332
with Rho-mediated signal transduction, actin filament
333
organization, cell polarity establishment, and bud
334
growth. Smd3p forms a complex with the Sm core
335
spliceosomal proteins [ 17 ] , and we are not aware of
336
any role it may play in maintaining cell wall integrity.
337
Its inclusion is most likely a result of its expression
338
correlation with
339
BUD6 in one of the microarray
340
datasets, but it seems unlikely that the observed
341
interactions of Smd3p with Spa2p and Slt2p have
342
biological significance. In the NetSearch-generated
343
model, Bck1p is downstream of Mkk1p because, although it
344
interacts with both Mkk1p and Mkk2p, it has been shown
345
specifically not to interact with Pkc1p in two-hybrid
346
assays [ 18 ] .
347
The network model for filamentous growth (Fig. 4D)
348
involves 21 proteins, 20 of which are known to play a
349
role in filamentous growth, or have functions consistent
350
with that role, with the exception of Fus1p. As in the
351
pheromone response and cell wall integrity network
352
models, key components of the Ras GTPase are included,
353
such as Cdc25p (the Ras guanine nucleotide exchange
354
factor), Cyr1p (the Ras-associated adenylate cyclase),
355
and Srv2p, which enables the activation of adenylate
356
cyclase by Ras2p. Several proteins with roles in actin
357
filament organization, cell polarity establishment, bud
358
growth, and GTPase-mediated signal transduction are
359
shared with the cell wall integrity pathway, including
360
Bni1p, Spa2p, Bud6p, and Act1p. NetSearch depicts
361
interactions between Abp1p and both Srv2p and Act1p,
362
consistent with the function of Abp1 in tethering Srv2p
363
to the cytoskeleton. The adenylate cyclase and associated
364
proteins mentioned above, along with Hsp82p and Hsc82p,
365
activate the cAMP pathway [ 19 ] , a pathway that acts in
366
parallel with the MAPK pathway to promote filamentation.
367
Hsp82p is a chaperone protein known to interact with a
368
number of signaling pathway components [ 20 ] . It is
369
required for activation of the pheromone signaling
370
pathway [ 21 ] , and for the general response to amino
371
acid starvation [ 22 ] . It may play a similar role in
372
response to nitrogen (ammonia) starvation, a trigger for
373
filamentation. Fus1p, included in our predicted network,
374
does not have a documented role in filamentation; it is
375
required for cell fusion during pheromone initiated
376
mating. Its transcript levels are significantly
377
upregulated in response to pheromone, but are unchanged
378
in
379
tec1Δ strains [ 6 ] ; that study
380
notes, however, that in
381
dig1Δ dig2Δ cells, fus1 is
382
constitutively activated, and both mating and invasive
383
growth are observed. Tec1p, conspicuously absent in our
384
model, has not been observed to interact with any
385
proteins in high-throughput two-hybrid screens.
386
387
388
389
Discussion
390
The utility of yeast protein-protein interaction maps
391
for generating signaling network models has previously been
392
suggested [ 23 ] , and they have been used to predict
393
metabolic pathways [ 24 ] . Expression data has been used
394
to generate and refine models for genetic regulatory
395
networks without the benefit of protein-protein interaction
396
data [ 25 ] . In this study, we have used expression data
397
to rank candidate pathways of interacting proteins. This
398
approach has a strong biological and experimental
399
rationale: proteins used in the same signaling network must
400
exist simultaneously with its activation. The genes
401
encoding these proteins must be transcribed at
402
approximately the same time, and under the same
403
environmental conditions in which the signaling network is
404
required. Furthermore, experimental evidence suggests that
405
when a signaling network is activated, positive feedback
406
mechanisms upregulate the expression of genes that encode
407
pathway proteins [ 26 ] , implying that this rationale is
408
also applicable to "surveillance" pathways, whose protein
409
components may need to be constitutively present in small
410
quantities, but whose concentration increases with
411
activation. This biological rationale is borne out by
412
evidence that interacting proteins have more highly
413
correlated expression profiles than do non-interacting
414
proteins [ 27 ] . However, if a single component of a
415
signaling network is independently (and differentially)
416
regulated, it would not necessarily be excluded using our
417
approach, if for instance, it connected two halves of a
418
pathway which had similar average expression profiles.
419
NetSearch can be used to predict new signaling pathways,
420
identify previously unknown members of documented pathways,
421
or identify smaller clusters of interacting proteins. Until
422
we have a more complete protein-interaction set, a user who
423
wishes to explore a particular pathway
424
http://arep.med.harvard.edu/NetSearchneeds to specify
425
pathway starting points and ending points (such as membrane
426
and DNA-binding proteins, respectively). This selection can
427
be based on a known genetic interaction, a shared mutant
428
phenotype, a shared functional classification, or signature
429
expression profile. This is the approach we have followed
430
in constructing the networks depicted in Fig. 4. Those
431
networks are comprised of all highest ranking linear paths
432
connecting the receptors and transcription factors for that
433
pathway.
434
The pheromone response pathway is commonly depicted as a
435
simple, linear transmission of the mating signal from the
436
membrane receptor, Ste2p (for alpha-factor) or Ste3p (for
437
a-factor), to the nuclear effectors, Ste12p and Mcm1p, via
438
a MAPK cascade. However, mating pheromone exposure also
439
induces other cellular processes such as those required for
440
polarized growth, cell cycle arrest, and recovery from cell
441
cycle arrest. Furthermore, the topology of the protein
442
interactions required for these processes is considerably
443
more complicated than a series of pairwise interactions. In
444
addition to accurately depicting the MAPK cascade, our
445
predicted pheromone response network identifies many
446
proteins necessary to execute the coordinated processes of
447
growth polarization and cell cycle arrest, and reflects the
448
complex topology of the interaction network.
449
The complexity of these interactions are observed in
450
large, multifunctional complexes of possibly dynamic
451
composition. For example, products of Ste18, Ste4, Cdc42,
452
Cdc24, Far1, Bem1, Ste20, Ste5, and other proteins are
453
thought to constitute a complex that has numerous
454
interactions among components, and that mediates many
455
different cellular processes [ 14 28 ] . The complex may
456
coordinate mating pheromone detection with (1) cell cycle
457
arrest via Far1p, (2) MAPK signal transduction via Ste5p
458
and Ste20p, and (3) cell polarity via Bem1p and Far1p
459
(among others) [ 29 30 ] .
460
Given that several of these networks share components of
461
the MAPK cascade, the mechanism by which input-output
462
specificity is maintained remains one of the most important
463
questions in the field of molecular signal transduction.
464
One well accepted hypothesis is that scaffolding proteins
465
such as Ste5p and Pbs2p tether the MAPK module to the
466
appropriate input and output components [ 31 ] . The recent
467
identification of numerous Ste5p analogs in yeast and
468
mammals makes this hypothesis even more intriguing [ 26 ] .
469
Beyond scaffolding proteins, higher-order protein complexes
470
have been hypothesized to play a role in maintaining signal
471
specificity [ 32 ] . Our computational results suggest that
472
this may indeed be the case. When comparing the minimal
473
pathways for pheromone response and filamentation as
474
depicted in Fig. 1, it appears that maintaining signal
475
specificity would be a considerable challenge. But when
476
comparing the two network predictions depicted in Fig. 4,
477
one notes many differences, all of which may help ensure
478
specificity. The network perspective suggests not a single
479
scaffolding protein, but many scaffolding proteins - in
480
fact, a "scaffolding network." The possibility exists that
481
relatively nonspecific kinases function simply as
482
"phosphorylation modules," operating inside insulating
483
networks that are the primary determinant of signaling
484
specificity.
485
Because our protein-protein interaction data is only a
486
small fraction of a truly complete interaction map, one
487
finds portions of a network that cannot be connected using
488
available protein-protein interaction data. This was the
489
case in our attempts to model the HOG network. While
490
NetSearch correctly identified the upstream elements of
491
this pathway (Sln1p → Ypd1p → Ssk1p → Ssk22p), it was
492
unable to form any connections to Pbs2p or Hog1p that ended
493
in a transcription factor. In some cases, a missing
494
interaction can be circumvented, however. In the model for
495
the pheromone response network, NetSearch inserted Akr1p, a
496
known inhibitor of the pheromone pathway [ 15 ] , between
497
Ste3p and the G protein complex (Ste4p/Ste18p/Gpa1p).
498
Although the protein-interaction dataset we used contained
499
no direct interaction between Ste2p/Ste3p and Ste4p,
500
Ste2p-Ste4p has been shown to interact in a targeted yeast
501
two-hybrid study [ 33 ] .
502
Our failure to model the HOG pathway underscores the
503
fact that, for the purposes of this algorithm, missing
504
interactions (false-negatives) are a more significant
505
obstacle than are false-positive interactions. Missing
506
interactions cannot be "created" by the algorithm, but
507
false-positive interactions are de-emphasized as a result
508
of the bias imposed by ranking paths according to the
509
similarity of expression profiles. Bearing this out, of the
510
fifty-eight proteins included in our networks, only Smd3p
511
seems to be included as a result of false-positive
512
interactions. (This is distinct from the case of Fus1p,
513
which may be misplaced in the filamentation pathway, but
514
whose interactions with Act1p and Ste7p are real.) This
515
highlights a general observation on the integration of
516
genomic technologies. Two-hybrid and microarray expression
517
studies are both known to have a sizable fraction of
518
systematic errors (for instance, self-activators in
519
two-hybrid experiments, and cross-hybridization in
520
microarrays), but when looking at the intersection of the
521
two, the true signals tend to reinforce one another,
522
whereas the systematic errors in the two tend to be
523
different and are reduced further into the noise. These
524
effects may help explain why we observe so few
525
false-positive proteins inserted into our predicted
526
networks.
527
In addition to using more complete interaction datasets,
528
such as those found in Ho [ 35 ] and Gavin [ 36 ] , one
529
could improve this approach by integrating more types of
530
data. Homology modelling could be used to differentially
531
weight the inclusion of molecules likely to be involved in
532
signal transduction (e.g. kinases), and genetic
533
interactions could weight the inclusion of the two proteins
534
in the same path. Signaling motif identification [ 36 ] and
535
data from protein kinase chips [ 37 ] could also easily be
536
incorporated into this framework. Based on the interaction
537
data available, the networks depicted in Fig. 4are static,
538
with all interactions given equal weight, and without
539
information on the direction of information transfer. In
540
reality, signaling networks are dynamic and vectorial
541
complexes, with interactions of varying strengths among
542
component proteins [ 38 ] . The technology necessary to
543
generate data which will allow modelling of these network
544
properties are beginning to emerge. Kinase chips [ 37 ]
545
will allow one to incorporate information about the
546
direction of information flow. The strength of protein
547
interactions (with DNA) has been measured on chips in a
548
highly parallel manner [ 39 ] and the same could be done
549
for protein-protein interactions [ 40 ] . Data on the
550
spatial and temporal co-localization of signaling
551
components is being generated by new imaging techniques [
552
41 ] , which will yield insight into the mechanism with
553
which the cellular response to a signal is modulated by the
554
intensity and the duration of the signal [ 42 ] , and the
555
interplay with parallel pathways.
556
557
558
Conclusions
559
The approach we have presented allows one to query the
560
intersection of two enormous sets of functional
561
genomic-derived molecular data. One can, in effect,
562
simultaneously browse protein-protein interaction and gene
563
expression data. It allows one to extract a group of
564
highly-connected, highly-correlated proteins from global
565
data to isolate a sub-network of particular interest.
566
Significantly, this approach does not require prior
567
knowledge of pathway intermediates. The interaction data
568
determines the pathways that are considered, and gene
569
expression data is used to rank the pathways. Although we
570
have focused on signaling pathways, this approach should be
571
applicable to modelling the relationships among any group
572
of interacting proteins that cooperate to perform a given
573
function within a cell, and the web-version of the software
574
allows for these queries. As many genomic techniques are
575
generating increasingly large amounts of molecular data,
576
new tools such as this will be required for the synthesis
577
of "parts into pathways" in order that we may understand
578
how cells regulate the many processes necessary for growth
579
and development.
580
581
582
Authors' contributions
583
M.S. conceived of the study, performed the network
584
modelling and drafted the manuscript. A.P. wrote program
585
code, analyzed the network models and drafted the
586
manuscript. J.A. devised algorithms, wrote and refined
587
program code and constructed the associated web pages. P.D.
588
performed statistical analyses and examined the protein
589
interaction maps. G.C. guided the study and coordinated the
590
project. All authors read and approved the final
591
manuscript.
592
593
594
Supplementary website
595
Supplementary website -
596
http://arep.med.harvard.edu/NetSearchWeb interface for
597
NetSearch:
598
http://arep.med.harvard.edu/NetSearch/runprog.html
599
600
601
602
603