CoCalc -- 1471-2105-4-13.txt

OANC_GrAF / data / written_2 / technical / biomed / 1471-2105-4-13.txt
³⁹⁶⁷³ views
1

2
  
3
    
4
      
5
        Background
6
        Alizadeh 
7
        et al . [ 1 ] did a large scale,
8
        long-term study of diffuse large B-cell lymphoma (DLBCL),
9
        using microarray data chips. By doing cluster analysis on
10
        this data, they were able to diagnose 96 donors with an
11
        accuracy of 93% for this specific lymphoma; they were not
12
        able to predict which individual patients would survive to
13
        the end of the long-term study. The International
14
        Prognostic Index for this disease was incorrect for 30% of
15
        these patients.
16
        Cluster analysis, together with other statistical
17
        methods for identifying and correlating minimal gene lists
18
        with outcome, have become established as the primary tools
19
        for the analysis of microarray data in cancer studies. We
20
        wished to test a different approach, ANN.
21
        These two approaches to the analysis of microarray data
22
        differ substantially in their mode of operation. In the
23
        first examination of the data, clustering, as applied in
24
        numerous recent cancer studies, is an unsupervised mapping
25
        of the input data examples based on the overall pairwise
26
        similarity of those examples to each other (here,
27
        similarity with respect to the expression levels of
28
        thousands of genes); the method is unsupervised in that no
29
        information of the desired outcome is provided. Subsequent
30
        analysis of the clusters in these studies generally
31
        attempts to reduce the gene set to the subset of genes that
32
        are most informative for the problem at hand. This step is
33
        a supervised step since there is an explicit effort to find
34
        correlations in the pattern of gene expression that match
35
        the classification one is attempting to make among the
36
        input examples (see Discussion for specific examples). The
37
        input for this supervised step is the product of an
38
        unsupervised step. As this subselection is not routinely
39
        subjected to independent test using input examples
40
        originally withheld from the subselection process, it is
41
        generally not possible to judge how specifically the
42
        subselection choices relate to this specific set of
43
        examples as opposed to the general population of potential
44
        examples. To the extent that the gene set employed is much
45
        larger than the gene set that really determines the
46
        classification, it is possible that much of the clustering
47
        result will be based on irrelevant similarities.
48
        On the other hand, backpropagation neural networks are a
49
        supervised learning method that has an excellent reputation
50
        for classification problems. During the training phase, the
51
        ANN are supplied with both the input data and the answer
52
        and are specifically tasked to make the classification of
53
        interest, given a training set of examples from all
54
        classes. That is, the ANN are constantly checking to see if
55
        they have gotten the 'correct' answer, the answer being the
56
        actual classification not just the overall similarity of
57
        inputs.
58
        Networks accomplish this by continually adjusting their
59
        internal weighted connections to reduce the observed error
60
        in matching input to output. When the network has achieved
61
        a solution that correctly identifies all training examples,
62
        the weights are fixed; it is then tested on input examples
63
        that were not part of the training set to see if the
64
        solution is a general one. It is only in this independent
65
        test that the quality of the network is judged.
66
        Investigators are not limited to a single network. It is
67
        feasible to train a series of networks using, say, 90% of
68
        the examples for training and holding back 10% for testing.
69
        A different 10 % can be tested in a second network and so
70
        on. In this way, with the training of ten networks, each
71
        input can be found in a test set one time and can,
72
        therefore, be independently evaluated. The data presented
73
        below, with the exception of a few cases, are the output of
74
        ten slightly different trained networks, operating in test
75
        mode, which collectively evaluate the entire donor pool.
76
        This 'round-robin' procedure was employed, in duplicate, in
77
        every trial described throughout this work. The fact that
78
        one ends up with 10 networks is not an impediment to
79
        analysis since any future examples could be submitted to
80
        all 10 networks for evaluation, with a majority poll
81
        deciding the classification. That is, six networks in
82
        agreement on a particular input datum would determine the
83
        classification of that input. These networks are, of
84
        course, likely to be very similar in that their training
85
        sets differ only slightly.
86
        A second major advantage of backpropagation networks
87
        follows from the first. Not only are neural networks
88
        trained to the specific question, rather than a loose
89
        derivative of that question, and tested for generality, but
90
        they can also be asked for a quantitative assessment of how
91
        they got the correct answer. Numerical partial
92
        differentiation of the network with respect to a given test
93
        input example [ 2 3 ] allows one to see the network's
94
        evaluation of the relative impact of each gene in arriving
95
        at the correct answer for this particular input. Cluster
96
        analysis, including the statistical correlations, has no
97
        corresponding highly focused sight for targeting specific
98
        similarities as opposed to non-specific similarities. To
99
        the extent that this is true, neural networks should be
100
        able to identify relatively small gene subsets which will
101
        significantly outperform the initial gene sets in
102
        classification and which will also significantly outperform
103
        the gene subsets suggested by cluster analysis.
104
      
105
      
106
        Results
107
        
108
          Determining patient prognosis from microarray
109
          data
110
          Cluster analysis [ 1 4 ] had shown that the 4026 gene
111
          expression panels for 40 DLBCL patients contained some
112
          information relevant to the question of prognosis but
113
          these authors did not make an attempt to provide survival
114
          predictions for individual patients.
115
          We wished to see if the neural network strategy, of
116
          train, test, differentiate, retrain on the reduced gene
117
          set, and retest, could produce any useful result with
118
          respect to prognosis on an individual basis. The approach
119
          would be: [ 1 ] use the entire gene set without
120
          preprocessing to train a network, testing to confirm that
121
          it had at least a good fit to the problem and, [ 2 ] use
122
          the network's definition of the problem, by
123
          differentiating the network, to focus on those genes most
124
          essential to the classification. These genes would then
125
          form the basis for training new networks with hopefully
126
          improved performance. Over 130 networks were trained for
127
          this study. Figure 1shows a work flow schematic for this
128
          study. Table 1provides a summary overview of the data,
129
          including data not shown.
130
          Initially a network was trained to accept microarray
131
          data on the complete panel of 4026 genes from 40
132
          patients. This network had 12078 input neurons with a
133
          semi-quantitative assessment of each gene, 100
134
          middle-layer neurons, and a single output neuron. The
135
          networks were originally designed with 3 input bits per
136
          datum: one for sign,'-' = 1, and 2 for quantitative
137
          degree of signal with 00 being 0 to 0.5, 01 being >0.5
138
          to 1.0, 10 being >1.0 to 2.0, and 11 being >2.0.
139
          Thus '011' would indicate a particular gene whose
140
          expression, relative to control, was increased at a
141
          magnitude >2. The training set included 30 donors,
142
          with 10 additional donors being held back as test data.
143
          The network was trained by processing 12 iterations of
144
          the complete training set. The test set, drawn from a
145
          mixture of survivors and non-survivors, was then run. The
146
          entire process was then repeated with a different choice
147
          of test data each time. In this round-robin fashion, all
148
          donors serve as test data for one of the networks, and
149
          each training set is necessarily slightly different. A
150
          round robin series of 4 networks was generated. Data
151
          underlying Figure 5 of the earlier report
152
          http://llmpp.nih.gov/lymphoma/data.shtmlwere used for
153
          training. The networks were asked to predict, based on
154
          the 4026 gene set, which of 40 DLBCL patients would
155
          survive to the end of the study (longest point = 10.8
156
          yrs). Networks initially varied with from 1 to 3 errors
157
          on 10 test patients each, for a total of 31 of 40
158
          patients correctly predicted (data not shown). 1However,
159
          a trained neural network can be numerically
160
          differentiated [ 2 3 ] to show the relative dependence of
161
          the output (classification) on each active input neuron
162
          within an input vector. Briefly stated, the
163
          differentiation process involves slightly perturbing the
164
          activation (down from 1.0 to 0.85) of each active input
165
          neuron, one at a time, to note the specific change in the
166
          output value. In that there is one gene for each active
167
          node, the largest change in the output points to the most
168
          influential gene. We then trained qualitative networks,
169
          with 2 bits per gene, on the 4026 gene set in order to
170
          differentiate them ('1 0' for expression greater than, or
171
          equal to, the control, '0 1' for less than the control).
172
          The networks had 67 middle layer neurons. This coding has
173
          the effect that there is an active neuron for each gene
174
          in the set regardless of expression level and the total
175
          number of active input neurons is constant from input to
176
          input. By taking the top 25% of genes in each of 12
177
          differentiations and requiring agreement of at least 4 of
178
          12 patients in choosing each gene, we obtained a set of
179
          34 genes. (These cutoff criteria are necessarily
180
          arbitrary and are only justified by subsequent proof that
181
          they produced gene subsets having the desired
182
          information.) A round-robin series of 10 networks, with 4
183
          test donors each, produced a single error (DLCL0018) in
184
          survival predictions when trained on these 34 genes (data
185
          not shown) 1. The second round-robin training with the
186
          same gene set produced no errors, correctly evaluating
187
          all 40 patients in a series of 10 test sets (Table
188
          2).
189
          For a second study, we took 20 patients and held them
190
          in reserve to model information from a "follow-up" study.
191
          Twenty networks were trained, on the 34 gene set, using
192
          the remaining 20 patients; each had 19 patients in the
193
          training set and 1 in the test set. Collectively, these
194
          networks made no errors in the prognosis of 20 patients.
195
          The data for the 20 reserve patients were then tested on
196
          all 20 trained networks to emulate follow-up data. Out of
197
          400 individual scores, there were 5 errors distributed
198
          over 2 patients. A poll of the 20 networks, therefore,
199
          produced no errors by a majority, correctly classifying
200
          all 20 members of the follow-up group.(data not
201
          shown)
202
          The 34 genes are given in Table 3. In 5 of 12 cases,
203
          the gene chosen as most influential in determining the
204
          correct prognosis was 18593, a tyrosine kinase receptor
205
          gene. While this gene set may not be the absolute best
206
          possible, it clearly does contain sufficient information
207
          for error-free predictions on these patients. The
208
          identification of this gene set will hopefully lead
209
          eventually to a better understanding of the interaction
210
          of these genes in this disease as a result of future
211
          studies.
212
        
213
        
214
          Diagnosing lymphoma from microarray data
215
          The diagnosis of DLBCL lymphoma by biopsy is not
216
          trivial. Even with gene expression data, clustering
217
          techniques produced a misreading of 7 out of 96 donors [
218
          1 ] , a result unimproved in their hands by further
219
          analysis of reduced gene panels. We wished to see if back
220
          propagation neural networks could do better using the
221
          same data set. Figure 2shows a work flow schematic for
222
          this study. This testing over the whole donor set with
223
          4026 genes produced 6 errors in diagnosis (data not
224
          shown).
225
          Thus, in the first round, ANN merely match cluster
226
          analysis. In preparation for differentiation, a network
227
          was trained with the same donor sets as the first network
228
          above, but coded qualitatively. This network correctly
229
          classified the 10 members of the test set (data not
230
          shown). The 5 positive donors from the test set were each
231
          used, in turn, to differentiate the network. In these
232
          cases, the first criterion for selection was broad: the
233
          gene had to contribute at least 10% as much as the gene
234
          making the maximum contribution to the correct
235
          classification; the second criterion was that 3 or more
236
          of the donors had to agree on the selection. This
237
          produced a subset of 292 genes. The number of genes
238
          referenced by a given donor under identical criteria
239
          ranged from 45 to 1448. Only 38% of the genes overlapped
240
          the 670 gene subset identified by cluster analysis. It
241
          was of interest to see if these genes were sufficient for
242
          correct classification of the donors. Ten different
243
          networks were trained with the 292 gene subset. Three
244
          (OCI Ly1 and DLBCL0009 and tonsil) errors were produced
245
          over 96 donors in 2 separate series (data not shown).
246
          At this point, the neural networks were doing a
247
          much-improved diagnosis; it remained to be seen if the
248
          gene set could be further refined. The set of 292 genes
249
          was then treated in two different ways: [ 1 ] it was
250
          arbitrarily split into even and odd halves, with each
251
          half being used to train ten new networks. [ 2 ] it was
252
          used whole to train ten qualitative networks for further
253
          differentiation.
254
          Twenty different networks were then trained using a
255
          146 gene (odd or even numbered) subset of the 292 gene
256
          set in 2 series of 10. The odd set again produced 3
257
          errors (data not shown). In the even set, a single error
258
          was made over 96 donors in ten different test sets,
259
          identifying the 'tonsil' inlier in the earlier cluster
260
          analysis [ 1 ] as positive (Table 4). Ten additional
261
          networks were trained on the even set with the same
262
          result (data not shown).
263
          The differentiation of the networks from the 292 gene
264
          set pointed to 8 genes. Given the high accuracy of the
265
          even 146 gene set, we also trained networks on this set
266
          for differentiation. These pointed to 11 additional
267
          genes. In these cases, only genes in the top 20% in
268
          influence chosen in common by at least 25% of the
269
          differentiated examples were considered. Networks trained
270
          on these 19 genes produced 2 errors over 96 donors in 10
271
          test sets (Table 5). The 19 genes, using the designation
272
          from the initial report, are given in Table 6.
273
          We also wished to test this gene set in the context of
274
          a follow-up study. For this purpose, we set aside 50
275
          donors as "follow-up" data, using the remaining 46 donors
276
          in the usual training/testing round robin. Eleven
277
          networks were trained, 9 with 42 training vectors and 4
278
          test vectors and 2 with 41 training vectors and 5 test
279
          vectors. Collectively, these produced 3 errors over 46
280
          donors or 93% correct. The follow-up donors were then
281
          tested on the 11 networks. A poll of these networks
282
          showed a majority vote for 1 error or 98% correct.
283
        
284
      
285
      
286
        Discussion
287
        The rather remarkable conclusion of this analysis is
288
        that there is sufficient information in a single gene
289
        expression time point of less than 5 dozen genes to provide
290
        perfect prognosis (out to ten years) and near-perfect
291
        diagnosis for this set of donors. Furthermore, neural
292
        networks, through a strategy of train and differentiate,
293
        bring that information to the fore by progressively
294
        focusing on the genes within the larger set which are most
295
        responsible for the correct classifications, providing at
296
        once a reduction in the noise level and specific donor
297
        profiles. This focus on the specific classification problem
298
        led to a set of 34 genes for prognosis and a second set of
299
        19 genes for diagnosis. These sets are mutually exclusive.
300
        The gene subsets suggested by cluster analysis [ 1 ] are
301
        not supersets of these sets; the 670 gene set of the
302
        initial report captured only 7 of the 19 gene set used for
303
        diagnosis and the 148 gene staging set captured only 2 of
304
        the 34 gene set used for prognosis. The 234 gene subset
305
        proposed by Hastie, 
306
        et al . [ 4 ] for prognosis contains
307
        6 of the 34 gene set. There was no overlap with the 13 gene
308
        set identified by Shipp, 
309
        et al [ 5 ] to correlate with their
310
        cured/fatal classes for this disease. At first, it might
311
        seem surprising that the gene subsets identified here do
312
        not appear to be subsets of those identified earlier by
313
        Alizadeh 
314
        et al . But this surprise is based on
315
        a naive intuition. The fact is that we do not know the
316
        level of information redundancy that exists in these large
317
        arrays. Apropos of this point, Alon 
318
        et al . [ 6 ] discarded the 1500
319
        genes indicated by cluster analysis as most discriminatory
320
        in their study of colon cancer and, upon reclustering,
321
        found their diagnosis unimpaired. Likewise, it may be that
322
        while the top 10% of relevant genes might be sufficient for
323
        perfect classification, so might the next 10%. These sets
324
        by definition are mutually exclusive. By extension, it is
325
        not difficult to believe that some other large gene set
326
        might be able to get 75% of the classifications correct
327
        with little or no overlap with those genes in the top
328
        10%.
329
        We have been careful to avoid any claim that the gene
330
        sets extracted in this procedure are the "best" gene sets.
331
        Only in one, highly qualified sense can they be said to be
332
        best; that is in classifying 
333
        this data set there are no other
334
        gene sets which offer a statistically significant
335
        improvement in classification accuracy. That is not to say
336
        that there may not be other sets which could do as well.
337
        Nor is there any implication that these genes are seminal
338
        in the etiology of this disease. They may not be necessary
339
        but they are sufficient to do this classification. They may
340
        not be sufficient to the classification of a much larger
341
        patient set. Forty patients are unlikely to be fully
342
        representative of the general patient population with this
343
        disease. It should be noted, however, that the same caveats
344
        apply to the analysis of these data by any other
345
        method.
346
        There have been a number of additional studies of cancer
347
        using microarray data for either prognostic or diagnostic
348
        purposes. The following listing includes a brief discussion
349
        of 7 of these studies:
350
        (1) Shipp 
351
        et al . [ 5 ] did a study of 58 DLBCL
352
        patients and 19 follicular lymphoma patients. They first
353
        sought to classify DLBCL and FL patients. They clustered
354
        6817 genes. Using their own weighted combination of
355
        informative gene markers, they picked out 30 genes whose
356
        expression levels would be used to do a 2-way
357
        classification. They correctly classified 71/77 patients
358
        for a diagnostic accuracy of 92%. They then attempted to
359
        develop high risk and low risk groups with respect to 5
360
        year prognosis. They used several different methods for
361
        associating particular gene clusters with survival outcome:
362
        Kaplan Meier analysis, Support Vector Machine, and
363
        K-nearest neighbor analysis. They selected 13 genes as most
364
        informative and achieved the best result with SVM modeling.
365
        They did not explicitly state how many patients initially
366
        sorted into the high risk/low risk groups but other data
367
        suggest 17 and 41 respectively. The only way in which these
368
        survival probability plots can be compared to the patient
369
        by patient predictions presented above is to associate low
370
        risk with survival and high risk with non-survival (Please
371
        note:this equation was not made by any of the authors, with
372
        the exception of [ 3 ] below, discussing risk groups). If
373
        one makes this association, their best result is 14/58
374
        errors for a 5 yr. survival accuracy of 76%.
375
        (2) Rosenwald 
376
        et al . [ 7 ] did what they termed a
377
        follow-up study on the original Alizadeh 
378
        et al . study of DLBCL patients.
379
        However, it was not really a follow-up study because a
380
        different chip was used for the microarray data. The
381
        Alizadeh study identified 2 groups based on an analysis of
382
        weighting the gene cluster groups: germinal center B
383
        cell-like tumors which correlated with low risk and
384
        activated B cell-like tumors which correlated with high
385
        risk. If these groups were made survivors and
386
        non-survivors, the prognosis accuracy would have been 75%.
387
        In the follow-up, the authors found it necessary to
388
        introduce a third group, consisting of patients who did not
389
        fit either of the previous 2 categories. Although lacking
390
        the associated gene profile, this third group had a
391
        survival pattern much like the activated B cell-like group.
392
        The authors used Cox proportional hazards modeling to
393
        assign groups on the basis of the expression of 100 genes.
394
        The 5 yr. survival for the low risk group was 60%, 35% for
395
        the activated B cell-like group, and 39% for the 3rd group.
396
        An improved result was obtained using 16 genes drawn from 4
397
        signature gene groupings plus a score for BMP6 expression.
398
        Kaplan Meier estimates of survival were determined for 4
399
        quartiles for which the 5 yr. survival rate was
400
        73%,71%,34%,15%. If these 4 are collapsed into 2 categories
401
        of survivor and non-survivor, it would produce 62/240
402
        errors for a prognosis accuracy of 74%.
403
        (3) van't Veer 
404
        et al . [ 8 ] did a study of 78
405
        patients with breast cancer. Starting with 5000 signature
406
        genes, they narrowed down the gene pool to 231 genes by
407
        examining the correlation coefficient of each gene with the
408
        prognostic outcome. They then rank ordered these genes and
409
        added them 5 at a time to a one-man-out test of their 77
410
        patients for predicted outcome. This was repeated until an
411
        optimum outcome classification was reached. This occurred
412
        at 70 genes. A patient by patient classification based on
413
        the weighting of these 70 genes was able to produce a
414
        survival classification with 13/78 errors for an accuracy
415
        of 83%.
416
        (4) Beer 
417
        et al . [ 9 ] used clustering and Cox
418
        hazard analysis to generate a list of 50 genes to be used
419
        in Kaplan Meier 5 yr. projections of survival. They had 86
420
        patients with lung cancer in the study. With 22 patients
421
        originally assigned to the low risk group and 19 to the
422
        high risk group, the corresponding 5 yr. survival rates
423
        were 83% and 40%. If treated as survival categories this
424
        would produce 12/41 errors for a prognosis classification
425
        accuracy of 71%. Although these authors had complete 5 yr.
426
        survival data on 41 of the patients in the study, they at
427
        no point attempted to analyze this group specifically for
428
        direct comparison with predictions.
429
        (5) Khan 
430
        et al . [ 10 ] used linear neural
431
        networks to analyze microarray data from patients with
432
        small round blue-cell tumors. They wished to classify the 4
433
        subcategories of this tumor. Principle Component Analysis
434
        was used to reduce 2308 genes to 10 components. Neural
435
        networks were trained using 2/3 of a 63 patient pool to
436
        train and 1/3 to test in a fully cross-validated fashion.
437
        The groups were shuffled 1250 times to produce 3750
438
        networks. These networks correctly classified all 63
439
        patients in a 4-way classification. The networks were
440
        analyzed for the most influential inputs to produce a list
441
        of 96 genes. New networks were calibrated with just these
442
        96 genes; these again correctly classified the 63 patients
443
        and also correctly classified the 25 patients who had been
444
        withheld from the whole process.
445
        (6) Dehanasekaran 
446
        et al . [ 11 ] did a study of 60
447
        prostate biopsy samples, 24 non-tumorous,14 tumor in situ,
448
        20 metastatic tumor. Cluster analysis of microarray data
449
        from nearly 10,000 genes misplaced 2 samples out of 26 for
450
        a diagnostic accuracy of 92%. The authors did not state why
451
        they limited the clustering result to 26 samples when they
452
        had 60. Although they performed additional analyses, they
453
        did not involve using the array data for either diagnosis
454
        or prognosis.
455
        (7) Golub 
456
        et al . [ 12 ] wished to be able to
457
        distinguish acute myeloid leukemia (AML) from acute
458
        lymphoblastic leukemia (ALL). Starting with the expression
459
        of 6817 genes from 38 patients, they did a 2-class
460
        clustering. They then did a neighbor analysis to identify
461
        1100 genes occurring above chance levels which related to
462
        the AML/ALL distinction. They choose an informative subset
463
        of 50 genes to weight for class assignment of the patients.
464
        They were able to correctly classify 29/34 patients for a
465
        diagnostic accuracy of 85%. They next attempted to use
466
        self-organizing-maps (SOM) for 2 classes in place of the
467
        initial clustering. This produced only 4/38 errors for 89%
468
        diagnostic accuracy. Drawing a 20 gene predictor from these
469
        SOM classes, they again produced 4/38 errors, maintaining a
470
        89% accuracy. These authors also attempted to use array
471
        data to predict clinical outcome on 15 AML patients but
472
        without success.
473
        The identification of specific genes associated with a
474
        particular biological characteristic such as malignant
475
        phenotype would be useful in many settings, [ 1 ] Precise
476
        classification and staging of tumors is critical for the
477
        selection of the appropriate therapy. At present,
478
        classification is accomplished by morphologic,
479
        immunohistochemical, and limited biological analyses.
480
        Neural net analysis in the form of specific donor profiles
481
        could provide a fine structure analysis of tumors
482
        characterizing them by a precise weighting of the genes,
483
        which they express differentially. At present, only subsets
484
        of patients with a given type of tumor respond to therapy.
485
        Networks trained to distinguish responders from
486
        non-responders would allow a comparison of tumor-expressed
487
        genes in responders and non-responders to find those genes
488
        most predictive of response. Recently we have used neural
489
        networks on the data of Perou 
490
        et al . [ 12 ] for classifying breast
491
        tumors as hormonally responsive or non-responsive. Networks
492
        that gave a perfect classification with 496 genes pointed
493
        to a subset of 12 genes. Retraining on these 12 genes
494
        produced no error in classifying 62 tissue samples from
495
        their study (unpublished data). We have also analyzed the
496
        data of Dhanasekaran, 
497
        et al [ 11 ] . Here the original set
498
        of 9984 genes was reduced to 34 genes. Retraining on these
499
        34 genes gave no errors in a 3-way (normal, early tumor,
500
        metastatic disease) classification of 53 patients
501
        (unpublished data). Given the significant impairment in the
502
        quality of life for many patients undergoing chemotherapy
503
        and/or radiation therapy, such prospective information
504
        would be extremely beneficial. [ 3 ] T cell and
505
        antibody-mediated immunotherapy may be efficacious
506
        approaches for limiting tumor growth in cancer patients. At
507
        present there is a paucity of known tumor rejection
508
        antigens that can be targeted. Neural net analysis may
509
        identify a panel of tumor-encoded genes shared by many
510
        patients with the same type of cancer and thereby provide a
511
        repertoire of potentially novel tumor rejection antigens. [
512
        4 ] For many patients with autoimmune disease the target
513
        antigen(s) is unknown. Enhanced identification of cell-type
514
        specific markers of the target organ through neural net
515
        profiling could identify potential target antigens as
516
        candidate molecules for testing and tolerance
517
        induction.
518
      
519
      
520
        Conclusions
521
        We believe neural networks will be an ideal tool to
522
        assimilate the vast amount of information contained in
523
        microarrays. The artificial networks presented here were
524
        not selected from a large number of attempts. The networks
525
        described here are the first or second attempts with the
526
        data and format stated; the longest training session lasted
527
        less than 5 minutes. Indeed, the trained neural network
528
        may, in the form of its weight matrix, have the best
529
        possible "understanding" of the very broad statement being
530
        made in the microarray, a view that is accessible with the
531
        differentiation of the network. In this study, that
532
        viewpoint suggested a small subset of genes, which proved
533
        sufficient to give a near-perfect classification in each of
534
        two problems. This approach should be suitable for any
535
        microarray study and, indeed, other global studies such as
536
        2-D gels and mass-spec data which contain sufficient
537
        information for training.
538
      
539
      
540
        Methods
541
        The data from microarray experiments are stored in
542
        spreadsheet form, representing the positive or negative
543
        level of expression, relative to some control state, of
544
        1000's of genes for two or more experimental conditions. A
545
        short software program is sufficient to translate these
546
        data directly into a binary representation suitable as
547
        input vectors for a neural network. The neural network
548
        software used throughout this study was NeuralWorks
549
        Professional II Plus v.5.3.Neural networks were trained on
550
        the corresponding data sets, with a fraction of the data,
551
        typically 10%, withheld for testing purposes. All open
552
        fields in the data array were set to zero. The trained
553
        networks were then asked to classify new test data as to
554
        donor type. Since the gene expression levels are read
555
        directly from the spreadsheet, their order and names are
556
        provided by the spreadsheet. Given the large amount of
557
        input data, these networks generally converge to a low
558
        error level very quickly during training, often in a few
559
        minutes or less. Subsequently additional networks were
560
        trained with a simplified input that contained only
561
        qualitative information in the form of a plus or minus sign
562
        to characterize the expression of each gene in the panel.
563
        This reduced the input size to 2 bits per gene, 01 for
564
        below the control and 10 for above, or equal to, the
565
        control. The output neuron was trained to output 1.0 for a
566
        positive donor and 0.0 for a negative donor in the
567
        diagnostic networks; for the prognostic networks 1.0
568
        indicated a non-survivor and 0.0 a survivor. The 4026 gene
569
        panel network was provided, respectively, 100 or 67
570
        middle-layer neurons for the 3 bit or 2 bit per gene
571
        inputs. With a very large number of input neurons it is
572
        possible to overload the middle-layer neurons, effectively
573
        always operating them at one extreme limit or the other;
574
        this can have the undesirable effect of reducing their
575
        sigmoid transfer function to a step function, with the loss
576
        of the network's non-linearity. This is clearly indicated
577
        if multiple output values are found to be exactly
578
        identical. Networks were trained to an error level below
579
        0.05 after which they were tested with previously unseen
580
        data. A possible disadvantage of neural networks,
581
        especially with a large input space and a relatively small
582
        sample number, is overtraining. In overtraining, a network
583
        can learn the specifics of each training example as opposed
584
        to finding a global solution for the entire training set.
585
        This behavior is characterized by a degradation in test
586
        scores as training sessions are extended. Although we saw
587
        no evidence of this in this study, we did look to see how
588
        much additional training would be necessary to degrade the
589
        test results in the case of the initial diagnosis networks
590
        with 4026 genes. It was not until we doubled the training
591
        iterations dictated by the 0.05 output error cutoff that we
592
        saw some increased test error. At double the normal
593
        training interval, 8 networks were unchanged, but 2
594
        networks showed an increased error of 1. This is
595
        suggestive, but not proof, of the onset of overtraining.
596
        The networks trained on the reduced 34 or 19 gene sets had
597
        6 or 4 middle-layer neurons.
598
        To differentiate a trained network with respect to
599
        specific inputs, a network was trained on the 4026 gene
600
        panel with 2 bits per gene. The 5 positive donors from the
601
        test set were each differentiated, using software that we
602
        designed for that purpose [ 2 ] . The selected genes were
603
        then compared among the 5 sets, with genes occurring in 3
604
        or more instances being included in the final subset. This
605
        requirement generated a subset of 292 genes from the
606
        original 4026 genes. Networks were trained on this 292 gene
607
        subset and on two 146 gene subsets, representing every
608
        other gene from the 292 set. All were coded with 3 bits per
609
        gene and employed networks with 25 or 12 middle-layer
610
        neurons, respectively. Other networks were trained on the
611
        292 gene set and the 146 'even' set, coded with 2 bits per
612
        gene for subsequent differentiation.
613
        The differentiation of the large panel networks trained
614
        for prognosis arbitrarily employed more selective criteria
615
        (see text) for subset determination with the result that a
616
        single differentiation reduced the gene set from 4026 genes
617
        to 34 genes. Subsequent networks demonstrated that this was
618
        a highly effective selection.
619
        All networks in this study were three-layer back
620
        propagation networks trained with a learning coefficient of
621
        0.3 and a momentum coefficient of 0.4 using the generalized
622
        delta learning rule and the standard sigmoidal transfer
623
        function. The cutoff, in all cases, between positive and
624
        negative scoring was taken to be 0.05 RMS error at the
625
        output neuron No network required more than 4 minutes
626
        training time on a PC at 650 Mh; in the majority of cases,
627
        the network was fully trained in less than a minute.
628
        Training and testing a 10 network round-robin series could
629
        generally be done in less than 20 minutes. Training was
630
        deliberately kept to a minimum to avoid over-training. The
631
        networks represented here were in each case the first or
632
        second attempt result for the given problem. There was no
633
        "data trolling."
634
      
635
      
636
        Note
637
        1All data not shown can be found at the site
638
        http://research.umbc.edu/~moneill/GBMS
639
      
640
    
641
  
642

643
Product

Resources

Company