Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Download
29547 views
1
2
3
4
5
Background
6
Expression profiling is an emerging experimental method
7
whereby RNA accumulation in cells and tissues can be
8
assayed for many thousands of genes simultaneously in a
9
single experiment. There are two common experimental
10
platforms for expression profiling; redundant
11
oligonucleotide arrays (Affymetrix GeneChips) [ 1 ] , and
12
spotted cDNA microarrays [ 2 3 4 ] . The Affymetrix
13
GeneChips have the inherent advantages of redundancy,
14
specificity, and transportability; there are typically
15
30-40 oligonucleotide probes (features) designed against
16
each gene tested by the array, with paired perfect-match
17
and mismatch probes, with standardized factory synthesis of
18
arrays [ 5 6 ] . The uniform nature of the arrays permits
19
databasing of individual profiles, which facilitates
20
comparison of data generated by different laboratories.
21
Expression profiling has led to dramatic advances in
22
understanding of yeast biology, where homogeneous cultures
23
can be grown and exposed to timed environmental variables [
24
7 8 9 10 11 12 ] . Such studies have led to the rapid
25
assignment of function to a large number of anonymous gene
26
sequences. Large-scale expression profiling studies of
27
tissues from higher vertebrates are more challenging, due
28
to the higher complexity of the genome, larger related gene
29
families, and incomplete genomic resources. Nevertheless,
30
DNA microarrays have been successfully applied in the
31
analysis of aging and caloric restriction [ 13 ] and
32
pulmonary fibrosis [ 14 ] . And many publications,
33
particularly on cancer, have appeared [ 14 15 16 17 18 19 ]
34
. Affymetrix has recently announced the availability of the
35
U133 GeneChip series with 33,000 well-characterized human
36
genes mined from genomic sequence. The nearly complete
37
ascertainment of genes in the human genome should make
38
expression-profiling studies of human tissues particularly
39
powerful. However, identification of the sources of
40
experimental variability, and knowledge of the relative
41
contribution of variation from each source, is critical for
42
appropriate experimental design in expression profiling
43
experiments.
44
Mills and Gordon recently studied the relative
45
contribution of experimental variability of probe
46
production on the reproducibility of microarray results
47
using mixed murine tissue RNA on Affymetrix Mu11K GeneChips
48
[ 20 ] . In their study, the same RNA preparation was used
49
as a template for distinct cDNA/cRNA amplifications and
50
hybridizations. An additional variable studied was the
51
effect of different laboratories processing the same RNAs.
52
The authors found relatively poor concordance between
53
duplicate arrays, with an average of 12% increase/decrease
54
calls between the same RNA processed in parallel and
55
hybridized to two Mu11K-A microarrays. The authors
56
concluded that there was substantial experimental
57
variability in the experimental procedure, necessitating
58
extensive filtering and large numbers of arrays to detect
59
accurate gene expression changes (LUT: look-up tables) [ 20
60
] . In our laboratory, we have processed over 1,200
61
Affymetrix arrays, and have found significantly higher
62
experimental reproducibility (R 2= 0.979 for new generation
63
U74A version 2 murine arrays or human U95 series, see
64
Result and Discussion). In addition, a recent publication
65
of a single human patient, where RNA was prepared from two
66
distinct breast tumors, and placed on duplicate U95A
67
GeneChips (four chips total) found a very low degree of
68
experimental variability between microarrays (R 2= 0.995),
69
and between the two tumors (R 2= 0.987) [ 21 ] . The marked
70
differences in experimental variability between
71
laboratories could be due to different quality control
72
protocols (see http://microarray.cnmcresearch.org), newer
73
more robust Affymetrix arrays now available (murine Mu11K
74
versus U74A version 2 and new generation human U95 series),
75
use of more recent algorithms for data interpretation, or
76
due to more consistent processing of RNA, cDNA, and cRNA in
77
the same laboratory.
78
The previous studies did not systematically address the
79
reproducibility of GeneChip hybridization (e.g. the same
80
biotinylated cRNA on two different microarrays). In
81
addition to lingering questions concerning variability due
82
to specific experimental procedures, there are other
83
possible sources of variability that have not yet been
84
investigated, specifically tissue heterogeneity and
85
inter-individual variation. The latter two sources of
86
variability are particularly important in human expression
87
profiling studies. The study of human tissues often
88
involves the use of tissue biopsies, where a relatively
89
limited region of an organ is sampled. Tissue heterogeneity
90
and sampling error might be expected to introduce
91
significant variability in expression profiles. Second,
92
tissues may derive from individuals from different ethnic
93
backgrounds; humans are highly outbred, leading to the
94
potential of significant polymorphic noise (herein called
95
"SNP noise") between individuals unrelated to the disease
96
or variable under study. SNP noise also exists between
97
different inbred mouse strains, and some experiments have
98
normalized this effect by breeding the same mutation on
99
different strains, and profiling each individually [ 22 ] .
100
Knowledge of the relative effect of each experimental,
101
tissue, and patient variable on expression profiling
102
results in humans is important, so that appropriate
103
experimental designs can be employed.
104
We recently reported the design and production of a
105
highly redundant oligonucleotide microarray for analysis of
106
human muscle biopsies (Borup et al.
107
submitted ). This MuscleChip contains
108
4,601 probe sets corresponding to 3,369 distinct genes and
109
ESTs expressed in human muscle. Each probe set contains
110
between 16 to 40 oligonucleotides, such that the number of
111
specific oligonucleotide probes on the array was
112
138,000.
113
Here, we utilize this MuscleChip to investigate the
114
relative significance of variables affecting expression
115
profiling data and interpretation. Specifically, we studied
116
the correlation coefficients of profiles considering the
117
following variables: 1. variation due to probe production
118
(same RNA); 2. variation due to the microarray itself (same
119
cRNA on different GeneChips); 3. tissue heterogeneity
120
(different regions of the same muscle biopsy); 4.
121
inter-patient variability (SNP noise); 5. diagnosis
122
(underlying pathological variable); and 6. patient age.
123
We have recently reported generation of expression
124
profiling results using mixed patient samples [ 23 ] . Our
125
hypothesis was that mixing of RNA samples from multiple
126
regions of muscle biopsies, and from multiple patients
127
matched for most variables (disease, age, sex), would
128
effectively normalize both intra-patient variability
129
(tissue heterogeneity), and inter-patient variability (SNP
130
noise; e.g. normal human polymorphic variation unrelated to
131
the primary defect). Here, we test this hypothesis
132
directly, and show that sample mixing does indeed result in
133
relatively high sensitivity and specificity for gene
134
expression changes that would be detected by many
135
individual expression profiles. Thus, sample mixing appears
136
to be an appropriate first-pass method to obtain the most
137
significant expression changes, while using small numbers
138
of arrays.
139
140
141
Results and discussion
142
Fifty six (56) different RNA samples were prepared from
143
different regions of muscle biopsies from 28 individuals
144
(15 Duchenne muscular dystrophy (DMD) patients, 13 normal
145
controls). The profiles of five of the DMD patients and the
146
five controls have been previously reported using the
147
Affymetrix HuFL microarray [ 23 ] ; however, we re-tested
148
these same samples on the custom MuscleChip (Borup et al.
149
submitted ) for comparison to the
150
other patients here. All RNAs were converted to
151
double-stranded cDNA, and then to biotinylated cRNA. The
152
cRNAs were then hybridized to the MuscleChip either singly,
153
in mixed groups, or both, as described below. In total, 34
154
hybridizations were performed, scanned, and the data
155
statistically analyzed using Affymetrix Microarray Suite
156
and Excel. Quality control criteria were as described on
157
our web site ( http://microarray.cnmcresearch.org, link to
158
"programs in genomic applications"), and included
159
sufficient cRNA amplification, and adequate
160
post-hybridization scaling factors. Scaling factors
161
(normalization needed to reach a common target intensity)
162
ranged from 0.46 to 3.28 (Table 1). All raw image files,
163
processed image files, and difference analyses are posted
164
on a web-queried SQL database interface to our Affymetrix
165
LIMS Oracle warehouse (see
166
http://microarray.cnmcresearch.org: link to "programs in
167
genomic applications", "data", "human").
168
Among the 4,601 probe sets on the Affymetrix custom
169
muscle microarray, we found a consistent percentage of
170
"present" calls for each of the 34 cRNA samples tested
171
(Duchenne dystrophy, 28 arrays, 48.2% ± 6.1%; controls 6
172
arrays, 53.3% ± 1.4%). To test for inter-array variability,
173
two different hybridization solutions were applied to
174
duplicate arrays, and correlation coefficients determined.
175
A high correlation coefficient was found in this analysis,
176
suggesting that inter-array variability of the MuscleChip
177
used was a relatively minor variable (Patient 3 a and
178
3a-duplicate R 2= 0.96 and percent shared [No Change (NC)]
179
calls by Microarray Suite software was 99%; Patient 3b and
180
3b-duplicate R 2= 0.98 and percent NC was 98%; Table 1).
181
The high reproducibility of Affymetrix array results is
182
consistent with other data in our laboratory, and from
183
previously published data [ 6 21 23 24 ] , and shows that
184
experimental variability associated with hybridization and
185
scanning of highly redundant oligonucleotide GeneChips is
186
not a major source of experimental variability.
187
Given the previous report suggesting that the conversion
188
of RNA to biotinylated cRNA probe was a major source of
189
variability in murine array experiments [ 20 ] , we tested
190
a series of murine RNA from different sources, using the
191
newer generation U74Av2 GeneChips. One series of samples
192
was from murine spleens, where spleens from multiple
193
animals for each variable under study were mixed, RNA
194
isolated, RNA samples split, and duplicate cDNA, cRNA, and
195
hybridizations processed in parallel for each RNA (Fig. 1,
196
"KNagaraju" samples). We also compared RNAs processed from
197
parallel murine myogenic cell cultures (Fig. 1, "VSM"
198
samples), where each profile was from a different cell
199
culture. Finally, we used a series of murine muscle tissues
200
from normal and dystrophin-deficient mice, where each
201
profile was from a different series of complete
202
gastrocnemius muscles (Fig. 1, "FBooth" samples). The data
203
from these 42 murine U74Av2 profiles were then analyzed by
204
unsupervised clustering [ 25 ] to determine which profiles
205
were most closely related to each other (Fig. 1). This
206
analysis shows that the different sources of RNA cluster
207
together, as expected. Importantly, the same RNA used as a
208
template for two distinct cDNA/cRNA preparations and
209
hybridizations showed a high correlation coefficient (R 2=
210
0.99 for five of the six samples, with average R 2= 0.978)
211
(Fig. 1). The large muscle group profiles (FBooth samples)
212
showed excellent correlation, both with respect to
213
diagnosis; however here there was no sampling error as the
214
entire muscle group was used rather than isolated biopsies.
215
Finally, the parallel tissue culture experiments (VSM
216
samples) showed greater variability between duplicates,
217
suggesting that tissue culture conditions may be more
218
subject to variability than
219
in vivo tissues (Fig. 1). This murine
220
data shows that variability from different cDNA-cRNA
221
reactions is very low (R 2= 0.978).
222
To analyze the impact of intra-patient variability
223
(tissue heterogeneity), inter-patient variability
224
(polymorphic noise in outbred populations), and the effect
225
of sample mixing on the sensitivity of detection of gene
226
expression differences between patient groups, we conducted
227
a series of individual and mixed profiling (Table 1).
228
Muscle biopsies from five 4-6 yr old DMD patients, and five
229
10-12 yr old patients were selected, each biopsy split into
230
two parts, and RNA isolated independently from each of the
231
20 biopsy fragments. For these ten DMD patients, the two
232
different regions of the same biopsy were expression
233
profiled both individually (20 profiles), and also mixed
234
into four pools where each pool originated from distinct
235
RNA samples (Table 1). The resulting profiles were also
236
compared to previously reported mixed 6-9 yr old DMD
237
patient cRNAs, and mixed 6-9 yr old control cRNAs [ 23 ] ,
238
as mentioned above.
239
As an initial statistical analysis, we used Affymetrix
240
software to define genes that showed expression changes
241
(Increased, Decreased or Marginal) in expression levels
242
between pairs of profiles (difference analyses). This
243
method of data interpretation showed that some muscle
244
biopsies showed very little variance between different
245
regions of the same biopsy, while other patient biopsies
246
showed considerable variability (see Fig. 2for
247
representative scatter graphs). Expressing this variance as
248
a percentage of "Diff Calls" between the two regions of the
249
same biopsy, as determined by Affymetrix default
250
algorithms, we found considerable variability in the
251
similarity of profiles, with values ranging from 1.5% to
252
18% of the 4,601 probe sets studied (4.99% ± 4.94%). This
253
data suggests that tissue heterogeneity (intra-patient
254
variability) can be a major source of variation in
255
expression profiling experiments, even when using
256
relatively large pieces (50 mg) of relatively homogeneous
257
tissues (such as muscle).
258
The most common strategy for interpreting Affymetrix
259
microarray data is to use two profile comparisons, with an
260
arbitrary threshold for "significant fold-change" in
261
expression levels. Typically, multiple arrays are compared,
262
with those gene expression changes showing the most
263
consistent fold changes prioritized, although other methods
264
have been reported [ 13 22 26 27 ] . To study inter-patient
265
variability, we defined the gene expression changes
266
surviving four pairwise comparisons with mixed control
267
samples, as we have previously described [ 23 ] . Briefly,
268
four comparisons were done by Affymetrix software (eg. DMD
269
1a versus control 1a; DMD 1a versus control 1b; DMD1b
270
versus control 1a; DMD1b versus control 1b). The four data
271
sets were then compared, with only those gene expression
272
changes that showed >2-fold change in all four
273
comparisons (four comparison survival method). The number
274
of surviving diff calls by this method ranged from 250 to
275
463 (355 ± 80) (Table 1). Interestingly, those patients
276
showing considerable variation between different regions of
277
the same biopsy did not show a corresponding decrease in
278
the number of gene expression changes surviving the
279
iterative comparisons to controls (Table 1). This suggests
280
(but does not prove) the most significant changes might be
281
shared, independent of tissue variability (see below).
282
A different statistical method to determine the effect
283
of the different variables under study is to perform
284
hierarchical cluster analysis using nearest neighbor
285
statistical methods [ 25 ] . Here, we subjected all
286
profiles to unsupervised cluster analysis, as a means of
287
determining which variables had the greatest effect (e.g.
288
intra-patient variability [different regions of biopsy],
289
versus diagnosis [DMD vs control], versus inter-patient
290
variability [DMD patients in same age group], versus age of
291
patient). For this analysis, we used the fluorescence
292
intensity of each probe set (Average difference), after
293
data scrubbing to remove genes that showed expression
294
levels near background ("Absent" Calls) for all profiles
295
(Fig. 3). This analysis shows that duplicate profiles of
296
the same cRNA hybridization solution are the most highly
297
related (Patient 3 a and duplicate (3a-d); 3b and duplicate
298
(3b-d)), consistent with the high correlation found by the
299
comparisons using Affymetrix Microarray Suite software
300
described above. Again, this reflects the low amount of
301
combined experimental variability intrinsic to the
302
laboratory processing of RNA, cDNA, cRNA and
303
hybridization.
304
When comparing two different regions of the same biopsy
305
[intra-patient variability], we found widely varying
306
results, depending on the patient studied (Fig. 3). For
307
example, some individual patients showed very closely
308
related profiles that approached the similarity of
309
duplicate arrays on the same cRNA (Fig. 2; profiles 6a, 6b;
310
10a, 10b). On the other hand, some patients showed very
311
distantly related profiles for two regions of the same
312
biopsy (Fig. 3; profiles 1a, 1b; 4a, 4b; 9a, 9b).
313
Importantly, the variation caused by intra-patient tissue
314
variation often overshadowed all other variables. For
315
example, a profile from DMD patient 9 (9a) clustered with
316
the normal controls, rather than with the other DMD
317
patients (Fig. 3). The histopathology of this patient was
318
noted as being unusually variable in severity prior to
319
expression profiling. Also, unsupervised clustering was
320
unable to group patients of similar ages, despite DMD
321
showing a progressive clinical course. We conclude that
322
intra-patient tissue heterogeneity is a major source of
323
experimental variability in expression profiling, and must
324
be considered in experimental design.
325
The above findings suggested that both intra-patient
326
variability (tissue heterogeneity) and inter-patient
327
variability (polymorphic noise) had major effects on the
328
expression profiles. One method to control for these
329
sources of noise is to analyze large numbers of profiles,
330
both on multiple patients, and on multiple regions of
331
tissue from each patient. This would allow determinations
332
of p values and statistical significance for a single
333
controlled variable under study (e.g. DMD vs controls). An
334
alternative method is to experimentally normalize these
335
variables through mixing of samples from patient groups;
336
such mixing would be expected to average out both intra-
337
and inter-patient variation. The expectation is that the
338
most significant and dramatic gene expression changes would
339
still be identified, while using many less profiles (and
340
thus a substantial reduction in cost of the analyses).
341
To test for the relative sensitivity of interpretation
342
of sample mixing versus individual profiles, we mixed
343
together the 10 cRNAs for the two different age groups of
344
DMD patients (samples 1a - 5b; samples 6a - 10b). For this
345
analysis, we also generated expression profiles for two
346
additional groups of control individuals. One was a second
347
set of five normal male biopsies ages 5-12 yrs (controls
348
2a, 2b), and the third control set was three normal
349
age-matched female biopsies ages 4-13 yrs (controls 3a, 3b)
350
(Fig. 4). As with the original male control group (control
351
1a, 1b), two different regions of each biopsy were
352
processed independently through the biotinylated cRNA step,
353
and then equimolar amounts of cRNA mixed for hybridization
354
to the MuscleChip.
355
All 34 profiles (both individual and mixed samples) were
356
again analyzed by unsupervised hierarchical clustering
357
(Fig. 4) [ 25 ] . As described above, we scrubbed the
358
profiles to eliminate all genes showing expression levels
359
consistently at or below background hybridization
360
intensities by requiring each gene to show a "Present Call"
361
in one or more of the 34 profiles.
362
As above, duplicate profiles using the same cRNA
363
hybridization solution on different arrays, whether mixed
364
or individual samples, showed very highly correlated
365
results (very low branch on dendrogram) (Fig. 4; mix 5-6
366
yrs, mix 10-12 yrs; patient 3a/3a-d; patient 3b/3b-d). As
367
above, this indicates that experimental variability from
368
laboratory procedures or different arrays is a relatively
369
minor factor in interpretation of results. Mixed samples
370
from different regions of the same biopsies showed the
371
same, or only slightly more variation (mixed controls c1,
372
c2, and c3, mixed DMD 6-9 yrs). This showed that sample
373
mixing does indeed average out tissue heterogeneity
374
(intra-patient variability), as well as inter-patient
375
variability. We noted that all of the controls (both male
376
and female) clustered in the same branch of the dendrogram,
377
while the four of the six mixed DMD profiles clustered just
378
one level away from the controls, separately from the other
379
DMD profiles. This analysis suggests that there is
380
considerable variability in the progressive tissue
381
pathology induced by dystrophin deficiency, both within a
382
patient, and between patients.
383
To test the sensitivity and specificity of sample mixing
384
versus individual profiling, we defined differentially
385
expressed genes using a two group t-test (GeneSpring [ 28
386
29 ] ), comparing all 6 mixed control profiles and the 10
387
individual 5-6 yr old DMD profiles. Genes were retained
388
that met specific p value thresholds between the two sets
389
of profiles. In parallel, we compared the two corresponding
390
mixed 5-6 yr old DMD profiles to the same 6 mixed control
391
profiles.
392
Comparison of 10 individual 5-6 yr Duchenne dystrophy
393
profiles to 6 mixed controls revealed 1,498 genes showing
394
differential expression with p < 0.05 (Fig. 5).
395
Comparison of the two mixed Duchenne dystrophy profiles to
396
the 6 mixed controls showed 1,350 genes with p < 0.05
397
(Fig. 5A). Comparison of the two gene lists showed that 61%
398
of differentially regulated genes detected by the 10
399
individual profiles were also detected by the two mixed
400
profiles. This suggests that the sensitivity and
401
specificity of using mixed samples is approximately half
402
that of individual profiles. However, there was a rapid
403
shift in specificity and sensitivity as stringency of the
404
analysis was increased. Raising the statistical threshold
405
to p < 0.0001 for individual profiles, while keeping the
406
threshold for mixed profiles at p < 0.05 as required by
407
the small number of data points (Fig. 5B), resulted in a
408
sensitivity of 86% for mixed samples (351 of 408 genes p
409
< 0.0001 detected). In conclusion, mixing detected about
410
two thirds of statistically significant changes (p <
411
0.05). Mixing was a relatively sensitive method of
412
detecting the most highly significant changes (p <
413
0.0001) (86% of changes detected), however it was not very
414
specific; as many as one third of gene expression changes
415
showing p < 0.05 in mixed samples were not confirmed by
416
individual profiles.
417
Use of t-test measurements is expected to contain
418
significant amounts of noise, due to the very large number
419
of comparisons involved in array studies; a value of p =
420
0.05 means that as many as 5% of gene expression changes
421
are expected to be identified by "chance", and thereby not
422
reflect true differences between samples. We have
423
previously reported a very simple, yet potentially more
424
stringent method for data analysis of small numbers of
425
expression profiles, using duplicate profiles for control
426
and experimental samples, and then identifying those genes
427
that show consistent changes >2-fold in the four
428
possible pair-wise data comparisons (four comparison
429
survival method) [ 23 ] . A similar pair-wise comparison
430
method, using a less stringent average fold-change
431
analysis, was recently reported for muscle from aging and
432
calorie-restricted mouse muscle [ 13 ] .
433
To investigate the validity of this approach we compared
434
the sensitivity and specificity of t-test detection of
435
expression changes versus the four-pairwise survival
436
method. Two sample t-test of the 10 individual Duchenne
437
dystrophy profiles compared to the 6 mixed control profiles
438
revealed 1,498 genes showing p < 0.05 as above. In
439
parallel, the mixed DMD duplicate profiles were compared to
440
a single pair of mixed control sample profiles (c1a, c1b),
441
using the pairwise comparison survival method [ 23 ] .
442
Briefly, four comparisons were done (DMD 1a versus control
443
1a; DMD1b versus control 1a; DMD 1a versus control 1b;
444
DMD1b versus control 1b), and only those genes retained
445
which showed >2-fold change in all four comparisons.
446
This method was indeed considerably more specific in
447
identifying significant (p < 0.05) gene expression
448
changes (Fig. 6A) with 85% of gene expression changes in
449
the mixed profiles verified by individual profiles (p <
450
0.05). The sensitivity of this method depended on the
451
p-value threshold for the individual profiles, but only
452
reached a maximum of 49% sensitivity at p < 0.0001 (Fig.
453
6B).
454
The results above suggested that analysis of mixed
455
samples using t-test methods was relatively sensitive but
456
non-specific, while analysis of the same mixed profiles by
457
2-fold survival method was relatively specific but
458
insensitive. To confirm this conclusion, we directly
459
compared the sensitivity and specificity of the four
460
pairwise comparison method to more standard t-test methods
461
(Fig. 7A). We found that the pairwise survival method was
462
indeed highly specific, with 97% of changes identified by
463
this method also detected by t-test. However, as predicted,
464
it was not very sensitive, with only 30% of the expression
465
changes with p < 0.05 identified by t-test being
466
detected by the pairwise survival method. Comparison of all
467
three analysis methods showed that many (349) genes
468
expression changes were detected by all three methods (Fig.
469
7B).
470
471
472
Conclusions
473
Microarray data analyses have been criticized as being
474
"quite elusive about measurement reproducibility" [ 30 ] .
475
This is largely the consequence of the large number of
476
uncontrolled or unknown variables, and the prohibitive cost
477
of isolating and investigating each variable. Here, we
478
report the systematic isolation and study of most variables
479
in microarray experiments using Affymetrix oligonucleotide
480
arrays and human tissue biopsies. We found that all sources
481
of experimental variability were quite minor (microarray R
482
2= 0.98-0.99; probe synthesis + microarray R 2= 0.98-0.99).
483
On the other hand, tissue heterogeneity (intra-patient
484
variation; Average R 2for 10 patients = 0.92 [0.85 to
485
0.98]), and differences between individual patients (SNP
486
noise; Average R 2= 0.76 [0.42 to 0.93]) were major sources
487
of variability in expression profiling. Thus, tissue
488
heterogeneity and SNP noise have a high potential to
489
obscure sought after condition-specific gene expression
490
changes, particularly in humans, where tissue samples can
491
be limiting (sampling error), and inter-individual
492
variation often is very large. We have shown that mixing of
493
patient samples effectively normalizes much of the intra-
494
and inter-patient noise, while still identifying the
495
majority of the most significant gene expression changes
496
that would have been detected by larger numbers of
497
individual patient profiles. Our results suggest that
498
stringent yet robust data can be generated by mixing a
499
small number of individuals with a defined condition (n =
500
5), preferably using different regions of tissue for
501
duplicate arrays. Controls should be similarly processed.
502
The resulting four arrays (2 controls, 2 experimental
503
datasets) should then be subjected to the >2-fold
504
survival method, as previously described [ 23 ] . This will
505
yield a stringent set of expression changes that are likely
506
to be verified by larger studies with individual arrays,
507
but at low cost as only four arrays are employed. The
508
preliminary data from just four mixed profiles (two
509
experimental and two control) can then be used to generate
510
functional clusters and pathophysiological models. These
511
preliminary models can then direct more hypothesis-driven
512
experiments, or more extensive expression profiling
513
studies.
514
515
516
Materials and methods
517
518
Expression profiling
519
Human muscle biopsy samples were diagnostic specimens
520
flash-frozen immediately after surgery in isopentane
521
cooled in liquid nitrogen, with storage in small,
522
airtight, humidified tubes at -80°C until RNA isolation.
523
Duchenne muscular dystrophy patient samples were all
524
shown to have complete lack of dystrophin by
525
immunostaining and/or immunoblot analysis, and were shown
526
to have excellent morphology and preservation of tissue.
527
Controls included groups of males and female (age
528
described in text) that showed no histopathological
529
abnormality, normal dystrophy proteins, and normal serum
530
creatine kinase levels. Biopsy sizes ranged from 50 mg to
531
2 grams, with approximately 20-30 mg used for RNA
532
isolation (~10-15 micrograms of total RNA). As described
533
in the text, all biopsies had two different regions of
534
the same biopsy expression profiled separately.
535
Details concerning the murine profiles will be
536
published elsewhere. In this report, we used the murine
537
profiles simply to test the sources of variation during
538
sample preparation prior to hybridization to
539
oligonucleotides.
540
RNA isolation (Trizol, Gibco BRL), RNA purification
541
(RNAeasy, Qiagen), cDNA synthesis and biotinylated cRNA
542
were all done as per standard protocols provided by
543
Affymetrix Inc. Quality control methods are described on
544
our web site (
545
http://microarray.cnmcresearch.org/pga.htm), with cRNA
546
amplifications of between 5- and 13-fold for each of the
547
samples. Ten micrograms of gel-verified fragmented
548
biotinylated cRNA were hybridized to each MuscleChip or
549
U74A v2 array, and scanning done after
550
biotin/avidin/phycoerythrin amplification. Details on the
551
specific patients studied, and details for each GeneChip
552
(scaling factors, number of present calls, percentage
553
difference calls between each duplicate sample, number of
554
difference calls surviving four pair-wise comparisons of
555
duplicate chips) is provided (Table 1). All profiling
556
data presented here is available on our web site (
557
http://microarray.CNMCResearch.org; data link), as image
558
(.dat), absolute analysis (.chp), and ASCII text
559
conversions of .chp (.txt) for each individual profile
560
(see http://microarray.cnmcresearch.org/pga.htmfor file
561
descriptions and use).
562
563
564
Bio-informatic methods
565
Absolute analysis (average difference determinations
566
for each probe set) was done using Affymetrix default
567
parameters. As described in the text, data was analyzed
568
using a variety of methods, including unsupervised
569
nearest-neighbor hierarchical clustering analyses
570
(GeneSpring [ 28 29 ] [Silicon Genetics], and Cluster [
571
25 ] [Stanford University]), t-test (GeneSpring) and
572
four-comparison survival method [ 23 ] . The Cluster and
573
Tree View software were download from
574
http://rana.lbl.govand installed on an NT
575
workstation.
576
577
578
579
580
581