Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Download
29547 views
1
2
3
4
5
Background
6
Traditionally, techniques for the study of gene
7
expression were significantly limited in both breadth and
8
efficiency since these studies typically allowed
9
investigators to study only one or a few genes at a time.
10
However, the recently developed DNA microarray technique is
11
a powerful method that provides researchers with the
12
opportunity to analyze the expression patterns of tens of
13
thousands of genes in a short time [ 1 ] . Presently,
14
several vendors offer these microarray systems, also known
15
as chips, with a variety of technologies available.
16
Currently, DNA microarrays are manufactured using either
17
cDNA or oligonucleotides as gene probes. cDNA microarrays
18
are created by spotting amplified cDNA fragments in a high
19
density pattern onto a solid substrate such as a glass
20
slide [ 1 2 ] . Oligonucleotide arrays are either spotted
21
or constructed by chemically synthesizing approximately
22
25-mer oligonucleotide probes directly onto a glass or
23
silicon surface using photolithographic technology [ 3 ]
24
.
25
Due to the powerful nature of microarrays, the number of
26
relevant publications in this burgeoning field is
27
increasing exponentially. During the years 1995-1997, the
28
number of reports featuring microarray data was less than
29
ten. However, in 2001 alone approximately 800 publications
30
featured data generated by microarray studies (according to
31
a PubMed search).
32
Microarray technology certainly has the potential to
33
greatly enhance our knowledge about gene expression, but
34
there are drawbacks that need to be considered. As Knight [
35
4 ] cautioned, it is possible that errors could be
36
incorporated during the manufacture of the chips.
37
Consequently, the fidelity of the DNA fragments immobilized
38
to the microarray surface may be compromised. However,
39
there are few studies where the majority of the gene
40
sequences spotted on the microarrays were verified [ 5 ] .
41
Kuo
42
et al (2002) compared the data from
43
two high-throughput DNA microarray technologies, cDNA
44
microarray (Stanford type) and oligonucleotide microarray
45
(from Affymetrix) and found very little correlation between
46
these two platforms [ 6 ] . Unfortunately, many
47
investigators are reporting microarray data without
48
confirming their results by other traditional gene
49
expression techniques such as PCR, Northern blot analysis
50
and RNase protection assay. Raw microarray data obtained
51
from questionable nucleotide sequences are then often
52
manipulated using cluster and statistical analysis software
53
and subsequently reported in scientific journals. In
54
addition the quality of the probe sequences and the
55
location of the probes selected for incorporation into the
56
array are also very important. For example, if probes are
57
selected only from the 3' end of a given gene, then there
58
is a strong possibility that different splice variants of
59
that gene will not be identified if the alternative
60
splicing occurs at the 5' region of the gene.
61
The development of a single chip containing the complete
62
gene set for a given tissue or for a complex organism
63
(30,000 to 60,000 genes) is likely in the near future, so
64
it is paramount that chip manufacturers avoid these
65
problems [ 7 ] . In this report, we demonstrate that
66
microarray technology continues to be a dynamic and
67
developing process and highlight potential pitfalls that
68
must be addressed when interpreting data.
69
70
71
Results
72
73
Inconsistent sequence fidelity of spotted cDNA
74
microarrays
75
cDNA microarray analysis was performed using the
76
UniGEM-V chip (IncyteGenomics, Palo Alto, CA) with mRNA
77
isolated from peripheral blood mononuclear cells (PBMC)
78
of a large granular lymphocyte leukemia patient and a
79
healthy control. In this microarray, 7075 immobilized
80
cDNA fragments (4107 from known genes and 2968 ESTs) were
81
immobilized onto a glass slide. After careful examination
82
of the microarray probes, it was determined that the
83
majority of the spotted cDNA fragments were from the 3'
84
end of the genes. Approximately 80 up-regulated and 12
85
down-regulated genes were identified in leukemic LGL. We
86
then purchased seventeen clones from IncyteGenomics
87
containing cDNA fragments that represent fourteen of the
88
up-regulated and three of the down-regulated genes.
89
Plasmid DNA was isolated from the clones and the
90
sequences were verified. Unfortunately, we found several
91
problems with the insert DNA sequences in these clones.
92
Four of the seventeen c DNA fragments spotted on the
93
microarray contained incorrect sequences (23.5%) (Table
94
1).
95
96
97
Variable reliability of differential expression
98
data
99
The cDNA fragments corresponding to differentially
100
expressed genes spotted on the microarrays were excised
101
from the plasmid DNA and used as probes in Northern
102
blots. Out of the seventeen only eight provided positive
103
results as indicated by microarray (47%). Although all
104
the sequences for the down-regulated genes were correct,
105
Northern blot analysis with these probes did not show any
106
differential expression of the genes. This is in contrast
107
to the microarray data that suggested they were down
108
regulated (Table 1).
109
110
111
Low specificity of cDNA microarray probes
112
By microarray analysis, it is very difficult to
113
distinguish between two genes that share a high degree of
114
sequence similarity. Low specificity of probes is also a
115
frequently encountered problem in oligonucleotide arrays.
116
This problem is especially prevalent in instances where
117
DNA sequences are nearly identical between two genes and
118
the oligonucleotide probes are generated from the 3' end
119
of the genes. For example, the 1.2 kb fragment (GB
120
Accession No. M 57888) spotted on the cDNA microarray as
121
granzyme B was not able to distinguish between
122
granzyme B and
123
H (Fig. 1a). The balanced
124
differential expression of 6.3 was calculated. A probe
125
set was generated by Affymetrix using the similar
126
sequence information (GB Accession No.M28879) and
127
according to oligonucleotide array,
128
granzyme B was shown to be
129
up-regulated (fold change 21.5: Fig. 1b). Northern blot
130
analysis (using the same fragment as probe) did not
131
discriminate between the genes for
132
granzyme B versus
133
granzyme H (Fig. 1c). However, by
134
using gene-specific probes in an RNase protection assay,
135
we were able to demonstrate the over-expression of
136
granzyme B and
137
granzyme H separately in leukemic
138
LGL cells (Fig. 1dand 1e).
139
140
141
Discrepancy in fold change calculation for a given
142
gene
143
It is very difficult to compare the exact fold change
144
between two microarray techniques, and no standard value
145
system is currently in place to compare the changes found
146
in one microarray to the next. This fact was clearly
147
demonstrated by Kuo et al (2002) in their recent
148
publication [ 6 ] . In this paper we compared the fold
149
change (Affymetrix) and balanced differential expression
150
(cDNA) with Northern blot expression. For example, our
151
IncyteGenomics cDNA microarray data demonstrated only a
152
3.8 differential expression in the expression of
153
perforin (Fig. 2a), a pore-forming
154
protein produced by cytolytic lymphocytes [ 8 ] in
155
leukemic LGL cells, whereas the oligonucleotide
156
microarray indicated a 103 fold increase (Fig. 2b). Using
157
a probe identical to the one spotted on the cDNA
158
microarray, we performed a Northern blot analysis. The
159
blot demonstrated the up-regulation of the
160
perforin transcript in leukemia LGL
161
cells (Fig. 2c), but the fold increase was neither 103 as
162
indicated by oligonucleotide array nor 3.8 as determined
163
by the cDNA microarray data. Instead, the actual value
164
was determined to fall between these two extreme values.
165
These observations strongly suggest that results for
166
significantly altered genes should be confirmed with
167
other traditional techniques such as Northern blots or
168
RNase protection assays prior to reporting the fold
169
increase.
170
171
172
Lack of probe specificity for gene isoforms
173
One of the genes spotted on the cDNA microarray that
174
we are interested in is (
175
Phosphatase in Activated Cells )
176
PAC-1 [ 9 ] . The differential
177
expression of
178
PAC-1 by both cDNA microarray
179
(differential expression 4.2) and oligonucleotide arrays
180
(fold change 1.6) is shown in Figures 3aand 3b. Using a
181
cDNA fragment identical to the
182
PAC-1 probe on the cDNA microarray,
183
we performed a Northern blot analysis and confirmed the
184
over-expression of two transcripts in leukemic LGL cells
185
(Fig. 3c). RT-PCR was performed using total RNA from
186
leukemic LGL and specific probes designed to amplify
187
full-length PAC-1. We did not see amplification of any
188
product. In addition, we found no PAC-1 expression using
189
two different monoclonal anti-PAC-1 antibodies in Western
190
blot analysis (data not shown). The monoclonal antibodies
191
obtained from Santa Cruz were based on the amino acid
192
sequence information obtained from the N-terminus and
193
C-terminus of the PAC-1. The results of all the
194
experiments did not confirm the over-expression of
195
PAC-1 . Therefore, to obtain more
196
information about the structure of the
197
PAC-1 related genes in leukemic
198
LGL, we screened an LGL leukemia cDNA library using a 1.2
199
kb
200
PAC-1 cDNA fragment and identified
201
similar genes which are different forms of
202
PAC-1 (GenBank Accession #AF331843,
203
the other sequence is not deposited). Similarly an
204
anti-apoptotic gene
205
A20 is also over-expressed in
206
leukemic LGL, but protein expression was absent when
207
Western blots were performed with monoclonal antibodies
208
raised against the amino acid sequence derived from A20
209
(data not shown).
210
Likewise, another gene of interest,
211
NKG2 C , showed a balanced
212
differential expression of 5.5 (Fig. 4a). By using a
213
probe derived from an
214
NKG2 C clone, we identified a
215
number of transcripts by Northern blot analysis (Fig.
216
4b). In order to ascertain more structural information,
217
we again screened the LGL leukemia library and identified
218
the presence of several members of the
219
NKG2 gene family including
220
NKG2 A, NKG2 D, NKG2 E and
221
NKG2 F (GB Accession Nos. AF461812,
222
AF461811, AF461157) [ 10 ] . Therefore, if genes similar
223
to
224
NKG2 family members are spotted on
225
a microarray, it may be difficult to confirm which form
226
of the gene is differentially expressed in a given
227
sample.
228
229
230
Mismatch probe sets mask the perfect match signals
231
in oligonucleotide array (Affymetrix)
232
In order to accomplish the highest sensitivity and
233
specificity in the presence of a complex background,
234
Affymetrix introduced a system that entails the use of a
235
series of specific and non-specific gene probe sets that
236
are intended to result in a more accurate discrimination
237
between true signal and random hybridization. Each probe
238
set consists of a pair of 25-mer probes, one that
239
represents a perfect match (PM)to the mRNA of interest,
240
and a second probe differing by only one nucleotide, the
241
mismatch (MM). The mismatch in the middle position
242
theoretically provides maximal disruption of
243
hybridization. Unfortunately, the use of the mismatch
244
probe information can interfere with fold change
245
calculations of gene expression. For example,
246
perforin transcripts showed strong
247
hybridization to both PM and MM probe sets. As a
248
consequence, the strong MM signal masked the PM signal
249
resulting in a low expression readout, even though the
250
gene was present in normal PBMC (Fig. 2b). Therefore, the
251
subsequently calculated fold increase from the test
252
sample was extraordinarily high and deemed unreliable.
253
Similarly, the fold change calculation was underestimated
254
for
255
PAC-1 due to the strong signal
256
displayed for MM probe set (Fig. 3b). Genes such
257
as human auto-antigen (GenBank
258
Accession #L26339) and
259
carboxyl ester lipase-like
260
protein (GenBank Accession #L14813), are additional
261
examples where these genes are present in LGL sample, but
262
because of the strong signals associated with some of the
263
MM probes, they are considered absent in the samples
264
(Fig. 5aand 5b).
265
266
267
268
Discussion
269
In order to identify the differentially expressed genes
270
in large granular lymphocytic (LGL) leukemia, we performed
271
microarray analysis using the UniGEM-V microarray from
272
IncyteGenomics and the HU6800 oligonucleotide array from
273
Affymetrix. In the course of our analysis, we discovered
274
several problems that we feel could occur in other studies
275
that might lead to false conclusions.
276
Approximately 80 up-regulated genes and 12
277
down-regulated genes were identified by cDNA microarray
278
analysis in leukemic LGL cells. Since microarray technology
279
was a new tool at that time, we decided to verify the
280
sequences of all the genes that were differentially
281
expressed. To that end, we purchased approximately 20
282
clones representing the differentially expressed genes and
283
verified the sequences. We found that only approximately
284
70% of the genes spotted on the microarray matched the
285
correct sequence of the clones. Other groups reported
286
similar observations. For example, IMAGE mouse cDNA clones
287
(approximately 1200) were purchased from Research Genetics
288
(Huntsville, Alabama) and sequences were verified by
289
Halgren
290
et al [ 11 ] . This group found that
291
only 62% were definitely identified as a pure sample of the
292
correct clones. In another study, PCR amplification
293
products (previously sequence-verified cDNA clones) were
294
re-sequenced and only 79% of the clones matched the
295
original database [ 12 ] . In a different study, it was
296
estimated that only 80% of the genes in a set of microarray
297
experiments were correctly identified [ 5 ] . Therefore, we
298
advise that when preparing cDNA microarrays (commercial or
299
homemade), it is necessary to sequence verify each clone at
300
the final stage before printing the microarray. If mistakes
301
are made at this stage, it is not possible to correct them
302
later by using the most sophisticated analytical tools.
303
We used cDNA microarray analysis to compare the gene
304
expression profile of leukemic LGL cells obtained from a
305
patient versus the expression profile of PBMC obtained from
306
a normal healthy individual as a control. We decided to
307
verify the microarray results using samples from more
308
patients by employing the use of other methods such as PCR,
309
Northern blot and RNase protection assay. To our surprise,
310
none of the three down-regulated genes studied exhibited
311
differential expression in Northern blots when the cDNA
312
fragments of these genes were used as probes. In the
313
up-regulated genes, only 47 % proved to support the results
314
from the microarray data. The rest either displayed no
315
signal, were not detectable in any sample or failed to
316
reveal any differential expression whatsoever. Although
317
some genes such as
318
PAC-1 and
319
A20 showed differential expression in
320
LGL leukemia patients, no product amplification was
321
obtained using RT-PCR with gene-specific primers.
322
By microarray analysis, it is very difficult to
323
distinguish between two similar genes. The best example in
324
our case is when
325
granzyme B and
326
granzyme H are compared. These two
327
genes share approximately 80% similarity at the DNA level
328
but have different enzymatic activities [ 13 14 ] . Using
329
either one of the genes as a probe, both cDNA microarray
330
and northern blot analysis indicated over-expression of
331
both genes indiscriminately (Fig. 1). However, using
332
gene-specific probes in an RNase protection assay, we were
333
able to distinctly identify the over-expression of both
334
granzyme B and
335
H in leukemic LGL cells (Fig. 1dand
336
1e). In normal PBMC only trace amounts of both genes were
337
identified, but after activation by PHA and IL2 only
338
granzyme B was up-regulated. It is
339
very difficult to get this information by microarray
340
analysis alone. Therefore, caution in presenting microarray
341
data without verification and confirmation is advised.
342
When the results from two different microarray
343
technologies (cDNA and oligonucleotide arrays) were
344
compared, the differential expression in some of the genes
345
appeared to agree in both cases but a large variation in
346
expression profiles between the two microarrays was clearly
347
evident. Previously, such systematic differences in the two
348
technologies were reported [ 6 ] . For example,
349
perforin showed a 103-fold change in
350
the Affymetrix array, whereas the cDNA microarray showed
351
only a balanced differential expression of 3.8-fold.
352
Northern blot results indicate that the genes were
353
over-expressed, but the actual value is in between the
354
values from the two microarrays. This problem may be due to
355
an inaccurate fold change calculation due to the inclusion
356
of mismatch values in the formula. We observed that many
357
over-expressed genes were not properly identified at times.
358
This may be the result of the introduction of mismatch
359
values in the Affymetrix system. For example, genes for
360
human autoantigen and
361
human carboxyl ester lipase-like
362
protein would be considered up-regulated in the
363
microarray (according to PM match hybridization) if the MM
364
hybridization values were ignored in the fold change
365
calculation.
366
DNA microarray anlysis can be a powerful technique to
367
identify differentially expressed genes but differentiating
368
between splice variants can be problematic. For example,
369
although the differential expression of the several genes
370
such as
371
PAC-1 and
372
A20 were confirmed by northern blot
373
analysis, we were unable to see any expression of protein
374
corresponding to these genes by Western blot analysis. We
375
were also unable to amplify those genes using gene-specific
376
primers by RT-PCR. After screening the LGL library, we
377
obtained several full-length genes that were different from
378
both the 5' and 3' ends of
379
PAC1 . Similarly, we screened an LGL
380
leukemia library and obtained several 1.5 kb cDNA fragments
381
using the
382
A20 cDNA as a probe. The deduced
383
amino acid sequences of these genes revealed different
384
proteins.
385
We found an up-regulation of
386
NKG2C with a balanced differential
387
expression of 5.8 in cDNA microarray (Fig. 4a). When
388
Northern Blot analysis was performed using
389
NKG2 C cDNA as a probe, we identified
390
multiple transcripts. Screening the LGL leukemia library
391
resulted in the identification of several other members of
392
the
393
NKG2 family such as
394
NKG2 A, D, E , and
395
F [ 10 ] . Therefore, it can be very
396
difficult to distinguish different forms of genes if they
397
are similar in certain sequence regions.
398
399
400
Conclusions
401
At the time of writing this report there were
402
approximately 1150 articles published describing microarray
403
results (PubMed). There is no doubt that these results will
404
provide an overall idea of gene expression and contribute
405
to understanding the molecular mechanisms involved in
406
various processes. However, as demonstrated by our
407
findings, the development of a standardized microarray
408
system is needed to obtain more meaningful data from these
409
experiments. The introduction of more uniform systems
410
combined with the consideration of the above described
411
pitfalls and alternatives will allow better utilization of
412
this powerful technique in an expanding collection of
413
scientific endeavors. It will be very helpful for the
414
scientific community if the verified data is deposited in a
415
public data base.
416
417
418
Methods
419
420
Isolation of PBMC and RNA
421
PBMC were isolated from whole blood using
422
Ficoll-Hypaque density gradient centrifugation. These
423
cells were suspended in Trizol reagent (GIBCO-BRL,
424
Rockville, MD) and total RNA was isolated immediately
425
according to the manufacturer's instructions. Poly A+ RNA
426
was isolated from total RNA by using Oligo-Tex mini mRNA
427
kit (Qiagen, Valencia, CA) according to the
428
manufacturer's recommendations.
429
430
431
Activation of PBMC
432
Normal PBMC were cultured
433
in vitro and activated by PHA,
434
(Sigma Chemical Co. St. Louis, MO) (1 μg/ml, 2 days) and
435
Interleukin-2 (IL-2) (100 U/ml, 10 days), then total RNA
436
was isolated.
437
438
439
cDNA microarray analysis
440
Microarray probing and analysis was performed by
441
IncyteGenomics. Briefly, one μg of Poly (A) + RNA
442
isolated from PBMC of an LGL leukemia patient and healthy
443
individual was reverse transcribed to generate Cy3 and
444
Cy5 fluorescent labeled cDNA probes. cDNA probes were
445
competitively hybridized to a human UniGEM-V cDNA
446
microarray containing approximately 7075 immobilized cDNA
447
fragments (4107 for known genes and 2968 for ESTs).
448
Microarrays were scanned in both Cy3 and Cy5 channels
449
with an Axon GenePix scanner (Foster City, CA) with a 10
450
μm resolution. P1 and P2 signals are the intensity
451
reading obtained by the scanner for Cy3 and Cy5 channels.
452
The balanced differential expression was calculated using
453
the ratio between the P1 signal (intensity reading for
454
probe 1) and the balanced P2 signal (intensity reading
455
for probe 2 adjusted using the balanced coefficient)
456
Incyte GEMtools software (Incyte Pharmaceuticals,
457
Inc., Palo Alto, CA) was used for image analysis. A
458
gridding and region detection algorithm determined the
459
elements. The area surrounding each element image was
460
used to calculate a local background and was subtracted
461
from the total element signal. Background subtracted
462
element signals were used to calculate Cy3:Cy5 ratio. The
463
average of the resulting total Cy3 and Cy5 signal gave a
464
ratio that was used to balance or normalize the
465
signals.
466
467
468
Oligonucleotide microarray analysis
469
The HU 6800 microarray was obtained from Affymetrix
470
(Santa Clara, CA). Briefly, total RNA isolated from
471
normal PBMC and leukemic LGL were DNase-treated and
472
purified with a Qiagen kit (Valencia, CA). Approximately
473
10 μg of purified RNA was used to prepare double-stranded
474
cDNA (Supercript GIBCO/BRL, Rockville, MD) using a T7
475
(dT)24 primer containing a T7 RNA polymerase promoter
476
binding site. Biotinylated complementary RNA was prepared
477
from 10 μg of cDNA and then fragmented to approximately
478
50 to 100 nucleotides.
479
In vitro transcribed transcripts
480
were hybridized to the HU 6800 microarray for 16 h at
481
45°C with constant rotation at 60 rpm. Chips were washed
482
and stained by using the Affymetrix fluidics station.
483
Fluorescence intensity was measured for each chip and
484
normalized to the fluorescence intensity for the entire
485
chip.
486
487
488
Verification of the clones
489
GEM cDNA clones (supplied as a bacterial stab) were
490
purchased from IncyteGenomics and streaked on LB agar
491
plates containing the appropriate antibiotic. Individual
492
colonies were picked and grown in LB medium. Plasmid DNA
493
was isolated and sequenced in order to verify the
494
sequence identity.
495
496
497
Northern blot analysis
498
Northern Blotting was performed as described. Briefly
499
10 μg of total RNA from each sample was denatured at 65°C
500
in RNA loading buffer, electrophoresed in a 1% agarose
501
gel containing 2.2 M formaldehyde, then blotted onto a
502
Nytran membrane (Schleicher & Schuell, Inc, Keene,
503
N.H). The RNA was fixed to the membrane by UV
504
cross-linking. cDNA was labeled with [ 32P] and purified
505
using Nick columns (Amersham Pharmacia Biotech AB,
506
Piscataway, NJ). Hybridization and washing of the blots
507
were performed as described by Engler-Blum et al [ 15 ]
508
.
509
510
511
RNase protection assay (RPA)
512
RPAs were performed using the RNA isolated from
513
leukemic LGL, normal PBMC and normal PBMC activated by
514
IL-2 and PHA. Five μg of total RNA was hybridized to the
515
in vitro transcribed hAPO-4 probe
516
set (PharMingen, SanDiego, CA), and the RPA assay was
517
performed according to the manufacturer's protocol. After
518
the assay, the samples were resolved on a 5%
519
polyacrylamide gel. The gel was dried and exposed to
520
X-ray film. After developing the film, the bands were
521
quantitated by using the ImageQuant program and
522
normalized with the housekeeping gene, L32.
523
524
525
Western immunoblot analysis
526
Cells were lysed in a buffer containing 50 mM Tris-HCl
527
(pH 7.6), 5 mM EDTA, 150 mM NaCl, 0.5 % NP-40, and 0.5%
528
Triton X-100 containing 1 μg/ml leupeptin, aprotinin and
529
antipain; 1 mM sodiumorthovanadate; and 0.5 mM PMSF (all
530
reagents were obtained from Sigma Chemical Co.).
531
Twenty-five μg of total protein from each sample was
532
subjected to 10% SDS-PAGE. Then the proteins were
533
transferred to a membrane and Western blotting was
534
performed using the monoclonal antibody for PAC-1 and
535
A20, followed by the ECL technique as recommended by the
536
manufacturer (Amersham Biosciences, Piscataway, NJ).
537
538
539
540
Authors' contributions
541
RK conceived of the study along with TPL, isolated,
542
purified RNA from the samples for microarray and performed
543
all the experiments to validate the microarray data and
544
analysed the data and drafted the manuscript. SJY verified
545
the microarray data and participated in validation of the
546
microarray. SM performed microarray analysis and analyzed
547
the data and TPL conceived of the study, and participated
548
in its design and coordination.
549
550
551
552
553