Background
Expression profiling is an emerging experimental method
whereby RNA accumulation in cells and tissues can be
assayed for many thousands of genes simultaneously in a
single experiment. There are two common experimental
platforms for expression profiling; redundant
oligonucleotide arrays (Affymetrix GeneChips) [ 1 ] , and
spotted cDNA microarrays [ 2 3 4 ] . The Affymetrix
GeneChips have the inherent advantages of redundancy,
specificity, and transportability; there are typically
30-40 oligonucleotide probes (features) designed against
each gene tested by the array, with paired perfect-match
and mismatch probes, with standardized factory synthesis of
arrays [ 5 6 ] . The uniform nature of the arrays permits
databasing of individual profiles, which facilitates
comparison of data generated by different laboratories.
Expression profiling has led to dramatic advances in
understanding of yeast biology, where homogeneous cultures
can be grown and exposed to timed environmental variables [
7 8 9 10 11 12 ] . Such studies have led to the rapid
assignment of function to a large number of anonymous gene
sequences. Large-scale expression profiling studies of
tissues from higher vertebrates are more challenging, due
to the higher complexity of the genome, larger related gene
families, and incomplete genomic resources. Nevertheless,
DNA microarrays have been successfully applied in the
analysis of aging and caloric restriction [ 13 ] and
pulmonary fibrosis [ 14 ] . And many publications,
particularly on cancer, have appeared [ 14 15 16 17 18 19 ]
. Affymetrix has recently announced the availability of the
U133 GeneChip series with 33,000 well-characterized human
genes mined from genomic sequence. The nearly complete
ascertainment of genes in the human genome should make
expression-profiling studies of human tissues particularly
powerful. However, identification of the sources of
experimental variability, and knowledge of the relative
contribution of variation from each source, is critical for
appropriate experimental design in expression profiling
experiments.
Mills and Gordon recently studied the relative
contribution of experimental variability of probe
production on the reproducibility of microarray results
using mixed murine tissue RNA on Affymetrix Mu11K GeneChips
[ 20 ] . In their study, the same RNA preparation was used
as a template for distinct cDNA/cRNA amplifications and
hybridizations. An additional variable studied was the
effect of different laboratories processing the same RNAs.
The authors found relatively poor concordance between
duplicate arrays, with an average of 12% increase/decrease
calls between the same RNA processed in parallel and
hybridized to two Mu11K-A microarrays. The authors
concluded that there was substantial experimental
variability in the experimental procedure, necessitating
extensive filtering and large numbers of arrays to detect
accurate gene expression changes (LUT: look-up tables) [ 20
] . In our laboratory, we have processed over 1,200
Affymetrix arrays, and have found significantly higher
experimental reproducibility (R 2= 0.979 for new generation
U74A version 2 murine arrays or human U95 series, see
Result and Discussion). In addition, a recent publication
of a single human patient, where RNA was prepared from two
distinct breast tumors, and placed on duplicate U95A
GeneChips (four chips total) found a very low degree of
experimental variability between microarrays (R 2= 0.995),
and between the two tumors (R 2= 0.987) [ 21 ] . The marked
differences in experimental variability between
laboratories could be due to different quality control
protocols (see http://microarray.cnmcresearch.org), newer
more robust Affymetrix arrays now available (murine Mu11K
versus U74A version 2 and new generation human U95 series),
use of more recent algorithms for data interpretation, or
due to more consistent processing of RNA, cDNA, and cRNA in
the same laboratory.
The previous studies did not systematically address the
reproducibility of GeneChip hybridization (e.g. the same
biotinylated cRNA on two different microarrays). In
addition to lingering questions concerning variability due
to specific experimental procedures, there are other
possible sources of variability that have not yet been
investigated, specifically tissue heterogeneity and
inter-individual variation. The latter two sources of
variability are particularly important in human expression
profiling studies. The study of human tissues often
involves the use of tissue biopsies, where a relatively
limited region of an organ is sampled. Tissue heterogeneity
and sampling error might be expected to introduce
significant variability in expression profiles. Second,
tissues may derive from individuals from different ethnic
backgrounds; humans are highly outbred, leading to the
potential of significant polymorphic noise (herein called
"SNP noise") between individuals unrelated to the disease
or variable under study. SNP noise also exists between
different inbred mouse strains, and some experiments have
normalized this effect by breeding the same mutation on
different strains, and profiling each individually [ 22 ] .
Knowledge of the relative effect of each experimental,
tissue, and patient variable on expression profiling
results in humans is important, so that appropriate
experimental designs can be employed.
We recently reported the design and production of a
highly redundant oligonucleotide microarray for analysis of
human muscle biopsies (Borup et al.
submitted ). This MuscleChip contains
4,601 probe sets corresponding to 3,369 distinct genes and
ESTs expressed in human muscle. Each probe set contains
between 16 to 40 oligonucleotides, such that the number of
specific oligonucleotide probes on the array was
138,000.
Here, we utilize this MuscleChip to investigate the
relative significance of variables affecting expression
profiling data and interpretation. Specifically, we studied
the correlation coefficients of profiles considering the
following variables: 1. variation due to probe production
(same RNA); 2. variation due to the microarray itself (same
cRNA on different GeneChips); 3. tissue heterogeneity
(different regions of the same muscle biopsy); 4.
inter-patient variability (SNP noise); 5. diagnosis
(underlying pathological variable); and 6. patient age.
We have recently reported generation of expression
profiling results using mixed patient samples [ 23 ] . Our
hypothesis was that mixing of RNA samples from multiple
regions of muscle biopsies, and from multiple patients
matched for most variables (disease, age, sex), would
effectively normalize both intra-patient variability
(tissue heterogeneity), and inter-patient variability (SNP
noise; e.g. normal human polymorphic variation unrelated to
the primary defect). Here, we test this hypothesis
directly, and show that sample mixing does indeed result in
relatively high sensitivity and specificity for gene
expression changes that would be detected by many
individual expression profiles. Thus, sample mixing appears
to be an appropriate first-pass method to obtain the most
significant expression changes, while using small numbers
of arrays.
Results and discussion
Fifty six (56) different RNA samples were prepared from
different regions of muscle biopsies from 28 individuals
(15 Duchenne muscular dystrophy (DMD) patients, 13 normal
controls). The profiles of five of the DMD patients and the
five controls have been previously reported using the
Affymetrix HuFL microarray [ 23 ] ; however, we re-tested
these same samples on the custom MuscleChip (Borup et al.
submitted ) for comparison to the
other patients here. All RNAs were converted to
double-stranded cDNA, and then to biotinylated cRNA. The
cRNAs were then hybridized to the MuscleChip either singly,
in mixed groups, or both, as described below. In total, 34
hybridizations were performed, scanned, and the data
statistically analyzed using Affymetrix Microarray Suite
and Excel. Quality control criteria were as described on
our web site ( http://microarray.cnmcresearch.org, link to
"programs in genomic applications"), and included
sufficient cRNA amplification, and adequate
post-hybridization scaling factors. Scaling factors
(normalization needed to reach a common target intensity)
ranged from 0.46 to 3.28 (Table 1). All raw image files,
processed image files, and difference analyses are posted
on a web-queried SQL database interface to our Affymetrix
LIMS Oracle warehouse (see
http://microarray.cnmcresearch.org: link to "programs in
genomic applications", "data", "human").
Among the 4,601 probe sets on the Affymetrix custom
muscle microarray, we found a consistent percentage of
"present" calls for each of the 34 cRNA samples tested
(Duchenne dystrophy, 28 arrays, 48.2% ± 6.1%; controls 6
arrays, 53.3% ± 1.4%). To test for inter-array variability,
two different hybridization solutions were applied to
duplicate arrays, and correlation coefficients determined.
A high correlation coefficient was found in this analysis,
suggesting that inter-array variability of the MuscleChip
used was a relatively minor variable (Patient 3 a and
3a-duplicate R 2= 0.96 and percent shared [No Change (NC)]
calls by Microarray Suite software was 99%; Patient 3b and
3b-duplicate R 2= 0.98 and percent NC was 98%; Table 1).
The high reproducibility of Affymetrix array results is
consistent with other data in our laboratory, and from
previously published data [ 6 21 23 24 ] , and shows that
experimental variability associated with hybridization and
scanning of highly redundant oligonucleotide GeneChips is
not a major source of experimental variability.
Given the previous report suggesting that the conversion
of RNA to biotinylated cRNA probe was a major source of
variability in murine array experiments [ 20 ] , we tested
a series of murine RNA from different sources, using the
newer generation U74Av2 GeneChips. One series of samples
was from murine spleens, where spleens from multiple
animals for each variable under study were mixed, RNA
isolated, RNA samples split, and duplicate cDNA, cRNA, and
hybridizations processed in parallel for each RNA (Fig. 1,
"KNagaraju" samples). We also compared RNAs processed from
parallel murine myogenic cell cultures (Fig. 1, "VSM"
samples), where each profile was from a different cell
culture. Finally, we used a series of murine muscle tissues
from normal and dystrophin-deficient mice, where each
profile was from a different series of complete
gastrocnemius muscles (Fig. 1, "FBooth" samples). The data
from these 42 murine U74Av2 profiles were then analyzed by
unsupervised clustering [ 25 ] to determine which profiles
were most closely related to each other (Fig. 1). This
analysis shows that the different sources of RNA cluster
together, as expected. Importantly, the same RNA used as a
template for two distinct cDNA/cRNA preparations and
hybridizations showed a high correlation coefficient (R 2=
0.99 for five of the six samples, with average R 2= 0.978)
(Fig. 1). The large muscle group profiles (FBooth samples)
showed excellent correlation, both with respect to
diagnosis; however here there was no sampling error as the
entire muscle group was used rather than isolated biopsies.
Finally, the parallel tissue culture experiments (VSM
samples) showed greater variability between duplicates,
suggesting that tissue culture conditions may be more
subject to variability than
in vivo tissues (Fig. 1). This murine
data shows that variability from different cDNA-cRNA
reactions is very low (R 2= 0.978).
To analyze the impact of intra-patient variability
(tissue heterogeneity), inter-patient variability
(polymorphic noise in outbred populations), and the effect
of sample mixing on the sensitivity of detection of gene
expression differences between patient groups, we conducted
a series of individual and mixed profiling (Table 1).
Muscle biopsies from five 4-6 yr old DMD patients, and five
10-12 yr old patients were selected, each biopsy split into
two parts, and RNA isolated independently from each of the
20 biopsy fragments. For these ten DMD patients, the two
different regions of the same biopsy were expression
profiled both individually (20 profiles), and also mixed
into four pools where each pool originated from distinct
RNA samples (Table 1). The resulting profiles were also
compared to previously reported mixed 6-9 yr old DMD
patient cRNAs, and mixed 6-9 yr old control cRNAs [ 23 ] ,
as mentioned above.
As an initial statistical analysis, we used Affymetrix
software to define genes that showed expression changes
(Increased, Decreased or Marginal) in expression levels
between pairs of profiles (difference analyses). This
method of data interpretation showed that some muscle
biopsies showed very little variance between different
regions of the same biopsy, while other patient biopsies
showed considerable variability (see Fig. 2for
representative scatter graphs). Expressing this variance as
a percentage of "Diff Calls" between the two regions of the
same biopsy, as determined by Affymetrix default
algorithms, we found considerable variability in the
similarity of profiles, with values ranging from 1.5% to
18% of the 4,601 probe sets studied (4.99% ± 4.94%). This
data suggests that tissue heterogeneity (intra-patient
variability) can be a major source of variation in
expression profiling experiments, even when using
relatively large pieces (50 mg) of relatively homogeneous
tissues (such as muscle).
The most common strategy for interpreting Affymetrix
microarray data is to use two profile comparisons, with an
arbitrary threshold for "significant fold-change" in
expression levels. Typically, multiple arrays are compared,
with those gene expression changes showing the most
consistent fold changes prioritized, although other methods
have been reported [ 13 22 26 27 ] . To study inter-patient
variability, we defined the gene expression changes
surviving four pairwise comparisons with mixed control
samples, as we have previously described [ 23 ] . Briefly,
four comparisons were done by Affymetrix software (eg. DMD
1a versus control 1a; DMD 1a versus control 1b; DMD1b
versus control 1a; DMD1b versus control 1b). The four data
sets were then compared, with only those gene expression
changes that showed >2-fold change in all four
comparisons (four comparison survival method). The number
of surviving diff calls by this method ranged from 250 to
463 (355 ± 80) (Table 1). Interestingly, those patients
showing considerable variation between different regions of
the same biopsy did not show a corresponding decrease in
the number of gene expression changes surviving the
iterative comparisons to controls (Table 1). This suggests
(but does not prove) the most significant changes might be
shared, independent of tissue variability (see below).
A different statistical method to determine the effect
of the different variables under study is to perform
hierarchical cluster analysis using nearest neighbor
statistical methods [ 25 ] . Here, we subjected all
profiles to unsupervised cluster analysis, as a means of
determining which variables had the greatest effect (e.g.
intra-patient variability [different regions of biopsy],
versus diagnosis [DMD vs control], versus inter-patient
variability [DMD patients in same age group], versus age of
patient). For this analysis, we used the fluorescence
intensity of each probe set (Average difference), after
data scrubbing to remove genes that showed expression
levels near background ("Absent" Calls) for all profiles
(Fig. 3). This analysis shows that duplicate profiles of
the same cRNA hybridization solution are the most highly
related (Patient 3 a and duplicate (3a-d); 3b and duplicate
(3b-d)), consistent with the high correlation found by the
comparisons using Affymetrix Microarray Suite software
described above. Again, this reflects the low amount of
combined experimental variability intrinsic to the
laboratory processing of RNA, cDNA, cRNA and
hybridization.
When comparing two different regions of the same biopsy
[intra-patient variability], we found widely varying
results, depending on the patient studied (Fig. 3). For
example, some individual patients showed very closely
related profiles that approached the similarity of
duplicate arrays on the same cRNA (Fig. 2; profiles 6a, 6b;
10a, 10b). On the other hand, some patients showed very
distantly related profiles for two regions of the same
biopsy (Fig. 3; profiles 1a, 1b; 4a, 4b; 9a, 9b).
Importantly, the variation caused by intra-patient tissue
variation often overshadowed all other variables. For
example, a profile from DMD patient 9 (9a) clustered with
the normal controls, rather than with the other DMD
patients (Fig. 3). The histopathology of this patient was
noted as being unusually variable in severity prior to
expression profiling. Also, unsupervised clustering was
unable to group patients of similar ages, despite DMD
showing a progressive clinical course. We conclude that
intra-patient tissue heterogeneity is a major source of
experimental variability in expression profiling, and must
be considered in experimental design.
The above findings suggested that both intra-patient
variability (tissue heterogeneity) and inter-patient
variability (polymorphic noise) had major effects on the
expression profiles. One method to control for these
sources of noise is to analyze large numbers of profiles,
both on multiple patients, and on multiple regions of
tissue from each patient. This would allow determinations
of p values and statistical significance for a single
controlled variable under study (e.g. DMD vs controls). An
alternative method is to experimentally normalize these
variables through mixing of samples from patient groups;
such mixing would be expected to average out both intra-
and inter-patient variation. The expectation is that the
most significant and dramatic gene expression changes would
still be identified, while using many less profiles (and
thus a substantial reduction in cost of the analyses).
To test for the relative sensitivity of interpretation
of sample mixing versus individual profiles, we mixed
together the 10 cRNAs for the two different age groups of
DMD patients (samples 1a - 5b; samples 6a - 10b). For this
analysis, we also generated expression profiles for two
additional groups of control individuals. One was a second
set of five normal male biopsies ages 5-12 yrs (controls
2a, 2b), and the third control set was three normal
age-matched female biopsies ages 4-13 yrs (controls 3a, 3b)
(Fig. 4). As with the original male control group (control
1a, 1b), two different regions of each biopsy were
processed independently through the biotinylated cRNA step,
and then equimolar amounts of cRNA mixed for hybridization
to the MuscleChip.
All 34 profiles (both individual and mixed samples) were
again analyzed by unsupervised hierarchical clustering
(Fig. 4) [ 25 ] . As described above, we scrubbed the
profiles to eliminate all genes showing expression levels
consistently at or below background hybridization
intensities by requiring each gene to show a "Present Call"
in one or more of the 34 profiles.
As above, duplicate profiles using the same cRNA
hybridization solution on different arrays, whether mixed
or individual samples, showed very highly correlated
results (very low branch on dendrogram) (Fig. 4; mix 5-6
yrs, mix 10-12 yrs; patient 3a/3a-d; patient 3b/3b-d). As
above, this indicates that experimental variability from
laboratory procedures or different arrays is a relatively
minor factor in interpretation of results. Mixed samples
from different regions of the same biopsies showed the
same, or only slightly more variation (mixed controls c1,
c2, and c3, mixed DMD 6-9 yrs). This showed that sample
mixing does indeed average out tissue heterogeneity
(intra-patient variability), as well as inter-patient
variability. We noted that all of the controls (both male
and female) clustered in the same branch of the dendrogram,
while the four of the six mixed DMD profiles clustered just
one level away from the controls, separately from the other
DMD profiles. This analysis suggests that there is
considerable variability in the progressive tissue
pathology induced by dystrophin deficiency, both within a
patient, and between patients.
To test the sensitivity and specificity of sample mixing
versus individual profiling, we defined differentially
expressed genes using a two group t-test (GeneSpring [ 28
29 ] ), comparing all 6 mixed control profiles and the 10
individual 5-6 yr old DMD profiles. Genes were retained
that met specific p value thresholds between the two sets
of profiles. In parallel, we compared the two corresponding
mixed 5-6 yr old DMD profiles to the same 6 mixed control
profiles.
Comparison of 10 individual 5-6 yr Duchenne dystrophy
profiles to 6 mixed controls revealed 1,498 genes showing
differential expression with p < 0.05 (Fig. 5).
Comparison of the two mixed Duchenne dystrophy profiles to
the 6 mixed controls showed 1,350 genes with p < 0.05
(Fig. 5A). Comparison of the two gene lists showed that 61%
of differentially regulated genes detected by the 10
individual profiles were also detected by the two mixed
profiles. This suggests that the sensitivity and
specificity of using mixed samples is approximately half
that of individual profiles. However, there was a rapid
shift in specificity and sensitivity as stringency of the
analysis was increased. Raising the statistical threshold
to p < 0.0001 for individual profiles, while keeping the
threshold for mixed profiles at p < 0.05 as required by
the small number of data points (Fig. 5B), resulted in a
sensitivity of 86% for mixed samples (351 of 408 genes p
< 0.0001 detected). In conclusion, mixing detected about
two thirds of statistically significant changes (p <
0.05). Mixing was a relatively sensitive method of
detecting the most highly significant changes (p <
0.0001) (86% of changes detected), however it was not very
specific; as many as one third of gene expression changes
showing p < 0.05 in mixed samples were not confirmed by
individual profiles.
Use of t-test measurements is expected to contain
significant amounts of noise, due to the very large number
of comparisons involved in array studies; a value of p =
0.05 means that as many as 5% of gene expression changes
are expected to be identified by "chance", and thereby not
reflect true differences between samples. We have
previously reported a very simple, yet potentially more
stringent method for data analysis of small numbers of
expression profiles, using duplicate profiles for control
and experimental samples, and then identifying those genes
that show consistent changes >2-fold in the four
possible pair-wise data comparisons (four comparison
survival method) [ 23 ] . A similar pair-wise comparison
method, using a less stringent average fold-change
analysis, was recently reported for muscle from aging and
calorie-restricted mouse muscle [ 13 ] .
To investigate the validity of this approach we compared
the sensitivity and specificity of t-test detection of
expression changes versus the four-pairwise survival
method. Two sample t-test of the 10 individual Duchenne
dystrophy profiles compared to the 6 mixed control profiles
revealed 1,498 genes showing p < 0.05 as above. In
parallel, the mixed DMD duplicate profiles were compared to
a single pair of mixed control sample profiles (c1a, c1b),
using the pairwise comparison survival method [ 23 ] .
Briefly, four comparisons were done (DMD 1a versus control
1a; DMD1b versus control 1a; DMD 1a versus control 1b;
DMD1b versus control 1b), and only those genes retained
which showed >2-fold change in all four comparisons.
This method was indeed considerably more specific in
identifying significant (p < 0.05) gene expression
changes (Fig. 6A) with 85% of gene expression changes in
the mixed profiles verified by individual profiles (p <
0.05). The sensitivity of this method depended on the
p-value threshold for the individual profiles, but only
reached a maximum of 49% sensitivity at p < 0.0001 (Fig.
6B).
The results above suggested that analysis of mixed
samples using t-test methods was relatively sensitive but
non-specific, while analysis of the same mixed profiles by
2-fold survival method was relatively specific but
insensitive. To confirm this conclusion, we directly
compared the sensitivity and specificity of the four
pairwise comparison method to more standard t-test methods
(Fig. 7A). We found that the pairwise survival method was
indeed highly specific, with 97% of changes identified by
this method also detected by t-test. However, as predicted,
it was not very sensitive, with only 30% of the expression
changes with p < 0.05 identified by t-test being
detected by the pairwise survival method. Comparison of all
three analysis methods showed that many (349) genes
expression changes were detected by all three methods (Fig.
7B).
Conclusions
Microarray data analyses have been criticized as being
"quite elusive about measurement reproducibility" [ 30 ] .
This is largely the consequence of the large number of
uncontrolled or unknown variables, and the prohibitive cost
of isolating and investigating each variable. Here, we
report the systematic isolation and study of most variables
in microarray experiments using Affymetrix oligonucleotide
arrays and human tissue biopsies. We found that all sources
of experimental variability were quite minor (microarray R
2= 0.98-0.99; probe synthesis + microarray R 2= 0.98-0.99).
On the other hand, tissue heterogeneity (intra-patient
variation; Average R 2for 10 patients = 0.92 [0.85 to
0.98]), and differences between individual patients (SNP
noise; Average R 2= 0.76 [0.42 to 0.93]) were major sources
of variability in expression profiling. Thus, tissue
heterogeneity and SNP noise have a high potential to
obscure sought after condition-specific gene expression
changes, particularly in humans, where tissue samples can
be limiting (sampling error), and inter-individual
variation often is very large. We have shown that mixing of
patient samples effectively normalizes much of the intra-
and inter-patient noise, while still identifying the
majority of the most significant gene expression changes
that would have been detected by larger numbers of
individual patient profiles. Our results suggest that
stringent yet robust data can be generated by mixing a
small number of individuals with a defined condition (n =
5), preferably using different regions of tissue for
duplicate arrays. Controls should be similarly processed.
The resulting four arrays (2 controls, 2 experimental
datasets) should then be subjected to the >2-fold
survival method, as previously described [ 23 ] . This will
yield a stringent set of expression changes that are likely
to be verified by larger studies with individual arrays,
but at low cost as only four arrays are employed. The
preliminary data from just four mixed profiles (two
experimental and two control) can then be used to generate
functional clusters and pathophysiological models. These
preliminary models can then direct more hypothesis-driven
experiments, or more extensive expression profiling
studies.
Materials and methods
Expression profiling
Human muscle biopsy samples were diagnostic specimens
flash-frozen immediately after surgery in isopentane
cooled in liquid nitrogen, with storage in small,
airtight, humidified tubes at -80°C until RNA isolation.
Duchenne muscular dystrophy patient samples were all
shown to have complete lack of dystrophin by
immunostaining and/or immunoblot analysis, and were shown
to have excellent morphology and preservation of tissue.
Controls included groups of males and female (age
described in text) that showed no histopathological
abnormality, normal dystrophy proteins, and normal serum
creatine kinase levels. Biopsy sizes ranged from 50 mg to
2 grams, with approximately 20-30 mg used for RNA
isolation (~10-15 micrograms of total RNA). As described
in the text, all biopsies had two different regions of
the same biopsy expression profiled separately.
Details concerning the murine profiles will be
published elsewhere. In this report, we used the murine
profiles simply to test the sources of variation during
sample preparation prior to hybridization to
oligonucleotides.
RNA isolation (Trizol, Gibco BRL), RNA purification
(RNAeasy, Qiagen), cDNA synthesis and biotinylated cRNA
were all done as per standard protocols provided by
Affymetrix Inc. Quality control methods are described on
our web site (
http://microarray.cnmcresearch.org/pga.htm), with cRNA
amplifications of between 5- and 13-fold for each of the
samples. Ten micrograms of gel-verified fragmented
biotinylated cRNA were hybridized to each MuscleChip or
U74A v2 array, and scanning done after
biotin/avidin/phycoerythrin amplification. Details on the
specific patients studied, and details for each GeneChip
(scaling factors, number of present calls, percentage
difference calls between each duplicate sample, number of
difference calls surviving four pair-wise comparisons of
duplicate chips) is provided (Table 1). All profiling
data presented here is available on our web site (
http://microarray.CNMCResearch.org; data link), as image
(.dat), absolute analysis (.chp), and ASCII text
conversions of .chp (.txt) for each individual profile
(see http://microarray.cnmcresearch.org/pga.htmfor file
descriptions and use).
Bio-informatic methods
Absolute analysis (average difference determinations
for each probe set) was done using Affymetrix default
parameters. As described in the text, data was analyzed
using a variety of methods, including unsupervised
nearest-neighbor hierarchical clustering analyses
(GeneSpring [ 28 29 ] [Silicon Genetics], and Cluster [
25 ] [Stanford University]), t-test (GeneSpring) and
four-comparison survival method [ 23 ] . The Cluster and
Tree View software were download from
http://rana.lbl.govand installed on an NT
workstation.