Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Download
29547 views
1
2
3
4
5
6
If necessity is the mother of invention, then its father is an inveterate tinkerer, with
7
a large garage full of spare parts. Innovation (like homicide) requires motive and
8
opportunity. Clearly, the predominant ‘motive’ during the evolution of a novel gene
9
function is to gain a selective advantage. To understand why gene duplications represent
10
the major ‘opportunities’ from which new genes evolve, we must first consider what
11
constrains genic evolution.
12
The vast majority of genes in every genome are selectively constrained, in that most
13
nucleotide changes that alter the fitness of the organism are deleterious. How do we know
14
this? Comparisons between genomes clearly demonstrate that coding sequences diverge at
15
slower rates than non-coding regions, largely due to a deficit of mutations at positions
16
where a base change would cause an amino-acid change. Gene duplication provides
17
opportunities to explore this forbidden evolutionary space more widely by generating
18
duplicates of a gene that can ‘wander’ more freely, on condition that between them they
19
continue to supply the original function.
20
Susumu Ohno was the first to comprehensively elucidate the potential of gene
21
duplication, in his book
22
Evolution by Gene Duplication , published more than 30 years ago (Ohno
23
1970). The prescience of Ohno's book is highlighted by the fact that his book has almost
24
certainly been cited more times in the past five years than in the first five years after
25
its publication.
26
27
28
What Is the Evidence for the Importance of Gene Duplication?
29
The primary evidence that duplication has played a vital role in the evolution of new
30
gene functions is the widespread existence of gene families. Members of a gene family that
31
share a common ancestor as a result of a duplication event are denoted as being paralogous,
32
distinguishing them from orthologous genes in different genomes, which share a common
33
ancestor as a result of a speciation event. Paralogous genes can often be found clustered
34
within a genome, although dispersed paralogues, often with more diverse functions, are also
35
common.
36
Whole genome sequences of closely related organisms have allowed us to identify changes
37
in the gene complements of species over relatively short evolutionary distances. These
38
comparisons typically reveal dramatic expansions and contractions of gene families that can
39
be related to underlying biological differences. For example, humans and mice differ in
40
their sensory reliance on sight and smell respectively; colour vision in humans has been
41
significantly enhanced by the duplication of an Opsin gene that allows us to distinguish
42
light at three different wavelengths, while mice can distinguish only two. By contrast, a
43
much higher proportion of the large gene family of olfactory receptors have retained their
44
functionality in mice, as compared to humans.
45
Given the apparent importance of gene duplication for the evolution of new biological
46
functions over all evolutionary timescales, it is of great interest to be able to
47
comprehensively document the duplicative differences that exist between our own species and
48
our closest relatives, the great apes. The study by Fortna et al. (2004) in this issue of
49
PLoS Biology identifies over 3% of around 30,000 genes as having
50
undergone lineage-specific copy number changes among five hominoid (humans plus the great
51
apes) species. This is the first time that copy number changes among apes have been assayed
52
for the vast majority of human genes, and we can expect that the biological consequences of
53
the 140 human-specific copy number changes identified in this study will be heavily
54
investigated over the coming years.
55
56
57
How Do Duplications Arise?
58
The various mechanisms by which genes become duplicated are often classified on the
59
basis of the size of duplication generated, and whether they involve an RNA intermediate
60
(Figure 1).
61
‘Retrotransposition’ describes the integration of reverse transcribed mature RNAs at
62
random sites in a genome. The resultant duplicated genes (retrogenes) lack introns and have
63
poly-A tails. Separated from their regulatory elements, these integrated sequences rarely
64
give rise to expressed full-length coding sequences, although functional retrogenes have
65
been identified in most genomes.
66
Tandem duplication of a genomic segment (segmental duplication) is one of the possible
67
outcomes of ‘unequal crossing over’, which results from homologous recombination between
68
paralogous sequences. These recombination events can also give rise to the deletion or
69
inversion of intervening sequences. Recent evidence suggests that the explosion of
70
segmental duplications in recent primate evolution has been caused in part by the rapid
71
proliferation of Alu elements about 40 MYA. Alu elements are derived from the 7SL RNA gene
72
and represent the most frequent dispersed repeat in the human genome, with the
73
approximately 1 million copies of the 300-bp Alu element representing around 10% of the
74
entire genome. The striking enrichment of Alu elements at the junctions between duplicated
75
and single copy sequences implicates unequal crossing over between these repeats in the
76
generation of segmental duplications (Bailey et al. 2003).
77
The observation of segmental duplication events with no evidence for homology-driven
78
unequal crossing over suggests that segmental duplications can also arise through
79
non-homologous mechanisms. A recent screen for spontaneous duplications in yeast suggests
80
that replication-dependent chromosome breakages also play a significant role in generating
81
tandem duplications, because duplication breakpoints are enriched at replication
82
termination sites (Koszul et al. 2004).
83
Genome duplication events generate a duplicate for every gene in the genome,
84
representing a huge opportunity for a step-change in organismal complexity. However, genome
85
duplication presents significant problems for the faithful transmission of a genome from
86
one generation to the next, and is consequently a rare event, at least in Metazoa. In
87
principle, genome duplications should be easily identified through the coincident emergence
88
within a phylogeny of many gene families. Unfortunately, this signal is complicated by
89
subsequent piecemeal loss and gain of gene family members. Consequently, there is heated
90
debate over possible ancient genome duplication events in early vertebrate evolution and
91
more recently in teleost fish, both of which must have occurred hundreds of millions of
92
years ago (McLysaght et al. 2002; Van de Peer et al. 2003).
93
So what are the relative contributions of these different mechanisms? Not all
94
interspersed duplicate genes are generated by retrotransposition. The initially tandem
95
arrangement of segmental duplications can be broken up by subsequent rearrangements. In
96
keeping with this hypothesis, duplicated genes in a tandem arrangement typically represent
97
more recent duplication events (Friedman and Hughes 2003). Recent analyses suggest that 70%
98
of non-functional duplicated genes (pseudogenes) in the human genome result from
99
retrotransposition rather than any DNA-based process (Torrents et al. 2003).
100
101
102
What Fates Befall a Recently Duplicated Gene?
103
A duplicated gene newly arisen in a single genome must overcome substantial hurdles
104
before it can be observed in evolutionary comparisons. First, it must become fixed in the
105
population, and second, it must be preserved over time. Population genetics tells us that
106
for new alleles, fixation is a rare event, even for new mutations that confer an immediate
107
selective advantage. Nevertheless, it has been estimated that one in a hundred genes is
108
duplicated and fixed every million years (Lynch and Conery 2000), although it should be
109
clear from the duplication mechanisms described above that it is highly unlikely that
110
duplication rates are constant over time. However, once fixed, three possible fates are
111
typically envisaged for our gene duplication.
112
Despite the slackened selective constraints, mutations can still destroy the incipient
113
functionality of a duplicated gene: for example, by introducing a premature stop codon or a
114
mutation that destroys the structure of a major protein domain. These degenerative
115
mutations result in the creation of a pseudogene (nonfunctionalization). Over time, the
116
likelihood of such a mutation being introduced increases. Recent studies suggest that there
117
is a relatively narrow time window for evolutionary exploration before degradation becomes
118
the most likely outcome, typically of the order of 4 million years (Lynch and Conery
119
2000).
120
During the relatively brief period of relaxed selection following gene duplication, a
121
new, advantageous allele may arise as a result of one of the gene copies gaining a new
122
function (neofunctionalization). This can be revealed by an accelerated rate of amino-acid
123
change after duplication in one of the gene copies. This burst of selection is necessarily
124
episodic—once a new function is attained by one of the duplicates, selective constraints on
125
this gene are reasserted. These patterns of selection can be observed in real data: most
126
recently duplicated gene pairs in the human genome have diverged at different rates from
127
their ancestral amino-acid sequence (Zhang et al. 2003). A convincing instance of
128
neofunctionalization is the evolution of antibacterial activity in the
129
ECP gene in Old World Monkeys and hominoids after a burst of amino-acid
130
changes following the tandem duplication of the progenitor gene
131
EDN (a ribonuclease) some 30 MYA (Zhang et al. 1998). The divergence of
132
duplicated genes over time can be also monitored in genome-wide functional studies. In both
133
yeast and nematodes, the ability of a gene to buffer the loss of its duplicate declines
134
over time as their functional overlap decreases.
135
Rather than one gene duplicate retaining the original function, while the other either
136
degrades or evolves a new function, the original functions of the single-copy gene may be
137
partitioned between the duplicates (subfunctionalization). Many genes perform a
138
multiplicity of subtly distinct functions, and selective pressures have resulted in a
139
compromise between optimal sequences for each role. Partitioning these functions between
140
the duplicates may increase the fitness of the organism by removing the conflict between
141
two or more functions. This outcome has become associated with a population genetic model
142
known as the Duplication–Degeneration–Complementation (DDC) model, which focuses attention
143
on the regulatory changes after duplication (Force et al. 1999). In this model,
144
degenerative changes occur in regulatory sequences of both duplicates, such that these
145
changes complement each other, and the union of the expression patterns of the two
146
duplicates reconstitutes the expression pattern of the original (Figure 2).
147
A recent study by Dorus and colleagues (Dorus et al. 2003) investigated the
148
retrotransposition (since the existence of a human–mouse common ancestor) of one of the two
149
autosomal copies of the
150
CDYL gene to Y chromosome (forming
151
CDY ). In the mouse, both
152
Cdyl genes produce two distinct transcripts, one of which is expressed
153
ubiquitously while the other is testis-specific. By contrast, in humans both
154
CDYL genes produce a single ubiquitously expressed transcript, and
155
CDY exhibits testis-specific expression. As
156
CDY is a retrogene (see above) that has not been duplicated together with
157
its ancestral regulatory sequences, it is clear that the DDC model is not the only route by
158
which to achieve spatial partitioning of ancestral expression patterns.
159
Subfunctionalization can also lead to the partitioning of temporal as well as spatial
160
expression patterns. In humans, the β-globin cluster of duplicated genes contains three
161
genes with coordinated but distinct developmental expression patterns. One gene is
162
expressed in embryos, another in foetuses, and the third from neonates onwards. In
163
addition, coding sequence changes have co-evolved with the regulatory changes so that the O
164
2 binding affinity of haemoglobin is optimised for each developmental
165
stage. This coupling between coding and regulatory change is similarly noted at a genomic
166
level when expression differences between many duplicated genes pairs are correlated with
167
their coding sequence divergence (Makova and Li 2003).
168
169
170
Other Evolutionary Consequences of Gene Duplication
171
If duplication results in the formation of a novel function as a result of interaction
172
between the two diverged duplicates, which of the above categories of evolutionary outcome
173
does this innovation fall into? Not all new biological functions resulting from gene
174
duplications can be ascribed to individual genes. Protein–protein interactions often occur
175
between diverged gene duplicates. This is especially true for ligand–receptor pairs, which
176
are often supposed to coevolve after a gene duplication event, and thus progress from
177
homophilic to heterophilic interactions. This emergent function of the new gene pair does
178
not fit comfortably into any of the scenarios outlined above: both genes are functional yet
179
neither retains the original function, nor has the original function been partitioned. This
180
mode of ‘duplicate co-evolution’ is likely to be especially prevalent in signalling
181
pathways.
182
Earlier, we saw that homologous recombination between paralogous sequences can result in
183
rearrangements, including tandem duplications. Such recombination events need not cause
184
rearrangements, but can also result in the nonreciprocal transfer of sequence from one
185
paralogue to the other—a process known as gene conversion. Gene conversion homogenizes
186
paralogous sequences, retarding their divergence, and consequently obscuring their
187
antiquity. This leads to the observation of ‘concerted evolution’ whereby duplicates within
188
a species can be highly similar and yet continue to diverge between species (Figure 3).
189
Once gene duplicates have diverged sufficiently so that they differ in their functionality
190
(or non-functionality), gene conversion events can become deleterious—for example, by
191
introducing disrupting mutations from a pseudogene into its functional duplicate. A
192
substantial proportion of disease alleles in Gaucher disease result from the introduction
193
of mutations into the glucocerebrosidase gene from a tandemly repeated pseudogene (Tayebi
194
et al. 2003). These kinds of recombinatorial interactions only occur between paralogues
195
that are minimally diverged. Thus, while selective interactions and functional overlap
196
between duplicates declines relatively slowly over evolutionary time, the potential for
197
recombinatorial interactions between paralogues is relatively short-lived.
198
For some genes, duplication confers an immediate selective advantage by facilitating
199
elevated expression, or as Ohno put it, ‘duplication for the sake of producing more of the
200
same’. This has clearly been the case for histones and ribosomal RNA genes. In this
201
scenario, gene conversion is of potential benefit in maintaining homogeneity between
202
copies. Certainly both histone and rDNA genes are commonly found in arrays of duplicates:
203
structures that facilitate array homogenization by both gene conversion and repeated
204
unequal crossing over.
205
Mechanisms of segmental duplication are oblivious to where genes begin and end, and so
206
are additionally capable of duplicating parts of genes or several contiguous genes. The
207
intragenic duplication of individual exons or enhancer elements also presents new
208
opportunities for the evolution of new functions or greater regulatory complexity.
209
210
211
Conclusions
212
The likelihood that newly duplicated genes will both remain functional clearly relates
213
to their inherent potential to undergo subfunctionalization or neofunctionalization. Under
214
the DDC model, greater regulatory complexity bestows greater potential for
215
subfunctionalization (Force et al. 1999), whereas neofunctionalization is more likely to
216
occur in genes that are necessarily rapidly evolving, such as those involved in
217
reproduction, immunity, and host defence (Emes et al. 2003). This is not to say that these
218
biases are deterministic, there are plenty of ‘successful’ gene family clusters that
219
contain associated pseudogenes.
220
Duplicate gene evolution has most likely played a substantial role in both the rapid
221
changes in organismal complexity apparent in deep evolutionary splits and the
222
diversification of more closely related species. The rapid growth in the number of
223
available genome sequences presents diverse opportunities to address important outstanding
224
questions in duplicate gene evolution. For those interested in patterns of selection
225
following duplication, the transient nature of the evolutionary window of opportunity
226
following duplication will focus attention on recently duplicated genes. In this regard it
227
will be important to document copy number variation not only among species, as Fortna et
228
al. have, but within species as well. In addition, it has been, and will continue to be, a
229
lot easier to identify copy number changes between genomes than it is to identify their
230
biological consequences (if any). Extensive functional studies targeted at duplicated genes
231
are required if we are to more fully understand the range of evolutionary outcomes.
232
Moreover, collaborations between the proteomics and evolutionary genetics communities would
233
facilitate investigation of the potential role of gene duplication during the evolution of
234
the protein–protein and cell–cell interactions that are fundamental to the biology of
235
multicellular organisms.
236
237
238
239
240