Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Download
29547 views
1
2
3
4
5
Background
6
Genomic cloning has revealed that most of the enzyme
7
families essential for maintaining cell growth have been
8
conserved throughout evolution [ 1 ] . However, mammalian
9
enzymes with different functional activity may have evolved
10
by combining elements from several bacterial ancestral
11
genes. Even small proteins may contain several individual
12
domains that link them to different superfamilies [ 2 ] .
13
While many endonucleases share a common active site that is
14
highly conserved across many subfamilies, identifying
15
residues that control substrate specificity requires
16
sophisticated analysis that combines both sequence
17
conservation and structural data [ 3 4 5 ] .
18
In this paper we distinguish, using a word-based
19
"molego" approach, structural elements that control
20
substrate specificity. We postulate here that elements
21
conserved in all the members of related protein families
22
dictate common structures and also common "functions",
23
i.e., individual steps in a complex reaction. Areas that
24
affect substrate specificity will be less conserved in the
25
superfamily than they are in subfamilies of enzymes that
26
catalyze specific activities. We have chosen to illustrate
27
this approach using the multifunctional family of DNA
28
repair proteins, the apurinic/apyrimidinic endonucleases
29
(APEs), which have a clearly defined bacterial ancestor,
30
E. coli exonuclease III (ExoIII), and
31
are distantly related to several enzymes with varying
32
substrate specificity.
33
APEs are essential for mammalian cell growth and
34
bacterial survival in the presence of ionizing radiation
35
and DNA mutagens [ 6 ] . They initiate repair of an abasic
36
DNA site by cleaving the phosphodiester backbone 5' of the
37
phosphodeoxyribose. This generates the necessary 3'
38
hydroxyl group for DNA polymerases (pol β, δ or ε, in
39
eukaryotes) to insert the correct nucleotide in later steps
40
in the base excision repair pathway (BER-pathway) [ 7 8 ] .
41
Recent crystal structures of huAPE1 complexed with DNA
42
containing an abasic site [ 9 10 11 ] , combined with
43
sequence analysis and site-directed mutagenesis, have
44
defined the residues that participate in metal ion based
45
cleavage of the phosphate backbone of the DNA [ 12 13 14 15
46
16 17 ] .
47
Mutations that greatly diminish the enzymatic activity
48
of huAPE1 do not affect, and may even increase binding to
49
damaged DNA, while non-specific DNA binding remains low [
50
16 18 ] . Further, mutations that have little effect on APE
51
activity in vitro prevent complementation of DNA repair
52
deficient
53
E. coli . As seen with other DNA
54
repair enzymes [ 19 ] , specificity determining residues,
55
as yet unidentified for APEs, must be distinct from those
56
involved in phosphorolysis.
57
To better assess which residues determine specificity,
58
we assume that functions unique to APEs will be determined
59
by motifs that are not conserved in a similar fashion in
60
families with a different activity spectrum of functions.
61
Besides cleaving the phosphate backbone, to achieve
62
specificity APEs must coordinate a series of functions,
63
including: interaction with target DNA in a series of
64
small, possibly repetitive steps (scanning), locating
65
damage sites, establishing the transition state complex,
66
completing the cleavage, re-adjusting the charge status
67
within the active site, and regulating release of product
68
after interaction with the next enzyme in the BER pathway [
69
20 21 22 23 24 25 ] . A finer breakdown of these functions
70
can be achieved at the molecular level once all the
71
residues in the reaction mechanism are known. APEs also
72
have RNase H, 3'-exonuclease, and 3'-phosphodiester
73
activities that are particularly high in the bacterial
74
members of the family [ 26 13 ] .
75
Our web-based MASIA program [ 27 ] was used to rapidly
76
decompose the sequences of APEs and related protein
77
families into motifs, areas of significant conservation in
78
members of identical function, which could then be
79
correlated structurally using data from crystal structures.
80
Having determined that 12 motifs were common to all APE1's,
81
we compared the structure of the subset of these that
82
occurred in both DNase 1 and synaptojanin, a member of the
83
IPP family. These shared motifs had a similar 3D structure
84
in representatives of these functionally diverse families,
85
and we therefore called these motifs "molegos" (molecular
86
legos). We then demonstrated that the shared molegos served
87
a similar role in substrate binding by comparing the DNA
88
binding profile of huAPE1 with that for the less specific
89
enzyme DNase 1. The molegos present in both enzymes
90
interact with target DNA in a similar fashion, while
91
residues in molegos distinctive for APE1 control
92
specificity by binding primarily to the bases around the
93
apurinic site. Matching of molegos, guided by the degree of
94
conservation of individual residues across the three
95
families, allowed a better alignment of the individual
96
secondary structure elements among the proteins than DALI
97
achieved. This word based, sequence (motif) to structure
98
(molego) to function method has clear implications for
99
genomic analysis and template based homology modeling, as
100
well as immediate application in recognizing specificity
101
determinants in proteins that share active sites common to
102
many enzymes [ 28 ] .
103
104
105
Results
106
107
Total sequence decomposition of human Ape1 with
108
MASIA
109
MASIA identified 12 motifs as conserved in all members
110
of the APE family (Figure 1and Table 1). As table 1, last
111
column, illustrates, these motifs include all the
112
residues known to be essential for DNA cleavage. Most of
113
the highly conserved (greater than 90%) residues have
114
been shown by previous mutagenesis studies to affect
115
activity. The 12 motifs are also structurally conserved,
116
as demonstrated by the low RMSD values between segments
117
in the crystal structures of bacterial ExoIII and of
118
huAPE1. These two proteins are only 26% identical (based
119
on a DALI, structure based alignment) and most of the
120
similar segments are contained in the molegos. As the
121
third column of the table demonstrates, the backbone
122
deviation of the segments is overall <1 Å and for 5 of
123
the motifs, <0.5 Å. We have chosen the name "molegos"
124
for the structural units associated with motifs, which
125
are presented pictorially in Figures 2and 3. Most of the
126
DNA and metal ion binding molegos form individual
127
β-strands at the core of the protein that orient the
128
absolutely conserved residues toward the substrate, but
129
several have a helical or hydrogen bonded coil
130
structure.
131
The 12 motifs, which account for about half of the
132
protein, are bridged by areas that vary in the different
133
members of the APE family. These connecting regions may
134
account for the differing activities of the bacterial and
135
mammalian proteins. The longest molego, 7, was broken
136
down into two areas, with the contiguous region labeled
137
7a. The first 7 residues of the 7a area molego are quite
138
similar in the bacterial and mammalian APE. However, the
139
end is differently conserved in eukaryotes. The
140
endonuclease activity of DNase 1 is reduced many fold by
141
integrating this loop from
142
E. coli exonuclease III, but the
143
mutant cleaves at abasic sites in DNA with low efficiency
144
[ 29 ] . Thus additional residues in the APEs control
145
specificity while still allowing a reasonable rate of
146
phosphorolytic cleavage.
147
148
149
Finding APE molegos in the DNase 1
150
superfamily
151
In an effort to functionally annotate the molegos of
152
APE1, we next sought to find them in other proteins that
153
shared some structural similarity to APE. The APEs,
154
DNase-1 and inositol 5'-polyphosphate phosphatases (IPP)
155
have been grouped according to the SCOP database [ 30 ]
156
as the DNase-1 like superfamily. Although DNase 1 has
157
only 18% overall sequence identity and the IPP domain of
158
synaptojanin, 14%, to APE1, we could show that most of
159
the areas of identity were in molegos common to all three
160
proteins. Motifs in other protein families were
161
identified by genomic cross-networking with PSIBLAST (see
162
methods for details). Our analysis identified 5 molegos
163
that are common to the DNase 1, IPP and APE families,
164
which roughly correspond to areas of sequence similarity
165
identified previously [ 31 32 ] . The structural
166
similarities of molegos 1,2, 7, 11 and 12 (i.e., the
167
segmental RMSD's) between APE1 and representatives of the
168
distantly related DNase 1 and IPP families are comparable
169
to those found between members of the APE family (Figure
170
4and Table 1& 2).
171
172
173
Common molegos form a similar active site in two
174
distant relatives
175
The 12 conserved molegos form the β-barrel core of
176
huAPE1. The completely conserved residues of huAPE1
177
concentrate, for the most part, at one end of this
178
framework to form the metal ion binding active site
179
(Figure 4). This core is also common to DNase 1 and
180
synaptojanin(an IPP family member), which share the
181
functions of metal ion based cleavage of a phosphate
182
backbone. The shared molegos define an active site
183
architecture conserved in all three proteins, including
184
the orientation of the substrate toward the metal binding
185
site.
186
187
188
Molegos define functional areas common to DNase 1
189
and APE1
190
A contact plot of huAPE1 with the DNA in the 1DE8
191
crystal structure (Figure 5) shows that motifs 1-3,5-8,
192
and 10-12 all have residues close to the substrate, an
193
oligonucleotide containing an abasic site (AP-DNA). The
194
N-terminal motifs 1-3 and 5 bind primarily 5' to the
195
apurinic site and to the 3' end of the undamaged strand.
196
The other motifs bind more to the area 3' of the damage
197
site. Motifs 10 and 12 span both strands of the DNA.
198
Although motif 12 contains several highly conserved
199
residues that, according to mutagenesis results (Table 1)
200
contribute to APE1 activity, only His309 is very close to
201
the abasic site in the DNA. Molegos 4 and 9 contain no
202
residues in contact with the DNA or metal ion. Comparing
203
the binding of APE in a substrate complex (Figure 6,
204
left) suggests that APE's binding to the 5' end of the
205
DNA after cleavage (Fig. 5), especially that mediated by
206
molego 3, is stronger, while the distance from the
207
protein to the DNA 3' of the cleavage site increases.
208
The contact plots of APE1 and DNase1 with their
209
respective substrates (Figure 6) documents that the
210
similar molegos in the proteins serve similar functions.
211
The N-terminal 100 residues of both proteins, including
212
molegos 1 and 2, bind 5' of the cleavage site and to the
213
3' end of the opposite DNA strand. Molegos 7, 11 and 12
214
bind to one base 5' and the next base 3' of the cleavage
215
site in both proteins. Overall, the pattern of protein
216
contacts to the cleavage site, the area 5' of the
217
cleavage site, and the 3' end of the opposite strand are
218
common to both proteins, suggesting that the functions of
219
forming the substrate complex and the actual
220
phosphorolysis are similar in both proteins.
221
While the length of the DNA in both cases is similar,
222
DNase 1 clearly has less binding to bases opposite and 3'
223
of the cleavage site. The extensive contacts that APE1
224
makes to these positions are mediated by molegos it does
225
not share with DNase 1. Molegos 6, 7a and 10 all have
226
residues within hydrogen bonding distance of the three
227
basepairs 3' of the AP-site. This redundancy of binding
228
to the 3' side is unique to APE, as is its strong binding
229
to the DNA opposite the abasic site.
230
The importance of such bonds for activity was shown in
231
other work, where huAPE1's binding to the DNA backbone is
232
only inhibited by ethylation of the phosphates two and
233
three positions 3' to an abasic site [ 18 ] . Mutation of
234
R177A, at the end of Molego 6, that binds to this region
235
and to the bases opposite the AP-site had enhanced
236
activity [ 11 ] , while mutations at W280 (Molego 11) and
237
F266 (Molego 10) [ 33 ] reduce activity and, in the
238
latter case, substrate selectivity.
239
In work from this group that will be described
240
separately, we used this analysis to generate mutants of
241
APE1 with altered activity. An alanine substitution
242
mutant, N226A, of a conserved residue at the end of
243
molego 7a that forms a hydrogen bond with the second
244
phosphate group downstream of the abasic site, had
245
enhanced APE activity but increased Km and Kd values,
246
similar to an alanine mutant of R177, which binds to the
247
same site, reported previously [ 34 ] . A combination of
248
the two mutants, N226A and R177A, substantially reduced
249
the ability of APE1 to bind to DNA containing an abasic
250
site (Izumi et al., in preparation). Thus, molegos can
251
effectively guide the redesign of enzymes to alter
252
specificity.
253
254
255
Molegos to improve structural alignment
256
Using molegos may also help in aligning proteins for
257
template based modeling, by determining the end points of
258
secondary structure elements in alignments with many gaps
259
and insertions. According to MASIA analysis, the residues
260
K/R and DI at the N-termini of motifs 1 and 2 are
261
absolutely conserved in the three families, APEs, DNase1s
262
and IPPs. However, matching these conserved residues
263
between synaptojanin and DNase1 or APE requires a gapping
264
that would not be consistent with CLUSTALW or a
265
structural (DALI [ 35 ] ) alignment of these proteins
266
(Table 2). If the local alignment with synaptojanin is
267
gapped to align these residues in the three proteins
268
(Table 2, gapped), the RMSD for the two sections
269
separated by the gap is much lower than that if one tries
270
to align the whole ungapped segment. As Fig.
271
7illustrates, the local environments of both conserved
272
residue pairs DI and QE are structurally equivalent in
273
all three proteins, indicating that a motif based
274
alignment with a two residue gap is correct. The first
275
two β-strand molegos in synaptojanin are 2 residues
276
longer than in APE or DNase 1. By regarding these
277
elements as simple lego style blocks, and recognizing the
278
connectivity, molego based alignment correctly defined
279
the changing length of the secondary structure
280
elements.
281
282
283
284
Discussion
285
286
Is the specificity of APE determined by binding 3'
287
to an abasic site?
288
Crystal structure data, coupled with molego analysis,
289
outlined the areas of APE1 that distinguish its mode of
290
DNA binding from the less specific DNase 1. Contact maps
291
(Figure 6) illustrate how the conserved motifs direct DNA
292
binding in the distributive (i.e., rapidly releasing
293
substrate/product), relatively non-specific DNase1 as
294
opposed to the processive, highly specific huAPE1. Both
295
enzymes cleave only one DNA strand in a duplex and
296
bacterial Xth cleaves ssDNA containing an abasic site [
297
42 ] . The additional contacts huAPE1, compared to DNase
298
1 (Figure 6), makes 3' to the damage site and to the
299
opposite strand lower its turnover rate and its potential
300
to cleave normal DNA. The residues contacting the region
301
3' to the abasic site come from three different uniquely
302
conserved areas of APEs (molegos 6,7a, 10) as well as 11,
303
a molego that is similar to that in DNase-1. These
304
observations, coupled with DNA ethylation data [ 18 ] ,
305
indicates that 3' binding is a key element in specific
306
recognition by APEs.
307
This is confirmed by site directed mutagenesis
308
studies. Of the four protein areas that bind to the DNA
309
3' of the abasic site, mutating F266 (molego 10) or W280
310
(middle of molego 11) decreases APE activity [ 33 ] . The
311
F266 mutation is particularly interesting, as the mutants
312
at this position had reduced substrate specificity and
313
enhanced 3'-exonuclease activity. However, an R177A
314
mutant had
315
enhanced APE activity [ 11 ] ,
316
as do mutations at N226 (Izumi et al., in preparation).
317
Combining these mutations however greatly decreases
318
substrate binding (Izumi et al., in preparation). The 3'
319
approach to the DNA [ 34 ] and the wide area covered by
320
the protein on both sides of the abasic site [ 14 ] are
321
both consistent with the need to hold the product until
322
the correct polymerase moves in 5' to 3' to complete the
323
repair [ 25 ] . This implies that the mammalian enzyme
324
has evolved to be processive, to facilitate more
325
efficient functioning of the overall BER pathway, and may
326
not be optimized for simple catalysis. Processivity is an
327
important facit of the activity of enzymes that function
328
in complex pathways [ 43 ] . Reduced processivity may
329
explain, for example, the repair deficits in Xeroderma
330
pigmentosum (XPA) cells [ 44 ] . Our molego approach
331
provides a basis for exploring the role of segments of
332
the protein in its functions, rather than relying only on
333
data from missense mutations.
334
335
336
Using molegos to detect structural and functional
337
homologues
338
We have demonstrated here the derivation and uses of
339
molegos for analyzing the specificity of enzymes, based
340
on those derived from the APE family. The methodology can
341
be used to complement searches with programs such as
342
PSIBLAST and PROSITE [ 41 ] to detect distantly related
343
functional or structural homologues in sequences revealed
344
by genome sequencing. PSIBLAST searches often reveal
345
areas of local similarity in proteins that have no
346
significant overall sequence identity. Molego analysis
347
could be useful to analyze the significance of such
348
findings. The combined sequence and structure definition
349
makes molegos more flexible for defining shared protein
350
elements than methods such as PROSITE that require a
351
strict one-dimensional definition. An improved motif
352
definition method, based on physical property similarity
353
[ 42 ] , which has been incorporated into our MASIA tool]
354
also promises to enhance the usefulness of the method.
355
This may eventually lead to a method to find functional
356
relationships between proteins with even lower overall
357
sequence similarity.
358
Another potential area for applying the molego
359
approach is in homology modeling. Molegos may prove
360
useful in to check alignments for template based modeling
361
of homologues with low identity (Table 2and Figure 7), if
362
the "anchoring" residues are conserved in sequence or
363
property across the members of both subfamilies. Our
364
molego approach is closest in principle to that of the
365
ROSETTA program [ 45 ] whereby the latter seeks only to
366
connect structure, not function, to a sequence element.
367
We are currently testing the usefulness of the molego
368
approach in modeling in the CASP5 competition.
369
370
371
372
Conclusions
373
The MASIA program can parse sequences into discrete
374
blocks of significant conservation. The motifs identified
375
in the APE family could be structurally annotated using
376
crystal data to derive molegos, words in the protein
377
sequence that correlate with structural elements. These
378
molegos could in turn be functionally annotated by
379
comparing the DNA binding profile of APE1 with that of the
380
less specific nuclease DNase 1. This analysis indicated
381
that residues binding 3' to the site of phosphorolytic
382
cleavage control the substrate specificity of APE1. These
383
results indicate that molegos can provide a useful basis
384
for identifying specificity determining regions in enzymes
385
with similar active sites but different activity spectra [
386
46 28 3 ] . Site directed mutagenesis based on these
387
results can define the function of the unique elements of
388
the APEs, and aid in the design of enzymes with altered
389
specificity.
390
391
392
Materials and Methods
393
394
Sequence alignment
395
A BLAST [ 47 ]
396
http://www.ncbi.nlm.nih.gov/BLAST/search of the
397
"non-redundant" protein database using the whole sequence
398
of human APE1 yielded over 100 related sequences. Some
399
sequence entries represented the same protein, called by
400
different names or isolated in different screens,
401
including many entries for huAPE1,
402
Drosophila Rrp 1 protein (~40%
403
identical to the mammalian APE1 in the C-terminal third
404
of the protein), Xth from
405
E. coli , and counterparts of this
406
and exodeoxyribonuclease (exo A) sequences from many
407
bacteria, which are about 25% identical to mammalian
408
APE1. The mammalian sequences are highly conserved, with
409
only 6 non-conservative residue variations between the
410
human and murine sequences, 5 of which occur in the
411
apparently unstructured N-terminus. Several proteins with
412
more distant relationship to APE1, such as mammalian and
413
yeast APEIIs, and the CRC protein from
414
Pseudomonas , which has no APE
415
activity [ 48 ] were in the BLAST list, but were not used
416
for this analysis. To derive functional motifs, the BLAST
417
list was culled to 37 unique sequences with identity
418
ranging from 25% to 98%. These were aligned using the
419
default parameters of CLUSTALW [ 49 50 ]
420
http://www2.ebi.ac.uk/clustalw/. Sequences were the APE1
421
protein from human, bovine, monkey, rat murine,
422
Arabidopsis thaliana (Mouse-ear
423
cress)
424
Dictyostelium discoideum (Slime
425
mold),
426
Schizosaccharomyces pombe (Fission
427
yeast),
428
Caenorhabditis elegans ,
429
Saccharomyces cerevisiae (Baker's
430
yeast),
431
Thermoplasma acidophilum ,
432
Neisseria meningitidis .
433
Methanobacterium
434
thermoautotrophicum ,
435
Leishmania major ,
436
Trypanosoma cruzi ,
437
Coxiella burnetii ; the Rrp1
438
protein of
439
Drosophilia ; exonuclease III from
440
E. coli ,
441
Bacillus subtilis ,
442
Mycobacterium tuberculosis ,
443
Haemophilus influenzae ,
444
Salmonella typhimurium ,
445
Helicobacter pylori ,
446
Rickettsia prowazekii ,
447
Archaeoglobus fulgidus ,
448
Actinobacillus
449
actinomycetemcomitans ,
450
Streptomyces coelicolor ,
451
Synechocystis sp. PCC 6803,
452
Haemophilus influenzae ;
453
Exonuclease A from
454
Steptococcus pneumonia ,
455
Treponema pallidum ,
456
Borrelia burgdorferi .
457
Plasmodium falciparum . Different
458
set conditions and sequence lists were tested in CLUSTALW
459
for their effect on alignment and subsequent motif
460
definition with MASIA. Areas peripheral to the
461
endonuclease domain, such as the 50 amino acid mammalian
462
and the 428 residue
463
Drosophila Rrp-1 N-terminal regions
464
were eliminated to improve the consensus.
465
466
467
Identification of motifs using MASIA
468
Motifs were identified in the aligned sequences using
469
the MASIA consensus macro
470
http://www.scsb.utmb.edu/masia/masia.html. Motifs start
471
when at least 3 of 4 consecutive positions are more than
472
40% conserved according to the dominant criterion [ 51 ]
473
, and extend until at least 2 positions in a row are less
474
than 40% conserved. To allow for mistakes in the
475
alignment of all the sequences, essential residues are
476
those >90% conserved by MASIA criteria over all
477
sequences in the alignment.
478
479
480
Genomic cross-networking with PSIBLAST
481
A PSIBLAST search, using huAPE1 as the founder
482
sequence, with an e-value of 0.1 per iteration, did not
483
converge after 6 iterations, but few new sequences were
484
added in the last 2 cycles. Searches with an e-value of
485
0.01/iteration had similar results, but members of
486
several families were not included until later cycles.
487
Members of the DNase 1, LINE-1 repeats, inositol
488
5'-polyphosphate phosphatase, Nocturnin, CCR4, cytolethal
489
distending toxin, neutral sphingomyelin
490
phosphodiesterase, and amino acid methyltransferase
491
families were found with expectation values of 10 -4or
492
less to be significantly similar to APE1. To determine
493
the presence of motifs in these relatives, a CLUSTALW
494
alignment of at least 5 representatives of a protein
495
family was prepared and analyzed with MASIA for
496
significant areas of conservation. In some cases,
497
alignments taken from literature references (e.g., for
498
IPPs [ 32 ] ) were used to confirm MASIA results. The
499
motifs common to these families were compared with the
500
APE motifs of Table 1. Criteria for inclusion (presence
501
of motif) included conservation of residues >90%
502
conserved (side chains shown in blue in the tables) and
503
patterns of polarity (as determined with a macro included
504
in the MASIA packet as a user specified feature).
505
506
507
Molego building and comparison
508
The drawings of "molego blocks" and structures
509
(Figures 2, 3, 4and 7), contact plots of protein/DNA
510
(Figures 5, 6), and calculation of the RMSD between
511
similar segments (Table 2) were done with MOLMOL
512
http://www.mol.biol.ethz.ch/wuthrich/software/molmol/ [
513
52 ] . The RMSD values in Table 1were calculated using
514
SwissPDB viewer and the "fit selected residues"
515
option.
516
517
518
519
Authors' contributions
520
CHS developed the molego concept, performed the sequence
521
analysis and prepared the manuscript. NO analyzed the
522
structural properties of the molegos, and prepared figures
523
2, 3, 4, 5, 6, 7. TI prepared mutants of human APE1 protein
524
to test the conclusions of this paper, and he provided
525
background and list of mutations affecting APE function. WB
526
designed and developed the MASIA program and the modular
527
analysis approach to enzyme functions.
528
Additional File 1
529
530
Sequence alignment and MASIA printout for
531
the APE family The original alignment used for the APE
532
family and the MASIA consensus macro output. Note that the
533
A. thaliana sequence is consistently
534
an outlier in the file, due to the CLUSTAL-W misalignment
535
of the nuclease area of the protein when the N-terminal
536
sequence is not removed. The real position of the motifs in
537
the
538
A. thaliana sequence are
539
underlined.
540
Click here for file
541
542
543
544
545