Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Download
29547 views
1
2
3
4
5
Background
6
During the last decade several genes have been found to
7
be interrupted by selfish genetic elements translated in
8
frame with their host proteins. During post translational
9
processing these elements excise themselves out of the host
10
protein (see [ 1 ] and [ 2 ] for recent reviews). The
11
sequences removed during splicing are called inteins (short
12
for internal protein); the portions of the host protein are
13
termed exteins (external protein) [ 3 4 5 ] .
14
Inteins facilitate their excision out of the host
15
protein without the help of any known host specific
16
activity. This phenomenon, called protein splicing, was
17
first discovered about a decade ago in the
18
Saccharomyces cerevisiae V-ATPase
19
catalytic subunit A [ 6 7 ] . Intein excision depends on
20
the splicing domain of the intein and the first amino acid
21
residue of the C-extein [ 8 ] . The inteins known to date
22
are between 134 and 608 amino acids long, and they have
23
been reported from all three domains of life: eukaryotes,
24
eubacteria and archaea. Pietrokovski's webpage on inteins
25
http://blocks.fhcrc.org/~pietro/inteins/currently lists
26
more than 100 inteins in 34 different types of proteins [ 9
27
] . The host proteins are diverse in function, including
28
metabolic enzymes, DNA and RNA polymerases, gyrases,
29
proteases, ribonucleotide reductases, and vacuolar and
30
archaeal type ATPases. Common features suggested for these
31
proteins are their expression during DNA replication [ 1 ]
32
and their low substitution rate during evolution [ 9 ]
33
.
34
Most reported inteins are composed of two domains: one
35
is responsible for protein splicing, and the other has
36
endonuclease activity [ 10 11 12 13 ] . The function of the
37
endonuclease is to spread the intein to intein-free
38
homologs of the host protein. During this process, called
39
homing, the gene encoding the intein-free homolog is
40
cleaved by the endonuclease at or close to the intein
41
integration site. During the repair of the cleaved gene,
42
the intein is copied to the previously intein-free homolog.
43
Gimble and Thorner [ 14 ] demonstrated intein homing in
44
Saccharomyces cerevisiae using
45
engineered V-ATPase genes from which the intein encoding
46
portion had been previously removed. However, some inteins
47
lack the endonuclease domain. Inteins without this domain
48
perform autocatalytic splicing [ 15 16 ] . Homing
49
endonucleases and the process of homing have been more
50
intensively studied in self splicing introns [ 17 ] , and
51
the process is assumed to be similar for inteins.
52
53
Thermoplasma acidophilum is among
54
sixteen archaea for which inteins have been reported to
55
date (Intein database
56
http://www.neb.com/inteins/int_reg.html [ 18 ] ). Members
57
of the genus
58
Thermoplasma lack cell walls and
59
possess a cytoskeleton. They live in hot and acidic
60
environments, and are often found adhering to sulfur
61
particles [ 19 ] .
62
T. acidophilum grows optimally at
63
59°C and at an external pH between 1-2 [ 20 ] . A
64
cytoplasmic pH of 5.5 has been measured indirectly [ 21 ]
65
.
66
Proton pumping ATPases/ATPsynthases are found in all
67
groups of present day organisms [ 22 ] . The typical
68
archaeal ATPsynthase is homologous to the eukaryotic
69
vacuolar ATPase. Because of the high degree of sequence
70
similarity the archaeal ATP synthase (A-type ATPase) is
71
sometimes labeled as vacuolar or V-type ATPase. The
72
archaeal and the vacuolar ATPase are both homologous to the
73
bacterial F-ATPases, but the level of sequence similarity
74
with the F-ATPases is much lower than between the V- and
75
the A-ATPases. To date seven species have been identified
76
to harbor inteins in their ATPase catalytic subunits. The
77
first intein was discovered in the
78
vma -1 gene of
79
Saccharomyces [ 7 ] . The yeast
80
Candida tropicalis [ 23 ] possesses
81
an intein in the same location. Inteins are also present in
82
the A subunit of the A-type ATPases of
83
Thermoplasma acidophilum, T. volcanium,
84
Pyrococcus abysii, P. horikoshii and
85
P. furiosus.
86
Here we present data on the cloning and expression of
87
the
88
T. acidophilum A-ATPase A subunit in
89
E. coli, and we discuss implications
90
for the location, propagation and distribution of inteins
91
among organisms.
92
93
94
Results
95
96
Sequence analysis of the T.
97
acidophilumintein
98
The
99
T. acidophilum intein was
100
discovered while sequencing the catalytic subunit of the
101
archaeal ATPase/ATPsynthase from
102
T. acidophilum for systematic
103
purposes [ 24 ] . More recently, the complete genome
104
sequences of
105
T. acidophilum [ 25 ] and
106
T. volcanii [ 26 ] have been
107
reported. In both instances, the catalytic subunit of the
108
ATPsynthase is the only gene in the entire genome for
109
which an intein was reported. Using PSI-BLAST [ 27 ] with
110
different inteins as seeds, divergent inteins with less
111
than ten percent sequence identity as compared to the
112
query sequence are recovered (data not shown); however,
113
using this approach we did not discover any additional
114
intein or homing endonuclease encoding genes in
115
T. acidophilum.
116
Multiple sequence alignments of diverse intein
117
sequences identified eight motifs composed of moderately
118
conserved residues [ 18 28 ] . The A-ATPase A subunit of
119
the
120
Thermoplasma and
121
Pyrococcus intein multiple sequence
122
alignment (with manual modification) is shown in figure
123
1. The
124
T. acidophilum intein (173 amino
125
acids long) is among the shortest inteins known, and the
126
alignment with other inteins reveals the absence of
127
sequences homologous to the typical endonuclease motifs.
128
Only the motifs characteristic for the self splicing
129
domain are present in the
130
Thermoplasma intein (see figure
131
1).
132
The significance of the match between the
133
T. acidophilum and the three
134
pyrococcal ATPase inteins was assessed using PRSS at
135
http://fasta.bioch.virginia.edu/fasta/prss.htm [ 29 ] .
136
The P-value for this match, i.e. the probability of
137
obtaining a match of this quality by chance alone, was
138
calculated to be below 10 -10. This indicates that not
139
only the exteins, but also the inteins themselves are
140
recognizably homologous. In contrast, a sequence
141
alignment of all known inteins shows intein sequences to
142
be much more divergent (not shown). For example, the
143
P-value for the comparison between the
144
T. acidophilum and the
145
Saccharomyces cerevisiae intein was
146
0.28, i.e. no significant similarity between the yeast
147
and the
148
Thermoplasma inteins was detectable
149
using pairwise alignments only.
150
151
152
Location of the intein within the host
153
protein
154
The vacuolar and the archaeal ATPases are homologous
155
to the bacterial and organellar F-type ATPases. The
156
structure of the bovine mitochondrial F
157
1 -ATPase has been determined by X-ray
158
crystallography [ 30 ] . The intein insertion points in
159
the yeasts' V-ATPase and the
160
Thermoplasma and
161
Pyrococcus A-ATPases correspond to
162
the catalytic site where the ATP binds and is hydrolyzed
163
during the catalytic cycle. Figure 2shows that the
164
inteins are located in the regions of these very
165
conserved proteins [ 22 ] that have the lowest
166
substitution rates.
167
168
169
Comparison of A-ATPase catalytic subunit and 16S
170
rRNA phylogeny
171
The phylogenies of archaeal ATPase catalytic subunits
172
and small subunit ribosomal RNAs are shown in Figure 3.
173
Both phylogenies were calculated for a similar set of
174
species. While the two phylogenies show some significant
175
differences, neither of them groups the molecules from
176
Thermoplasma with the
177
Pyrococcus homologs. In both
178
phylogenies several other Archaea, i.e.,
179
M. jannaschii, M. thermolithotrophicus,
180
Thermococcus sp., Halobacterium sp.,
181
Methanosarcina and
182
Methanobacterium
183
thermoautotrophicus branch off between the two groups
184
that carry inteins in their ATPase catalytic subunits.
185
All of these intervening archaeal ATPases do not carry an
186
intein.
187
188
189
Codon usage comparison
190
Codon usage varies among organisms. The production of
191
tRNAs corresponds to the frequencies with which the
192
different codons are present in their protein coding
193
genes. The exact causes for tRNA regulation and codon
194
usage are not completely understood; however, expression
195
of foreign genes in an organism is often prevented by a
196
different codon usage of the foreign gene [ 31 ] . Many
197
Archaea have codon usage frequencies and tRNA
198
compositions different from
199
E. coli. The
200
T. acidophilum A-ATPase A subunit
201
has 763 codons. Codon AUA (Ile) is the most frequent
202
(39/1000). In
203
E. coli the same codon (AUA) is a
204
rare codon present at a frequency of only 5.5 per 1000.
205
The other two major differences in codon usage between
206
the
207
T. acidophilum A-ATPase A subunit
208
and
209
E. coli are AGG (Arg) and AGA (Arg)
210
which are present in
211
T. acidophilum A-ATPase A subunit
212
at frequencies of 41 and 16 per 1000 respectively. In
213
E. coli however, these codons are
214
considered rare and occur with frequencies of 1.7 and 2.8
215
per 1000 respectively.
216
217
218
Expression and intein splicing of T.
219
acidophilumA-ATPase A subunit
220
The gene encoding the
221
T. acidophilum A-ATPase A subunit
222
was cloned into the expression vector pET-11a
223
(Stratagene) and transformed into
224
E. coli Bl21(DE3) and
225
E. coli Bl21-CodonPlus(DE3)-RIL
226
strain for protein expression. When
227
E. coli Bl21(DE3), a strain that
228
did not express additional rare tRNAs, was transformed
229
with the cloned
230
T. acidophilum ATPase A subunit, no
231
additional protein bands were observed in extracts of
232
induced cells (not shown). However, the
233
E. coli Bl21-CodonPlus (DE3)-RIL
234
strain transformed with the same plasmid expresses two
235
additional proteins of 20 and 65 kDalton upon induction
236
(Fig. 4). No additional band at 85 kDa, indicative of an
237
unprocessed intein, was visible after induction. This
238
demonstrates that autocatalytic splicing occurred
239
efficiently in
240
E. coli. Efficient autocatalytic
241
splicing was also observed when the
242
E. coli were grown and induced at
243
42°C, or when the
244
E. coli were induced at 16°C for 16
245
hours (Fig. 4A). During preparation for SDS gel
246
electrophoreses the samples are heated to 72°C in the
247
presence of DTT. With respect to temperature these
248
conditions are more similar to the conditions in the
249
T. acidophilum cytoplasm than the
250
conditions in
251
E. coli. Intein excision might
252
occur only during this high temperature treatment.
253
Therefore, proteins from induced
254
E. coli were also separated under
255
non-denaturing conditions with or without addition of
256
DTT. The induced bands were excised from the
257
non-denaturating gel and separated using denaturing SDS
258
gel electrophoresis (Fig. 5). Both of the slower
259
migrating bands visible after induction (A and B in Fig.
260
5), revealed only one major band corresponding to the
261
spliced and religated A-subunit upon separation in an SDS
262
denaturing gel. Presumably the slower migrating band was
263
a dimer or higher aggregate of the A-subunit monomer. In
264
none of these experiments did we find any indications for
265
unprocessed or incompletely spliced inteins.
266
267
268
269
Discussion
270
The
271
Daucus carota [ 32 ] and
272
Saccharomyces cerevisiae (Alireza
273
Senejani unpublished) catalytic V-ATPase subunits form
274
inclusion bodies when expressed in
275
E. coli. In contrast, the
276
T. acidophilum subunit is expressed
277
as a soluble protein in the
278
E. coli cytoplasm. Despite the
279
chemical and physical differences between the
280
E. coli cytoplasm and the environment
281
in which the
282
T. acidophilum intein is functioning
283
in vivo, we found no indications that
284
self-splicing of the intein was inefficient in
285
E. coli. Even when
286
E. coli was grown at lower
287
temperatures, processing of the intein and religation of
288
the exteins appeared 100% efficient. Complete processing
289
was also observed when the
290
E. coli proteins were separated in
291
non-denaturing and non-reducing gels. Autocatalytic
292
splicing of the
293
T. acidophilum A-ATPase catalytic
294
subunit occurs efficiently in the
295
E. coli cytoplasm. The
296
T. acidophilum A-ATPase intein
297
appears to splice out efficiently at very different pHs
298
(7.2 versus 5.5 [ 21 33 ] ) and temperatures (16 to 37°
299
versus 55°C [ 20 ] ).
300
The
301
T. acidophilum intein shows
302
significant sequence similarity to the inteins found in the
303
A-ATPase catalytic subunits of
304
Pyrococcus. Moreover, these inteins
305
are inserted into the same highly conserved sequence in the
306
ATP binding site. This indicates that the inteins in
307
Thermoplasma and
308
Pyrococcus are homologous in the
309
evolutionary sense,
310
i.e., they are derived from a common
311
ancestral gene. However,
312
Pyrococcus and
313
Thermoplasma are not considered
314
closely related Archaea. Based on their 16S rRNA they are
315
considered distantly related Euryarchaeotes (See Fig.
316
3b).
317
One explanation for the discrepancy between phylogenetic
318
classification and distribution of the A-ATPase intein is
319
horizontal gene transfer: the intein was not present in the
320
last common ancestor of
321
Thermoplasma and
322
Pyrococcus, rather the intein invaded
323
one of the lineages after their split and was more recently
324
horizontally transferred to the other lineage. The
325
horizontal transfer scenario is more parsimonious than the
326
assumption of presence in the shared common ancestor
327
because the latter requires several independently occurring
328
losses of the intein together with its long-term
329
persistence in the
330
Pyrococcus and
331
Thermoplasma lineages (
332
cf. Fig. 3b). Horizontal transfer of
333
whole genes and operons between divergent species is a
334
frequent event [ 34 35 36 37 ] . Even house keeping genes
335
are transferred between divergent species [ 37 ] .
336
Two possibilities exist for the horizontal transfer
337
scenario. Either the whole A-ATPase catalytic subunits was
338
transferred, or the intein alone spread as a selfish
339
genetic element. To discriminate between these two
340
scenarios, we constructed the phylogeny of the host protein
341
(Fig. 3a). The resulting phylogeny is in reasonable
342
agreement with the ribosomal rRNA phylogeny, the main
343
exception being the placement of
344
Desulfurococcus. According to its 16S
345
rRNA this organisms is clearly classified as a
346
Crenarcheote; however, its ATPase catalytic subunit groups
347
with
348
Thermococcus sp. The finding that the
349
host protein itself does not group the genus
350
Thermoplasma with the
351
Pyrococci suggests that the intein
352
alone was transferred between
353
Thermoplasma and
354
Pyrococcus, and that the sequence
355
similarity between the
356
Thermoplasma and
357
Pyrococcus catalytic subunits was
358
sufficient to allow homing into the same insertion
359
site.
360
The dispersion of the intein as a selfish genetic
361
element is consistent with the work of Goddard and Burt [
362
38 ] on the persistence of an intron with homing
363
endonuclease in yeast mitochondria. These authors studied
364
the distribution of empty target sites, and introns with
365
and without a functioning endonuclease gene among different
366
yeasts. They concluded that the long term persistence of
367
the intron depends on a cycle that begins with invasion of
368
the empty target site by an intron with the help of a
369
homing endonuclease encoded by an open reading frame within
370
the intron. However, once the intron containing allele is
371
fixed in the population, the endonuclease, which itself had
372
been the reason for the rapid spread of the intron in the
373
population, is no longer under selection, the endonuclease
374
becomes non-functional and is lost, resulting in an intron
375
without homing endonuclease activity. However, once the
376
endonuclease is lost the intron containing allele becomes
377
more likely to be replaced with an allele that has lost the
378
intron altogether and the cycle of invasion and successive
379
loss begins anew.
380
The process of intein homing is likely to depend on a
381
similar cycle; however, the time intervals for loss of the
382
intein are likely to be longer than the loss of the intron.
383
The A-ATPase and the V-ATPase inteins are located in the
384
most conserved part of the host gene. Any deletion of the
385
intein from the gene itself needs to be precise, because
386
any alteration of the amino acid sequence in the catalytic
387
site is likely to result in a non-functioning enzyme.
388
Comparative sequence analysis (Fig. 1) shows that the
389
T. acidophilum and
390
T. volcanium inteins do not contain
391
an endonuclease domain. Our search of the genomes of these
392
Archaea did not identify any endonucleases that might
393
function in homing. While the failure to identify a homing
394
endonuclease is not proof of absence, the presence of a
395
homing endonuclease acting in trans would be unprecedented
396
and has to be regarded as improbable. The cyclic reinvasion
397
model for long term persistence of a selfish genetic
398
element through homing [ 38 ] suggests that the small
399
Thermoplasma intein evolved through
400
reduction from a large endonuclease containing intein
401
similar to the one found in the pyrococcal A-ATPase.
402
Apparently, the small intein has been persisting in the
403
Thermoplasma A-ATPase since the split
404
between
405
T. acidophilum and
406
T. volcanium without the help of a
407
homing endonuclease.
408
The cyclic reinvasion model also explains why the
409
insertion site is in a region of very low substitution
410
rates: The high degree of sequence similarity surrounding
411
the integration point facilitates the intein transfer
412
between different populations and species using the homing
413
endonuclease. A more variable sequence surrounding the
414
integration point would restrict homing to members of the
415
same species, and would thus lower the chances for long
416
term survival of the intein.
417
418
419
Conclusion
420
The small intein in the
421
Thermoplasma A-ATPase is closely
422
related to the endonuclease containing intein in the
423
Pyrococcus A-ATPase. Phylogenies
424
constructed with the host protein (A-ATPase catalytic
425
subunit) and with 16S rRNA do not group these two organisms
426
together, suggesting that the A-ATPase intein spread
427
through horizontal gene transfer. The small intein has
428
persisted in
429
Thermoplasma apparently without
430
homing. The
431
T. acidophilum intein retains
432
efficient self-splicing activity when expressed in
433
E. coli. This activity does not
434
depend on the physicochemical conditions in the
435
T. acidophilum cytoplasm.
436
437
438
Materials and Methods
439
440
Plasmid constructs
441
The
442
T. acidophilum A-ATPase A subunit
443
encoding gene was amplified from genomic DNA using
444
primers Ta-4 (ATGGATCCTTCTCAACGAAGAGCAGTG) and Ta-5
445
(GAGGTGAACATATGGGAAAGATAATCAG). These primers match the
446
coding sequence of the
447
Thermoplasma acidophilum A-ATPase A
448
subunit (gene identification number 9369337) and
449
introduce restriction sites useful in subcloning.
450
Initially, the PCR product was cloned into pCR R2.1 (TA
451
cloning Vector, Invitrogen). After digestion with
452
Nde I and
453
BamH I the coding sequence was
454
subcloned to the vector pET-11a (Stratagene) for gene
455
expression experiments.
456
457
458
Protein expression and determination
459
460
E. coli Bl21(DE3) and
461
Bl21-CodonPlus(DE3)-RIL strain (Stratagene) were used for
462
protein expression. If not stated otherwise, transformed
463
E. coli were grown overnight in a
464
culture wheel at 37°C in Luria-Bertani broth with
465
ampicillin (100 μg/mL). One mL of the broth was used to
466
inoculate 10 mL of LB broth containing ampicillin and
467
incubated at 37°C (200 rpm). After 2 hours, isopropyl
468
thio-β-D-galactoside (IPTG) was added to a final
469
concentration of 1 mM to induce gene expression and the
470
cultures. Cells were harvested 4 hours after induction by
471
centrifugation at 7000 rpm for 10 min and washed with TEP
472
buffer (0.1 M Tris-HCl, pH 7.4, 0.01 M EDTA and 1 mM
473
phenyl methyl sulfonyl fluoride). Cells were resuspended
474
in 500 μL of TEP buffer and sonicated with a Braun-Sonic
475
U with micro-tip for 4 minutes (0.5 duty cycle; power
476
output approximately 120). The disrupted cells were
477
centrifuged at 8000 rpm and the pellet was resuspended in
478
500 μL of fresh TEP buffer. Both the supernatant and the
479
pellet were diluted by adding 6X sample buffer (10% SDS,
480
1.2 mg/mL bromphenol blue, 0.6 M DTT, 30% glycerol, .1 M
481
Tris/HCl pH 6.8) and heated to 80°C for 15 minutes to
482
denature the protein. Five to fifty microliters of this
483
preparation were run in a 10% denaturing Tris/Tricine SDS
484
polyacrylamide gel electrophoresis system as described by
485
Schagger and von Jagow [ 42 ] .
486
Non-denatured protein was prepared with the same
487
procedure except that SDS was omitted from the buffers
488
and that the samples were not heated. Proteins were
489
electroeluted from the non-denaturing gel as describe
490
here [ 24 ] . Gels were fixed with fixing solution (50%
491
methanol, 10% acetic acid) for 15 minutes, followed by
492
staining in Coomassie staining solution (20% acetic acid,
493
0.025% Coomassie blue G-250) for one hour, followed by
494
destaining in destaining solution (5% methanol, 10%
495
acetic acid) [ 42 ] .
496
497
498
DNA sequencing and cloning
499
DNA was sequenced using the ABI PRISM BigDye
500
Terminator Cycle Sequencing (PE Applied Biosystems).
501
Sequencing gels were ran and processed in the Biotech
502
Center (University of Connecticut)
503
504
505
Codon usage
506
The program Codon Usage Tabulated from GenBank (CUTG)
507
at http://www.kazusa.or.jp/codon/ [ 43 ] was used to
508
calculate the codon usage of individual genes and
509
genomes.
510
511
512
Sequence retrieval, alignment and phylogenetic
513
reconstruction
514
The
515
Pyrococcus furiosus A-ATPase
516
sequence was retrieved via blastp from the unfinished
517
genome using the web page
518
http://combdna.umbi.umd.edu/bags.html. The aligned small
519
subunit ribosomal RNA sequences were retrieved from the
520
Ribosomal Database Project II http://rdp.cme.msu.edu [ 44
521
] . Francine Perler (New England Biolabs) the curator of
522
the intein database
523
http://www.neb.com/inteins/int_reg.htmlkindly provided
524
the sequences of all known inteins. All the other
525
sequences were retrieved from the NCBI databank.
526
Sequences were aligned using CLUSTAL X 1.8 [ 45 ] .
527
The number of substitutions per site in the aligned data
528
set were calculated using a JAVA program written by Olga
529
Zhaxybayeva (University of Connecticut). This program
530
calculates and plots the number of substitutions in a
531
sliding window of 10 aligned positions. The window is
532
moved through the alignment one position at a time. For
533
positions where less than 50% of the sequences have a
534
gap, the average number of substitutions was calculated
535
considering the gap as an additional character. Positions
536
with gaps in more than 50% of the aligned sequences were
537
skipped.
538
Phylogenies were reconstructed from amino acid
539
sequences aligned using CLUSTAL X 1.8 [ 45 ] . The
540
topologies of the depicted trees were calculated using
541
neighbor joining as implemented in CLUSTAL X with
542
correction for multiple substitutions. Branch lengths and
543
their confidence intervals were calculated with
544
TREE-PUZZLE 5.0 [ 46 ] using the JTT or the HKY model for
545
substitution respectively, and assuming an among site
546
rate variation described by a gamma distribution.
547
Bootstrap samples were analyzed using parsimony as
548
implemented in PAUP* 4.0 beta 8 [ 47 ] treating gaps as
549
missing data. Each bootstrapped sample was analyzed using
550
10 different starting trees built through random
551
addition, tree-branch-reconnection (TBR) branch swapping,
552
and considering gaps as missing data. The following
553
sequences were used for the 16S rRNA phylogeny (the
554
sequences are available under these names from the
555
Ribosomal Database Project II http://rdp.cme.msu.edu/:
556
Sul.solfa4, Sul.acalda, Ap.pernixl, Mc.thlitho,
557
Mc.janrrnA, Mb.tautot2, Pc.furios2, Pc.abyssi, AP000001,
558
AB016298, Tpl.acidop, Arg.fulgid p, Hf.volcani,
559
Hb.spCh2_2, Hb.salina2, Msr.mazei5, Msr.barke2,
560
Dco.mobili.
561
562
563
564
565
566