Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Download
29547 views
1
2
3
4
5
6
Nothing but Net
7
8
The Internet is a moving
9
target. Every minute, thousands of Web pages are updated or abandoned. Messages
10
sent to newsgroups replace older postings. All but a fraction of the chat-room
11
conversations and digital images that streak across the Net vanish after
12
they're displayed.
13
14
Seeking
15
to preserve the chaos of the Net for posterity is Brewster Kahle, a man with a
16
mission, a server, and a lot of magnetic tape. Kahle, who once designed
17
computers for Thinking Machines Corp., founded the Internet Archive in 1996 to
18
collect and store all the disparate bits of the Internet. From offices in the
19
Presidio, the former Army base adjacent to San Francisco's Golden Gate Bridge,
20
the Internet Archive's powerful computer the Internet at high speeds.
21
Consulting intelligent algorithms about what information to store and how
22
often, the archive's computer copies data to tape cassettes on a Quantum
23
DLT4500 recorder. When each cassette is full, a robotic arm removes it, stores
24
it in a carousel, and replaces it with a blank one.
25
26
The Internet may seem impossibly vast to users, but in fact
27
it's quite finite. The entire World Wide Web is currently estimated to contain
28
about 1.5 terabytes (or 1.5 million megabytes) of data. Newsgroups, , and other
29
Internet subsystems account for another 5.5 or so terabytes. (Compare these
30
numbers with the 20 terabytes of ASCII data contained in the Library of
31
Congress' 20 million books or the 8 terabytes of data at the average video
32
store.) With tape-cassette storage costing only $20 per gigabyte (1 billion
33
bytes), archiving the Internet is practically economical. Already, the
34
archivists have stockpiled more than 2 terabytes of the Net, and currently
35
they're storing about 100 gigabytes of data every month. Faster connections to
36
the Net promise to speed things up, and Kahle estimates that his group will be
37
done by the end of 1997.
38
39
Storing
40
the Internet once is only the beginning. As experienced Web surfers
41
know, things change rapidly on the Net. The archive doesn't have the computer
42
muscle to store the publicly available Internet every week, but even if it did,
43
a lot of stuff would still fall through the cracks. On sites like MSNBC and CNN, breaking news comes and goes
44
every minute, which means pages disappear faster than they can currently be
45
squirreled away. Slate is updated daily. Shifting faster still are Web sites
46
generated by databases, such as the online bookstore Amazon.com. Because the
47
information these sites produce is specific to a user's experience, they can
48
generate a literally infinite number of different pages. Finally, much of the
49
traffic on the Internet is dynamic--chat rooms, instant messages, and now even
50
phone conversations. To archive the Internet with absolute fidelity would
51
require cloning not only every computer on the Internet, but also every person
52
using every computer.
53
54
55
Many responsible netizens already archive
56
themselves for selfish reasons. Archiving is a no-brainer for publication sites
57
like the San Francisco Chronicle 's The Gate, which collects the contents of the daily
58
newspaper and connects them to a good search engine. And other sites like
59
Deja News already
60
assemble postings from the Internet newsgroups.
61
62
63
Where the
64
Internet Archive trumps these archives, of course, is in its sheer
65
comprehensiveness. While it isn't a replica of the Internet, it's a start. And
66
it's not useful just to historians. Suppose your Web browser allowed you to
67
specify not only an address but also a date. Remember that headline you saw on
68
Wired News , but
69
have been unable to find since? The headline was posted for only a day, and you
70
haven't had much luck using the site's search tool to locate the piece. But
71
using the Internet Archive to turn back the hands of time will uncover it for
72
you. And what about your teen-age cousin's Web page, with that cute picture of
73
her Mohawk? Cousin's mother cancelled her ISP account, and now the site is
74
gone. But an intelligent browser could catch the "no such site" error and look
75
it up on the archive instead, displaying the last-known version. Did your
76
favorite politician really just flip-flop on your hot-button issue? Compare
77
last year's campaign Web site with today's. These are just a few of the many
78
valuable services that promise to keep the nonprofit Internet Archive richly
79
endowed.
80
81
Useful though it might be, the idea of archiving the
82
Internet is assailed by all sides. David Berreby argued last year in Slate that exhaustive
83
documentation of our world threatens to box us into a corner. The recent
84
"Documenting the Digital
85
Age" conference gathered experts from the computing, telecommunication, and
86
archiving worlds to explore these issues. Corporate executives complained that
87
because their archives are routinely subpoenaed by plaintiffs' attorneys, they
88
have every incentive to shred their data instead of preserving them. Lawyers
89
worried aloud about privacy and copyright concerns. Should you have the right
90
to exclude your public page from the archive? (Consensus opinion: Yes.) Should
91
we be saving usage logs, which detail every page a person sees? (Probably not.)
92
Doesn't this whole thing violate current copyright laws left and right? (Almost
93
certainly.) Should those laws be amended to allow such an archive?
94
(Probably.)
95
96
Professional archivists argue
97
that it's a waste of time to store the Internet without providing a proper
98
historical context. Others say that having too
99
much information
100
about the Web at our disposal will be as bad as not having enough. They add
101
that finding things promptly on the Web with a search engine is hard enough,
102
that using it as a historical research tool would be incredibly painful. They
103
advocate an orderly weeding, assembling, and categorizing of digital records.
104
Microsoft's chief technical officer (and Slate contributor), Nathan Myhrvold,
105
whose "Save the Web" memo last year helped start the archive movement, counters
106
that we don't know now what will be important later. Your cousin might grow up
107
to be president, at which point her teen-age Mohawk Web site will become
108
substantially more important than it is now. Myhrvold adds that it's better to
109
start saving today's Internet now, even if it is badly collected and organized,
110
rather than lose it forever.
111
112
And to
113
think that Brewster Kahle thought he was just solving a problem by starting the
114
Internet Archive, and not introducing lots of new ones.
115
116
117
118
119
120