- Special Sections
- Public Notices
Most of the Internet is lost in time.
It’s an amazing information system, most would agree, but for all practical purposes it exists mostly in the present and has a shrouded past.
This simple paradox as it applies to the Internet has been troubling Herbert Van de Sompel for quite awhile.
He’s a computer scientist at Los Alamos National Laboratory, team leader of research and prototyping in the lab’s Research Library
Working with Michael Nelson, a professor of computer science and a colleague from Old Dominion University, Van de Sompel has decided to do something about this missing time dimension on the Internet.
“Log on to the CNN Web page,” he said in an interview this week. “You can get today’s page, but you can’t get to the previous day’s page.”
That’s because when one types the hypertext transfer protocol, “http://” into the address bar in the computer browser, that computer talks to a server, which is set up to display only the current page.
“You can’t say, give me six months ago,” Van de Sompel said.
Books, letters, monuments, almost any record system one might think of – they almost always have a time or a date somewhere, which gives them a definite time dimension.
In fact, many web sites keep a record of every page, but those saved pages have a different address now. Some collaborative systems, like Wikipedia and other wikis, allow access to multiple versions of a document in progress, an instructive exception, but not a common arrangement.
Then there is another “pocket” of the Web, as Van de Sompel calls it, which does keep a record in time.
This is the Internet Archives, one feature of which is the Wayback Machine, which allows the browser to recover old pages that were once on the Internet, but now are no longer there.
Computer “bots” search the web and store copies of pages that can be often be retrieved by date.
“Not every single day, but they have a lot,” said Van de Sompel.
“This is a problem also, because for many people to find an old version of the BBC News site, for example, they have to know the Way Back Machine exists and then they have to go there and do a search. Finding old versions is a search activity. You can’t just type an address.”
To deal with this Van de Sompel and Nelson have developed an architectural solution, called Memento, that adds a time dimension to the web.
It’s a fairly simple piece of code that uses a capability computers already have, called “content negotiation” that currently allows a browser to express a preference for the kind of pages it wants – in English, for example, if the user speaks English.
With a little bit of code and an international system, the surfer of the future can wade into the past and spear a document according to its date or time with no more effort than typing an address. If one needs to go to the Way Back Machine to do that, the protocol takes care of the arrangements.
The hard part about Memento will be the process of changing the standards, which might seem like a daunting task given the universe of the Internet.
“It’s totally doable,” said Van de Sompel, “but going from our experiment to where the whole world can do it involves politics, discussion and friends in high places.”
Van de Sampel has another advantage. He has already shepherded three standards through the system. Two of them have been widely adopted, he said, and it’s still too early to tell about a third effort.
He was asked if he would be rewarded.
“Not at all,” he said. “That’s the story of my life, but there are indirect rewards, not dollars but prestige and recognition.”