Monday, January 12, 2009

Where pages go when they die


"Dear AOL Hometown user, We're sorry to inform you that as of Oct. 31, 2008 Hometown was shut down permanently. We sincerely apologize for any inconvenience this may cause."

If you've ever received a message like this, or lost work you've posted on the Web, you can understand the frustration of these Hometown members after the site was shut down:

  • How do I download my Hometown website files? Help. My website is gone.
  • I need my files back. This is crap. You can't just close it and delete our files!!!
  • My question is like those above. Is there anyway still to retrieve my journals and homepages? I tried before the deadline but nothing happened. These are my memories. Things I wanted to remember about my kids. And when I tried to access them before the deadline I was unable to. Otherwise I would have printed it all out. Please help. (more posts here)

Knowing where to look for lost pages can help alleviate some of the frustration. Preserved copies of web pages can sometimes be found, as at least one person noted in response to the notice above, by using the WayBack Machine at archive.org. The WayBack Machine is a search engine that sits atop a massive collection of archived pages. Archive.org, with 85 billion pages, is a much larger database than Google. The URL of the lost page is required in order to retrieve it; standard keywords are not effective.

Not all pages are saved by archive.org, so this doesn't work in all cases. If a page was created and lost recently, it may be in a queue and not yet posted to the database, as archive.org has amassed a backlog of pages to post. Or it may not have been crawled and fed to the archival database.

Depending on public interest in the original page, it may be archived elsewhere. Searching Google or other databases using the URL or the name of the document sometimes reveals a trail that leads to the lost pages. Since I'm not sure what URL those Homepage pages once had--or if they were ever backed up by an archival site--I'll use a different example that demonstrates this type of search challenge.

About 8 years ago, a professor by the name of Mutsumi Suzuki retired and removed his large collection of Magic Squares. The original pages were hosted on a Japanese higher ed site. That valuable collection is not irretrievably lost, however. Archive.org preserved a copy of the complete collection, including the page where Suzuki explains why he's removing them. Here's a link that will explain how to find the pages in archive.org. Also try googling the name of the original document, magicsquares.html, along with the name of the author, Suzuki. Turns out there's at least one other site that thought this material was worth preserving.

Next time you encounter a "page not found" message, try looking for the page in archive.org or google it to see if it was preserved somewhere else.

And if you want to help AOL's Hometown victims, see what you can do.




1 comment:

Anonymous said...

Internet
|
|
|
\ /
Outernet
|
|
|
\ /
Overnet
|
|
|
\ /
Undernet
|
|
|
\ /
Darknet
|
|
|
\ /
Hidden Net
|
|
|
\ /
???????????????

Don't be afraid