Distributed archiving 2  #
Friday, 30 Mar 2007 03:57PM
Further notes on keeping your site alive...

A commenter on Dave's site suggested that there is nothing stopping someone creating a trust in their Will to continue to ensure their legacy remains.

The Internet Archive points to Archive-It, a subscription service that will keep searchable archives of your site. Again, this is something you could continue to pay via a trust and inform your public to point to it in the event of your demise.

Also, Archive.org will accept URLs requests from the public. There is a deliberate 6 month delay on any new pages.

And of course the point we've all missed is that most blogs I read I read via RSS! If they stopped updating I wouldn't read them anymore. End of story. It wouldn't matter if it was archived or not.

For every blog but my own the only reason I'd hit an archived post is because Google sent me there or because the blogger linked to it in reference to their current post.

Perhaps part of the problem is that assuming anyone will care sounds a little assumptive. Like setting up the "AnAussieMusicFan institute dedicated to the history of AnAussieMusicFan".

Taking control of your own history seems to fit in a similar place to editing your own Wikipedia biography.


Distributed archiving  #
Friday, 30 Mar 2007 02:32PM
ScriptingNews Dave will be celebrating 10 years of bloging by offering the entire archive of his site as one downloadable archive.

He was prompted to do this by brainstorming some web ideas, one of which was the following:

2. I'd like to be able to pay a web company like Amazon or Google a one-time flat fee to host my content for perpetuity. I'd deposit my writing with them, on the web, and not worry about whether or not my heirs will keep paying the hosting bills to keep it alive. [...]

The Internet Archive serves this purpose well for static sites assuming it knows to crawl them.

Perhaps the Archive could benefit from clearer instructions on how to instruct it to crawl a site for permanent archiving. But the Internet Archive doesn't keep the site in it's original location. Once a site is gone you have to know to go looking for it.

What Dave is looking for is a service that will guarantee a site will remain online forever without further instruction from it's owner. That's a big call for any company. But I'm not sure it's necessary.

What prompted my thoughts was Dave's offer to provide his website as a single downloadable package. There is no easy way for a user to download the entire contents of a website without crawling it's entire contents via clicking every link one a time.

There have been tools to automate this process for over a decade, I remember back in the mid 90s it was one of the first things I thought to try when I discovered the web... "I want this whole website on my local machine, how?"

Crawling a whole website is bad form for a user to do. It wastes bandwidth, costs the website owner money. But search engines and archive sites like the Internet Archive do it all the time.

And yet, even for web site owners that do want their users to be able to download their whole site as a single package for archiving or knowledge sharing purposes, there is no easy way to do this without manually creating a package.

Personally I'd love to provide a big fat ZIP file of all the HTML on my Faith No More gig database. But all the HTML is generated by code. In order to provide a package I'd have to download everything, zip it, then upload it again.

It's far more difficult for those websites (the majority I expect) that are database driven and have no 100% meaningful "static" version to provide as a download. And I'd expect most commercial website would prefer you didn't.

Wouldn't it be nice if there was a standard protocol for downloading a whole site that could be easily coded?

But maybe that isn't even necessary. Maybe it's a service Google could offer? A link next to their "cache" button, "Download whole site". Imagine the copyright carnage that would cause?! It should be opt-in of course...

Certainly I don't provide a big download of this site or any other because the care-factor to effort ratio just isn't compelling enough. Who'd want this whole site on their local machine? That's... silly. And of course, it's always going to be here init?

But I'm not sure Dave cares so much about the data. I think the ultimate aim of the desired service would be to keep the site running, the URL, the location, for eternity. The ultimate cyber squatting. Cold storage for your brain.

It would be nice to think your efforts remained part of history after your death. If you wrote a book, at the least it might be published and placed in a library for all time.

Your blog could easily die along with you within months. That is what Dave would like to prevent, and as yet there is no direct service available to prevent that happening.