ArchiveTeam Yahoo Messages Followup: Success

The response to my post last week was amazing. As I write this, there are nine items left processing out of what ended up being more than 200,000 – a mixture of groups of threads and forum pages.

For some perspective, when I posted that there were only eight days left, about 5,000 items had been processed in the five preceding days. Within 24 hours, more than 160,000 items had been processed, as various individuals and companies (including AirBnB engineering) spun up hundred of instances to help break the task.

yahoomessages-23-25

I had dropped to sleep, exhausted, after spending all Friday night mucking around with the portion of the code responsible for the Yahoo Messages job, building the AMIs, writing a blog post and submitting it to Hacker News.

By the time I woke up, six hours later, it was on the front page and people were getting involved. I spent the rest of the day helping people get up and running along with others in the IRC channel, eventually making the AMI available in other regions (using the recently released AMI copy functionality).

I also ended up writing a tool to help people trim down the many spot instances they had launched (several hundred each), slayer (using boto and ansible). There turned out to be a long tail of very large threads, no more than a thousand, that have taken the last five days to slowly work down.

Parts of the archive are already being pushed to archive.org (though not in an immediately browsable format); you can see updates here.

I hadn’t been involved with the Archive Team before this, just a long time appreciator of the work they’ve done – it was supremely gratifying to be able to give back, and there are plans for spot instances to play a much bigger role in future Archive Team endeavours.

Thanks to all who got involved or spread the word, and major thanks to Jason Scott and the rest of the Archive Team who have been tirelessly saving our history from before we knew that was what we were losing.

If you’d like to get involved, check out ArchiveTeam.org, the excellent Tracker, and join us in #ArchiveTeam on EFnet.

This entry was posted in Code. Bookmark the permalink. Both comments and trackbacks are currently closed.

One Trackback

  1. […] See the follow-up post. […]