Comments on: ArchiveTeam + Yahoo Messages Shuttering + EC2 Spot Instances = MegaCrawl http://rossduggan.ie/blog/technology/archiveteam-yahoo-messages-shuttering-ec2-spot-instances-megacrawl/ Move slow and fix things. Sun, 26 May 2013 13:37:18 +0000 hourly 1 By: ArchiveTeam Yahoo Messages Followup: Success | Ross Duggan http://rossduggan.ie/blog/technology/archiveteam-yahoo-messages-shuttering-ec2-spot-instances-megacrawl/comment-page-1/#comment-9511 Thu, 28 Mar 2013 23:33:22 +0000 http://rossduggan.ie/?p=11530#comment-9511 […] response to my post last week was amazing. As I write this, there are nine items left processing out of what ended up being more […]

]]>
By: TryingToHelp http://rossduggan.ie/blog/technology/archiveteam-yahoo-messages-shuttering-ec2-spot-instances-megacrawl/comment-page-1/#comment-9414 Sat, 23 Mar 2013 19:21:54 +0000 http://rossduggan.ie/?p=11530#comment-9414 Oh yeah, the CloudFormation templates I created use the Amazon Linux AMI for each region, so this way the person who created the other AMIs doesn’t need to worry about maintaining them, or paying for access to them. These are standard across regions, are free, and might work a slight bit better too!

]]>
By: TryingToHelp http://rossduggan.ie/blog/technology/archiveteam-yahoo-messages-shuttering-ec2-spot-instances-megacrawl/comment-page-1/#comment-9413 Sat, 23 Mar 2013 19:20:10 +0000 http://rossduggan.ie/?p=11530#comment-9413 I built 2 CloudFormation templates to allow you to easily spin up a ton of these things across multiple availability zones:

With a keypair ( so you can login to the host)
http://files.wordsaboutbytes.com/yahoo-messages-save.cf.txt
Without a keypair ( can’t log in locally, but it will run)
http://files.wordsaboutbytes.com/yahoo-messages-save-nokeypair.cf.txt

1. Open the console
2. Go to CloudFormation
3. Give your stack a name.
4. Select the file you downloaded from above
5. Click Next.
6. Fill in the parameters here ( # of instances, The nick you want to be tracked with at the archive team site, the spot price you are willing to pay, and optionally a keypair if you selected that file).
7. Check the box at the bottom acknowledging that the template will create IAM resources ( used by the host to bootstrap )
8. Click Continue.
9. Tags if you want, or click continue.
10. Review. Click Continue.
11. Close.

This will launch however many instances you told it to, as t1.micro’s, as the spot price you set it to. When you want to stop, you just go and delete the stack in this console and everything should go away.

]]>
By: Ross http://rossduggan.ie/blog/technology/archiveteam-yahoo-messages-shuttering-ec2-spot-instances-megacrawl/comment-page-1/#comment-9412 Sat, 23 Mar 2013 17:55:40 +0000 http://rossduggan.ie/?p=11530#comment-9412 Not 100% sure on the individual cost as I’m only monitoring aggregate bandwidth and not tracking when instances are killed or reappear, but I’m running about 300 right now.

That’s at about $0.003 per hour per instance, and only charged for outgoing bandwidth (about 10 GB for all so far). Ballpark figure is about $23 per day. It’ll fluctuate up occasionally, but I’ve set my payment limit to $0.005, so there’s a roof.

]]>
By: Mark http://rossduggan.ie/blog/technology/archiveteam-yahoo-messages-shuttering-ec2-spot-instances-megacrawl/comment-page-1/#comment-9411 Sat, 23 Mar 2013 17:40:00 +0000 http://rossduggan.ie/?p=11530#comment-9411 Including the cost of bandwidth downloading from Yahoo and uploading to Archive Team, how much are each of your instances costing you per day?

]]>
By: Ross http://rossduggan.ie/blog/technology/archiveteam-yahoo-messages-shuttering-ec2-spot-instances-megacrawl/comment-page-1/#comment-9406 Sat, 23 Mar 2013 15:45:38 +0000 http://rossduggan.ie/?p=11530#comment-9406 Could be just a problem using HTTP, I’ve changed the gist to use a git URI instead, try that.

]]>
By: TryingToHelp http://rossduggan.ie/blog/technology/archiveteam-yahoo-messages-shuttering-ec2-spot-instances-megacrawl/comment-page-1/#comment-9405 Sat, 23 Mar 2013 15:26:54 +0000 http://rossduggan.ie/?p=11530#comment-9405 Trying to follow the steps you have listed in the gist there, but this fails:

# git clone https://gist.github.com/5226491.git setup-config && cd setup-config
Cloning into setup-config…
error: The requested URL returned error: 403 Forbidden while accessing https://gist.github.com/5226491.git/info/refs

fatal: HTTP request failed

Thoughts?

]]>
By: Ross http://rossduggan.ie/blog/technology/archiveteam-yahoo-messages-shuttering-ec2-spot-instances-megacrawl/comment-page-1/#comment-9402 Sat, 23 Mar 2013 14:34:55 +0000 http://rossduggan.ie/?p=11530#comment-9402 You’ll be doing about 10x as much good/price, yes!

There’s a practical limit after which you have to contant Amazon to increase. That limit is 100 spot instances (or 20 on-demand), according to AWS repos on Quora.

]]>
By: Mark http://rossduggan.ie/blog/technology/archiveteam-yahoo-messages-shuttering-ec2-spot-instances-megacrawl/comment-page-1/#comment-9401 Sat, 23 Mar 2013 14:13:46 +0000 http://rossduggan.ie/?p=11530#comment-9401 This sounds like a silly question, but what happens if I request 10 instances instead of the default 1? Will I be doing 10x as much good for the project (at 10x the AWS bill)? Assuming I can afford it, is there a practical limit to how many instances I can run?

]]>
By: Ross http://rossduggan.ie/blog/technology/archiveteam-yahoo-messages-shuttering-ec2-spot-instances-megacrawl/comment-page-1/#comment-9398 Sat, 23 Mar 2013 13:10:55 +0000 http://rossduggan.ie/?p=11530#comment-9398 Scott: it uploads periodically, so only the last batch it was crawling will be lost.

JB: you may need to search under “all AMIs”

AK/Andyfoo: I’ll see what I can come up with.

]]>