Monday, June 11, 2007

Amazon EC2 For Scientific Processing

[Amazon EC2 is generating a lot of buzz in scientific computing circles as it makes distributed computing platform projects a lot easier to implement, as opposed to federating disparate resources from different management domains. Some also claim that EC2 is considerably cheaper for a researcher than what it would cost to provide power and cooling to most HPC computing clusters. Thanks to Richard Ackerman for this pointer-- BSA]

http://aws.typepad.com/aws/2007/06/amazon_ec2_for_.html

Amazon EC2 For Scientific Processing

Bioinformatics_for_dummies Mike Cariaso was kind enough to set up the Meetup in Bethesda for my upcoming trip to Washington, DC. Mike has done some pretty cool work with with Amazon EC2, setting up the mpiBLAST tool to run on EC2.

MPI, short for Message Passing Interface, is a standard for coordinating processing on supercomputer grids. MPIPCH2 is a popular implementation of MPI.

BLAST is the primary bioinformatics tool used to query genome sequences against an established database, or to match one sequence against another. The primary BLAST tool is run as an online service by the National Institute of Health.

Running BLAST over MPI lets BLAST run on a processing grid; this variant is called mpiBLAST.

Mike's work builds on that of Peter Skomorch, who did the work needed to get MPIPCH2 running on Amazon EC2. Peter documented his work in a very informative set of blog posts:

* On-Demand MPI Cluster with Python and EC2 (part 1 of 3)
* MPI Cluster with Python and Amazon EC2 (part 2 of 3)
* Amazon EC2 Considered Harmful

That last post doesn't actually reference EC2, but it is entertaining nonetheless. Part 2 ends with a parallel fractal calculation running on 5 EC2 instances!

By the way, I'm very interested in hearing about more academic and scientific uses of EC2. Please feel free to post a comment.

-- Jeff;

See also past postings:

http://lists.canarie.ca/pipermail/news/2006/000365.html