Tuesday, June 19, 2007

Amazon S3 for Science Grids: A viable solution?



http://aws.typepad.com/aws/2007/06/amazon_s3_for_s.html

Amazon S3 for Science Grids

S3_for_science_grids_revised A team of researchers from the University of South Florida and the University of British Columbia have written a very interesting paper, Amazon S3 for Science Grids: A Viable Solution?

http://www.csee.usf.edu/~anda/papers/AmazonS3_TR.pdf

In this paper the authors review the features of Amazon S3 in depth, focusing on the core concepts, the security model, and data access protocols. After characterizing science storage grids in terms of data usage characteristics and storage requirements, they proceed to benchmark S3 with respect to data durability, data availability, access performance, and file download via BitTorrent. With this information as a baseline, they evaluate S3's cost, performance, and security functionality.

They conclude by observing that many science grid applications don't actually need all three of S3's most desirable characteristics -- high durability, high availability, and fast access. They also have some interesting recommendations for additional security functionality and some relaxing of limitations.

I do have one small update to the information presented in the article! Since it article was written, we have announced that S3 is now storing 5 billion objects, not the 800 million mentioned in section II.