Amazon EC2 and Data Persistence
So, another Amazon Web Services article. Ugh. I’m just trying to figure out the best way to launch an application.
I had commented on Slashdot that EC2 had no persistent storage and was rebutted by someone who said that you shouldn’t trust mirrored RAID‘d disks combined with a daily backup either and so other providers only gave the illusion of persistent storage.
The solution proposed by the commenter was to launch 2 database server instances and use mySQL’s clustering ability to replicate onto the backup. That way, you have a hot spare ready to take the place of the original should anything go wrong.
That is the correct way to do it if you’re an established site with heavy traffic. I just want to know that my storage will live through a server crash or power blip.
Am I silly thinking that RAID mirroring provides enough protection? The things it doesn’t protect against are basically catastrophes like a data center burning down. How likely is it that two disks would simultaneously fail? More importantly, if I’m not dealing with RAID storage, how does one mirror protect me against that any more.
I’m glad that I was rebutted (even though I still disagree) because, while I don’t need that ability now, I might in the future. Right now, anything I make can deal with being restored from the daily backup in the very rare event that a RAID10 would fail. If I were running MySpace or something, it makes sense to trust RAID less since downtime there is a much bigger deal than my downtime.
EC2 may force you to think about the worst case scenario, but it makes you think of it when you’re too small for it to be meaningful. It’s something that big players need to be forced to think about, not small operations.