database

EC2 Hosting Architecture, Two Years Later

Posted by Mike Brittain on July 12, 2010
Cloud Computing / Comments Off on EC2 Hosting Architecture, Two Years Later

It’s been nearly two years to the day since I wrote my post about the hosting platform I setup on EC2.  The post still gets plenty of traffic and this week I was asked if it was still valid info.  I think there are some better ways of accomplishing what we set out to do back then, and here is a summary.

1. Instead of round-robin DNS for load balancing, you can now use Amazon’s Elastic Load Balancing (ELB) service.  The service allows for HTTP or TCP load balancing options.  I found that the HTTP 1.1 support in ELB is somewhat incomplete and “100 Continue” responses were not handled properly for large image uploads (a specific case I was using).

2. I chose Puppet two years ago for configuration management.  Since that time, OpsCode has released Chef, which is a friendlier way to manage your systems (in my opinion) that we also happen to use at Etsy.

3. Our database layer was built on four instances for MySQL, in a fairly paranoid configuration.  We had strong concerns about instances failing and losing data.  There are a couple of new tools available to help with running MySQL on EC2.  You can use the Elastic Block Store (EBS) for more resilient disk storage, or choose the Relational Database Service (RDS) which is MySQL implemented as a native service in AWS.  Disclaimer: I haven’t deployed production databases using either of these tools.  These are only suggestions for possibilities that look better/easier than the setup we used.

4. When it comes to monitoring tools, Ganglia is terrific.  What I like about Munin is the easy of writing plug-ins and the layout of similar services on a single page for quick comparisons between machines.  Ganglia’s plugins are also dead-simple to write.  In the five months I’ve been at Etsy, I’ve written at least 15.  In the three years I was using Munin, I probably wrote a total of six plug-ins.

Additionally, Ganglia has some sweet aggregated graphs for like machines.  This graph looks like a couple hundred web servers (as stacked lines).

All of the points listed in the “successes” section of that original article should still be considered valid and are worth reading again.  But I’ll highlight the last two, specifically:

Fault tolerance: Over the last two years when I worked at CafeMom we ran a number of full-time services on EC2 (fewer than 10 instances).  While a handful had been running for years at the time I left (had been started before I arrived, actually), other instances failed much sooner.  I can’t stress the importance of automated configurations for EC2 instances, given that these things have a tendency to fail when you’re busy work on another deadline.  I believe that in technical circles they refer to this as Murphy’s Law.

Portable hosting: I’m a big believer in commodity services.  The more generic your vendor services, the easier it is to switch them out when your blowout preventer fails.  I’ve mentioned a few services in this article that are specific to Amazon Web Services (ELB, RDS, and EBS).  If you go the route of Elastic Load Balancer or Relational Database Service, you should strongly consider what services you would use if you had to move to another cloud vendor.

Tags: , , , , , ,