aws

EC2 Hosting Architecture, Two Years Later

Posted by Mike Brittain on July 12, 2010
Cloud Computing / Comments Off on EC2 Hosting Architecture, Two Years Later

It’s been nearly two years to the day since I wrote my post about the hosting platform I setup on EC2.  The post still gets plenty of traffic and this week I was asked if it was still valid info.  I think there are some better ways of accomplishing what we set out to do back then, and here is a summary.

1. Instead of round-robin DNS for load balancing, you can now use Amazon’s Elastic Load Balancing (ELB) service.  The service allows for HTTP or TCP load balancing options.  I found that the HTTP 1.1 support in ELB is somewhat incomplete and “100 Continue” responses were not handled properly for large image uploads (a specific case I was using).

2. I chose Puppet two years ago for configuration management.  Since that time, OpsCode has released Chef, which is a friendlier way to manage your systems (in my opinion) that we also happen to use at Etsy.

3. Our database layer was built on four instances for MySQL, in a fairly paranoid configuration.  We had strong concerns about instances failing and losing data.  There are a couple of new tools available to help with running MySQL on EC2.  You can use the Elastic Block Store (EBS) for more resilient disk storage, or choose the Relational Database Service (RDS) which is MySQL implemented as a native service in AWS.  Disclaimer: I haven’t deployed production databases using either of these tools.  These are only suggestions for possibilities that look better/easier than the setup we used.

4. When it comes to monitoring tools, Ganglia is terrific.  What I like about Munin is the easy of writing plug-ins and the layout of similar services on a single page for quick comparisons between machines.  Ganglia’s plugins are also dead-simple to write.  In the five months I’ve been at Etsy, I’ve written at least 15.  In the three years I was using Munin, I probably wrote a total of six plug-ins.

Additionally, Ganglia has some sweet aggregated graphs for like machines.  This graph looks like a couple hundred web servers (as stacked lines).

All of the points listed in the “successes” section of that original article should still be considered valid and are worth reading again.  But I’ll highlight the last two, specifically:

Fault tolerance: Over the last two years when I worked at CafeMom we ran a number of full-time services on EC2 (fewer than 10 instances).  While a handful had been running for years at the time I left (had been started before I arrived, actually), other instances failed much sooner.  I can’t stress the importance of automated configurations for EC2 instances, given that these things have a tendency to fail when you’re busy work on another deadline.  I believe that in technical circles they refer to this as Murphy’s Law.

Portable hosting: I’m a big believer in commodity services.  The more generic your vendor services, the easier it is to switch them out when your blowout preventer fails.  I’ve mentioned a few services in this article that are specific to Amazon Web Services (ELB, RDS, and EBS).  If you go the route of Elastic Load Balancer or Relational Database Service, you should strongly consider what services you would use if you had to move to another cloud vendor.

Tags: , , , , , ,

Twitter’s Photo Storage (from the outside looking in)

Posted by Mike Brittain on May 24, 2010
Cloud Computing, WWW / 1 Comment

I’ve been working on some photo storage and serving problems at Etsy, which is exciting work given the number of photos we store for the items being sold on the site.  This sort of project makes you wonder how other sites are handling their photo storage and serving architectures.

Today I spent a few minutes looking at avatar photos from Twitter from the outside.  This is all from inspection of URLs and HTTP headers, and completely unofficial and unvalidated assumptions.

Two things I found interesting today were (1) the rate of new avatar photos being added to Twitter, and (2) the architecture for storing and serving images.

Avatars

Avatar photos at Twitter have URLs that look something like the following:

http://a3.twimg.com/profile_images/689887435/my_photo.jpg

I’m assuming the numeric ID increments linearly with each photo that is uploaded… two images uploaded a few minutes apart showed a relatively small increase between these IDs.  I compared one of these IDs with the ID of an older avatar, along with the “Last-Modified” header that was included with its HTTP response headers:

Last-Modified Tue, 26 Feb 2008 03:15:46 GMT

Comparing these numbers shows that Twitter is currently ingesting somewhere over two million avatars per day.

Stock, or library, avatars have different URLs, meaning they are not served or stored the same way as custom avatars.  This is good because you get the caching benefits of reusing the same avatar URL for multiple users.

http://s.twimg.com/a/1274144130/images/default_profile_6_normal.png

Storage and Hosting

Running a “host” look up on the hostname of an avatar URL shows a CNAME to Akamai’s cache network:

$ host a3.twimg.com
a3.twimg.com is an alias for a3.twimg.com.edgesuite.net.
a3.twimg.com.edgesuite.net is an alias for a948.l.akamai.net.
a948.l.akamai.net has address 96.6.41.171
a948.l.akamai.net has address 96.6.41.170

If you’re familiar with Akamai’s network, you can dig into response headers that come from their cache servers.  I did a little of that, but the thing I found most interesting is that Akamai plucks avatar images from Amazon’s CloudFront service.

x-amz-id-2: NVloBPkil5u…
x-amz-request-id: 1EAA3DE5516E…
Server: AmazonS3
X-Amz-Cf-Id: 43e9fa481c3dcd79…

It’s not news that Twitter uses S3 for storing their images, but I hadn’t thought about using CloudFront (which is effectively a CDN) as an origin to another CDN.  The benefit here, aside from not pounding the crap out of S3, is that Akamai’s regional cache servers can pull avatars from CloudFront POPs that are relatively close, as opposed to reaching all the way back to a single S3 origin (such as the “US Standard Region”, which I believe has two locations in the US).  CloudFront doesn’t have nearly as many global POPs as Akamai. But using it does speed up image delivery by ensuring that Akamai’s cache servers in Asia are grabbing files from a CloudFront POP in Hong Kong or Singapore, rather than jumping across the Pacific to North America.

I suspect that Twitter racks up a reasonably large bill with Amazon by storing and serving so many files from S3 and CloudFront.  However, it takes away the burden of owning all of the hardware, bandwidth, and man power required to serve millions upon millions of images… especially when that is not a core feature of their site.

Tags: , , , , , ,

Fragility of the Cloud

Posted by Mike Brittain on June 11, 2009
Cloud Computing / Comments Off on Fragility of the Cloud

A lightning strike causes EC2 outages and Om Malik blames the “fragility of the cloud,” rather than recognizing that all tech suffers failures.  I’ll say it again, this could have happened to my own servers, or my own data center, and I would have been much further up the creek than if Amazon team was taking care of it.  Besides, one of the most important lessons I have learned from working with AWS is that servers/services should fail, and fail gracefully.  It shouldn’t matter whether that service is “in the cloud” or in your data center.

Tags: , ,

High Traffic Sites on EC2

Posted by Mike Brittain on April 08, 2009
Cloud Computing / Comments Off on High Traffic Sites on EC2

Grig Gheorghiu wrote up a nice article on handling high traffic sites on EC2.  It’s definitely worth a read for some high-level concepts about multi-tier architectures.  It doesn’t talk deeply on details of EC2 (would have liked to see something mentioned about availability zones for MySQL and load balancers).  One thing I really liked was the concept of using multiple load balancers with round-robin DNS pointing at them.  I’ve been considering this as an option and have played around with HAProxy already.  It’s likely a future step for our new image service at CafeMom.

Tags: , , , ,

Manage Amazon Web Services on Your iPhone

Posted by Mike Brittain on October 23, 2008
Cloud Computing / Comments Off on Manage Amazon Web Services on Your iPhone

Ylastic is putting a management interface for AWS on the iPhone.  Looks pretty cool.

I am familiar with their name, but don’t have any experience with their product.  I sort of wish these sorts of tools could be open sourced (and there are some) so that I could run the management service on my own servers and not hand over my AWS keys.  Like I said, I don’t have experience with their product, so maybe I’m making an assumption there.

As I’ve said earlier about AWS, it’s an amazing service, but is very much like a raw material.  It’s like having someone hand you the keys to a datacenter, and you don’t even know how to turn on the lights.  Ylastic fits into the category of management vendors for AWS, and I think that Amazon’s ultimate success will depend on management vendors who extend the web services to the layperson.

Tags: ,