amazon web services

EC2 Hosting Architecture, Two Years Later

Posted by Mike Brittain on July 12, 2010
Cloud Computing / Comments Off on EC2 Hosting Architecture, Two Years Later

It’s been nearly two years to the day since I wrote my post about the hosting platform I setup on EC2.  The post still gets plenty of traffic and this week I was asked if it was still valid info.  I think there are some better ways of accomplishing what we set out to do back then, and here is a summary.

1. Instead of round-robin DNS for load balancing, you can now use Amazon’s Elastic Load Balancing (ELB) service.  The service allows for HTTP or TCP load balancing options.  I found that the HTTP 1.1 support in ELB is somewhat incomplete and “100 Continue” responses were not handled properly for large image uploads (a specific case I was using).

2. I chose Puppet two years ago for configuration management.  Since that time, OpsCode has released Chef, which is a friendlier way to manage your systems (in my opinion) that we also happen to use at Etsy.

3. Our database layer was built on four instances for MySQL, in a fairly paranoid configuration.  We had strong concerns about instances failing and losing data.  There are a couple of new tools available to help with running MySQL on EC2.  You can use the Elastic Block Store (EBS) for more resilient disk storage, or choose the Relational Database Service (RDS) which is MySQL implemented as a native service in AWS.  Disclaimer: I haven’t deployed production databases using either of these tools.  These are only suggestions for possibilities that look better/easier than the setup we used.

4. When it comes to monitoring tools, Ganglia is terrific.  What I like about Munin is the easy of writing plug-ins and the layout of similar services on a single page for quick comparisons between machines.  Ganglia’s plugins are also dead-simple to write.  In the five months I’ve been at Etsy, I’ve written at least 15.  In the three years I was using Munin, I probably wrote a total of six plug-ins.

Additionally, Ganglia has some sweet aggregated graphs for like machines.  This graph looks like a couple hundred web servers (as stacked lines).

All of the points listed in the “successes” section of that original article should still be considered valid and are worth reading again.  But I’ll highlight the last two, specifically:

Fault tolerance: Over the last two years when I worked at CafeMom we ran a number of full-time services on EC2 (fewer than 10 instances).  While a handful had been running for years at the time I left (had been started before I arrived, actually), other instances failed much sooner.  I can’t stress the importance of automated configurations for EC2 instances, given that these things have a tendency to fail when you’re busy work on another deadline.  I believe that in technical circles they refer to this as Murphy’s Law.

Portable hosting: I’m a big believer in commodity services.  The more generic your vendor services, the easier it is to switch them out when your blowout preventer fails.  I’ve mentioned a few services in this article that are specific to Amazon Web Services (ELB, RDS, and EBS).  If you go the route of Elastic Load Balancer or Relational Database Service, you should strongly consider what services you would use if you had to move to another cloud vendor.

Tags: , , , , , ,

Good Observation on Cloud Architecture with EC2

Posted by Mike Brittain on December 29, 2008
Cloud Computing / Comments Off on Good Observation on Cloud Architecture with EC2

I just read this short article about Soocial’s hosting architecture which runs on Amazon Web Services.  There was one particular line that echoes what I’ve been saying for a while and I think it is worth repeating:

One of the most interesting things is how the architecture isn’t dramatically different than it would be if you were to build an on-premise version.

In my own experience with hosting on EC2, we built our application on a physical dev server that we already had in place and was running Linux.  It was easy for us (with just a little forethought) to deploy the application on EC2 and S3, and the developers working on the application really needed to know very little about the workings of EC2.

Tags: ,

Munin Plugin for Testing S3 Speed

Posted by Mike Brittain on September 26, 2008
Cloud Computing / Comments Off on Munin Plugin for Testing S3 Speed

Matt Spinks put together a Munin plugin for monitoring S3 download speeds, which is now available at Google Code.  I mentioned this in a recent post and wanted to provide an update the the plugin is now published.

Tags: , , ,

Download Speeds from Amazon S3

Posted by Mike Brittain on September 25, 2008
Cloud Computing / Comments Off on Download Speeds from Amazon S3

I’ve been planning to post some details about download speeds that I’ve seen from S3, and why you shouldn’t necessarily use S3 as a CDN, yet.  Granted, Amazon recently announced that they will be providing a content delivery service in front of S3.  This post has nothing to do with the CDN (CDS).

Scott posted the presentation he gave at the AWS Start-Up Tour.  It’s worth a read, and is a good summary of our business case for building the EC2/S3 hosting platform that I led when I worked at Heavy.

His post includes this graphic of how we measured S3 delivery speeds throughout the day.  Matt Spinks wrote the Munin plugin that generated this graph, and he tells me he’s planning to make that available for others to use.  When it is, I’ll add the link here. It’s now available at Google Code.

As we measured it, S3 is fairly variable in their delivery speeds.  Unfortunately, we didn’t measure latency for initial bits, which would be good to know as well.

My own impression is that it is not a good idea to host video directly from S3 if you run a medium to large web site.  The forthcoming CDN service will probably help with this.  If you’re a small to medium site, you might be happy with hosting video on S3.  Hosting images and other static content (say, CSS and JS files) might also be a good idea if you don’t have a lot of your own server capacity.

For my own use, I’m planning on using S3 to host images and static content for some other sites I run, which use a shared hosting provider for serving PHP.  On a low traffic site, I’d be happy to offload images to S3.  And when the CDN service becomes available, the user experience should be even snappier.

One thing to note if you plan to use S3 for image hosting, look into providing the correct cache-control headers on your objects in S3.  You need to do this when putting content onto S3.  You can’t modify headers on existing content.  More on this in a future post.

Tags: , , , ,

Notes from AWS Start-Up Tour (NYC)

Posted by Mike Brittain on September 18, 2008
Cloud Computing / Comments Off on Notes from AWS Start-Up Tour (NYC)

I attended the Amazon Web Services “Start-Up Tour” in New York today.  Though I’ve been using AWS for some time now, I learned a few little bits that I thought I would share.  Some of these might have been in press releases that I missed, but I still thought they were interesting.

1. 400K registered developers. Surely not all of these are active developers, or even doing anything large.  But this seems like a pretty good developer base for a set of services that are mostly still in “beta”, and most people consider them to be bleeding edge.

2. “Muck”. This is the term that Amazon uses for all of that infrastructure that you shouldn’t need to build when you’re starting a company… because someone else has already done it.  Stop reinventing the wheel and get focused on your real business priorities.  I’ve heard this term before, but I love it.

3. Start-ups shouldn’t need an ops team. Using AWS, a start-up can get a real infrastructure setup without having to hire an operations team, go through capacity planning, purchase equipment, rent a rack in a colo, deal with power, bandwidth, security, etc.  Companies like RightScale can ease the implementation process, and at least one of the NYC panelists who was speaking at the Start-Up Tour made use of RightScale.  Slightly more expensive for hourly charges, but keep in mind that you won’t need a such a heavy-duty developer to manage your infrastructure.

4. S3 data redundancy. I was aware that data stored in S3 was replicated across multiple nodes, but according to Mike Culver (from AWS): 1.) S3 stripes across multiple nodes, 2.) Rebuilding a node doesn’t reduce performance of the striped system, and …drumroll… 3.) Data is replicated across multiple datacenters.  That’s the good part I didn’t know about.

5. 22 billion served. Well, not really “served”, but as of 2008 Q2, S3 has over 22 billion objects stored.

6. Upcoming products and features. Today, AWS announced that they will be providing a content delivery service for objects stored in S3 — high-speed, low-latency delivery.  Additionally, from what I heard talking with one of the evangelists, AWS is working on a long list of features requested by their customers.  When you think about a high-performance web application, there are a number of moving pieces — front-end servers, app servers, databases, storage, caching, load balancing, DNS, etc.  Lots of “muck” in there that isn’t already provided by AWS, but it sounds like they’re working on a number of these problems.  I’m looking forward to what may come out in the next year.

Tags: ,

Amazon Is Launching a CDN

Posted by Mike Brittain on September 18, 2008
Cloud Computing / Comments Off on Amazon Is Launching a CDN

Amazon Web Services is about to launch what appears to be a globally distributed delivery network for content stored in S3.  Details have just been released today about the new service, which looks to work in conjunction with existing S3 content… which is niiiice.  New domain names are handed out through an API, which presumably will CNAME to a node that is geographically close to the end users POP.

This should continue to heat up the CDN industry, which has already become quite competitive with a number of small players who have come on the scene over the last 2-3 years to challenge the big fish, Akamai.  I’ve had the pleasure of working with Akamai for deliverying Heavy’s content in the past, and can say that they set themselves apart from the rest of the industry by providing a wider range of products than just content delivery.

I’ve used S3 as an origin server to other CDNs in the past, and look forward to seeing how their own delivery network compares in speed, latency, and pricing.  On the surface, this looks amazing for start-ups and other small online businesses who may not be able to afford a large contingent of vendors.  My suspicion is that the pricing will be incredibly competitive, and look forward to seeing actual numbers.

Tags: , ,

Some Thoughts on Cloud Computing

Posted by Mike Brittain on August 17, 2008
Cloud Computing / 1 Comment

This post was republished by SYS-CON’s Cloud Computing Journal on Aug. 19, 2008.

What I’ve put together below are my thoughts following a recent panel on cloud computing in New York City.  Thanks to Murat Aktihanoglu at Unype for putting together the panel.  While the discussion was varied, and maybe disjointed, I came away with some new ideas.

This post hasn’t been well edited.  I apologize in advance.  These are mostly random notes I took away from the panel and my own experience with web hosting on the cloud.

What is “cloud computing”?

There are a variety of notions to how cloud computing is defined.  I tend to think that what this really boils down to is the ability to procure hardware or services that you wouldn’t normally have access to in a physical sense.  Rather than buying 20 new servers, you can spin them up on-demand, and also dump them whenever you want.  It’s the “utility” or “pay-as-you-go” model.

I don’t see any difference between spinning up one server to run some prototypes, or spinning up 100 to crunch through a huge data set.  People seem to be getting caught up in the notion that unless you are doing some sort of parallel processing with lots of nodes, you aren’t doing “cloud computing”.  I disagree.

I also don’t believe that virtualization is necessarily the same as cloud computing.  To me, virtualization means that you’re essentially splitting up fixed resources you already have into smaller chunks for other people to use.  This is your accounting and human resources departments sharing space on the same machine, but keeping them logically partitioned.  Providers are now selling virtualization under the cloud label.  But if I have to buy (or rent) 20 physical machines to virtualize into slices, then I’m still committed to 20 machines.  If I need more or fewer resources, I may need to work through a contract or serve out a lease term.  It’s no longer pay-as-you-go, it’s a major expenditure.

Software as a Service

I love the software as a service model.  I like having someone else running a database or mail service so that I don’t have to hire a team or own the plant to support it.  With the service being off-site, I don’t have to worry about local disasters (though be sure to watch out for providers without their own SLA or disaster recovery plans).  Our clients are again becoming thin.  Laptops will have fewer and fewer local applications installed, and simply access various online applications and databases.

Again, pay as you go.

Additionally, fewer staff to manage services in-house.  This means you won’t/can’t strangle your sysadmin when hosted email goes down for six hours.  That can be a good or a bad thing, depending on how you look at it.

The best part about this model, though, is that you focus your own resources on what you’re best at.  Does an online marketing agency need to know how to administer an Exchange server?  Or should that be outsourced to a company that has the expertise to run mail for over a hundred other companies?

Cloud != Scale

This seems like a typical misconception: If I build my application on a cloud computing platform, then it will automatically scale. Environments like EC2 provide the ability to scale your application horizontally.  Your application, however, still needs to be able to benefit from horizontal scaling.  If you can only handle 5 concurrent users per node, then adding more boxes isn’t going to get you to 10,000 users very quickly.  This seems obvious, but many people are still missing this point.

I don’t think there are many case studies yet of companies with applications “in the cloud” who also have suffered large amounts of traffic.  And when we do see more of these applications, they will tend to have been built by early adopters who are probably experts in their fields. These cloud services are not yet open and approachable enough so that you have your average developer poking around and building applications that have the DNA for failure.  Google has done a good job with promoting AppEngine using videos and hack-a-thons.

Decent architecture is always going to be foundational for scale.  Your application has to benefit from the availability of additional nodes.

Redundancy and Planning for Failure

Amazon gets a lot of heat when S3 goes down, or when Gmail is unavailable.  This is all a lot of finger pointing, especially by people have not started using cloud services — The “I told you so” crowd.  Truth be told, the day after the recent S3 outage, my company had an application that was offline for nearly the same amount of time as S3’s outage.  Are we any better?  No.

It’s incredibly important to have a failover option for your own application.  Before I left Heavy, we designed our storage on S3 so that it could be replicated to physical disks that we have at RackSpace.  When S3 went out, we just flipped over to the physical disks.  Eventually there will be a time when we don’t have enough disk to store what we keep at S3.  That doesn’t mean that it can’t be replicated to another cloud storage service.

Consider having a backup hosting service in place, either physical or using another cloud provider.  Your physical service could be provided by a managed hosting provider, or on some other dedicated hardware outside of your own office.  You don’t need to own your own servers for a backup solution.

If you don’t have much money to spend on physical machines to host your fully operating site or application, think about how you can reduce the site to a version that can be hosted on a minimal number of servers.  Can you maintain a read-only backup?  Can you host a backup of your most popular content (i.e. the top 5%), and temporarily turn off access to the rest of the site?

Abstraction Layers

Something that I have talked a lot about, but haven’t had enough time to spend building, is a good abstraction layer on top of cloud storage. Everyone seems to have slightly different APIs.  On the other hand, about 85% of the features overlap from provider to provider.  Why not write an abstraction layer to handle the 85% and use multiple services?  This could probably work pretty well for flipping back and forth between (or replicating amongst) various cloud storage services like S3, CloudFS, Nirvanix, and also physical disks.

I don’t know many details about SimpleDB and AppEngine’s datastore, but it seems to me that you may be able to apply this 85% rule to those as well.  You could probably even treat MySQL and PostgreSQL the same way.  You couldn’t use all of the joins and transactions you normally would want to use, but then again, writing an application specifically for cloud computing platforms seems to be a different sort of animal.  We’ve basically been doing the same thing for years with the so-called database abstraction layers.  You can say that you’ve got a layer that allows you to flip from one database engine to another, but chances are, you have some engine-specific code that you’ve been using that doesn’t translate well.

Porting an Application to EC2

I ported an application at Heavy that ran on physical machines we had available at RackSpace onto EC2.  How much effort did it take for the application developers?  Almost none.  We didn’t buy into using SimpleDB — we just ran MySQL on EC2 instances.  We split our team so that we had a couple of us building a few tools for managing our EC2 instances, and the other developers went about their business building a web application that could run on a standard LAMP stack.  Additionally, if EC2 ever goes out of commission, we have the code and databases backed.  They can easily be deployed to physical machines.

It’s worth saying this again… I ported an application from physical machines to the cloud.  This application was not written for a specific cloud service.  We were very concerned about lock-in from the beginning.

Conclusions

What did we gain by hosting our application on EC2?  Initially nothing.  We had the physical machines to run the application.  But as our traffic increases, we can fire up new instances on demand.  If traffic drops off, so does out monthly bill.  It’s variable cost web hosting.

Does hosting your application on EC2 solve scaling problems?  No.  If you can’t improve performance of your application by adding additional servers, then there are bottlenecks to solve.  Running your service on the cloud doesn’t mean it scales.

Furthermore, the cloud is not self-healing.  In other words, it doesn’t automatically monitor your application and grow your infrastructure.  That doesn’t mean, however, that you can’t build your application to do this.  Read Don MacAskill’s SkyNet posting to get some idea of how that can work.

I look forward to reading your comments.

Tags: , , , , , , , ,

Diversity Factor at Amazon Web Services

Posted by Mike Brittain on April 25, 2008
Cloud Computing / Comments Off on Diversity Factor at Amazon Web Services

Nicholas Carr makes a nice point about “diversity factor” within Amazon’s AWS clients.

We have been looking at a number of services in our own hosting environment that are “spikey”, things like image manipulation, video encoding, marketing email blasts, etc.  By offloading the spikes from our origin servers, we get better efficiency out of those machines — I.e. they are handling consistent loads throughout the year.  We push the spikes onto S3 and EC2.  Amazon’s clients have a wide variety of needs, which help to even out their loads.  I might be driving high load today at EC2, but tomorrow my traffic might be sleepy during someone else’s peak.

Tags: , , ,