This is an interesting write-up on how to move enormous numbers of objects within S3. I have a similar post coming up about large-scale batch processing that I’m looking forward to sharing in the next couple of weeks. But give this a read. It’s interesting!
I’ve been working on some photo storage and serving problems at Etsy, which is exciting work given the number of photos we store for the items being sold on the site. This sort of project makes you wonder how other sites are handling their photo storage and serving architectures.
Today I spent a few minutes looking at avatar photos from Twitter from the outside. This is all from inspection of URLs and HTTP headers, and completely unofficial and unvalidated assumptions.
Two things I found interesting today were (1) the rate of new avatar photos being added to Twitter, and (2) the architecture for storing and serving images.
Avatar photos at Twitter have URLs that look something like the following:
I’m assuming the numeric ID increments linearly with each photo that is uploaded… two images uploaded a few minutes apart showed a relatively small increase between these IDs. I compared one of these IDs with the ID of an older avatar, along with the “Last-Modified” header that was included with its HTTP response headers:
Last-Modified Tue, 26 Feb 2008 03:15:46 GMT
Comparing these numbers shows that Twitter is currently ingesting somewhere over two million avatars per day.
Stock, or library, avatars have different URLs, meaning they are not served or stored the same way as custom avatars. This is good because you get the caching benefits of reusing the same avatar URL for multiple users.
Storage and Hosting
Running a “host” look up on the hostname of an avatar URL shows a CNAME to Akamai’s cache network:
$ host a3.twimg.com a3.twimg.com is an alias for a3.twimg.com.edgesuite.net. a3.twimg.com.edgesuite.net is an alias for a948.l.akamai.net. a948.l.akamai.net has address 184.108.40.206 a948.l.akamai.net has address 220.127.116.11
If you’re familiar with Akamai’s network, you can dig into response headers that come from their cache servers. I did a little of that, but the thing I found most interesting is that Akamai plucks avatar images from Amazon’s CloudFront service.
x-amz-id-2: NVloBPkil5u… x-amz-request-id: 1EAA3DE5516E… Server: AmazonS3 X-Amz-Cf-Id: 43e9fa481c3dcd79…
It’s not news that Twitter uses S3 for storing their images, but I hadn’t thought about using CloudFront (which is effectively a CDN) as an origin to another CDN. The benefit here, aside from not pounding the crap out of S3, is that Akamai’s regional cache servers can pull avatars from CloudFront POPs that are relatively close, as opposed to reaching all the way back to a single S3 origin (such as the “US Standard Region”, which I believe has two locations in the US). CloudFront doesn’t have nearly as many global POPs as Akamai. But using it does speed up image delivery by ensuring that Akamai’s cache servers in Asia are grabbing files from a CloudFront POP in Hong Kong or Singapore, rather than jumping across the Pacific to North America.
I suspect that Twitter racks up a reasonably large bill with Amazon by storing and serving so many files from S3 and CloudFront. However, it takes away the burden of owning all of the hardware, bandwidth, and man power required to serve millions upon millions of images… especially when that is not a core feature of their site.
Cyberduck is a nice Mac FTP/SFTP GUI client that I’ve used in the past for moving files around between my desktop and some web servers. Turns out they’ve added support for moving your files directly to Amazon S3 and Mosso (RackSpace) Cloud Files. This means that you can use the same tool that you may previously have used for publishing content to your own web server to instead publish content directly to a self-service CDN. Amazon uses it’s Cloud Front service to distribute files, and Mosso is supposed to be integrated with LimeLight networks for distributing content from the Cloud Files system.
Just wish I had these services available to me three years ago. They would have saved me some serious cash on bandwidth commits for CDNs for those silly little projects I was working on.
Cloud Computing / Comments Off
Cloud Computing / Comments Off
I’ve been planning to post some details about download speeds that I’ve seen from S3, and why you shouldn’t necessarily use S3 as a CDN, yet. Granted, Amazon recently announced that they will be providing a content delivery service in front of S3. This post has nothing to do with the CDN (CDS).
Scott posted the presentation he gave at the AWS Start-Up Tour. It’s worth a read, and is a good summary of our business case for building the EC2/S3 hosting platform that I led when I worked at Heavy.
His post includes this graphic of how we measured S3 delivery speeds throughout the day. Matt Spinks wrote the Munin plugin that generated this graph, and he tells me he’s planning to make that available for others to use. When it is, I’ll add the link here. It’s now available at Google Code.
As we measured it, S3 is fairly variable in their delivery speeds. Unfortunately, we didn’t measure latency for initial bits, which would be good to know as well.
My own impression is that it is not a good idea to host video directly from S3 if you run a medium to large web site. The forthcoming CDN service will probably help with this. If you’re a small to medium site, you might be happy with hosting video on S3. Hosting images and other static content (say, CSS and JS files) might also be a good idea if you don’t have a lot of your own server capacity.
For my own use, I’m planning on using S3 to host images and static content for some other sites I run, which use a shared hosting provider for serving PHP. On a low traffic site, I’d be happy to offload images to S3. And when the CDN service becomes available, the user experience should be even snappier.
One thing to note if you plan to use S3 for image hosting, look into providing the correct cache-control headers on your objects in S3. You need to do this when putting content onto S3. You can’t modify headers on existing content. More on this in a future post.
What I’ve put together below are my thoughts following a recent panel on cloud computing in New York City. Thanks to Murat Aktihanoglu at Unype for putting together the panel. While the discussion was varied, and maybe disjointed, I came away with some new ideas.
This post hasn’t been well edited. I apologize in advance. These are mostly random notes I took away from the panel and my own experience with web hosting on the cloud.
What is “cloud computing”?
There are a variety of notions to how cloud computing is defined. I tend to think that what this really boils down to is the ability to procure hardware or services that you wouldn’t normally have access to in a physical sense. Rather than buying 20 new servers, you can spin them up on-demand, and also dump them whenever you want. It’s the “utility” or “pay-as-you-go” model.
I don’t see any difference between spinning up one server to run some prototypes, or spinning up 100 to crunch through a huge data set. People seem to be getting caught up in the notion that unless you are doing some sort of parallel processing with lots of nodes, you aren’t doing “cloud computing”. I disagree.
I also don’t believe that virtualization is necessarily the same as cloud computing. To me, virtualization means that you’re essentially splitting up fixed resources you already have into smaller chunks for other people to use. This is your accounting and human resources departments sharing space on the same machine, but keeping them logically partitioned. Providers are now selling virtualization under the cloud label. But if I have to buy (or rent) 20 physical machines to virtualize into slices, then I’m still committed to 20 machines. If I need more or fewer resources, I may need to work through a contract or serve out a lease term. It’s no longer pay-as-you-go, it’s a major expenditure.
Software as a Service
I love the software as a service model. I like having someone else running a database or mail service so that I don’t have to hire a team or own the plant to support it. With the service being off-site, I don’t have to worry about local disasters (though be sure to watch out for providers without their own SLA or disaster recovery plans). Our clients are again becoming thin. Laptops will have fewer and fewer local applications installed, and simply access various online applications and databases.
Again, pay as you go.
Additionally, fewer staff to manage services in-house. This means you won’t/can’t strangle your sysadmin when hosted email goes down for six hours. That can be a good or a bad thing, depending on how you look at it.
The best part about this model, though, is that you focus your own resources on what you’re best at. Does an online marketing agency need to know how to administer an Exchange server? Or should that be outsourced to a company that has the expertise to run mail for over a hundred other companies?
Cloud != Scale
This seems like a typical misconception: If I build my application on a cloud computing platform, then it will automatically scale. Environments like EC2 provide the ability to scale your application horizontally. Your application, however, still needs to be able to benefit from horizontal scaling. If you can only handle 5 concurrent users per node, then adding more boxes isn’t going to get you to 10,000 users very quickly. This seems obvious, but many people are still missing this point.
I don’t think there are many case studies yet of companies with applications “in the cloud” who also have suffered large amounts of traffic. And when we do see more of these applications, they will tend to have been built by early adopters who are probably experts in their fields. These cloud services are not yet open and approachable enough so that you have your average developer poking around and building applications that have the DNA for failure. Google has done a good job with promoting AppEngine using videos and hack-a-thons.
Decent architecture is always going to be foundational for scale. Your application has to benefit from the availability of additional nodes.
Redundancy and Planning for Failure
Amazon gets a lot of heat when S3 goes down, or when Gmail is unavailable. This is all a lot of finger pointing, especially by people have not started using cloud services — The “I told you so” crowd. Truth be told, the day after the recent S3 outage, my company had an application that was offline for nearly the same amount of time as S3’s outage. Are we any better? No.
It’s incredibly important to have a failover option for your own application. Before I left Heavy, we designed our storage on S3 so that it could be replicated to physical disks that we have at RackSpace. When S3 went out, we just flipped over to the physical disks. Eventually there will be a time when we don’t have enough disk to store what we keep at S3. That doesn’t mean that it can’t be replicated to another cloud storage service.
Consider having a backup hosting service in place, either physical or using another cloud provider. Your physical service could be provided by a managed hosting provider, or on some other dedicated hardware outside of your own office. You don’t need to own your own servers for a backup solution.
If you don’t have much money to spend on physical machines to host your fully operating site or application, think about how you can reduce the site to a version that can be hosted on a minimal number of servers. Can you maintain a read-only backup? Can you host a backup of your most popular content (i.e. the top 5%), and temporarily turn off access to the rest of the site?
Something that I have talked a lot about, but haven’t had enough time to spend building, is a good abstraction layer on top of cloud storage. Everyone seems to have slightly different APIs. On the other hand, about 85% of the features overlap from provider to provider. Why not write an abstraction layer to handle the 85% and use multiple services? This could probably work pretty well for flipping back and forth between (or replicating amongst) various cloud storage services like S3, CloudFS, Nirvanix, and also physical disks.
I don’t know many details about SimpleDB and AppEngine’s datastore, but it seems to me that you may be able to apply this 85% rule to those as well. You could probably even treat MySQL and PostgreSQL the same way. You couldn’t use all of the joins and transactions you normally would want to use, but then again, writing an application specifically for cloud computing platforms seems to be a different sort of animal. We’ve basically been doing the same thing for years with the so-called database abstraction layers. You can say that you’ve got a layer that allows you to flip from one database engine to another, but chances are, you have some engine-specific code that you’ve been using that doesn’t translate well.
Porting an Application to EC2
I ported an application at Heavy that ran on physical machines we had available at RackSpace onto EC2. How much effort did it take for the application developers? Almost none. We didn’t buy into using SimpleDB — we just ran MySQL on EC2 instances. We split our team so that we had a couple of us building a few tools for managing our EC2 instances, and the other developers went about their business building a web application that could run on a standard LAMP stack. Additionally, if EC2 ever goes out of commission, we have the code and databases backed. They can easily be deployed to physical machines.
It’s worth saying this again… I ported an application from physical machines to the cloud. This application was not written for a specific cloud service. We were very concerned about lock-in from the beginning.
What did we gain by hosting our application on EC2? Initially nothing. We had the physical machines to run the application. But as our traffic increases, we can fire up new instances on demand. If traffic drops off, so does out monthly bill. It’s variable cost web hosting.
Does hosting your application on EC2 solve scaling problems? No. If you can’t improve performance of your application by adding additional servers, then there are bottlenecks to solve. Running your service on the cloud doesn’t mean it scales.
Furthermore, the cloud is not self-healing. In other words, it doesn’t automatically monitor your application and grow your infrastructure. That doesn’t mean, however, that you can’t build your application to do this. Read Don MacAskill’s SkyNet posting to get some idea of how that can work.
I look forward to reading your comments.
In the months prior to leaving Heavy, I led an exciting project to build a hosting platform for our online products on top of Amazon’s Elastic Compute Cloud (EC2). We eventually launched our newest product at Heavy using EC2 as the primary hosting platform.
I’ve been following a lot of what other people have been doing with EC2 for data processing and handling big encoding or rendering jobs. This is not one of those projects.
We set out to build a fairly standard LAMP hosting infrastructure where we could easily and quickly add additional capacity. In fact, we can add new servers to our production pool in under 20 minutes, from the time we call the “run instance” API at EC2, to the time when public traffic begins hitting the new server. This includes machine startup time, adding custom server config files and cron jobs, rolling out application code, running smoke tests, and adding the machine to public DNS.
What follows is a general outline of how we do this.
Wait! But first, you should know that this article is pretty old and a lot has changed. After you’ve read this article, please take a look at my follow up post: EC2 Hosting Architecture, Two Years Later. — Mike
Heavy makes use of a pretty standard LAMP stack. Administration scripts are written in PHP, Perl, or Bash. There is a lot of caching in memory (memcached), file caching, and HTTP caching (Akamai). The new site requires a layer of front-end web servers that double as application servers and a database layer (with replication). The site is built entirely in PHP, making use of Zend Framework. The database is MySQL. We are not using Amazon’s SimpleDB service.
I chose CentOS for the operating system on our machine images. All of the machine images we built are designed specifically for their purpose, and there are a handful of them. For example, web servers run Apache, PHP, and some Perl libraries. Databases are installed with MySQL and PHP (for administration scripts). Memcached nodes are built with memcached and barely anything else.
Thorsten von Eicken at RightScale has written a lot of great material about their use of EC2. I took a lot of ideas from their blog, including the use of RightScripts. After banging around with some publicly available images, I started building my own by modifying their scripts for 32-bit and 64-bit Cent OS images.
It took a little getting used to the manner of building images with these scripts, and getting the right software packages installed. Eventually it clicked and what I ended up with was a really simple script for building each type of machine that we would need. Even better, it was simple to go back and re-configure any of the images and roll them into new machines.
Many thanks to Thorsten for providing these scripts.
Amazon provides some fantastic command line tools for managing EC2 instances and getting status on the service. Unfortunately, these tools don’t really help much in terms of documenting what each server is doing. To keep track of this, I went to work building a control panel for our EC2 account that documents the roles for each machine and what products are running on it. Our plan was not to run a single web site, but multiple sites/products each with their own database and web server clusters.
The control panel, by the way, lived on a physical machine of ours (at RackSpace), and not on EC2.
We realized early that all of these machines would need to know how to find each other. Our control panel manages a global configuration file that lives in our S3 account that documents all of our servers’ roles. Every server is setup to to inspect this file and adjust its own application environment. For example, when a new web server comes online it grabs the configuration file and figures out which databases belong to the web site running on the instance. If a database fails and a slave takes over as the new master, web servers can figure that out on their own without anyone manually logging in to change configs or host files.
Although we had tested the EC2 servers to handle very high loads of traffic, we have the luxury of using Akamai’s Site Accelerator product in front of our web servers. This allows for easy page caching in front of our web servers, and actually handles about 90% of the hits to the site. Our web servers serve as the origin for Akamai’s proxy. Rather than fooling around with additional servers to handle load balancing and configuring proper failover between them, we simple use round-robin DNS. As it turns out, our load is very evenly distributed amongst the web servers.
Most people I’ve talked to about this setup want to know how we felt about hosting our database on EC2. The best answer I can give is, “nervous”. Since EC2 doesn’t have persistent storage on machine instances, yet, we were liberal with setting up replication and backup servers. A single master database is replicated to a slave (master candidate), and that slave replicates to a second slave (slave candidate). Scripts were written to handle automated failover if the master becomes inaccessible; the master candidate is automatically be promoted to master, and our global configuration file is updated so that all of the web servers are aware of the change.
Furthermore, we run a second slave from the master database. This slave has a single role: dump snapshots of the database every 15 minutes and store them on S3. If all of our EC2 instances should ever disappear from EC2, we have recent copies of the database on S3.
In all, 4 databases instances. We’re being careful.
Our images are designed to support a specific “role” for a machine, such as a database, web server, etc. Once started, we identify “products” that will run on each machine. These might be things like “HuskyMedia.com”, “Heavy.com”, “Video Encoding”, etc. Obviously, each of our products requires its own set of service configurations (Apache, MySQL), users and groups, and cron jobs.
We chose Puppet to roll out these configurations to new machines (hat tip to Justin Shepherd at RackSpace for this suggestion). If you’re not familiar with Puppet, you can create classes, or roles, for each of your servers. In the classes, you define configurations files, cron jobs, packages to install, etc. Finally, you identify the hosts that belong to each class. When a new machine is started up (plain old vanilla), it checks into the Puppet “master” server, and the master sends over the proper configs.
When Puppet works, it totally rocks. It has its drawbacks, however. There is (what I consider) a steep learning curve for its configuration language. It’s also still very much in development. When we upgraded to a new software version, the master server didn’t seem to play well with clients that were still on the old version. We jumped through some hoops to get all of our clients talking to the master server again.
On the upside, however, we must have gone through setting up over 100 machine instances using Puppet. And that would have taken hundreds of hours in server administration to get each machine configured. Additionally, Puppet can do package management, tying into whatever package manager you use in your Linux distribution. If we had started using Puppet earlier, we might have stuck with two baseline machine images, one for 32-bit, and one for 64-bit architectures. Then we could allow Puppet to handle all of the software installations.
Many thanks to the guys at Reductive Labs. Puppet is a very cool piece of software!
We use two pieces of software for monitoring: Munin and Nagios. The Munin server we use is the same that we use for our physical machines. The simple configuration needed for Munin nodes is built into our machine images, along with the properly installed plugins. The control panel we built for EC2 also updates our local Munin server configuration to listen for the new machines. As soon as we start a new EC2 instance, it begins to show up in our reports.
Our Nagios configuration is a work in progress. There are two installations, one that we use on our physical machines, and one the lives within EC2. The EC2-based installation is monitored by the one installed on our physical machines. It is not tied as tightly to our control panel, yet, but it seems likely for that to happen soon.
Not to be overlooked, availability zones allow you to distribute your EC2 instances across separate fault-tolerant groups. If one availability zone goes down, machines in other zones should theoretically be insulated from the same issue, i.e. separate power, separate network connectivity, etc.
We built a color-coded indicator in our control panel of the availability zone where each machine is running. This makes it easy for us to make sure that we balance our servers equally throughout all of the zones.
It’s always handy to have a physical backup, especially since EC2 is currently still in “beta”. Since our installation on EC2 uses essentially the same architecture as we use on our physical machines at RackSpace, it would be simple for us to move the entire site back to those servers. In fact, most of the configurations are already in place. We also use Neustar as our DNS provider, so we can keep very low TTLs on our hostnames. When we need to change the location of our origin servers, it’d done in a matter of seconds.
Here are some successes we took away from this project:
- Twenty-minute start up time. Hands down, this is the most impressive for me. We can spin up new machines and put them into production in under 20 minutes. This isn’t (SkyNet), but it’s pretty darn cool.
- Loads of scripts and automation. We moved from mostly manual server administration, which we got used to by running only a few physical machines, to a much more automated process. This improves our general workflow for server administration, whether the servers are virtual machines or physical machines.
- Documentation of images. Using image builders based on RightScripts, we have a catalog of what software goes into each new server, cleanly spelled out in Bash. :)
- Fault tolerance. We don’t know what is going to happen with our virtual machines. We’ve seen some unexpected behavior from EC2, and have designed with that in consideration.
- Portable hosting. I didn’t want to build a hosting architecture just for Amazon Web Services. I wanted to build a fairly standard LAMP stack, but one that is redundant. We can take all of these learnings and re-apply them to the physical servers we still run at RackSpace.
While I researched and developed much of this project, I couldn’t have finished it off without the help of a couple of other guys at Heavy, Matt Spinks and Henry Cavillones. Matt led the database effort and all of the scripting involved for our automated failover. Henry took care of our monitoring needs, image maintenance, and helping me iron out some of the issues we were originally seeing with our Cent OS configuration for EC2. Thanks, guys!
I also want to mention Scott Penberthy, our CTO, who kept us on track and was an excellent sounding board. Without Scott at the top of this project, it wouldn’t have come together. Thanks!
Finally, the clever work put together and discussed by the guys at RightScale and SmugMug, and countless other blog and forum postings I read during this project to keep me in the right directions.
Cloud Computing / Comments Off
We have been looking at a number of services in our own hosting environment that are “spikey”, things like image manipulation, video encoding, marketing email blasts, etc. By offloading the spikes from our origin servers, we get better efficiency out of those machines — I.e. they are handling consistent loads throughout the year. We push the spikes onto S3 and EC2. Amazon’s clients have a wide variety of needs, which help to even out their loads. I might be driving high load today at EC2, but tomorrow my traffic might be sleepy during someone else’s peak.