SysAdmin Appreciation Day in NYC

Posted by Mike Brittain on July 29, 2010
etsy / 1 Comment

There is a meetup being planned in NYC (at The Gingerman) for SysAdmin Appreciation Day tomorrow night (July 30).  As it turns out, Etsy is picking up the tab.  This is one of the really wonderful things that I appreciate about Etsy.

So be sure to come out and join us for a drink!

EC2 Hosting Architecture, Two Years Later

Posted by Mike Brittain on July 12, 2010
Cloud Computing / No Comments

It’s been nearly two years to the day since I wrote my post about the hosting platform I setup on EC2.  The post still gets plenty of traffic and this week I was asked if it was still valid info.  I think there are some better ways of accomplishing what we set out to do back then, and here is a summary.

1. Instead of round-robin DNS for load balancing, you can now use Amazon’s Elastic Load Balancing (ELB) service.  The service allows for HTTP or TCP load balancing options.  I found that the HTTP 1.1 support in ELB is somewhat incomplete and “100 Continue” responses were not handled properly for large image uploads (a specific case I was using).

2. I chose Puppet two years ago for configuration management.  Since that time, OpsCode has released Chef, which is a friendlier way to manage your systems (in my opinion) that we also happen to use at Etsy.

3. Our database layer was built on four instances for MySQL, in a fairly paranoid configuration.  We had strong concerns about instances failing and losing data.  There are a couple of new tools available to help with running MySQL on EC2.  You can use the Elastic Block Store (EBS) for more resilient disk storage, or choose the Relational Database Service (RDS) which is MySQL implemented as a native service in AWS.  Disclaimer: I haven’t deployed production databases using either of these tools.  These are only suggestions for possibilities that look better/easier than the setup we used.

4. When it comes to monitoring tools, Ganglia is terrific.  What I like about Munin is the easy of writing plug-ins and the layout of similar services on a single page for quick comparisons between machines.  Ganglia’s plugins are also dead-simple to write.  In the five months I’ve been at Etsy, I’ve written at least 15.  In the three years I was using Munin, I probably wrote a total of six plug-ins.

Additionally, Ganglia has some sweet aggregated graphs for like machines.  This graph looks like a couple hundred web servers (as stacked lines).

All of the points listed in the “successes” section of that original article should still be considered valid and are worth reading again.  But I’ll highlight the last two, specifically:

Fault tolerance: Over the last two years when I worked at CafeMom we ran a number of full-time services on EC2 (fewer than 10 instances).  While a handful had been running for years at the time I left (had been started before I arrived, actually), other instances failed much sooner.  I can’t stress the importance of automated configurations for EC2 instances, given that these things have a tendency to fail when you’re busy work on another deadline.  I believe that in technical circles they refer to this as Murphy’s Law.

Portable hosting: I’m a big believer in commodity services.  The more generic your vendor services, the easier it is to switch them out when your blowout preventer fails.  I’ve mentioned a few services in this article that are specific to Amazon Web Services (ELB, RDS, and EBS).  If you go the route of Elastic Load Balancer or Relational Database Service, you should strongly consider what services you would use if you had to move to another cloud vendor.

Tags: , , , , , ,

How do you use the word “millions” at work?

Posted by Mike Brittain on July 10, 2010
Misc / No Comments

I published my first blog post on Etsy’s engineering blog today: Batch Processing Millions and Millions of Images.

Tags: ,

Etsy is Hiring, Meet us at Velocity Conf

Posted by Mike Brittain on June 22, 2010
Job Openings, etsy / Comments Off

We still have a number of open positions at Etsy that are located in both New York City and San Francisco.  John Allspaw and Mike Brittain will be at O’Reilly’s Velocity Conference this week.  If you’re attending Velocity, don’t miss out on the chance to talk to one of us about working at Etsy.

These positions include:

  • Senior Operations Engineer
  • Web Operations Engineer
  • Senior Engineers and Lead Engineers in various areas, including Search, Payments, Social Systems, Community, Internationalization, Internal Tools, Fraud and Risk Management, and Content Platforms
  • Test Automation Engineer
  • Release Manager
  • Technical Project Manager

Check out our recruiting site for more details about each of these positions, or email me directly.  I hate having to say it, but please, no recruiters.

And don’t miss out on John’s talk at Velocity: Ops Meta-Metrics, The Currency You Use to Pay for Change.

Tags: , , , , ,

Zero-Day Deploys

Posted by Mike Brittain on June 10, 2010
etsy / 3 Comments

When you bring new engineers on-board, it’s important to make them productive as soon as possible.  Spending days (or weeks) in training session, locked down permissions, and/or watchful eyes simply constrains your shiny, new developer from getting shit done.  (I’ll be honest and say that I’ve been one of those types in the past.)

At Etsy, every engineer has access to deploy code to our production site.  We use a tool called “Deployinator” to do this quickly and easily.  It’s one button.  A culture of unit and functional testing (know when your code done broke), transparency (know when releases are happening), operational metrics (know when your servers are crying), and personal accountability (know when it’s your fault) keeps the entire process under control.

Earlier this week, we did something brand new for us…

watching a new engineer who started at Etsy today push code RIGHT NOW. awesome.

~chaddickerson, Etsy’s CTO

That’s right. On his very first day, Jason got setup in his development environment, made a change to our code base (albeit a small one), tested, committed, and deployed to the production web servers.

I can just hear it: “Why would you ever let someone deploy code on their first day?” But if you’re asking this question, I have to assume that you’ve also found your self wondering at times, “How long is it going to be before Joe Newbie is going to be up to speed?

Sure, it’s going to take time for every new hire to learn all of the ropes. But it’s better for them to be productive, confident and experienced in releasing code (even if it’s small changes!) while they’re learning all of the other details than it is for them to sitting on their hands waiting for coach to put them into the game.

And in case you’re wondering, Jason has already released code three times in his first week… possibly four by time I finish writing this.

Want to get in on the action? We’re hiring in both Engineering and Operations.

Tags: , , , ,

Moving Billions of Objects Within S3

Posted by Mike Brittain on May 25, 2010
Misc / Comments Off

This is an interesting write-up on how to move enormous numbers of objects within S3.  I have a similar post coming up about large-scale batch processing that I’m looking forward to sharing in the next couple of weeks.  But give this a read.  It’s interesting!

Tags:

Twitter’s Photo Storage (from the outside looking in)

Posted by Mike Brittain on May 24, 2010
Cloud Computing, WWW / 1 Comment

I’ve been working on some photo storage and serving problems at Etsy, which is exciting work given the number of photos we store for the items being sold on the site.  This sort of project makes you wonder how other sites are handling their photo storage and serving architectures.

Today I spent a few minutes looking at avatar photos from Twitter from the outside.  This is all from inspection of URLs and HTTP headers, and completely unofficial and unvalidated assumptions.

Two things I found interesting today were (1) the rate of new avatar photos being added to Twitter, and (2) the architecture for storing and serving images.

Avatars

Avatar photos at Twitter have URLs that look something like the following:

http://a3.twimg.com/profile_images/689887435/my_photo.jpg

I’m assuming the numeric ID increments linearly with each photo that is uploaded… two images uploaded a few minutes apart showed a relatively small increase between these IDs.  I compared one of these IDs with the ID of an older avatar, along with the “Last-Modified” header that was included with its HTTP response headers:

Last-Modified Tue, 26 Feb 2008 03:15:46 GMT

Comparing these numbers shows that Twitter is currently ingesting somewhere over two million avatars per day.

Stock, or library, avatars have different URLs, meaning they are not served or stored the same way as custom avatars.  This is good because you get the caching benefits of reusing the same avatar URL for multiple users.

http://s.twimg.com/a/1274144130/images/default_profile_6_normal.png

Storage and Hosting

Running a “host” look up on the hostname of an avatar URL shows a CNAME to Akamai’s cache network:

$ host a3.twimg.com
a3.twimg.com is an alias for a3.twimg.com.edgesuite.net.
a3.twimg.com.edgesuite.net is an alias for a948.l.akamai.net.
a948.l.akamai.net has address 96.6.41.171
a948.l.akamai.net has address 96.6.41.170

If you’re familiar with Akamai’s network, you can dig into response headers that come from their cache servers.  I did a little of that, but the thing I found most interesting is that Akamai plucks avatar images from Amazon’s CloudFront service.

x-amz-id-2: NVloBPkil5u…
x-amz-request-id: 1EAA3DE5516E…
Server: AmazonS3
X-Amz-Cf-Id: 43e9fa481c3dcd79…

It’s not news that Twitter uses S3 for storing their images, but I hadn’t thought about using CloudFront (which is effectively a CDN) as an origin to another CDN.  The benefit here, aside from not pounding the crap out of S3, is that Akamai’s regional cache servers can pull avatars from CloudFront POPs that are relatively close, as opposed to reaching all the way back to a single S3 origin (such as the “US Standard Region”, which I believe has two locations in the US).  CloudFront doesn’t have nearly as many global POPs as Akamai. But using it does speed up image delivery by ensuring that Akamai’s cache servers in Asia are grabbing files from a CloudFront POP in Hong Kong or Singapore, rather than jumping across the Pacific to North America.

I suspect that Twitter racks up a reasonably large bill with Amazon by storing and serving so many files from S3 and CloudFront.  However, it takes away the burden of owning all of the hardware, bandwidth, and man power required to serve millions upon millions of images… especially when that is not a core feature of their site.

Tags: , , , , , ,

Mobile Sites Proliferate by Category

Posted by Mike Brittain on April 21, 2010
Mobile / Comments Off

An article on RWW about mobile apps and browser-based sites highlights the different concentrations in categories between native apps and mobile sites.  Native apps are heavily weighted toward games and entertainment.  Mobile web sites are heavily weighted toward shopping and social categories:

… 19% of the mobile sites measured were Shopping & Services sites; compared to 3.6% in the same category in the App Store. Content in the ‘Social’ category also has a higher chance of being a browser-based mobile site, rather than an app (12.9% to 1.7%).

Additionally, Taptu estimates that mobile site growth far outpaces the growth of native apps on any other platform, including the iPhone and App Store.

My position for a long while has been that mobile sites have much better reach due to the ability to access them from any mobile device with a decent browser, without having to download an app.  This makes cross-platform development much easier for existing web teams.  As more mobile platforms take up the WebKit rendering engine, including this week’s report that BlackBerry 6.0 will include a touch browser backed by WebKit, the baseline for development across the myriad of mobile devices is actually much better than what we had with the first web browsers in the late ’90s.

Still, the perception by consumers that apps are hip, as well as aggressive app-centric marketing by carriers, sets a higher barrier for consumers to understand the wealth of mobile sites available to them on their existing handsets.  Visibility is still an issue here.

Tags: , , , ,

NBCOlympics.com Delivers Quality TV to Web Crossover

Posted by Mike Brittain on February 16, 2010
WWW / Comments Off

I write this because I expected it not to work. Tonight while watching coverage from the Olympics, an ad popped up promotion additional background video available online. I opened the site on my iPhone, browsed through the videos, and watched the video that was highlighted on the broadcast coverage.

Great experience. No Flash Required.

Tags: , , , ,

Batch Process Image Optimizations

Posted by Mike Brittain on February 14, 2010
WWW / 2 Comments

A couple weeks ago I wrote a post about a script I had put together for batch processing JPEGs with jpegtran.  This week I extended that script so that it handles processing GIF and PNG images as well.  It’s now a project on GitHub called “Wesley“.

Wesley is a single Perl script that you run from the command line, supplying the path to a single file name or a directory where you keep your site’s images.  If you work on a Linux or Mac development server, you can quickly run this script against all new images that you add to your site code.  Additionally, you could tie this into your build process or pre-commit hook for your preferred source control.  I haven’t spent time on this yet, but expect to add a write up on it soon.

The script strips meta data and comments from image files and tries to optimize images using lossless techniques.  You should be able to run Wesley on your images without any reduction in quality.

Wesley makes use of locally installed copies of ImageMagick, jpegtran, pngcrush, and gifsicle.  Some of these are probably already installed on your own machine (or shared hosting service).  If you are missing one or more of these packages, you can still run Wesley and it will use as many packages as you have available.

Usage

   wesley.pl  /path/to/images/

Sample Output

  ----------------------------
    Summary
  ----------------------------

  Converting the following GIFs to PNG would save additional file size.
  Bytes saved: 19173 (orig 149404, saved 12.83%)

    ./hd-sm.gif
    ./top_navigation.gif
    ./logo.gif

  Inspected 226 JPEG files.
  Modified 190 files.
  Huffman table optimizations: 138
  Progressive JPEG optimizations: 52
  Bytes saved: 408508 (orig 2099658, saved 19.45%)

  Inspected 105 PNG files.
  Modified 99 files.
  Bytes saved: 84618 (orig 315056, saved 26.85%)

  Inspected 129 GIF files.
  Modified 70 files.
  Bytes saved: 57535 (orig 1393120, saved 4.12%)

  Total bytes saved: 550661 (orig 3807834, saved 14.46%)

Tags: , , ,