On Dedicated Performance Teams

Posted by Mike Brittain on May 01, 2012
Engineering, etsy / Comments Off on On Dedicated Performance Teams

Earlier this year, I gave another talk about Web Performance at Etsy for the NY Web Performance meetup. One of the points we stress on my team is that while we have a dedicated performance team, the primary focus is on building tools, settings goals, and creating a performance culture where everyone on the engineering team at Etsy is thinking about performance. We don’t believe in cleaning up other peoples’ mess, we believe in teaching others to “yearn for the vast and endless sea.

I was looking back at John Rauser’s talk on “Creating Cultural Change” from Velocity 2010 and noted this quote, which shares exactly the same mindset:

In fact, at Amazon we have a performance team, but we explicitly seek to put ourselves out of business. We want the tools to be so good and the culture so strong that one day there won’t be a need for a performance team anymore.

Tags: ,

Batch Process Image Optimizations

Posted by Mike Brittain on February 14, 2010
WWW / 2 Comments

A couple weeks ago I wrote a post about a script I had put together for batch processing JPEGs with jpegtran.  This week I extended that script so that it handles processing GIF and PNG images as well.  It’s now a project on GitHub called “Wesley“.

Wesley is a single Perl script that you run from the command line, supplying the path to a single file name or a directory where you keep your site’s images.  If you work on a Linux or Mac development server, you can quickly run this script against all new images that you add to your site code.  Additionally, you could tie this into your build process or pre-commit hook for your preferred source control.  I haven’t spent time on this yet, but expect to add a write up on it soon.

The script strips meta data and comments from image files and tries to optimize images using lossless techniques.  You should be able to run Wesley on your images without any reduction in quality.

Wesley makes use of locally installed copies of ImageMagick, jpegtran, pngcrush, and gifsicle.  Some of these are probably already installed on your own machine (or shared hosting service).  If you are missing one or more of these packages, you can still run Wesley and it will use as many packages as you have available.

Usage  /path/to/images/

Sample Output


  Converting the following GIFs to PNG would save additional file size.
  Bytes saved: 19173 (orig 149404, saved 12.83%)


  Inspected 226 JPEG files.
  Modified 190 files.
  Huffman table optimizations: 138
  Progressive JPEG optimizations: 52
  Bytes saved: 408508 (orig 2099658, saved 19.45%)

  Inspected 105 PNG files.
  Modified 99 files.
  Bytes saved: 84618 (orig 315056, saved 26.85%)

  Inspected 129 GIF files.
  Modified 70 files.
  Bytes saved: 57535 (orig 1393120, saved 4.12%)

  Total bytes saved: 550661 (orig 3807834, saved 14.46%)

Tags: , , ,

Batch Processing your JPEGs with jpegtran

Posted by Mike Brittain on January 27, 2010
WWW / 1 Comment

UPDATE: Please read my post about a new version of this image processing script.

Stoyan Stefanov wrote up a nice post last year about installing jpegtran on a Mac or Unix/Linux system so that you can run optimizations on your JPEG files.  His conclusion on jpegtran is that you can save about 10% on your JPEG file sizes for “about a minute of work, or less.”

Sounds great!  I looked it over and, indeed, jpegtran cuts some of the junk out of the JPEG files I tested.  The only holdup, however, is that at CafeMom we have a few thousand JPEG files in our site code, and that number grows every week.  The only reasonable solution was to automate this process.

The following Perl script should work right out of the box for you, assuming you already have jpegtran installed on your server or shared hosting account.


# Lossless optimization for all JPEG files in a directory
# This script uses techniques described in this article about the use
# of jpegtran:

use strict;
use File::Find;
use File::Copy;

# Read image dir from input
if (!$ARGV[0]) {
    print "Usage: $0 path_to_images\n";
    exit 1;
my @search_paths;
my $images_path = $ARGV[0];
if (!-e $images_path) {
    print "Invalid path specified.\n";
    exit 1;
} else {
    push @search_paths, $ARGV[0];

# Compress JPEGs
my $count_jpegs = 0;
my $count_modified = 0;
my $count_optimize = 0;
my $count_progressive = 0;
my $bytes_saved = 0;
my $bytes_orig = 0;

find(\&jpegCompress, @search_paths);

# Write summary
print "\n\n";
print "----------------------------\n";
print "  Sumary\n";
print "----------------------------\n";
print "\n";
print "  Inspected $count_jpegs JPEG files.\n";
print "  Modified $count_modified files.\n";
print "  Huffman table optimizations: $count_optimize\n";
print "  Progressive JPEG optimizations: $count_progressive\n";
print "  Total bytes saved: $bytes_saved (orig $bytes_orig, saved "
       . (int($bytes_saved/$bytes_orig*10000) / 100) . "%)\n";
print "\n";

sub jpegCompress()
    if (m/\.jpg$/i) {

        my $orig_size = -s $_;
        my $saved = 0;

        my $fullname = $File::Find::dir . '/' . $_;

        print "Inspecting $fullname\n";

        # Run Progressive JPEG and Huffman table optimizations, then inspect
		# which was best.
        `/usr/bin/jpegtran -copy none -optimize $_ > $_.opt`;
        my $opt_size = -s "$_.opt";

        `/usr/bin/jpegtran -copy none -progressive $_ > $_.prog`;
        my $prog_size = -s "$_.prog";

        if ($opt_size && $opt_size < $orig_size && $opt_size <= $prog_size) {
            move("$_.opt", "$_");
            $saved = $orig_size - $opt_size;
            $bytes_saved += $saved;
            $bytes_orig += $orig_size;

            print " -- Huffman table optimization: "
				. "saved $saved bytes (orig $orig_size)\n";

        } elsif ($prog_size && $prog_size < $orig_size) {
            move("$_.prog", "$_");
            $saved = $orig_size - $prog_size;
            $bytes_saved += $saved;
            $bytes_orig += $orig_size;

            print " -- Progressive JPEG optimization: "
				. "saved $saved bytes (orig $orig_size)\n";

        # Cleanup temp files
        if (-e "$_.prog") {
        if (-e "$_.opt") {

How to use this script

For starters, copy this script into a text file (such as and set it to be executable (chmod 755

After the script is setup, pull the trigger…

$ ./  /path/to/your/images/dir

That’s it.  The output should look something like this:

Inspecting ./phpXkWlcW.jpg
 -- Progressive JPEG optimization: saved 1089 bytes (orig 13464)
Inspecting ./phpCnBRri.jpg
 -- Progressive JPEG optimization: saved 1155 bytes (orig 34790)
Inspecting ./phpx6G3lD.jpg
 -- Progressive JPEG optimization: saved 742 bytes (orig 11493)



  Inspected 21 JPEG files.
  Modified 21 files.
  Huffman table optimizations: 0
  Progressive JPEG optimizations: 21
  Total bytes saved: 63918

Wrap up

Many thanks to Stoyan for his post on jpegtran, and all of the other performance ideas he has been sharing on his blog.  This script was easy to write, knowing the right techniques to be running on our images. took about a minute or so to run on our thousands of images and shaved a few megabytes from all of those files combined.  This will be a great savings for us.

Tags: , , , ,

Choosing a CDN

Posted by Mike Brittain on January 15, 2010
WWW / 1 Comment

I thought this was a good article covering 8 Things to Consider When Choosing a CDN.  It’s worth a read.

The only bit I would disagree with is that this reads a bit too video-centric for me.  It felt like the main argument for using a CDN is that you don’t have enough bandwidth at your own data center to handle all of the requests being made to your servers.

I use CDNs for delivering static objects like images, CSS, and JavaScript.  An additional consideration I have is how fast cached objects will reach different locations across the country or around the world.  I’m dependent (as most sites are) on one central data center where my pages are being generated.  Every page view needs to make that trip over the network from the browser to my data center.  However, if the 20-100 successive objects can be requested from a regional CDN node, the performance in the end-user’s browser will be much better than if every request made a full trip across the country.

Tags: , ,

Leaner iPhone Interfaces with CSS Gradients

Posted by Mike Brittain on July 05, 2009
Mobile / Comments Off on Leaner iPhone Interfaces with CSS Gradients

I started playing around with Safari’s CSS gradients yesterday to see whether they would be usable on One tsp.’s mobile interface.  Looks like there has been support in WebKit for about a year now, but I don’t know specifics about how that translates to versions of Safari and other browsers built on top of WebKit.

The demos seemed to work for me in Safari 4 and in the latest version of mobile Safari built into the iPhone 3.0 OS.  I tested the 2.0 OS and it did not support gradients. I don’t know what support the Palm Pre browser has available.

This looked good enough for me, through.  Much of the interface for One tsp. is already taking advantage of a few CSS extensions with varying support.  The interface looks its best on modern browsers (IE excluded) but is still totally usable everywhere else.

So what’s the difference?

I’ve only replaced one gradient background so far, but I’m stunned.  By defining the gradient in CSS, I’ve added just 92 bytes to my style sheet.  This allowed me to remove the background-image rule I had in place to load an image file, which was 50 bytes.  The image file that is no longer needed was pretty small (635 bytes) but also meant another external request that needed to be made.  When we’re talking about a mobile device, extra requests can have a high latency — worse than what we typically think of for the web.

These are pretty small numbers, I’ll admit.  But assuming I have six gradients defined per page, the net savings would be trading around 4 KB and six additional requests for about 260 bytes and no additional requests.  That’s pretty cool.

Faster Mobile Interfaces

Successful mobile web applications need to be super fast. Users trading a native app for a web app will expect it to be responsive. Speed can be improved through faster server responses, low mobile network latency (which we have little control over), fewer and smaller requests to the server, and cacheability on as much content as possible.

Rounded corners and background gradients are two frequently used interface styles that can now be achieved directly in the browser using CSS, eliminating the need for many additional image requests.

Tags: , , , , , ,

How to Improve JavaScript Latency in Mobile Browsers

Posted by Mike Brittain on January 20, 2009
Mobile / 2 Comments

Mobile browsers are really coming along.  Mobile Safari is built on top of WebKit and has just as much capability as the desktop version.  Same with Android’s browser.  Blackberry’s browser, I understand, has improved tremendously over previous versions.  The new offering from Palm centers application development around web technologies HTML, CSS, and JavaScript.

As more applications and data grow to live in the cloud, then access to them via a browser must be easy and fast, which is often not the case with data on mobile devices.  A web site can take many seconds to several minutes to load all of the content required.  And at the heart of many sites these days lie some common elements — JavaScript libraries.

Personally, I have avoided heavy-weight libraries for mobile application development, because I know that they are a burden to the end-user.  This is less often the case for desktop users, who typically have broadband connections at home or at work.  So what do we do to improve this situation?

I propose that the mobile browser makers (or OS makers, in most cases) embed standard versions of common JavaScript libraries within their browsers.  Google already makes a number of these available as a hosted solution for web application developers: jQuery, YUI, Prototype,, etc.  Other players, particularly in the CDN space, could also become involved in hosting these frameworks.  Nearly half of the libraries that Google hosts are larger than the 25 KB cache limit in mobile Safari (for example).  By embedding a handful of these libraries, mobile browsers could speed up some of the overhead of mobile applications that rely on Ajax or heavy DOM manipulation.

How would you do this?  Likely by inspecting HTTP requests by URL.  Google’s hosted libraries include version numbers, which allows developers to peg their work to a specific version, not having to worry about quirks in future versions that could upset their apps.  When an application makes use of one of these embedded libraries, the browser can simply execute the JavaScript library without having to make an external request.  If the application uses a newer version that is not embedded in the browser, the HTTP request would proceed as normal.  End users would get a slower experience than with an embedded framework, but that experience would be no worse than we have now.

I’m interested in hearing others’ thoughts about this idea.

Tags: , , , , , , , , , , , ,

Download Speeds from Amazon S3

Posted by Mike Brittain on September 25, 2008
Cloud Computing / Comments Off on Download Speeds from Amazon S3

I’ve been planning to post some details about download speeds that I’ve seen from S3, and why you shouldn’t necessarily use S3 as a CDN, yet.  Granted, Amazon recently announced that they will be providing a content delivery service in front of S3.  This post has nothing to do with the CDN (CDS).

Scott posted the presentation he gave at the AWS Start-Up Tour.  It’s worth a read, and is a good summary of our business case for building the EC2/S3 hosting platform that I led when I worked at Heavy.

His post includes this graphic of how we measured S3 delivery speeds throughout the day.  Matt Spinks wrote the Munin plugin that generated this graph, and he tells me he’s planning to make that available for others to use.  When it is, I’ll add the link here. It’s now available at Google Code.

As we measured it, S3 is fairly variable in their delivery speeds.  Unfortunately, we didn’t measure latency for initial bits, which would be good to know as well.

My own impression is that it is not a good idea to host video directly from S3 if you run a medium to large web site.  The forthcoming CDN service will probably help with this.  If you’re a small to medium site, you might be happy with hosting video on S3.  Hosting images and other static content (say, CSS and JS files) might also be a good idea if you don’t have a lot of your own server capacity.

For my own use, I’m planning on using S3 to host images and static content for some other sites I run, which use a shared hosting provider for serving PHP.  On a low traffic site, I’d be happy to offload images to S3.  And when the CDN service becomes available, the user experience should be even snappier.

One thing to note if you plan to use S3 for image hosting, look into providing the correct cache-control headers on your objects in S3.  You need to do this when putting content onto S3.  You can’t modify headers on existing content.  More on this in a future post.

Tags: , , , ,

Autoloading PHP Classes to Reduce CPU Usage

Posted by Mike Brittain on March 27, 2008
PHP / 3 Comments

At Heavy, we have been using a database ORM called Propel for some time now. I’m not a huge fan, to be honest, but it is what it is. One issue that we ran into is that Propel generates about five class files for every table that you model in PHP. With a few hundred tables, there are quite a few class files in our database code. We use Propel in an application layer behind our front-end web servers. Given that some of the controller scripts for the application layer might handle a variety of requests and deal with a large number of tables, you might imagine that a single script could end up requiring 50-200 class files at runtime.


Last fall, we took a close look at how to split up this load and only compile the truly necessary PHP class files when needed. We setup an autoload routine and saw dramatic results. We went from a typical 175% CPU usage on each 4 processor server down to around 50%. Much of the CPU seems to have been eaten up by constant recompiling of this huge number of class files. (Note: we also looked at using APC in the past to cache compiled PHP code, but ran into regular segfault issues on Apache when running it, which I believe are a fault of our own legacy configurations.)

Our autoloader looks something like this:

function __autoload ($class_name)
  if (!class_exists($class_name, false)) {
    $class_file_path = str_replace('_', '/', $class_name) . '.php';

There is an assumption, here, that you’ve got your PHP classes in order. One class per file, and using a PEAR-ish naming convention, where the class Input::Validator is stored in as Input/Validator.php in your include path.

If your class files are always located in one library directory, you can get some extra points by using an absolute path in your autoloader so that PHP doesn’t have to search your configured include path. For us, we’ve modeled our include path so that we can use custom classes, Zend Framework classes, and PEAR classes together. This ensures that our custom libraries override calls for similarly named classes in Zend or PEAR (which can be bad if you’re not careful!).

  1. Current working directory (.)
  2. Custom library path
  3. Zend Framework path
  4. PEAR path
  5. PHP standard path

Since we use this across our code base, we included the autoload function in a script that gets automatically loaded for every PHP script using auto_prepend_file. I can hear the groans already. The truth is that this works very well for our codebase. Your mileage may vary.

What does this gain us?

First of all, we improved the overall performance of our application, and I think pretty dramatically. Figures 1 and 2 (below) show Munin reports of our CPU usage on front-end servers and application servers. The drops in October and November coincide with our code promotions that included class autoloading.

Simplified development is another benefit. We never have to think about whether we’ve already included a class file in a script we’re writing, or whether it was conditionally included somewhere other than the top of the script file. If the class hasn’t been defined, the autoloader takes care of loading the file.

Finally, classes that fall out of use in our codebase automatically fall out of the runtime code. Unless we use a class, it’s class file is never included in the running script. When managing a lot of legacy code, possibly written by people who no longer work with you, this works great for culling out the old classes.

CPU Usage for Web Server
Figure 1. CPU usage for front-end web server

CPU Usage for Application Server
Figure 2. CPU usage for back-end application server

Tags: , , , , ,

Improve PHP Performance by Limiting Session Cookies

Posted by Mike Brittain on March 10, 2008
PHP / 2 Comments

I’ve been looking at PHP sessions again for a new project. At Heavy, we’re very conservative about using sessions at all on the site. So I have been thinking about the performance impact of mindlessly turning on session.auto_start.

Let’s start with the assumption that on many web sites, a small percentage of web traffic is actually from visitors who are logged in. In other words, many visitors arrive at your site, look at a page or two, maybe search for products or content, and then leave. Why bother setting a session for all of these page views if your not storing anything in it?

What happens when a PHP session starts up? A cookie is set for the visitor’s browser with the session identifier, “PHPSESSID”, by default. The session data, if available, is loaded from the session store, which is in a file — unless you’ve moved it to something faster. The data that is loaded is then unserialized into parameters in the $_SESSION global.

Additionally, at startup the PHP session handler rolls the dice to see if it’s time to do garbage collection on expired sessions. Unless you’ve changed the default settings for PHP, then there’s a 1% chance that PHP will have to sort through all of your available session files to find out if any are ripe for dismissal.

This is just a summary. I’ll readily admit I don’t know all of the internals of session management in PHP. I also can’t speak to whether PHP re-saves your visitor’s session on every page view, whether or not the data has been changed. If anyone can answer that, I’d love to know.

Finally, one other thing that I’ve noticed is that once you have a PHP session running, some additional cache-busting HTTP headers seem to be added to the server response for a page view:

Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache

These headers make it impossible for your browser to cache a page, even if it’s a PHP script that returns virtually static content.

So tonight I took an extra step after setting up a login class that makes use of sessions for storing a visitors username and password credentials. The code conditionally checks to see whether or not a session handler needs to be fired up. Session_start() is only called if the HTTP request includes a session cookie.

if (isset($_COOKIE['PHPSESSID'])) {

This code could go in one of two places, either in the constructor for a login class, or if you potentially need session data in more places in your code, maybe in a file that gets auto-loaded on every request.

Once you get to a sign-in page, the login class would be responsible for firing up the session if it does not yet exist. For me, this is in a method called “authenticate()”:

public function authenticate ()
if (!isset($_SESSION)) {
// Do rest of user validation...

Note that we can use isset() to see if $_SESSION exists, which prevents E_NOTICE messages from being fired by session_start() if there is already a session in progress.

With these small changes, I can surf all over my site without having a session started up. The session is only initialized when I log into my site’s account. Furthermore, you could add the additional behavior of explicitly deleting the session cookie for the visitor once they have logged-out of your site. While session_destroy() will delete data within the session file, it doesn’t delete the cookie from your visitor’s browser.

Tags: , ,