GDS Design Principles and “Digital Services”

Posted by Mike Brittain on May 02, 2013
Mobile, WWW / Comments Off on GDS Design Principles and “Digital Services”

The UK Government Digital Services has published their Design Principles last summer. It’s a great read. And if you’ve already read it, it’s still a great resource to reflect on. These are principles that would resonate with most of the technical teams I’ve worked on over the past years, though they often fail to appear in the products that are created.

Two principles stand out:

1. Start with (user) needs

8. Build digital services, not websites

The products we build need to solve real needs, rather than expose the organizational concerns or bureaucracies. This is more intuitive for new product start-ups, and less so for legacy institutions who are trying to increase their online presence.

And the products we build need to make sense within the current digital landscape—which is to say, outside (complementary?) services and the wide variety of devices in use by consumers of these services. Form factors—both existing and emerging—and usage context play roles in how we design products.

Twitter’s Photo Storage (from the outside looking in)

Posted by Mike Brittain on May 24, 2010
Cloud Computing, WWW / 1 Comment

I’ve been working on some photo storage and serving problems at Etsy, which is exciting work given the number of photos we store for the items being sold on the site.  This sort of project makes you wonder how other sites are handling their photo storage and serving architectures.

Today I spent a few minutes looking at avatar photos from Twitter from the outside.  This is all from inspection of URLs and HTTP headers, and completely unofficial and unvalidated assumptions.

Two things I found interesting today were (1) the rate of new avatar photos being added to Twitter, and (2) the architecture for storing and serving images.


Avatar photos at Twitter have URLs that look something like the following:


I’m assuming the numeric ID increments linearly with each photo that is uploaded… two images uploaded a few minutes apart showed a relatively small increase between these IDs.  I compared one of these IDs with the ID of an older avatar, along with the “Last-Modified” header that was included with its HTTP response headers:

Last-Modified Tue, 26 Feb 2008 03:15:46 GMT

Comparing these numbers shows that Twitter is currently ingesting somewhere over two million avatars per day.

Stock, or library, avatars have different URLs, meaning they are not served or stored the same way as custom avatars.  This is good because you get the caching benefits of reusing the same avatar URL for multiple users.


Storage and Hosting

Running a “host” look up on the hostname of an avatar URL shows a CNAME to Akamai’s cache network:

$ host a3.twimg.com
a3.twimg.com is an alias for a3.twimg.com.edgesuite.net.
a3.twimg.com.edgesuite.net is an alias for a948.l.akamai.net.
a948.l.akamai.net has address
a948.l.akamai.net has address

If you’re familiar with Akamai’s network, you can dig into response headers that come from their cache servers.  I did a little of that, but the thing I found most interesting is that Akamai plucks avatar images from Amazon’s CloudFront service.

x-amz-id-2: NVloBPkil5u…
x-amz-request-id: 1EAA3DE5516E…
Server: AmazonS3
X-Amz-Cf-Id: 43e9fa481c3dcd79…

It’s not news that Twitter uses S3 for storing their images, but I hadn’t thought about using CloudFront (which is effectively a CDN) as an origin to another CDN.  The benefit here, aside from not pounding the crap out of S3, is that Akamai’s regional cache servers can pull avatars from CloudFront POPs that are relatively close, as opposed to reaching all the way back to a single S3 origin (such as the “US Standard Region”, which I believe has two locations in the US).  CloudFront doesn’t have nearly as many global POPs as Akamai. But using it does speed up image delivery by ensuring that Akamai’s cache servers in Asia are grabbing files from a CloudFront POP in Hong Kong or Singapore, rather than jumping across the Pacific to North America.

I suspect that Twitter racks up a reasonably large bill with Amazon by storing and serving so many files from S3 and CloudFront.  However, it takes away the burden of owning all of the hardware, bandwidth, and man power required to serve millions upon millions of images… especially when that is not a core feature of their site.

Tags: , , , , , ,

NBCOlympics.com Delivers Quality TV to Web Crossover

Posted by Mike Brittain on February 16, 2010
WWW / Comments Off on NBCOlympics.com Delivers Quality TV to Web Crossover

I write this because I expected it not to work. Tonight while watching coverage from the Olympics, an ad popped up promotion additional background video available online. I opened the site on my iPhone, browsed through the videos, and watched the video that was highlighted on the broadcast coverage.

Great experience. No Flash Required.

Tags: , , , ,

Batch Process Image Optimizations

Posted by Mike Brittain on February 14, 2010
WWW / 2 Comments

A couple weeks ago I wrote a post about a script I had put together for batch processing JPEGs with jpegtran.  This week I extended that script so that it handles processing GIF and PNG images as well.  It’s now a project on GitHub called “Wesley“.

Wesley is a single Perl script that you run from the command line, supplying the path to a single file name or a directory where you keep your site’s images.  If you work on a Linux or Mac development server, you can quickly run this script against all new images that you add to your site code.  Additionally, you could tie this into your build process or pre-commit hook for your preferred source control.  I haven’t spent time on this yet, but expect to add a write up on it soon.

The script strips meta data and comments from image files and tries to optimize images using lossless techniques.  You should be able to run Wesley on your images without any reduction in quality.

Wesley makes use of locally installed copies of ImageMagick, jpegtran, pngcrush, and gifsicle.  Some of these are probably already installed on your own machine (or shared hosting service).  If you are missing one or more of these packages, you can still run Wesley and it will use as many packages as you have available.


   wesley.pl  /path/to/images/

Sample Output


  Converting the following GIFs to PNG would save additional file size.
  Bytes saved: 19173 (orig 149404, saved 12.83%)


  Inspected 226 JPEG files.
  Modified 190 files.
  Huffman table optimizations: 138
  Progressive JPEG optimizations: 52
  Bytes saved: 408508 (orig 2099658, saved 19.45%)

  Inspected 105 PNG files.
  Modified 99 files.
  Bytes saved: 84618 (orig 315056, saved 26.85%)

  Inspected 129 GIF files.
  Modified 70 files.
  Bytes saved: 57535 (orig 1393120, saved 4.12%)

  Total bytes saved: 550661 (orig 3807834, saved 14.46%)

Tags: , , ,

A Web Without Flash?

Posted by Mike Brittain on January 31, 2010
WWW / 1 Comment

Scoble has an interesting post asking Can Flash Be Saved? His central point is that developers will make use of whatever technologies that are widely available to their users which, of course, makes sense.  He draws an analogy between the growth of the iPhone OS, which does not include Flash, and Firefox, which does not include Microsoft-centric technologies like ActiveX.  Within the first years that Firefox began to steal share from Internet Explorer in the consumer market, developers changed the manner that they coded their sites by embracing Web Standards.

The interesting thing about that change was that Firefox’s growth was fairly slow outside of the developer community.  Many developers loved Firefox, but it took a few years for the general public to get hip to the browser.  I would have to say that the iPhone has far higher brand recognition and uptake.  You could attribute this to Apple’s strong marketing machine.

Mobile web browsing is still a small market compared to desktop browsers.  It’s a tiny, but growing market that can’t be ignored.  I believe that in just a few years mobile Internet services (including networked apps)  will be more important than the desktop Internet for communication, entertainment, and consuming information.  Devices like the iPhone and iPad are driving this forward.  Android is growing and finally seeing a wider number of devices offering Google’s mobile OS.  BlackBerry is a widely used platform, but it doesn’t contribute much to mobile web usage (yet).

If these (iPhone and Android) operating systems become the leading platforms in the mobile space, developers will build sites and apps based on the technologies available across them, namely HTML5 support that is part of the WebKit engine.  If Flash continues to be blockaded from the iPhone OS, developers will certainly look to other technologies to replace it.

Tags: , , , , ,

Batch Processing your JPEGs with jpegtran

Posted by Mike Brittain on January 27, 2010
WWW / 1 Comment

UPDATE: Please read my post about a new version of this image processing script.

Stoyan Stefanov wrote up a nice post last year about installing jpegtran on a Mac or Unix/Linux system so that you can run optimizations on your JPEG files.  His conclusion on jpegtran is that you can save about 10% on your JPEG file sizes for “about a minute of work, or less.”

Sounds great!  I looked it over and, indeed, jpegtran cuts some of the junk out of the JPEG files I tested.  The only holdup, however, is that at CafeMom we have a few thousand JPEG files in our site code, and that number grows every week.  The only reasonable solution was to automate this process.

The following Perl script should work right out of the box for you, assuming you already have jpegtran installed on your server or shared hosting account.



# Lossless optimization for all JPEG files in a directory
# This script uses techniques described in this article about the use
# of jpegtran: http://www.phpied.com/installing-jpegtran-mac-unix-linux/

use strict;
use File::Find;
use File::Copy;

# Read image dir from input
if (!$ARGV[0]) {
    print "Usage: $0 path_to_images\n";
    exit 1;
my @search_paths;
my $images_path = $ARGV[0];
if (!-e $images_path) {
    print "Invalid path specified.\n";
    exit 1;
} else {
    push @search_paths, $ARGV[0];

# Compress JPEGs
my $count_jpegs = 0;
my $count_modified = 0;
my $count_optimize = 0;
my $count_progressive = 0;
my $bytes_saved = 0;
my $bytes_orig = 0;

find(\&jpegCompress, @search_paths);

# Write summary
print "\n\n";
print "----------------------------\n";
print "  Sumary\n";
print "----------------------------\n";
print "\n";
print "  Inspected $count_jpegs JPEG files.\n";
print "  Modified $count_modified files.\n";
print "  Huffman table optimizations: $count_optimize\n";
print "  Progressive JPEG optimizations: $count_progressive\n";
print "  Total bytes saved: $bytes_saved (orig $bytes_orig, saved "
       . (int($bytes_saved/$bytes_orig*10000) / 100) . "%)\n";
print "\n";

sub jpegCompress()
    if (m/\.jpg$/i) {

        my $orig_size = -s $_;
        my $saved = 0;

        my $fullname = $File::Find::dir . '/' . $_;

        print "Inspecting $fullname\n";

        # Run Progressive JPEG and Huffman table optimizations, then inspect
		# which was best.
        `/usr/bin/jpegtran -copy none -optimize $_ > $_.opt`;
        my $opt_size = -s "$_.opt";

        `/usr/bin/jpegtran -copy none -progressive $_ > $_.prog`;
        my $prog_size = -s "$_.prog";

        if ($opt_size && $opt_size < $orig_size && $opt_size <= $prog_size) {
            move("$_.opt", "$_");
            $saved = $orig_size - $opt_size;
            $bytes_saved += $saved;
            $bytes_orig += $orig_size;

            print " -- Huffman table optimization: "
				. "saved $saved bytes (orig $orig_size)\n";

        } elsif ($prog_size && $prog_size < $orig_size) {
            move("$_.prog", "$_");
            $saved = $orig_size - $prog_size;
            $bytes_saved += $saved;
            $bytes_orig += $orig_size;

            print " -- Progressive JPEG optimization: "
				. "saved $saved bytes (orig $orig_size)\n";

        # Cleanup temp files
        if (-e "$_.prog") {
        if (-e "$_.opt") {

How to use this script

For starters, copy this script into a text file (such as optimize_jpegs.pl) and set it to be executable (chmod 755 optimize_jpegs.pl).

After the script is setup, pull the trigger…

$ ./optimize_jpegs.pl  /path/to/your/images/dir

That’s it.  The output should look something like this:

Inspecting ./phpXkWlcW.jpg
 -- Progressive JPEG optimization: saved 1089 bytes (orig 13464)
Inspecting ./phpCnBRri.jpg
 -- Progressive JPEG optimization: saved 1155 bytes (orig 34790)
Inspecting ./phpx6G3lD.jpg
 -- Progressive JPEG optimization: saved 742 bytes (orig 11493)



  Inspected 21 JPEG files.
  Modified 21 files.
  Huffman table optimizations: 0
  Progressive JPEG optimizations: 21
  Total bytes saved: 63918

Wrap up

Many thanks to Stoyan for his post on jpegtran, and all of the other performance ideas he has been sharing on his blog.  This script was easy to write, knowing the right techniques to be running on our images.

optimize_jpegs.pl took about a minute or so to run on our thousands of images and shaved a few megabytes from all of those files combined.  This will be a great savings for us.

Tags: , , , ,

Choosing a CDN

Posted by Mike Brittain on January 15, 2010
WWW / 1 Comment

I thought this was a good article covering 8 Things to Consider When Choosing a CDN.  It’s worth a read.

The only bit I would disagree with is that this reads a bit too video-centric for me.  It felt like the main argument for using a CDN is that you don’t have enough bandwidth at your own data center to handle all of the requests being made to your servers.

I use CDNs for delivering static objects like images, CSS, and JavaScript.  An additional consideration I have is how fast cached objects will reach different locations across the country or around the world.  I’m dependent (as most sites are) on one central data center where my pages are being generated.  Every page view needs to make that trip over the network from the browser to my data center.  However, if the 20-100 successive objects can be requested from a regional CDN node, the performance in the end-user’s browser will be much better than if every request made a full trip across the country.

Tags: , ,

Two Munin Graphs for Monitoring Our Code

Posted by Mike Brittain on December 17, 2009
PHP, WWW / Comments Off on Two Munin Graphs for Monitoring Our Code

We’re using a few custom Munin graphs at CafeMom to monitor our application code running on production servers.  I posted two samples of these to the WebOps Visualization group at Flickr.  The first graph measures the “uptime” of our code, a measure of how many minutes it’s been since our last deployment to prod (with a max of 1 day).  The second graph provides a picture of what’s going on in our PHP error logs, highlighting spikes in notices, warnings, and fatals, as well as DB-related errors that we throw of our own.

When used together, these graphs give us quick feedback on what sort of errors are occurring on our site and whether they are likely to be related to a recent code promotion, or are the effect of some other condition (bad hardware, third-party APIs failing, SEM gone awry, etc.).

I figured someone might find these useful, so I’m posting the code for both Munin plugins.

Code Uptime

When we deploy code to our servers, we add a timestamp file that is used to manage versioning of deployments to make things like rolling back super easy.  It’s also handy for measuring how long it’s been since the last deployment.  All this plugin does is reads how long ago that file was modified.

We run multiple applications on the same clusters of servers. I wrote our code deployment process in a manner that allows for independent deployment of each application. For example, one team working on our Facebook apps can safely deploy code without interfering with the deployment schedule another team is using for new features that will be released on the CafeMom web site.

Each of these applications is deployed to a separate directory under a root, let’s say “/var/www.”  This explains why the plugin is reading a list of files (directories) under APPS_ROOT.  Each app has it’s own reported uptime on the Munin graph.

# Munin plugin to monitor the relative uptime of each app
# running on the server.


# Cap the uptime at one day so as to maintain readable scale

# Configure list of apps
if [ "$1" = "config" ]; then
 echo 'graph_title Application uptimes'
 echo "graph_args --base 1000 --lower-limit 0 --upper-limit $MAX_MIN"
 echo 'graph_scale no'
 echo 'graph_category Applications'
 echo 'graph_info Monitors when each app was last deployed to the server.'
 echo 'graph_vlabel Minutes since last code push (max 1 day)'

 for file in `ls $APPS_ROOT`; do
 echo "$file.label $file"
 exit 0

# Fetch release times
now_sec=`date +%s`

for file in `ls $APPS_ROOT`; do
 release_sec=`date +%s  -r $APPS_ROOT/$file/prod/release_timestamp`
 diff_sec=$(( $now_sec - $release_sec ))
 diff_min=$(( $diff_sec/60 ))
 echo "$file.value $diff_min"

Error Logs

The second plugin uses grep to search for occurrences of specific error-related strings in our log files. In our case, the graph period was set to “minute” because that gives the best scale for us (thankfully, it’s not in errors per second!).

If you’re wondering about using grep five times against large error files every time Munin runs (every five minutes), I want to point out that we rotate our logs frequently which ensures that these calls are manageable. If you run this against very large error logs you may find gaps in your Munin graphs if the poller times out waiting for the plugin to return data points.

Even if you don’t care about PHP logs, you may find this to be a simple example of using Munin to examine any sort of log files that your application is creating.


# Collect stats for the contents of PHP error logs. Measures notice,
# warning, fatal, and parse level errors, as well as custom errors
# thrown from our DB connection class.



if [ "$#" -eq "1" ] && [ "$1" = "config" ]; then
	echo "graph_title Error Logs"
	echo "graph_category applications"
	echo "graph_info Data is pooled from all PHP error logs."
	echo "graph_vlabel entries per minute"
	echo "graph_period minute"
	echo "notice.label PHP Notice"
	echo "notice.type DERIVE"
	echo "notice.min 0"
	echo "notice.draw AREA"
	echo "warning.label PHP Warning"
	echo "warning.type DERIVE"
	echo "warning.min 0"
	echo "warning.draw STACK"
	echo "fatal.label PHP Fatal"
	echo "fatal.type DERIVE"
	echo "fatal.min 0"
	echo "fatal.draw STACK"
	echo "parse.label PHP Parse"
	echo "parse.type DERIVE"
	echo "parse.min 0"
	echo "parse.draw STACK"
	echo "db.label DB Error"
	echo "db.type DERIVE"
	echo "db.min 0"
	echo "db.draw STACK"


# The perl code at the end of each line takes a list of integers (counts) from grep (one per line)
# and outputs the sum.

echo "notice.value `grep --count \"PHP Notice\" $logs | cut -d':' -f2 | perl -lne ' $x += $_; END { print $x; } ' `"
echo "warning.value `grep --count \"PHP Warning\" $logs | cut -d':' -f2 | perl -lne ' $x += $_; END { print $x; } ' `"
echo "fatal.value `grep --count \"PHP Fatal error\" $logs | cut -d':' -f2 | perl -lne ' $x += $_; END { print $x; } ' `"
echo "parse.value `grep --count \"PHP Parse error\" $logs | cut -d':' -f2 | perl -lne ' $x += $_; END { print $x; } ' `"
echo "db.value `grep --count \"DB Error\" $logs | cut -d':' -f2 | perl -lne ' $x += $_; END { print $x; } ' `"

I’m open to comments and suggestions on how we use these, or how they were written.  Spout off below.

Tags: , , , , , , ,

Tips on Selecting a CDN Vendor

Posted by Mike Brittain on June 24, 2009
WWW / Comments Off on Tips on Selecting a CDN Vendor

There was interest after my talk at Velocity on which vendor is the best. Since I haven’t been exposed to many of the vendors on the market, it’s hard for me to answer that question.  I can, however, give some insight onto what you should look for when choosing a vendor for yourself.

1. Start with price. There is a pretty wide variance in price among vendors in the industry.  This mostly correlates to how big of a fish they want to go after.  If you can push a lot of traffic over a CDN (I want to say maybe hundreds of terabytes in a month), then you’re likely to find that many CDNs are going to give you great pricing.  If you’re a small fish with a small budget, however, you’ll immediately be able to exclude some of the larger vendors from your shopping list.

2. Setup a trial account. When you get down to three or four vendors, arrange a two week window for you to test their service.  You’ll need to have a little time to do basic configuration for the CDN, so plan for about half a day when you can work with vendors to iron out any unexpected issues with getting setup.

I give additional points to vendors who have a technical lead available to help setup your account and review your configuration.  Ask as many questions as possible and make sure you get answers, even if it means the sales team has to get their engineers involved.

3. Use “origin pull”. I touched on this briefly in my talk, but I believe that this is a better solution than CDN storage.  It should become obvious while setting up your account that slipping a CDN in front of your origin servers is supposed to be easy.  If you’re spending a lot of time getting your account setup at this point, imagine the frustration you’re going to face if you CDN has an outage and you need to switch in another provider.

4. Compare speeds from various locations. I used ab and http_load to do some basic speed tests on individual files being served from your CDN.  Run the same tests against your origin servers.  You should run these tests in a few different geographic locations. If you don’t have access to test locations of your own, talk to friends who may be able to run tests for you and send you results.

If you have access to tools like Keynote or Gomez, add those into your testing.  Let them monitor load speeds from the CDNs your testing.

5. Test cold cache speeds. Response for objects that are in cache are supposed to be very fast, and you’ll probably find no problems there.  What you really should be concerned about is how slow the latency is for objects not already in cache — i.e. a full round-trip from end-user to origin by way of the CDN.  There’s no reason latency should be exceptionally high here, but if you find any surprises escalate them to the vendor.  It could be a problem with your configuration, or it could be that they are someone you shouldn’t be working with.

6. Understand the support model. An outage is a really bad time to be figuring out how to reach support for the first time.  Find this out during the evaluation period.  Do you open trouble tickets with a phone call, an email, or a control panel?  Test this out.  Open a ticket to ask a few questions.  Become familiar with how tickets are resolved — do you get an immediate follow up, do you have visibility into the status of your ticket, or does your request go into a black hole?

Here’s a fun test for support: Intentionally set an invalid “cache-control” header for a test URL and tell support that you’re having trouble with caching that object.  Hopefully you get a quick, intelligent, and polite response.  If you’re missing any of those three characteristics, maybe you should reconsider whether you want to be working with this vendor.  Remember that the next time you’ll be opening a trouble ticket, you’re site may be on fire.

7. Look for transparency in status. I’ve wrote a bit about transparency in operations earlier this year.  I haven’t seen this with most of the vendors I’ve evaluated myself, but I think it’s worth extra points to vendors who get this right.

8. Look for the pretty pictures. Most vendors have pretty crappy reporting, but check it out an know what you’re going to get.  Run a light load against your test account for four or five days and then review the reports (reports without data aren’t very useful).

9. Get close with a technical lead. Find the smartest technical guy on your vendor’s team and talk with him/her regularly.  Tell them what you’re up to, how you’re using the CDN, what is working and what is not.  This person can teach you a lot about the more subtle features of caching after you get past the initial setup.  Ask for new features, no matter how absurd they may seem.  Vendors are looking to differentiate themselves in a space where most look like commodity services, so use this to your advantage.

It’s really important, as with most vendors, to avoid getting locked-in too tightly.  If you’re using a new video steaming service that is only available through one provider, then you’re going to have to select a niche vendor. On the other hand, if you are just serving typical site assets (images, CSS, JavaScript, etc.) then you should keep things flexible enough where you can move from CDN to CDN when necessary.

Hopefully these tips are useful to you.  Please post additional suggestions in the comments below.  I’m specifically interested in hearing about additional monitoring or testing tools that you have found useful in evaluating a CDN yourself.

Caching URLs with Query Strings

Posted by Mike Brittain on June 11, 2009
WWW / 1 Comment

A common method for breaking a cached object, or for specifying a version of an object, is to put a version number into a query string or some other part of the file name.  I’ve been wary of using query strings in URLs of static assets that I cache, either in browser cache or on a CDN.  I hadn’t seen much hard evidence, but generally felt query strings provide another area of uncertainty in a network path that you already have little control over.

Today, I just found Google’s recommendation for leveraging proxy caches, which states, “to enable proxy caching for these resources, remove query strings from references to static resources, and instead encode the parameters into the file names themselves.”

What is a good alternative?  One method I like is to use a version number that is part of the URL path:


The “12345” is prepended in your page templates using a variable.  The value can be taken from anywhere you want… it could be part of a timestamp, a revision number in your source control, or maybe even manually changed in a constants file anytime your are updating files in your “css” directory.  If you’re using Apache, this portion of the path can be easily stripped off the request using mod_rewrite.

  RewriteEngine On
  RewriteRule ^/\d{5,}(/.+)$ $1

With this method, you’re not literally using directories with version number in them.  Rather, you just use your regular old /htdocs/css directory. The same can be applied to JavaScript, images, or other directories with static assets.

What you gain is the ability to break your cached versions of files when necessary, but to also set long-term cache times on these assets (while they’re not changing) and expect cache servers between you and your users to properly store these objects.

More to come…

I’ll be speaking at Velocity in two weeks about how to optimize caching with your CDN.  I have a lot of content that I’d like to present, but only 20 minutes to do it.  Over the coming weeks I’ll be writing a number of posts covering topics that don’t make it into the presentation.

Tags: , , ,