GDS Design Principles and “Digital Services”

Posted by Mike Brittain on May 02, 2013
Mobile, WWW / Comments Off on GDS Design Principles and “Digital Services”

The UK Government Digital Services has published their Design Principles last summer. It’s a great read. And if you’ve already read it, it’s still a great resource to reflect on. These are principles that would resonate with most of the technical teams I’ve worked on over the past years, though they often fail to appear in the products that are created.

Two principles stand out:

1. Start with (user) needs

8. Build digital services, not websites

The products we build need to solve real needs, rather than expose the organizational concerns or bureaucracies. This is more intuitive for new product start-ups, and less so for legacy institutions who are trying to increase their online presence.

And the products we build need to make sense within the current digital landscape—which is to say, outside (complementary?) services and the wide variety of devices in use by consumers of these services. Form factors—both existing and emerging—and usage context play roles in how we design products.

How do we get to mobile web applications?

Posted by Mike Brittain on December 20, 2012
Mobile / Comments Off on How do we get to mobile web applications?

Anyone who has had a conversation with me for more that 3 minutes about mobile knows that I’ve been a big proponent for the mobile web for years. I believe that mobile web apps have been held back by the proliferation of native app marketplaces, which are highly attractive to developers due to:

– Solid SDKs, some standardized UI components
– Better UI performance, for now
– Marketplaces with integrated payment systems
– Ease of discovery, installation
– Heavy advertising and promotion by carriers and OS creators

After reading Fred Wilson’s post, The Mobile Web, yesterday I finally wrote up a response in the comments. And since it’s surely lost amongst 285+ responses, I’m reposting it here. For context, be sure to read the post.

I’ll preface this response, however, by mentioning that these are pretty big hurdles and it’s going to be a while before we get to the place where the decision to build a mobile web app over a native app is obvious and accepted. There’s technical progress being made, but there is still a vicious consumer awareness and advertising cycle built to prop up the native app marketplaces. That will take more time to unwind.

* Developer Tools and Frameworks. There are a lot of individual pieces, but no single, obvious, and popular framework for building what feels more like an “app” rather than a collection of pages on the web (an earlier comment use the term “card”). We’re competing against the very obvious native SDKs. It *is* possible to write “installable,” offline web applications for mobile browsers, but you still have to cobble together much of the application framework yourself — it’s not yet a well-understood pattern. The HTML5 app cache is not the most wonderful thing to work with, either.

Latency is brought up as an advantage of native over web, I think that’s a fallacy. The tools are just not here to help manage the perception of speed for users. An API call from a native app over HTTP is the same thing as requesting an HTML page — only the page refresh and (re-)loading of resources gives the perception that “the web” is a slow medium.

* Payments. Less a matter of payment at time of download (e.g. App Store) than it is having a single, trusted digital wallet that you can pay from. Tapping out a 16-digit card number, name, expiry, etc. on a handset is a huge barrier.

Try-before-you-buy would be far better to consumers than the proliferation of $0.99 and $1.99 apps that are so cheap that it doesn’t matter if you’ve wasted your money on junk a few times.

* Improved Platform (i.e. Browsers). You can argue that the browser is at a disadvantage from native apps because it is an abstraction from the OS. This is hampered further by lower CPU, memory, battery power — all of which will continue to improve over time. But while SDKs and native apps continue to rule and have an obvious, and working, payment model the innovation in mobile browsers will be slower.

Browsers will need API parity with native SDKs — i.e. access to cameras, audio, location, file storage, wallets/NFC, preferences, notifications, running in background, network detection, etc.. I would expect you’ll see more innovation here within Chrome OS than on the more mainstream mobile OSes.

* Discovery. There’s not a one-stop-shop for native and web apps. The Chrome Web Store is actually decent and provides ties to in-app payments via Google Wallet. Same with Mozilla’s new store, though I’m unaware of any payment service there.

Consumers have been well conditioned by handset makers and carriers that they will get their apps from the major native app stores. We have yet to see marketing like that for mobile web apps. And most consumers will use what is put right in front of them, and what they’re hearing in commercials.


Mobile and Web Performance at Etsy

Posted by Mike Brittain on October 29, 2012
Misc / Comments Off on Mobile and Web Performance at Etsy

Over the past two years, I’ve been running a few teams in engineering that deal with Software Infrastructure — that layer of frameworks, services, and developer tools that “sits” right next to Operations. I mean that both figuratively and literally. These teams have grown and matured substantially. Last week we announced that Jason Wong would take over as director for those teams.

This change freed me up to focus on a few new things. I’ll be focusing heavily on Mobile Apps, Mobile Web, and Site Performance. [1]

Our Mobile Apps work is going strong, with a recently released update to our iOS app. Our Mobile Web work has been somewhat slow recently. Last year we did a fantastic job of rolling out the initial shopping experience for use on mobile devices. The progress since then, frankly, has been glacial, and I’m excited to be working on speeding up the pace. When I started working at Etsy in early 2010, the number of visits to the site coming from mobile devices was fewer than 6%. This group has grown significantly over the last two years. Mobile visitors now account for one quarter of visits to (which does not include usage of our iOS app).

There’s a natural fit between mobile and performance. There’s a high expectation from mobile internet users for high performance and low latency. Despite increases in cellular network speeds (3G, 4G, LTE), there are still significant hurdles for making a mobile web site (or mobile web app) feel quick and responsive. I frequently argue that the perception of responsiveness is one of the great advantages of building a native app — even if your back-end server response time is relatively slow.

Over the past year our web performance team, led by Seth Walker, has been publishing quarterly Site Performance Reports (e.g. June 2012 report). Expect to see us expanding these reports in the future to include performance numbers for Mobile Web access.

I’m incredibly excited about the challenges ahead and looking for talented mobile and performance engineers to join the team. If you’re interested in hearing more about what we’re up to, please contact me at


[1] I’m also continuing to work with a couple of other teams at Etsy, including out Marketing Communications and Product Quality teams. (Not to leave them out!)

On Dedicated Performance Teams

Posted by Mike Brittain on May 01, 2012
Engineering, etsy / Comments Off on On Dedicated Performance Teams

Earlier this year, I gave another talk about Web Performance at Etsy for the NY Web Performance meetup. One of the points we stress on my team is that while we have a dedicated performance team, the primary focus is on building tools, settings goals, and creating a performance culture where everyone on the engineering team at Etsy is thinking about performance. We don’t believe in cleaning up other peoples’ mess, we believe in teaching others to “yearn for the vast and endless sea.

I was looking back at John Rauser’s talk on “Creating Cultural Change” from Velocity 2010 and noted this quote, which shares exactly the same mindset:

In fact, at Amazon we have a performance team, but we explicitly seek to put ourselves out of business. We want the tools to be so good and the culture so strong that one day there won’t be a need for a performance team anymore.

Tags: ,

My Two Upcoming Talks at Velocity 2012

Posted by Mike Brittain on April 30, 2012
Talks and Conferences / Comments Off on My Two Upcoming Talks at Velocity 2012

I’m looking forward to speaking at Velocity Conf this summer on two very different topics:

Building Resilient User Interfaces

Large-scale sites are complex systems, where every page is likely generated with data pulled from multiple back-end data stores and services. Yet, not every piece of data on the page is critical to the user experience. The front-end of your site (i.e. front-end applications, templates, and user interface) needs to be designed to tolerate service outages. Features that your users will see need to have not just manual configuration flags, but also the ability to fail gracefully when a back-end service is unavailable. It’s important to note that in many cases, your site’s users will never realize the difference.

This talk is focused on getting front-end engineers, application developers, and product managers to think about their sites and products with an operational mindset. Topics covered in this talk will include:

  • Identifying data sources used throughout your site
  • Front-end design patterns for failure modes
  • De-coupling non-critical user interfaces
  • Effectively communicating failure to your community
  • Validating failure scenarios with “Game Days”
  • Configuration flags

In this interview, I describe more of what this is all about.


A Picture is Worth a Thousand Logs

A wealth of performance and debugging data can be found in your web server logs. Graphs and trend lines are far better tools than a terminal and the find utility for analyzing what’s happening in your logs. In this talk, we’ll demonstrate some simple workflows for converting what’s in your logs into graphs.


Attending Velocity Conf 2012

If you’re planning to attend Velocity this year, use the discount code “BRITTAIN” to get a 20% discount on your registration fee.

Metrics-Driven Engineering at Etsy

Posted by Mike Brittain on March 19, 2011
etsy / Comments Off on Metrics-Driven Engineering at Etsy

Last weekend Etsy put on a mini tech conference outside of SxSW, Moving Fast at Scale. Etsy has been growing fast. When I started last February, we had about 20 engineers. Today we have somewhere around 65. A fundamental goal for us is to keep the process of shipping code simple, as if we only had two or three people sitting around a single table working on the site. Rather than building massively complex rules around communication of changes, review processes, and release management, we’ve created tight feedback loops using continuous integration and rich application monitoring. I spoke on the latter at SxSW.

We collect an enormous number of graphs and metrics around our servers and application code. The number of graphs we generate from our application code is over 16,000. How do we collect so many metrics? We keep the process super simple.

Two of the tools I introduced in the talk are Logster and StatsD. We’ve open-sourced both of these projects on GitHub.

Logster is the tool we use for parsing logs every minute and aggregating interesting bits that we can shoot off to either Ganglia or Graphite. We log a lot of events from our application code, but it’s really hard to see trends in those logs as they’re shooting across your screen at hundreds of lines per second. That’s where Logster comes in.

StatsD is a network daemon that listens for metrics over UDP, then aggregates similar metrics and sends them over to Graphite. The client code we use for StatsD at Etsy is written in PHP, but there are already clients written in Python, Ruby, Java, etc. that are being contributed to the project. The key here is that collecting a new metric from our application code is one line of code.

We’re re-running all four of our talks at the Etsy office in Brooklyn, NY on Thursday, March 31 (sign up here). Come join us to hear all of the details that you can’t glean from just reading the slides.

If you can’t make it in person, we’ll be live streaming the event.

Tags: , , , , , , , , , ,

EC2 Hosting Architecture, Two Years Later

Posted by Mike Brittain on July 12, 2010
Cloud Computing / Comments Off on EC2 Hosting Architecture, Two Years Later

It’s been nearly two years to the day since I wrote my post about the hosting platform I setup on EC2.  The post still gets plenty of traffic and this week I was asked if it was still valid info.  I think there are some better ways of accomplishing what we set out to do back then, and here is a summary.

1. Instead of round-robin DNS for load balancing, you can now use Amazon’s Elastic Load Balancing (ELB) service.  The service allows for HTTP or TCP load balancing options.  I found that the HTTP 1.1 support in ELB is somewhat incomplete and “100 Continue” responses were not handled properly for large image uploads (a specific case I was using).

2. I chose Puppet two years ago for configuration management.  Since that time, OpsCode has released Chef, which is a friendlier way to manage your systems (in my opinion) that we also happen to use at Etsy.

3. Our database layer was built on four instances for MySQL, in a fairly paranoid configuration.  We had strong concerns about instances failing and losing data.  There are a couple of new tools available to help with running MySQL on EC2.  You can use the Elastic Block Store (EBS) for more resilient disk storage, or choose the Relational Database Service (RDS) which is MySQL implemented as a native service in AWS.  Disclaimer: I haven’t deployed production databases using either of these tools.  These are only suggestions for possibilities that look better/easier than the setup we used.

4. When it comes to monitoring tools, Ganglia is terrific.  What I like about Munin is the easy of writing plug-ins and the layout of similar services on a single page for quick comparisons between machines.  Ganglia’s plugins are also dead-simple to write.  In the five months I’ve been at Etsy, I’ve written at least 15.  In the three years I was using Munin, I probably wrote a total of six plug-ins.

Additionally, Ganglia has some sweet aggregated graphs for like machines.  This graph looks like a couple hundred web servers (as stacked lines).

All of the points listed in the “successes” section of that original article should still be considered valid and are worth reading again.  But I’ll highlight the last two, specifically:

Fault tolerance: Over the last two years when I worked at CafeMom we ran a number of full-time services on EC2 (fewer than 10 instances).  While a handful had been running for years at the time I left (had been started before I arrived, actually), other instances failed much sooner.  I can’t stress the importance of automated configurations for EC2 instances, given that these things have a tendency to fail when you’re busy work on another deadline.  I believe that in technical circles they refer to this as Murphy’s Law.

Portable hosting: I’m a big believer in commodity services.  The more generic your vendor services, the easier it is to switch them out when your blowout preventer fails.  I’ve mentioned a few services in this article that are specific to Amazon Web Services (ELB, RDS, and EBS).  If you go the route of Elastic Load Balancer or Relational Database Service, you should strongly consider what services you would use if you had to move to another cloud vendor.

Tags: , , , , , ,

How do you use the word “millions” at work?

Posted by Mike Brittain on July 10, 2010
Misc / Comments Off on How do you use the word “millions” at work?

I published my first blog post on Etsy’s engineering blog today: Batch Processing Millions and Millions of Images.

Tags: ,

Zero-Day Deploys

Posted by Mike Brittain on June 10, 2010
etsy / 3 Comments

When you bring new engineers on-board, it’s important to make them productive as soon as possible.  Spending days (or weeks) in training session, locked down permissions, and/or watchful eyes simply constrains your shiny, new developer from getting shit done.  (I’ll be honest and say that I’ve been one of those types in the past.)

At Etsy, every engineer has access to deploy code to our production site.  We use a tool called “Deployinator” to do this quickly and easily.  It’s one button.  A culture of unit and functional testing (know when your code done broke), transparency (know when releases are happening), operational metrics (know when your servers are crying), and personal accountability (know when it’s your fault) keeps the entire process under control.

Earlier this week, we did something brand new for us…

watching a new engineer who started at Etsy today push code RIGHT NOW. awesome.

~chaddickerson, Etsy’s CTO

That’s right. On his very first day, Jason got setup in his development environment, made a change to our code base (albeit a small one), tested, committed, and deployed to the production web servers.

I can just hear it: “Why would you ever let someone deploy code on their first day?” But if you’re asking this question, I have to assume that you’ve also found your self wondering at times, “How long is it going to be before Joe Newbie is going to be up to speed?

Sure, it’s going to take time for every new hire to learn all of the ropes. But it’s better for them to be productive, confident and experienced in releasing code (even if it’s small changes!) while they’re learning all of the other details than it is for them to sitting on their hands waiting for coach to put them into the game.

And in case you’re wondering, Jason has already released code three times in his first week… possibly four by time I finish writing this.

Want to get in on the action? We’re hiring in both Engineering and Operations.

Tags: , , , ,

Moving Billions of Objects Within S3

Posted by Mike Brittain on May 25, 2010
Misc / Comments Off on Moving Billions of Objects Within S3

This is an interesting write-up on how to move enormous numbers of objects within S3.  I have a similar post coming up about large-scale batch processing that I’m looking forward to sharing in the next couple of weeks.  But give this a read.  It’s interesting!