Metrics-Driven Engineering at Etsy

Posted by Mike Brittain on March 19, 2011
etsy / Comments Off on Metrics-Driven Engineering at Etsy

Last weekend Etsy put on a mini tech conference outside of SxSW, Moving Fast at Scale. Etsy has been growing fast. When I started last February, we had about 20 engineers. Today we have somewhere around 65. A fundamental goal for us is to keep the process of shipping code simple, as if we only had two or three people sitting around a single table working on the site. Rather than building massively complex rules around communication of changes, review processes, and release management, we’ve created tight feedback loops using continuous integration and rich application monitoring. I spoke on the latter at SxSW.

We collect an enormous number of graphs and metrics around our servers and application code. The number of graphs we generate from our application code is over 16,000. How do we collect so many metrics? We keep the process super simple.

Two of the tools I introduced in the talk are Logster and StatsD. We’ve open-sourced both of these projects on GitHub.

Logster is the tool we use for parsing logs every minute and aggregating interesting bits that we can shoot off to either Ganglia or Graphite. We log a lot of events from our application code, but it’s really hard to see trends in those logs as they’re shooting across your screen at hundreds of lines per second. That’s where Logster comes in.

StatsD is a network daemon that listens for metrics over UDP, then aggregates similar metrics and sends them over to Graphite. The client code we use for StatsD at Etsy is written in PHP, but there are already clients written in Python, Ruby, Java, etc. that are being contributed to the project. The key here is that collecting a new metric from our application code is one line of code.

We’re re-running all four of our talks at the Etsy office in Brooklyn, NY on Thursday, March 31 (sign up here). Come join us to hear all of the details that you can’t glean from just reading the slides.

If you can’t make it in person, we’ll be live streaming the event.

Tags: , , , , , , , , , ,

How do you use the word “millions” at work?

Posted by Mike Brittain on July 10, 2010
Misc / Comments Off on How do you use the word “millions” at work?

I published my first blog post on Etsy’s engineering blog today: Batch Processing Millions and Millions of Images.

Tags: ,

Zero-Day Deploys

Posted by Mike Brittain on June 10, 2010
etsy / 3 Comments

When you bring new engineers on-board, it’s important to make them productive as soon as possible.  Spending days (or weeks) in training session, locked down permissions, and/or watchful eyes simply constrains your shiny, new developer from getting shit done.  (I’ll be honest and say that I’ve been one of those types in the past.)

At Etsy, every engineer has access to deploy code to our production site.  We use a tool called “Deployinator” to do this quickly and easily.  It’s one button.  A culture of unit and functional testing (know when your code done broke), transparency (know when releases are happening), operational metrics (know when your servers are crying), and personal accountability (know when it’s your fault) keeps the entire process under control.

Earlier this week, we did something brand new for us…

watching a new engineer who started at Etsy today push code RIGHT NOW. awesome.

~chaddickerson, Etsy’s CTO

That’s right. On his very first day, Jason got setup in his development environment, made a change to our code base (albeit a small one), tested, committed, and deployed to the production web servers.

I can just hear it: “Why would you ever let someone deploy code on their first day?” But if you’re asking this question, I have to assume that you’ve also found your self wondering at times, “How long is it going to be before Joe Newbie is going to be up to speed?

Sure, it’s going to take time for every new hire to learn all of the ropes. But it’s better for them to be productive, confident and experienced in releasing code (even if it’s small changes!) while they’re learning all of the other details than it is for them to sitting on their hands waiting for coach to put them into the game.

And in case you’re wondering, Jason has already released code three times in his first week… possibly four by time I finish writing this.

Want to get in on the action? We’re hiring in both Engineering and Operations.

Tags: , , , ,