architecture

Twitter’s Photo Storage (from the outside looking in)

Posted by Mike Brittain on May 24, 2010
Cloud Computing, WWW / 1 Comment

I’ve been working on some photo storage and serving problems at Etsy, which is exciting work given the number of photos we store for the items being sold on the site.  This sort of project makes you wonder how other sites are handling their photo storage and serving architectures.

Today I spent a few minutes looking at avatar photos from Twitter from the outside.  This is all from inspection of URLs and HTTP headers, and completely unofficial and unvalidated assumptions.

Two things I found interesting today were (1) the rate of new avatar photos being added to Twitter, and (2) the architecture for storing and serving images.

Avatars

Avatar photos at Twitter have URLs that look something like the following:

http://a3.twimg.com/profile_images/689887435/my_photo.jpg

I’m assuming the numeric ID increments linearly with each photo that is uploaded… two images uploaded a few minutes apart showed a relatively small increase between these IDs.  I compared one of these IDs with the ID of an older avatar, along with the “Last-Modified” header that was included with its HTTP response headers:

Last-Modified Tue, 26 Feb 2008 03:15:46 GMT

Comparing these numbers shows that Twitter is currently ingesting somewhere over two million avatars per day.

Stock, or library, avatars have different URLs, meaning they are not served or stored the same way as custom avatars.  This is good because you get the caching benefits of reusing the same avatar URL for multiple users.

http://s.twimg.com/a/1274144130/images/default_profile_6_normal.png

Storage and Hosting

Running a “host” look up on the hostname of an avatar URL shows a CNAME to Akamai’s cache network:

$ host a3.twimg.com
a3.twimg.com is an alias for a3.twimg.com.edgesuite.net.
a3.twimg.com.edgesuite.net is an alias for a948.l.akamai.net.
a948.l.akamai.net has address 96.6.41.171
a948.l.akamai.net has address 96.6.41.170

If you’re familiar with Akamai’s network, you can dig into response headers that come from their cache servers.  I did a little of that, but the thing I found most interesting is that Akamai plucks avatar images from Amazon’s CloudFront service.

x-amz-id-2: NVloBPkil5u…
x-amz-request-id: 1EAA3DE5516E…
Server: AmazonS3
X-Amz-Cf-Id: 43e9fa481c3dcd79…

It’s not news that Twitter uses S3 for storing their images, but I hadn’t thought about using CloudFront (which is effectively a CDN) as an origin to another CDN.  The benefit here, aside from not pounding the crap out of S3, is that Akamai’s regional cache servers can pull avatars from CloudFront POPs that are relatively close, as opposed to reaching all the way back to a single S3 origin (such as the “US Standard Region”, which I believe has two locations in the US).  CloudFront doesn’t have nearly as many global POPs as Akamai. But using it does speed up image delivery by ensuring that Akamai’s cache servers in Asia are grabbing files from a CloudFront POP in Hong Kong or Singapore, rather than jumping across the Pacific to North America.

I suspect that Twitter racks up a reasonably large bill with Amazon by storing and serving so many files from S3 and CloudFront.  However, it takes away the burden of owning all of the hardware, bandwidth, and man power required to serve millions upon millions of images… especially when that is not a core feature of their site.

Tags: , , , , , ,

High Traffic Sites on EC2

Posted by Mike Brittain on April 08, 2009
Cloud Computing / Comments Off on High Traffic Sites on EC2

Grig Gheorghiu wrote up a nice article on handling high traffic sites on EC2.  It’s definitely worth a read for some high-level concepts about multi-tier architectures.  It doesn’t talk deeply on details of EC2 (would have liked to see something mentioned about availability zones for MySQL and load balancers).  One thing I really liked was the concept of using multiple load balancers with round-robin DNS pointing at them.  I’ve been considering this as an option and have played around with HAProxy already.  It’s likely a future step for our new image service at CafeMom.

Tags: , , , ,