cloudfront

Twitter’s Photo Storage (from the outside looking in)

Posted by Mike Brittain on May 24, 2010
Cloud Computing, WWW / 1 Comment

I’ve been working on some photo storage and serving problems at Etsy, which is exciting work given the number of photos we store for the items being sold on the site.  This sort of project makes you wonder how other sites are handling their photo storage and serving architectures.

Today I spent a few minutes looking at avatar photos from Twitter from the outside.  This is all from inspection of URLs and HTTP headers, and completely unofficial and unvalidated assumptions.

Two things I found interesting today were (1) the rate of new avatar photos being added to Twitter, and (2) the architecture for storing and serving images.

Avatars

Avatar photos at Twitter have URLs that look something like the following:

http://a3.twimg.com/profile_images/689887435/my_photo.jpg

I’m assuming the numeric ID increments linearly with each photo that is uploaded… two images uploaded a few minutes apart showed a relatively small increase between these IDs.  I compared one of these IDs with the ID of an older avatar, along with the “Last-Modified” header that was included with its HTTP response headers:

Last-Modified Tue, 26 Feb 2008 03:15:46 GMT

Comparing these numbers shows that Twitter is currently ingesting somewhere over two million avatars per day.

Stock, or library, avatars have different URLs, meaning they are not served or stored the same way as custom avatars.  This is good because you get the caching benefits of reusing the same avatar URL for multiple users.

http://s.twimg.com/a/1274144130/images/default_profile_6_normal.png

Storage and Hosting

Running a “host” look up on the hostname of an avatar URL shows a CNAME to Akamai’s cache network:

$ host a3.twimg.com
a3.twimg.com is an alias for a3.twimg.com.edgesuite.net.
a3.twimg.com.edgesuite.net is an alias for a948.l.akamai.net.
a948.l.akamai.net has address 96.6.41.171
a948.l.akamai.net has address 96.6.41.170

If you’re familiar with Akamai’s network, you can dig into response headers that come from their cache servers.  I did a little of that, but the thing I found most interesting is that Akamai plucks avatar images from Amazon’s CloudFront service.

x-amz-id-2: NVloBPkil5u…
x-amz-request-id: 1EAA3DE5516E…
Server: AmazonS3
X-Amz-Cf-Id: 43e9fa481c3dcd79…

It’s not news that Twitter uses S3 for storing their images, but I hadn’t thought about using CloudFront (which is effectively a CDN) as an origin to another CDN.  The benefit here, aside from not pounding the crap out of S3, is that Akamai’s regional cache servers can pull avatars from CloudFront POPs that are relatively close, as opposed to reaching all the way back to a single S3 origin (such as the “US Standard Region”, which I believe has two locations in the US).  CloudFront doesn’t have nearly as many global POPs as Akamai. But using it does speed up image delivery by ensuring that Akamai’s cache servers in Asia are grabbing files from a CloudFront POP in Hong Kong or Singapore, rather than jumping across the Pacific to North America.

I suspect that Twitter racks up a reasonably large bill with Amazon by storing and serving so many files from S3 and CloudFront.  However, it takes away the burden of owning all of the hardware, bandwidth, and man power required to serve millions upon millions of images… especially when that is not a core feature of their site.

Tags: , , , , , ,

Cyberduck Support for Cloud Files and Amazon S3

Posted by Mike Brittain on January 21, 2009
Cloud Computing / 1 Comment

Cyberduck is a nice Mac FTP/SFTP GUI client that I’ve used in the past for moving files around between my desktop and some web servers.  Turns out they’ve added support for moving your files directly to Amazon S3 and Mosso (RackSpace) Cloud Files.  This means that you can use the same tool that you may previously have used for publishing content to your own web server to instead publish content directly to a self-service CDN.  Amazon uses it’s Cloud Front service to distribute files, and Mosso is supposed to be integrated with LimeLight networks for distributing content from the Cloud Files system.

Just wish I had these services available to me three years ago.  They would have saved me some serious cash on bandwidth commits for CDNs for those silly little projects I was working on.

Tags: , , , , , , ,