I’ve been working on some photo storage and serving problems at Etsy, which is exciting work given the number of photos we store for the items being sold on the site. This sort of project makes you wonder how other sites are handling their photo storage and serving architectures.
Today I spent a few minutes looking at avatar photos from Twitter from the outside. This is all from inspection of URLs and HTTP headers, and completely unofficial and unvalidated assumptions.
Two things I found interesting today were (1) the rate of new avatar photos being added to Twitter, and (2) the architecture for storing and serving images.
Avatar photos at Twitter have URLs that look something like the following:
I’m assuming the numeric ID increments linearly with each photo that is uploaded… two images uploaded a few minutes apart showed a relatively small increase between these IDs. I compared one of these IDs with the ID of an older avatar, along with the “Last-Modified” header that was included with its HTTP response headers:
Last-Modified Tue, 26 Feb 2008 03:15:46 GMT
Comparing these numbers shows that Twitter is currently ingesting somewhere over two million avatars per day.
Stock, or library, avatars have different URLs, meaning they are not served or stored the same way as custom avatars. This is good because you get the caching benefits of reusing the same avatar URL for multiple users.
Storage and Hosting
Running a “host” look up on the hostname of an avatar URL shows a CNAME to Akamai’s cache network:
$ host a3.twimg.com a3.twimg.com is an alias for a3.twimg.com.edgesuite.net. a3.twimg.com.edgesuite.net is an alias for a948.l.akamai.net. a948.l.akamai.net has address 18.104.22.168 a948.l.akamai.net has address 22.214.171.124
If you’re familiar with Akamai’s network, you can dig into response headers that come from their cache servers. I did a little of that, but the thing I found most interesting is that Akamai plucks avatar images from Amazon’s CloudFront service.
x-amz-id-2: NVloBPkil5u… x-amz-request-id: 1EAA3DE5516E… Server: AmazonS3 X-Amz-Cf-Id: 43e9fa481c3dcd79…
It’s not news that Twitter uses S3 for storing their images, but I hadn’t thought about using CloudFront (which is effectively a CDN) as an origin to another CDN. The benefit here, aside from not pounding the crap out of S3, is that Akamai’s regional cache servers can pull avatars from CloudFront POPs that are relatively close, as opposed to reaching all the way back to a single S3 origin (such as the “US Standard Region”, which I believe has two locations in the US). CloudFront doesn’t have nearly as many global POPs as Akamai. But using it does speed up image delivery by ensuring that Akamai’s cache servers in Asia are grabbing files from a CloudFront POP in Hong Kong or Singapore, rather than jumping across the Pacific to North America.
I suspect that Twitter racks up a reasonably large bill with Amazon by storing and serving so many files from S3 and CloudFront. However, it takes away the burden of owning all of the hardware, bandwidth, and man power required to serve millions upon millions of images… especially when that is not a core feature of their site.