Case Against Using Symlinks For Code Promotion

Posted by Mike Brittain on May 12, 2009
WWW

I’ve read and argued a lot about the case for using symlinks in managing code promotion. I’ve talked about this a lot with my pal, John Goulah, and he also noted this approach in a post he wrote about deploying code.

My general approach goes something like this:

  1. Point your document root at a static location, such as /var/www/html/prod.
  2. Promote new code (from stage, dev, what have you) to a versioned directory on your prod server, such as /var/www/html/release_12345.
  3. Use a symlink, “prod” in this case, to point at a specific release (e.g. prod -> release_12345).

The goal here is that you have a handful of versions on your prod servers that make it very quick to revert to a previous version.  You can revert your code, right?  (I’ll admit that only after a number of years I’ve finally gotten reverts working quickly and consistently.)  Our build process is not too involved, but it takes about two minutes to get code out to all of our servers and then flip over the symlinks to make the new code live.  The flip only takes one or two seconds, so all of the web servers we are deploying to remain consistent.

We have used a similar technique for nearly the last year in our build process at CafeMom.  I stress the word used, because we just got rid of the symlinks.

It turns out that we ran into an issue with the way that Apache handles symlinks while under load.  At the point when we would flip our symlinks to a new release version, files (especially PHP) from the previous release were still being executed or served by Apache.  It seems that Apache caches the real path to a file after determining where the symlink is pointing, and doesn’t necessarily re-check the symlink.  I don’t know how long this occurs, or whether it is specific to individual Apache children, but I do know that it has been a regular pain in the ass (PHP fatals, blank white pages, yuck).

To be clear about details, we also run APC on our servers to speed things up in our PHP processing.  When we promote a new version of code we also flush the PHP cache, thinking that this would help clear out cached versions of files.  The only thing that would absolutely clear up this issue was to cycle Apache, which we would prefer not to do on every deployment.

What now?  Rather than repointing a symlink, we move entire directories.  Something like this:

  1. Sync code out to a new versioned directory on our prod servers, say release_12346.
  2. Switch release directories doing something like this: “mv prod release_12345; mv release_12346 prod”.  (We use a state file to keep track of where “prod” originally lived, in case you’re wondering where release_12345 came from.)

Renaming these directories takes about the same time as swapping the old symlink we were using.  The Apache virtual host configurations (document roots) don’t need to be updated, either.  Looking at our APC dashboard and the affect on our site, it seems like this approach is working much more smoothly.

If you’ve got questions, alternative suggestions, or if you know what the root issue is with Apache caching our symlinks, sound off in the comments below!

Tags: , , , , , ,

4 Comments to Case Against Using Symlinks For Code Promotion

  • Actually I don’t use symbolic links for the document root. I programmatically generate the virtual host files on every build and bundle them with the code. These generated files point to the most current document root (the directory that is basically named by svn version you’re releasing). After code is pushed, I do use a symbolic link to point my httpd/conf.d/www-site.com.conf at the generated vhost file that I pushed out with my code. This way if you make a change to your vhost file between versions you can roll that back alongside the code.

  • Hi Mike,

    Why are you against restarting apache after deployment? It’s quick and harmless as far as I can tell.

  • We feed some content through a CDN. When the CDN makes a request that hits a server that is restarting, it has caused issues for users receiving blank pages.

    I agree that it *should* be harmless, especially if we were doing a graceful restart. But we have run into issues, which may be specific to our own setup.