<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Mike Brittain &#187; munin</title>
	<atom:link href="http://www.mikebrittain.com/blog/tag/munin/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.mikebrittain.com/blog</link>
	<description>Internet, mobile applications, skiing, snowboarding, food... you know, whatever comes to mind.</description>
	<lastBuildDate>Thu, 29 Jul 2010 22:54:19 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Two Munin Graphs for Monitoring Our Code</title>
		<link>http://www.mikebrittain.com/blog/2009/12/17/munin-plugins-code-deployment/</link>
		<comments>http://www.mikebrittain.com/blog/2009/12/17/munin-plugins-code-deployment/#comments</comments>
		<pubDate>Fri, 18 Dec 2009 01:19:35 +0000</pubDate>
		<dc:creator>Mike Brittain</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[WWW]]></category>
		<category><![CDATA[cafemom]]></category>
		<category><![CDATA[deployment]]></category>
		<category><![CDATA[errors]]></category>
		<category><![CDATA[graphs]]></category>
		<category><![CDATA[munin]]></category>
		<category><![CDATA[operations]]></category>
		<category><![CDATA[plugin]]></category>

		<guid isPermaLink="false">http://www.mikebrittain.com/blog/?p=398</guid>
		<description><![CDATA[We&#8217;re using a few custom Munin graphs at CafeMom to monitor our application code running on production servers.  I posted two samples of these to the WebOps Visualization group at Flickr.  The first graph measures the &#8220;uptime&#8221; of our code, a measure of how many minutes it&#8217;s been since our last deployment to prod (with [...]]]></description>
			<content:encoded><![CDATA[<p>We&#8217;re using a few custom Munin graphs at <a href="http://www.cafemom.com/">CafeMom</a> to monitor our application code running on production servers.  I posted two samples of these to the <a href="http://www.flickr.com/groups/webopsviz/pool/">WebOps Visualization</a> group at Flickr.  The <a href="http://www.flickr.com/photos/mikebrittain/4133501271/in/pool-webopsviz">first graph</a> measures the <strong>&#8220;uptime&#8221; of our code</strong>, a measure of how many <em>minutes</em> it&#8217;s been since our last deployment to prod (with a max of 1 day).  The <a href="http://www.flickr.com/photos/mikebrittain/4171701079/in/pool-webopsviz/">second graph</a> provides a picture of what&#8217;s going on in our <strong>PHP error logs</strong>, highlighting spikes in <a href="http://us3.php.net/manual/en/errorfunc.constants.php">notices, warnings, and fatals</a>, as well as DB-related errors that we throw of our own.</p>
<p>When used together, these graphs give us quick feedback on what sort of errors are occurring on our site and whether they are likely to be related to a recent code promotion, or are the effect of some other condition (bad hardware, third-party APIs failing, SEM gone awry, etc.).</p>
<p style="text-align: center;"><img class="aligncenter" title="Code uptime between deployments" src="http://farm3.static.flickr.com/2620/4133501271_9c2c72559f.jpg" alt="" width="400" height="245" /></p>
<p style="text-align: center;"><img class="aligncenter" title="Error Logs" src="http://farm3.static.flickr.com/2677/4171701079_bfcc73a72a.jpg" alt="" width="400" height="254" /></p>
<p>I figured someone might find these useful, so I&#8217;m posting the code for both Munin plugins.</p>
<h3>Code Uptime</h3>
<p>When we deploy code to our servers, we add a timestamp file that is used to manage versioning of deployments to make things like rolling back <em>super easy</em>.  It&#8217;s also handy for measuring how long it&#8217;s been since the last deployment.  All this plugin does is reads how long ago that file was modified.</p>
<p>We run multiple <em>applications</em> on the same clusters of servers. I wrote our code deployment process in a manner that allows for independent deployment of each application. For example, one team working on our Facebook apps can safely deploy code without interfering with the deployment schedule another team is using for new features that will be released on the CafeMom web site.</p>
<p>Each of these applications is deployed to a separate directory under a root, let&#8217;s say &#8220;/var/www.&#8221;  This explains why the plugin is reading a list of files (directories) under APPS_ROOT.  Each app has it&#8217;s own reported uptime on the Munin graph.<br />
<code> </code></p>
<pre>#!/bin/sh
#
# Munin plugin to monitor the relative uptime of each app
# running on the server.
#

APPS_ROOT="/var/www/"

# Cap the uptime at one day so as to maintain readable scale
MAX_MIN=1440

# Configure list of apps
if [ "$1" = "config" ]; then
 echo 'graph_title Application uptimes'
 echo "graph_args --base 1000 --lower-limit 0 --upper-limit $MAX_MIN"
 echo 'graph_scale no'
 echo 'graph_category Applications'
 echo 'graph_info Monitors when each app was last deployed to the server.'
 echo 'graph_vlabel Minutes since last code push (max 1 day)'

 for file in `ls $APPS_ROOT`; do
 echo "$file.label $file"
 done
 exit 0
fi

# Fetch release times
now_sec=`date +%s`

for file in `ls $APPS_ROOT`; do
 release_sec=`date +%s  -r $APPS_ROOT/$file/prod/release_timestamp`
 diff_sec=$(( $now_sec - $release_sec ))
 diff_min=$(( $diff_sec/60 ))
 ((diff_min&gt;MAX_MIN?diff_min=MAX_MIN:diff_min))
 echo "$file.value $diff_min"
done</pre>
<h3>Error Logs</h3>
<p>The second plugin uses grep to search for occurrences of specific error-related strings in our log files. In our case, the graph period was set to &#8220;minute&#8221; because that gives the best scale for us (thankfully, it&#8217;s not in errors per second!).</p>
<p>If you&#8217;re wondering about using grep five times against large error files every time Munin runs (every five minutes), I want to point out that we rotate our logs frequently which ensures that these calls are manageable. If you run this against very large error logs you may find gaps in your Munin graphs if the poller times out waiting for the plugin to return data points.</p>
<p>Even if you don&#8217;t care about PHP logs, you may find this to be a simple example of using Munin to examine any sort of log files that your application is creating.</p>
<p><code> </code></p>
<pre>#!/bin/bash

#
# Collect stats for the contents of PHP error logs. Measures notice,
# warning, fatal, and parse level errors, as well as custom errors
# thrown from our DB connection class.
#

logs="/var/log/httpd/*.error.log"

# CONFIG

if [ "$#" -eq "1" ] &amp;&amp; [ "$1" = "config" ]; then
	echo "graph_title Error Logs"
	echo "graph_category applications"
	echo "graph_info Data is pooled from all PHP error logs."
	echo "graph_vlabel entries per minute"
	echo "graph_period minute"
	echo "notice.label PHP Notice"
	echo "notice.type DERIVE"
	echo "notice.min 0"
	echo "notice.draw AREA"
	echo "warning.label PHP Warning"
	echo "warning.type DERIVE"
	echo "warning.min 0"
	echo "warning.draw STACK"
	echo "fatal.label PHP Fatal"
	echo "fatal.type DERIVE"
	echo "fatal.min 0"
	echo "fatal.draw STACK"
	echo "parse.label PHP Parse"
	echo "parse.type DERIVE"
	echo "parse.min 0"
	echo "parse.draw STACK"
	echo "db.label DB Error"
	echo "db.type DERIVE"
	echo "db.min 0"
	echo "db.draw STACK"
	exit
fi

# DATA

# The perl code at the end of each line takes a list of integers (counts) from grep (one per line)
# and outputs the sum.

echo "notice.value `grep --count \"PHP Notice\" $logs | cut -d':' -f2 | perl -lne ' $x += $_; END { print $x; } ' `"
echo "warning.value `grep --count \"PHP Warning\" $logs | cut -d':' -f2 | perl -lne ' $x += $_; END { print $x; } ' `"
echo "fatal.value `grep --count \"PHP Fatal error\" $logs | cut -d':' -f2 | perl -lne ' $x += $_; END { print $x; } ' `"
echo "parse.value `grep --count \"PHP Parse error\" $logs | cut -d':' -f2 | perl -lne ' $x += $_; END { print $x; } ' `"
echo "db.value `grep --count \"DB Error\" $logs | cut -d':' -f2 | perl -lne ' $x += $_; END { print $x; } ' `"</pre>
<p>I&#8217;m open to comments and suggestions on how we use these, or how they were written.  Spout off below.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mikebrittain.com/blog/2009/12/17/munin-plugins-code-deployment/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Munin Plugin for Testing S3 Speed</title>
		<link>http://www.mikebrittain.com/blog/2008/09/26/munin-plugin-for-testing-s3-speed/</link>
		<comments>http://www.mikebrittain.com/blog/2008/09/26/munin-plugin-for-testing-s3-speed/#comments</comments>
		<pubDate>Fri, 26 Sep 2008 23:58:59 +0000</pubDate>
		<dc:creator>Mike Brittain</dc:creator>
				<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[amazon web services]]></category>
		<category><![CDATA[munin]]></category>
		<category><![CDATA[s3]]></category>

		<guid isPermaLink="false">http://www.mikebrittain.com/blog/?p=198</guid>
		<description><![CDATA[Matt Spinks put together a Munin plugin for monitoring S3 download speeds, which is now available at Google Code.  I mentioned this in a recent post and wanted to provide an update the the plugin is now published.
]]></description>
			<content:encoded><![CDATA[<p>Matt Spinks put together a <a href="http://munin.projects.linpro.no/">Munin</a> plugin for <a href="http://code.google.com/p/s3speed/">monitoring S3 download</a> speeds, which is now available at Google Code.  I mentioned this in a recent post and wanted to provide an update the the plugin is now published.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mikebrittain.com/blog/2008/09/26/munin-plugin-for-testing-s3-speed/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
