Aikido, programming, system administration, and other things I find interesting

Using rrdcached with munin on Ubuntu

When you have munin following many hosts with many variables tracked on each, you will inevitably find your munin server is overwhelmed with IO (rising iowait values, the server sometimes having excursions where it just gets overwhelmed with IO).

The 1.4.7-1 version of rrdcached did NOT install cleanly on my Ubuntu 12.04 server: the package couldn’t complete configuration because when it attempted to start the daemon, it caught a segfault.

To make the package install, I had to do

mkdir -p /var/lib/rrdcached/db /var/lib/rrdcached/journal

after which the package installed and the rrdcached started without problems.

However, the permissions were still wrong, the communication with munin was not working as hoped (munin was reporting “Permission denied” from rrdcached when trying to write it’s rrd files) and munin-cgi-graph found itself unable to draw it’s graphs, leading to broken image icons in the munin pages.

This was fixed by killing rrdcached, modifying the init script to use different uid/gid and to change the permission of the socket,
and then starting rrdcached again.

 

#! /bin/bash
#
# rrdcached - start and stop the RRDtool data caching daemon
# http://oss.oetiker.ch/rrdtool/
#
# Based on the collectd init script.
#
# Copyright (C) 2005-2006 Florian Forster <octo@verplant.org>
# Copyright (C) 2006-2009 Sebastian Harl <tokkee@debian.org>
#
### BEGIN INIT INFO
# Provides: rrdcached
# Required-Start: $local_fs $remote_fs
# Required-Stop: $local_fs $remote_fs
# Should-Start: $network
# Should-Stop: $network
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: start the RRDtool data caching daemon
### END INIT INFO
set -e
PATH=/sbin:/bin:/usr/sbin:/usr/bin
DISABLE=0
DESC="RRDtool data caching daemon"
NAME=rrdcached
DAEMON=/usr/bin/rrdcached
SOCKET=/var/run/rrdcached.sock
OPTS="-l unix:$SOCKET"
OPTS="$OPTS -j /var/lib/rrdcached/journal/ -F"
OPTS="$OPTS -b /var/lib/rrdcached/db/ "
OPTS="$OPTS -m 777 -w 1800 -z 1800 -f 3600 "
PIDFILE=/var/run/rrdcached.pid
MAXWAIT=30
# Gracefully exit if the package has been removed.
test -x $DAEMON || exit 0
if [ -r /etc/default/$NAME ]; then
 . /etc/default/$NAME
fi
if test "$DISABLE" != 0 -a "$1" == "start"; then
 echo "$NAME has been disabled - see /etc/default/$NAME."
 exit 0
fi
if test "$ENABLE_COREFILES" == 1; then
 ulimit -c unlimited
fi
d_start() {
 if test "$DISABLE" != 0; then
 # we get here during restart
 echo -n " - disabled by /etc/default/$NAME"
 return 0
 fi
start-stop-daemon --chuid munin:www-data --start --quiet --oknodo \
 --pidfile "$PIDFILE" \
 --exec $DAEMON -- $OPTS -p "$PIDFILE" 
 chmod 770 $SOCKET
}
still_running_warning="
WARNING: $NAME might still be running.
In large setups it might take some time to write all pending data to
the disk. You can adjust the waiting time in /etc/default/$NAME."
d_stop() {
 PID=$( cat "$PIDFILE" 2> /dev/null ) || true
start-stop-daemon --stop --quiet --oknodo --pidfile "$PIDFILE"
sleep 1
 if test -n "$PID" && kill -0 $PID 2> /dev/null; then
 i=0
 while kill -0 $PID 2> /dev/null; do
 i=$(( $i + 2 ))
 echo -n " ."
if test $i -gt $MAXWAIT; then
 echo "$still_running_warning" >&2
 return 1
 fi
 sleep 2
 done
 return 0
 fi
}
d_status() {
 PID=$( cat "$PIDFILE" 2> /dev/null ) || true
if test -n "$PID" && kill -0 $PID 2> /dev/null; then
 echo "$NAME ($PID) is running."
 exit 0
 else
 PID=$( pidof $NAME ) || true
if test -n "$PID"; then
 echo "$NAME ($PID) is running."
 exit 0
 else
 echo "$NAME is stopped."
 fi
 fi
 exit 1
}
case "$1" in
 start)
 echo -n "Starting $DESC: $NAME"
 d_start
 echo "."
 ;;
 stop)
 echo -n "Stopping $DESC: $NAME"
 d_stop
 echo "."
 ;;
 status)
 d_status
 ;;
 restart|force-reload)
 echo -n "Restarting $DESC: $NAME"
 d_stop
 sleep 1
 d_start
 echo "."
 ;;
 *)
 echo "Usage: $0 {start|stop|restart|force-reload|status}" >&2
 exit 1
 ;;
esac
exit 0
# vim: syntax=sh noexpandtab sw=4 ts=4 :

Note that there will be large files written to /var/lib/rrdcached/journal, and rrdcached normally keeps one to which it is actively writing, and the previous one (a new file is started every hour). Older files seem to be automatically removed. On my system, each file is more than 100 MB in size.

Related Posts

Why is my munin slow and how to speed it up

At $work we are monitoring a network of hundreds of servers, and that means that we end up recording hundreds of thousands of variable values every five minutes. After a while, the server started slowing down, taking more than 300 seconds to collect the data. Since it has a whole-system lock, that means the next […]

Read More

A munin plugin to monitor each CPU core separately

Monitoring each core separately may seem like a waste – after all, we have an overall CPU usage already available under “system” in munin, isn’t that enough? It turns out that it isn’t. Sometimes, when using top on a multicore/multicpu machine, you can see a process pegged at 100%, while other processes are comfortably using […]

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *