Unexpected pitfall when building MySQL HA Servers

I built quite some HA setups for numerous pieces of software over the past years. But what happened this week I never saw … well … up to now.

The setup is pretty basic. Two servers mirroring a logical volume via DRBD with a MySQL Master DB on top, the whole thing managed by pacemaker on top of a heartbeat comms stack. Actually, it’s a percona version of MySQL, but that shouldn’t matter.

So one day just before lunch, the cluster monitoring tells us it needs to restart MySQL and is actually just in the middle of doing so. Looking at the processlist, there was a MySQL process running that was started a couple of weeks ago and still, one could even see the resource agent trying to start the resource.

~# ps -ef|grep mysql
root 1395 1 0 Jan23 ? 00:00:00 /bin/sh /usr/bin/mysqld_safe --defaults-file=/drbd/my.cnf --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/run/mysqld/mysqld.sock --datadir=/drbd/data/ --user=mysql
mysql 1656 1395 0 Jan23 ? 10:36:28 /usr/sbin/mysqld --defaults-file=/drbd/my.cnf --basedir=/usr --datadir=/drbd/data/ --user=mysql --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/run/mysqld/mysqld.sock --port=3306
root 1657 1395 0 Jan23 ? 00:00:00 logger -t mysqld -p daemon.error
root 17127 25225 0 13:27 ? 00:00:00 /bin/sh /usr/lib/ocf/resource.d//heartbeat/mysql start
root 18226 18015 0 13:30 pts/2 00:00:00 grep mysql

Looking at the logs, the cluster had seemingly tried to start the database system although it was already running and it had not even shut it down before trying to start it again. This usually isn’t something pacemaker would do unless we just discovered a critical bug, so I started investigating logs.

Mar 29 13:27:00 server1 crmd: [25228]: info: process_lrm_event: LRM operation MysqlServer_monitor_53000 (call=263, rc=7, cib-update=16939, confirmed=false) not running

Why should it return “not running” when clearly, as in the process list, the database is running?

When the cluster tried to start the database up again, MySQL also said that it was already running:

Mar 29 13:27:33 server1 mysqld: InnoDB: Unable to lock ./ibdata1, error: 11
Mar 29 13:27:33 server1 mysqld: InnoDB: Check that you do not already have another mysqld process
Mar 29 13:27:33 server1 mysqld: InnoDB: using the same InnoDB data or log files.

So I looked at the resource agent to find in which situations it would return “not running”.

mysql_status() {
if [ ! -e $OCF_RESKEY_pid ]; then
ocf_log debug "MySQL is not running"
return $OCF_NOT_RUNNING;
fi

pid=`cat $OCF_RESKEY_pid`;
if [ -d /proc -a -d /proc/1 ]; then
[ "u$pid" != "u" -a -d /proc/$pid ]
else
kill -0 $pid >/dev/null 2>&1
fi

if [ $? -eq 0 ]; then
return $OCF_SUCCESS;
else
ocf_log debug "MySQL not running: removing old PID file"
rm -f $OCF_RESKEY_pid
return $OCF_NOT_RUNNING;
fi
}

So it does that when there is no pid file. But why should there not be a pid file? Looking at the configuration I found that the pid file was not within the DRBD mountpoint but in the default location /var/run/mysqld. And this was about the time when it hit my co-worker…

He had had to look up some MySQL parameter and had issued “mysqld –help” on that server. Unfortunately, that spawned a mysqld process, which looked at /etc/mysql/my.cnf, which was there from just installing the distribution package, which also had /var/run/mysqld configured as its pidfile path. So it saw a pidfile that did not belong to itself, erased it, wrote its own pid in there which it only had for the time it was displaying the help message and then erased the pidfile before it quit. It also did that with the socket btw…

At that point, there was no pidfile, so the resource agent told pacemaker that the database was not running, so it tried to start things up again. That ran into the start-operation’s timeout and the cluster will moved the database to server2. From a cluster point of view, this was the thing to do.

So, what do we learn from this?

Never leave pidfile and socket untouched in a MySQL HA configuration or at least change the default configuration file to use a different pidfile and socket.

ps. at least I learned that this did not happen due to an old version of the resource agent … the current version would have done the same thing.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s