Thursday, June 16, 2011

Pgpool-II Online Recovery

I've spent quite a bit of time learning about Pgpool-II online recovery, and getting it working. To recover a broken database, run the recovery script which will sync up the database then attach it to pgpool. Unfortunately the examples in the Pgpool-II manual are not complete - they just give you the general idea. So I had to comb the internet to clear up some haziness, and modify examples to make them work for me. Below is what I've got. It's tested and works. Stress tested even - by bombarding pgpool with inserts before during and after recovery - the result is all databases are in sync. Some of this is borrowed code and I've lost who the author is. If it's your code let me know and I'll properly credit you.

###############################################
#copy_base_backup.sh - 1st stage of recovery
#XXX TODO: remove exit 0 - make exit out with error codes when any command fails
#!/bin/sh

PSQL=/usr/bin/psql
SCP=/usr/bin/scp
SSH=/usr/bin/ssh
LOGGER="log /var/log/postgresql/pgpool_copy_base_backup.log"
#"/usr/bin/logger -i -p local0.info -t pgpool"
RSYNC="/usr/bin/rsync --archive --quiet --compress --rsh=$SSH --delete"
BASENAME=`/usr/bin/basename $0`
HOSTNAME=`/bin/hostname`
ID=`/usr/bin/id -un`

# $1 = Database cluster path of a master node.
# $2 = Hostname of a recovery target node.
# $3 = Database cluster path of a recovery target node.

#PG_HOME=/var/lib/postgresql
SRC_ARCHIVE_LOG_DIR=/var/lib/postgresql/9.0/main2/archive_log

SRC_DATA=$1
DST_HOST=$2
DST_DATA=$3

SRC_DBNAME=postgres
SRC_PORT=5431

log()
{
echo $(date +"%D %T") $2 >> $1
}

$LOGGER "Executing $BASENAME as user $ID"

$LOGGER "Executing pg_start_backup"
$PSQL -p $SRC_PORT -d $SRC_DBNAME -c "select pg_start_backup('pgpool-recovery')"

$LOGGER "Creating file recovery.conf"
echo "restore_command = '$SCP $HOSTNAME:$SRC_ARCHIVE_LOG_DIR/%f %p'" > $SRC_DATA/recovery.conf

$LOGGER "Rsyncing directory base"
$RSYNC $SRC_DATA/base/ $DST_HOST:$DST_DATA/base/
$LOGGER "Rsyncing directory global"
$RSYNC $SRC_DATA/global/ $DST_HOST:$DST_DATA/global/
$LOGGER "Rsyncing directory pg_clog"
$RSYNC $SRC_DATA/pg_clog/ $DST_HOST:$DST_DATA/pg_clog/
$LOGGER "Rsyncing directory pg_multixact"
$RSYNC $SRC_DATA/pg_multixact/ $DST_HOST:$DST_DATA/pg_multixact/
$LOGGER "Rsyncing directory pg_subtrans"
$RSYNC $SRC_DATA/pg_subtrans/ $DST_HOST:$DST_DATA/pg_subtrans/
$LOGGER "Rsyncing directory pg_tblspc"
$RSYNC $SRC_DATA/pg_tblspc/ $DST_HOST:$DST_DATA/pg_tblspc/
$LOGGER "Rsyncing directory pg_twophase"
$RSYNC $SRC_DATA/pg_twophase/ $DST_HOST:$DST_DATA/pg_twophase/
$LOGGER "Rsyncing directory pg_xlog"
$RSYNC $SRC_DATA/pg_xlog/ $DST_HOST:$DST_DATA/pg_xlog/
$LOGGER "Rsyncing file recovery.conf (with source deletion)"
$RSYNC --remove-source-files $SRC_DATA/recovery.conf $DST_HOST:$DST_DATA/

$LOGGER "Executing pg_stop_backup"
$PSQL -p $SRC_PORT -d $SRC_DBNAME -c 'select pg_stop_backup()'

exit 0

###################################################
#pgpool_recovery_pitr.sh - 2nd stage of recovery
#XXX TODO: remove exit 0 - make exit out with error codes when any command fails
#SRC_PORT and SRC_DBNAME need to be modified according to your system
#!/bin/sh

DATADIR=$1
PSQL=/usr/bin/psql
LOGGER="log /var/log/postgresql/pgpool_recovery_pitr.log"
BASENAME=`/usr/bin/basename $0`
ID=`/usr/bin/id -un`
SRC_PORT=5431
SRC_DBNAME=postgres

log()
{
echo $(date +"%D %T") $2 >> $1
}

$LOGGER "Executing $BASENAME as user $ID"

$LOGGER "Flushing sequence values to xlog"
# Force to flush current value of sequences to xlog
psql -p $SRC_PORT -t -c 'SELECT datname FROM pg_database WHERE NOT datistemplate AND datallowconn' template1|
while read i
do
if [ "$i" != "" ];then
psql -p $SRC_PORT -c "SELECT setval(oid, nextval(oid)) FROM pg_class WHERE relkind = 'S'" $i
fi
done

$LOGGER "Executing pgpool_switch_xlog"
$PSQL -p $SRC_PORT -d $SRC_DBNAME -c "select pgpool_switch_xlog('$DATADIR/archive_log')"

exit 0

###################
#pgpool_remote_start - 3rd stage (note there is no "3rd stage" in pgpool.conf - it seems hardcoded to run this script by name
#XXX TODO: remove exit 0 - make exit out with error codes when any command fails
#DST_DB_START should be modified - it is the command that is run on the recovery node - maybe service postgreql start is fine, maybe you need to run pg_ctl, for example

#!/bin/sh

SSH=/usr/bin/ssh
LOGGER="log /var/log/postgresql/pgpool_remote_start.log"
DST_DB_START="service postgresql start"

BASENAME=`/usr/bin/basename $0`
ID=`/usr/bin/id -un`

DST_HOST=$1
DST_DIR=$2

log()
{
echo $(date +"%D %T") $2 >> $1
}

$LOGGER "Executing $BASENAME as user $ID"
$LOGGER "Starting remote PostgreSQL server"
$SSH -T $DST_HOST '$DST_DB_START' 2>/dev/null 1>/dev/null

exit 0

1 comment:

Николай Галушко said...

Hi!
Comment please why there are db postgres, not template1

From docs:
psql -p $port -c "SELECT pgpool_switch_xlog('$archdir')" template1