[ADMIN] Issues with log-shipping replication

Discussion:

(too old to reply)

Khusro Jaleel

2011-12-15 18:02:03 UTC

Hello, I'm trying out a simple example from the Postgresql 9
Administration Cookbook about File-based log shipping replication, but I
can't get it to work between 2 CentOS 5.7 VMs running Postgresql 9.1.
Here are my settings:

.bash_profile on master:
export PGARCHIVE=/var/lib/pgsql/pgarchive
export STANDBYNODE=172.16.1.56

.bash_profile on slave:
export PGARCHIVE=/var/lib/pgsql/pgarchive

postgresql.conf on master:
===================================================================
# - Settings -

wal_level = archive # minimal, archive, or hot_standby
# (change requires restart)
#fsync = on # turns forced synchronization
on or off
#synchronous_commit = on # synchronization level; on,
off, or local
#wal_sync_method = fsync # the default is the first option
# supported by the operating
system:
# open_datasync
# fdatasync (default on Linux)
# fsync
# fsync_writethrough
# open_sync
#full_page_writes = on # recover from partial page writes
#wal_buffers = -1 # min 32kB, -1 sets based on
shared_buffers
# (change requires restart)
#wal_writer_delay = 200ms # 1-10000 milliseconds

#commit_delay = 0 # range 0-100000, in microseconds
#commit_siblings = 5 # range 1-1000

# - Checkpoints -

#checkpoint_segments = 3 # in logfile segments, min 1,
16MB each
#checkpoint_timeout = 5min # range 30s-1h
#checkpoint_completion_target = 0.5 # checkpoint target duration,
0.0 - 1.0
#checkpoint_warning = 30s # 0 disables

# - Archiving -

archive_mode = on # allows archiving to be done
# (change requires restart)
archive_command = 'scp %p $STANDBYNODE:$PGARCHIVE/%f'
archive_timeout = 30 # force a logfile segment switch after
this # number of seconds; 0 disables
===================================================================

recovery.conf on slave:
===================================================================
standby_mode = 'on'
restore_command = 'cp $PGARCHIVE/%f %p'
archive_cleanup_command = '/usr/pgsql-9.1/bin/pg_archivecleanup
$PGARCHIVE %r'
trigger_file = '/tmp/postgresql.trigger.5432
===================================================================

do_backup.sh script to make an initial copy of master onto slave:
===================================================================
#!/bin/bash

psql -c "select pg_start_backup('base backup for log shipping')"
rsync -cva --inplace --exclude=*pg_xlog* ${PGDATA}/ $STANDBYNODE:$PGDATA
psql -c "select pg_stop_backup(), current_timestamp"
===================================================================

The procedure I am using to test this is:

1. Clear out the data dir on the slave, put only the recovery.conf file
there
2. Run the 'do_backup.sh' script to copy the master data dir to the
slave, excluding pg_xlog
3. Create the pg_xlog dir on the slave
4. Make sure the $PGARCHIVE dir exists on the slave. Note that this is
completely outside the normal 'data' dir
5. Start up the master. As soon as I do this, I see some archive files
appear in the $PGARCHIVE dir on the slave. This means the 'scp'
'archive_command' I am using IS working.
6. Now start up the slave, which prints the following to the log in
9.1/data/pg_log:

========================================================================
LOG: startup process (PID 4771) exited with exit code 1
LOG: aborting startup due to startup process failure
LOG: database system was interrupted; last known up at 2011-12-08
19:45:10 UTC
LOG: creating missing WAL directory "pg_xlog/archive_status"
LOG: entering standby mode
LOG: restored log file "000000010000000000000027" from archive
LOG: redo starts at 0/27000078
LOG: consistent recovery state reached at 0/28000000
cp: cannot stat `/var/lib/pgsql/pgarchive/000000010000000000000028': No
such file or directory
cp: cannot stat `/var/lib/pgsql/pgarchive/000000010000000000000028': No
such file or directory
========================================================================

Looking at the master, the file '000000010000000000000028' does exist in
9.1/data/pg_xlog, but for some reason the master is NOT copying it over
to the slave. The slave has all the files prior to this, but not
anything after.

Anybody got any ideas why?

--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Rural Hunter

2011-12-16 00:55:33 UTC

Permalink

well, is pgsql capable to parse shell variables in postgresql.conf?
anway you should check your master log. If it can not archive the wal,
there will be errors reported in it.

Post by Khusro Jaleel
Hello, I'm trying out a simple example from the Postgresql 9
Administration Cookbook about File-based log shipping replication, but
I can't get it to work between 2 CentOS 5.7 VMs running Postgresql
export PGARCHIVE=/var/lib/pgsql/pgarchive
export STANDBYNODE=172.16.1.56
export PGARCHIVE=/var/lib/pgsql/pgarchive
===================================================================
# - Settings -
wal_level = archive # minimal, archive, or hot_standby
# (change requires restart)
#fsync = on # turns forced synchronization on or off
#synchronous_commit = on # synchronization level; on, off, or local
#wal_sync_method = fsync # the default is the first option
# open_datasync
# fdatasync (default on Linux)
# fsync
# fsync_writethrough
# open_sync
#full_page_writes = on # recover from partial page writes
#wal_buffers = -1 # min 32kB, -1 sets based on shared_buffers
# (change requires restart)
#wal_writer_delay = 200ms # 1-10000 milliseconds
#commit_delay = 0 # range 0-100000, in microseconds
#commit_siblings = 5 # range 1-1000
# - Checkpoints -
#checkpoint_segments = 3 # in logfile segments, min 1, 16MB each
#checkpoint_timeout = 5min # range 30s-1h
#checkpoint_completion_target = 0.5 # checkpoint target duration, 0.0
- 1.0
#checkpoint_warning = 30s # 0 disables
# - Archiving -
archive_mode = on # allows archiving to be done
# (change requires restart)
archive_command = 'scp %p $STANDBYNODE:$PGARCHIVE/%f'
archive_timeout = 30 # force a logfile segment switch after this #
number of seconds; 0 disables
===================================================================
===================================================================
standby_mode = 'on'
restore_command = 'cp $PGARCHIVE/%f %p'
archive_cleanup_command = '/usr/pgsql-9.1/bin/pg_archivecleanup
$PGARCHIVE %r'
trigger_file = '/tmp/postgresql.trigger.5432
===================================================================
===================================================================
#!/bin/bash
psql -c "select pg_start_backup('base backup for log shipping')"
rsync -cva --inplace --exclude=*pg_xlog* ${PGDATA}/ $STANDBYNODE:$PGDATA
psql -c "select pg_stop_backup(), current_timestamp"
===================================================================
1. Clear out the data dir on the slave, put only the recovery.conf
file there
2. Run the 'do_backup.sh' script to copy the master data dir to the
slave, excluding pg_xlog
3. Create the pg_xlog dir on the slave
4. Make sure the $PGARCHIVE dir exists on the slave. Note that this is
completely outside the normal 'data' dir
5. Start up the master. As soon as I do this, I see some archive files
appear in the $PGARCHIVE dir on the slave. This means the 'scp'
'archive_command' I am using IS working.
6. Now start up the slave, which prints the following to the log in
========================================================================
LOG: startup process (PID 4771) exited with exit code 1
LOG: aborting startup due to startup process failure
LOG: database system was interrupted; last known up at 2011-12-08
19:45:10 UTC
LOG: creating missing WAL directory "pg_xlog/archive_status"
LOG: entering standby mode
LOG: restored log file "000000010000000000000027" from archive
LOG: redo starts at 0/27000078
LOG: consistent recovery state reached at 0/28000000
No such file or directory
No such file or directory
========================================================================
Looking at the master, the file '000000010000000000000028' does exist
in 9.1/data/pg_xlog, but for some reason the master is NOT copying it
over to the slave. The slave has all the files prior to this, but not
anything after.
Anybody got any ideas why?

--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Khusro Jaleel

2011-12-18 15:57:54 UTC

Permalink

Post by Rural Hunter
well, is pgsql capable to parse shell variables in postgresql.conf?
anway you should check your master log. If it can not archive the wal,
there will be errors reported in it.

Hi there,

I have removed the various shell variables from the postgresql.conf
file, cleaned out the data dir and started again. I can not see any
errors in the postgresql log files in "data/pg_log". They simply report
that Postgres was started up successfully, that's all. There are a few
files created in "data/pg_xlog", some of which are copied over to the
slave, but it seems some of them are not for some strange reason,
perhaps an oddity with using "scp" for this procedure?

You are referring to the "master" log file, is this the same log file I
am referring to, or is that is some different location?

--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Ray Stell

2011-12-18 19:35:28 UTC

Permalink

You are referring to the "master" log file, is this the same log file I am
referring to, or is that is some different location?

http://www.postgresql.org/docs/9.1/interactive/runtime-config-logging.html

--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin