PG 9.1 Looking for old WAL when promoting from recovery to master

David Morton

2012-09-03 22:01:17 UTC

I'm implementing replica servers which will use a trigger file to promote
from hot standby to full read/write. I've configured streaming replication
as well as a recovery.conf which copies old WAL files from a repository if
required.

When placing the trigger file the system assumes the read/write roll
without issue but insists on looking for a really old WAL file ... the
below log file shows restoration from the previous nights full online
backup (rsync) along with the trigger file detection and then attempting to
find the old WAL file.

Is this behavior normal ? From what i can see its not writing any new WAL
files until it is satisfied with the state of this old one. If I create the
file its expecting to see it archives it off and then complains about the
next in the series.

2012-08-28 23:30:33 UTC LOG: restored log file
"000000010000002E00000030" from archive
2012-08-28 23:30:34 UTC LOG: restored log file
"000000010000002E00000031" from archive
2012-08-28 23:30:36 UTC LOG: restored log file
"000000010000002E00000032" from archive
2012-08-28 23:30:37 UTC LOG: restored log file
"000000010000002E00000033" from archive
2012-08-28 23:30:39 UTC LOG: restored log file
"000000010000002E00000034" from archive
2012-08-28 23:30:42 UTC LOG: restored log file
"000000010000002E00000035" from archive
2012-08-28 23:30:44 UTC LOG: restored log file
"000000010000002E00000036" from archive
2012-08-28 23:30:45 UTC LOG: restored log file
"000000010000002E00000037" from archive
cp: cannot stat `/NFS/current/wal/depot/000000010000002E00000038': No such
file or directory
2012-08-28 23:30:47 UTC LOG: streaming replication successfully
connected to primary
2012-08-28 23:42:09 UTC LOG: trigger file found:
/home/depot/data/transition_to_master.trigger
2012-08-28 23:42:09 UTC FATAL: terminating walreceiver process due to
administrator command
cp: cannot stat `/NFS/current/wal/depot/000000010000002E00000039': No such
file or directory
2012-08-28 23:42:09 UTC LOG: record with zero length at 2E/39079E00
cp: cannot stat `/NFS/current/wal/depot/000000010000002E00000039': No such
file or directory
2012-08-28 23:42:09 UTC LOG: redo done at 2E/39079DC0
2012-08-28 23:42:09 UTC LOG: last completed transaction was at log time
2012-08-28 23:42:02.226546+00
cp: cannot stat `/NFS/current/wal/depot/000000010000002E00000039': No such
file or directory
cp: cannot stat `/NFS/current/wal/depot/00000002.history': No such file or
directory
2012-08-28 23:42:09 UTC LOG: selected new timeline ID: 2
cp: cannot stat `/NFS/current/wal/depot/00000001.history': No such file or
directory
2012-08-28 23:42:10 UTC LOG: archive recovery complete
2012-08-28 23:42:10 UTC LOG: database system is ready to accept
connections
2012-08-28 23:42:10 UTC LOG: autovacuum launcher started
pg_xlog/000000010000001D00000023: No such file or directory
2012-08-28 23:42:10 UTC LOG: archive command failed with exit code 1
2012-08-28 23:42:10 UTC DETAIL: The failed archive command was:
/DB_SHARED/dbcommon/scripts/logarchive.sh pg_xlog/000000010000001D00000023
000000010000001D00000023
pg_xlog/000000010000001D00000023: No such file or directory
2012-08-28 23:42:11 UTC LOG: archive command failed with exit code 1
2012-08-28 23:42:11 UTC DETAIL: The failed archive command was:
/DB_SHARED/dbcommon/scripts/logarchive.sh pg_xlog/000000010000001D00000023
000000010000001D00000023
pg_xlog/000000010000001D00000023: No such file or directory
2012-08-28 23:42:13 UTC LOG: archive command failed with exit code 1
2012-08-28 23:42:13 UTC DETAIL: The failed archive command was:
/DB_SHARED/dbcommon/scripts/logarchive.sh pg_xlog/000000010000001D00000023
000000010000001D00000023
2012-08-28 23:42:13 UTC WARNING: transaction log file
"000000010000001D00000023" could not be archived: too many failures
pg_xlog/000000010000001D00000023: No such file or directory
2012-08-28 23:43:13 UTC LOG: archive command failed with exit code 1
2012-08-28 23:43:13 UTC DETAIL: The failed archive command was:
/DB_SHARED/dbcommon/scripts/logarchive.sh pg_xlog/000000010000001D00000023
000000010000001D00000023
pg_xlog/000000010000001D00000023: No such file or directory
2012-08-28 23:43:14 UTC LOG: archive command failed with exit code 1
2012-08-28 23:43:14 UTC DETAIL: The failed archive command was:
/DB_SHARED/dbcommon/scripts/logarchive.sh pg_xlog/000000010000001D00000023
000000010000001D00000023
pg_xlog/000000010000001D00000023: No such file or directory
2012-08-28 23:43:15 UTC LOG: archive command failed with exit code 1
2012-08-28 23:43:15 UTC DETAIL: The failed archive command was:
/DB_SHARED/dbcommon/scripts/logarchive.sh pg_xlog/000000010000001D00000023
000000010000001D00000023
2012-08-28 23:43:15 UTC WARNING: transaction log file
"000000010000001D00000023" could not be archived: too many failures