Lonni J Friedman
2013-01-14 20:02:34 UTC
Your errors look somewhat similar to a problem I reported last week
(no replies thus far):
http://www.postgresql.org/message-id/CAP=oouE5niXgAO_34Q+FGq=***@mail.gmail.com
Except in my case no number of restarts helped. You didn't say, were
you explicitly copying $PGDATA or using some other mechanism to
migrate the data elsewhere?
Also, which version of postgres are you using?
(no replies thus far):
http://www.postgresql.org/message-id/CAP=oouE5niXgAO_34Q+FGq=***@mail.gmail.com
Except in my case no number of restarts helped. You didn't say, were
you explicitly copying $PGDATA or using some other mechanism to
migrate the data elsewhere?
Also, which version of postgres are you using?
Hi Everyone,
So we had to failover and do a full base backup to get our slave database
back online and ran into a interesting scenario. After copying the data
directory, setting up the recovery.conf, and starting the slave database,
the database crashes while replaying xlogs. However, trying to start the
database again, the database is able to replay xlogs farther than it
initially got, but ultimately ended up failing out again. After starting the
DB a third time, PostgreSQL replays even further and catches up to the
master to start streaming replication. Is this common and or acceptable?
base/16408/18967399 does not exist
1663/16408/22892842; iblk 658355, heap 1663/16408/18967399;
invalid pages
1663/16408/22892842; iblk 658355, heap 1663/16408/18967399;
terminated by signal 6: Aborted
base/16408/18967399 does not exist
1663/16408/22892841; iblk 1075350, heap 1663/16408/18967399;
invalid pages
1663/16408/22892841; iblk 1075350, heap 1663/16408/18967399;
terminated by signal 6: Aborted
Fortunately, these errors only pertain to indexes, which can be rebuilt.
But is this a sign that the data directory on the slave is corrupt?
1. Data Directory Copy Finishes.
2. Recovery.conf Setup
recovery at 2013-01-12 00:14:06 UTC
incomplete startup packet
database system is starting up
unpigz: /mnt/db/wals/00000009.history does not exist -- skipping
"0000000900008E45000000B8" from archive
"0000000900008E450000008B" from archive
"pg_snapshots": No such file or directory
reached at 8E45/8B174840
database system is starting up
"0000000900008E450000008C" from archive
"0000000900008E450000008D" from archive
*SNIP*
"0000000900008E4800000066" from archive
"0000000900008E4800000067" from archive
base/16408/18967399 does not exist
1663/16408/22892842; iblk 658355, heap 1663/16408/18967399;
invalid pages
1663/16408/22892842; iblk 658355, heap 1663/16408/18967399;
terminated by signal 6: Aborted
server processes
4. PostgreSQL shuts down...
5. Debugging logs enabled in postgresql.conf.
while in recovery at log time 2013-01-11 18:05:31 UTC
once some data might be corrupted and you might need to choose an earlier
recovery target.
incomplete startup packet
database system is starting up
unpigz: /mnt/db/wals/00000009.history does not exist -- skipping
"0000000900008E45000000B8" from archive
8E45/B80AF650
8E45/8B173180; shutdown FALSE
0/552803703; next OID: 24427698
MultiXactOffset: 2442921
ID: 3104202601, in database 16408
956718952, limited by database with OID 16408
cleanup 1 init 0
"pg_snapshots": No such file or directory
"0000000900008E450000008B" from archive
"0000000900008E450000008C" from archive
*SNIP*
"0000000900008E4800000062" from archive
"0000000900008E4800000063" from archive
"0000000900008E4800000064" from archive
"0000000900008E4800000065" from archive
"0000000900008E4800000066" from archive
"0000000900008E4800000067" from archive
reached at 8E48/67AC4E28
accept read only connections
"0000000900008E4800000068" from archive
"0000000900008E4800000069" from archive
"0000000900008E480000006A" from archive
"0000000900008E480000006B" from archive
"0000000900008E480000006C" from archive
*SNIP*
"0000000900008E4F00000079" from archive
"0000000900008E4F0000007A" from archive
base/16408/18967399 does not exist
1663/16408/22892841; iblk 1075350, heap 1663/16408/18967399;
invalid pages
1663/16408/22892841; iblk 1075350, heap 1663/16408/18967399;
terminated by signal 6: Aborted
server processes
7. PostgreSQL shuts down...
while in recovery at log time 2013-01-11 19:50:31 UTC
once some data might be corrupted and you might need to choose an earlier
recovery target.
incomplete startup packet
database system is starting up
unpigz: /mnt/db/wals/00000009.history does not exist -- skipping
"0000000900008E4A00000039" from archive
8E4A/39CD4BA0
8E4A/19F0D210; shutdown FALSE
0/552859005; next OID: 24427698
MultiXactOffset: 2443321
ID: 3104202601, in database 16408
956718952, limited by database with OID 16408
cleanup 1 init 0
"pg_snapshots": No such file or directory
"0000000900008E4A00000019" from archive
"0000000900008E4A0000001A" from archive
*SNIP*
"0000000900008E4F00000077" from archive
"0000000900008E4F00000078" from archive
"0000000900008E4F00000079" from archive
"0000000900008E4F0000007A" from archive
reached at 8E4F/7A22BD08
accept read only connections
"0000000900008E4F0000007B" from archive
"0000000900008E4F0000007C" from archive
"0000000900008E4F0000007D" from archive
"0000000900008E4F0000007E" from archive
*SNIP*
"0000000900008E53000000D9" from archive
"0000000900008E53000000DA" from archive
"0000000900008E53000000DB" from archive
"0000000900008E53000000DC" from archive
"0000000900008E53000000DD" from archive
unpigz: /mnt/db/wals/0000000900008E53000000DE does not exist -- skipping
in log file 36435, segment 222, offset 0
unpigz: /mnt/db/wals/0000000900008E53000000DE does not exist -- skipping
successfully connected to primary
file=base/16408/22873432 time=2.538 msec
file=base/16408/18967506 time=12.054 msec
file=base/16408/18967506_fsm time=0.095 msec
file=base/16408/22873244 time=0.144 msec
file=base/16408/22892823 time=0.087 msec
9. Slave DB connected to streaming replication with Master DB and all
seems fine.
Any help would be appreciated. Thanks!
So we had to failover and do a full base backup to get our slave database
back online and ran into a interesting scenario. After copying the data
directory, setting up the recovery.conf, and starting the slave database,
the database crashes while replaying xlogs. However, trying to start the
database again, the database is able to replay xlogs farther than it
initially got, but ultimately ended up failing out again. After starting the
DB a third time, PostgreSQL replays even further and catches up to the
master to start streaming replication. Is this common and or acceptable?
base/16408/18967399 does not exist
1663/16408/22892842; iblk 658355, heap 1663/16408/18967399;
invalid pages
1663/16408/22892842; iblk 658355, heap 1663/16408/18967399;
terminated by signal 6: Aborted
base/16408/18967399 does not exist
1663/16408/22892841; iblk 1075350, heap 1663/16408/18967399;
invalid pages
1663/16408/22892841; iblk 1075350, heap 1663/16408/18967399;
terminated by signal 6: Aborted
Fortunately, these errors only pertain to indexes, which can be rebuilt.
But is this a sign that the data directory on the slave is corrupt?
1. Data Directory Copy Finishes.
2. Recovery.conf Setup
recovery at 2013-01-12 00:14:06 UTC
incomplete startup packet
database system is starting up
unpigz: /mnt/db/wals/00000009.history does not exist -- skipping
"0000000900008E45000000B8" from archive
"0000000900008E450000008B" from archive
"pg_snapshots": No such file or directory
reached at 8E45/8B174840
database system is starting up
"0000000900008E450000008C" from archive
"0000000900008E450000008D" from archive
*SNIP*
"0000000900008E4800000066" from archive
"0000000900008E4800000067" from archive
base/16408/18967399 does not exist
1663/16408/22892842; iblk 658355, heap 1663/16408/18967399;
invalid pages
1663/16408/22892842; iblk 658355, heap 1663/16408/18967399;
terminated by signal 6: Aborted
server processes
4. PostgreSQL shuts down...
5. Debugging logs enabled in postgresql.conf.
while in recovery at log time 2013-01-11 18:05:31 UTC
once some data might be corrupted and you might need to choose an earlier
recovery target.
incomplete startup packet
database system is starting up
unpigz: /mnt/db/wals/00000009.history does not exist -- skipping
"0000000900008E45000000B8" from archive
8E45/B80AF650
8E45/8B173180; shutdown FALSE
0/552803703; next OID: 24427698
MultiXactOffset: 2442921
ID: 3104202601, in database 16408
956718952, limited by database with OID 16408
cleanup 1 init 0
"pg_snapshots": No such file or directory
"0000000900008E450000008B" from archive
"0000000900008E450000008C" from archive
*SNIP*
"0000000900008E4800000062" from archive
"0000000900008E4800000063" from archive
"0000000900008E4800000064" from archive
"0000000900008E4800000065" from archive
"0000000900008E4800000066" from archive
"0000000900008E4800000067" from archive
reached at 8E48/67AC4E28
accept read only connections
"0000000900008E4800000068" from archive
"0000000900008E4800000069" from archive
"0000000900008E480000006A" from archive
"0000000900008E480000006B" from archive
"0000000900008E480000006C" from archive
*SNIP*
"0000000900008E4F00000079" from archive
"0000000900008E4F0000007A" from archive
base/16408/18967399 does not exist
1663/16408/22892841; iblk 1075350, heap 1663/16408/18967399;
invalid pages
1663/16408/22892841; iblk 1075350, heap 1663/16408/18967399;
terminated by signal 6: Aborted
server processes
7. PostgreSQL shuts down...
while in recovery at log time 2013-01-11 19:50:31 UTC
once some data might be corrupted and you might need to choose an earlier
recovery target.
incomplete startup packet
database system is starting up
unpigz: /mnt/db/wals/00000009.history does not exist -- skipping
"0000000900008E4A00000039" from archive
8E4A/39CD4BA0
8E4A/19F0D210; shutdown FALSE
0/552859005; next OID: 24427698
MultiXactOffset: 2443321
ID: 3104202601, in database 16408
956718952, limited by database with OID 16408
cleanup 1 init 0
"pg_snapshots": No such file or directory
"0000000900008E4A00000019" from archive
"0000000900008E4A0000001A" from archive
*SNIP*
"0000000900008E4F00000077" from archive
"0000000900008E4F00000078" from archive
"0000000900008E4F00000079" from archive
"0000000900008E4F0000007A" from archive
reached at 8E4F/7A22BD08
accept read only connections
"0000000900008E4F0000007B" from archive
"0000000900008E4F0000007C" from archive
"0000000900008E4F0000007D" from archive
"0000000900008E4F0000007E" from archive
*SNIP*
"0000000900008E53000000D9" from archive
"0000000900008E53000000DA" from archive
"0000000900008E53000000DB" from archive
"0000000900008E53000000DC" from archive
"0000000900008E53000000DD" from archive
unpigz: /mnt/db/wals/0000000900008E53000000DE does not exist -- skipping
in log file 36435, segment 222, offset 0
unpigz: /mnt/db/wals/0000000900008E53000000DE does not exist -- skipping
successfully connected to primary
file=base/16408/22873432 time=2.538 msec
file=base/16408/18967506 time=12.054 msec
file=base/16408/18967506_fsm time=0.095 msec
file=base/16408/22873244 time=0.144 msec
file=base/16408/22892823 time=0.087 msec
9. Slave DB connected to streaming replication with Master DB and all
seems fine.
Any help would be appreciated. Thanks!
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin