Discussion:
corrupted indexes when using base backups generated from hot standby
(too old to reply)
Lonni J Friedman
2013-01-09 18:28:32 UTC
Permalink
Greetings,
I'm running postgres-9.2.2 in a Linux-x86_64 cluster with 1 master and
several hot standby servers. Since upgrading to 9.2.2 from 9.1.x a
few months ago, I switched from generating a base backup on the
master, to generating it on a dedicated slave/standby (to reduce the
load on the master). The command that I've always used to generate
the base backup is:
pg_basebackup -v -D /tmp/bb0 -x -Ft -U postgres

However, I've noticed that whenever I use the base backup generated
from the standby to create a new standby server, many of the indexes
are corrupted. This was never the case when I was generating the
basebackup directly from the master. Now, I see errors similar to the
following when running queries against the tables that own the
indexes:
INDEX "debugger_2013_01_dacode_idx" contains unexpected zero page at block 12
HINT: Please REINDEX it.
INDEX "smoke32on64tests_2013_01_suiteid_idx" contains unexpected zero
page at block 111
HINT: Please REINDEX it.

I've confirmed that the errors/corruption doesn't exist on the server
that is generating the base backup (I can run the same SQL query which
fails on the new standby, successfully). So it seems that I'm
potentially misunderstanding some part of the process. My setup
process is to simply untar the basebackup in the $PGDATA directory,
and copy over all the WAL logs into $PGDATA/pg_xlog.

thanks for any pointers.
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Heikki Linnakangas
2013-01-15 10:57:23 UTC
Permalink
Post by Lonni J Friedman
Greetings,
I'm running postgres-9.2.2 in a Linux-x86_64 cluster with 1 master and
several hot standby servers. Since upgrading to 9.2.2 from 9.1.x a
few months ago, I switched from generating a base backup on the
master, to generating it on a dedicated slave/standby (to reduce the
load on the master). The command that I've always used to generate
pg_basebackup -v -D /tmp/bb0 -x -Ft -U postgres
However, I've noticed that whenever I use the base backup generated
from the standby to create a new standby server, many of the indexes
are corrupted. This was never the case when I was generating the
basebackup directly from the master. Now, I see errors similar to the
following when running queries against the tables that own the
INDEX "debugger_2013_01_dacode_idx" contains unexpected zero page at block 12
HINT: Please REINDEX it.
INDEX "smoke32on64tests_2013_01_suiteid_idx" contains unexpected zero
page at block 111
HINT: Please REINDEX it.
I've confirmed that the errors/corruption doesn't exist on the server
that is generating the base backup (I can run the same SQL query which
fails on the new standby, successfully). So it seems that I'm
potentially misunderstanding some part of the process. My setup
process is to simply untar the basebackup in the $PGDATA directory,
and copy over all the WAL logs into $PGDATA/pg_xlog.
That process sounds correct. Since you're using pg_basebackup -x option,
you don't even need to copy the WAL logs, although it shouldn't do any
harm either . The tar file should contain everything needed to restore
the backup.

Can you provide more information? The log output would be nice. How
large is the database? What kind of activity is there in the master
while the backup is taken?

- Heikki
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Lonni J Friedman
2013-01-25 23:28:06 UTC
Permalink
On Tue, Jan 15, 2013 at 2:57 AM, Heikki Linnakangas
Post by Lonni J Friedman
Greetings,
I'm running postgres-9.2.2 in a Linux-x86_64 cluster with 1 master and
several hot standby servers. Since upgrading to 9.2.2 from 9.1.x a
few months ago, I switched from generating a base backup on the
master, to generating it on a dedicated slave/standby (to reduce the
load on the master). The command that I've always used to generate
pg_basebackup -v -D /tmp/bb0 -x -Ft -U postgres
However, I've noticed that whenever I use the base backup generated
from the standby to create a new standby server, many of the indexes
are corrupted. This was never the case when I was generating the
basebackup directly from the master. Now, I see errors similar to the
following when running queries against the tables that own the
INDEX "debugger_2013_01_dacode_idx" contains unexpected zero page at block 12
HINT: Please REINDEX it.
INDEX "smoke32on64tests_2013_01_suiteid_idx" contains unexpected zero
page at block 111
HINT: Please REINDEX it.
I've confirmed that the errors/corruption doesn't exist on the server
that is generating the base backup (I can run the same SQL query which
fails on the new standby, successfully). So it seems that I'm
potentially misunderstanding some part of the process. My setup
process is to simply untar the basebackup in the $PGDATA directory,
and copy over all the WAL logs into $PGDATA/pg_xlog.
That process sounds correct. Since you're using pg_basebackup -x option, you
don't even need to copy the WAL logs, although it shouldn't do any harm
either . The tar file should contain everything needed to restore the
backup.
Can you provide more information? The log output would be nice. How large is
the database? What kind of activity is there in the master while the backup
is taken?
Sorry for the delayed reply, I was out of the office.

The database is about 530GB uncompressed. The master is quite busy
all the time, with inserts, updates & deletes.

I've attached all the recent errors. I could send you the entire log
if you'd prefer, its about 800KB compressed.
Heikki Linnakangas
2013-01-29 16:32:54 UTC
Permalink
Post by Lonni J Friedman
On Tue, Jan 15, 2013 at 2:57 AM, Heikki Linnakangas
That process sounds correct. Since you're using pg_basebackup -x option, you
don't even need to copy the WAL logs, although it shouldn't do any harm
either . The tar file should contain everything needed to restore the
backup.
Can you provide more information? The log output would be nice. How large is
the database? What kind of activity is there in the master while the backup
is taken?
Sorry for the delayed reply, I was out of the office.
The database is about 530GB uncompressed. The master is quite busy
all the time, with inserts, updates& deletes.
I've attached all the recent errors. I could send you the entire log
if you'd prefer, its about 800KB compressed.
Thanks. I'm afraid I didn't get any wiser from the log output. Since
this is a test system, could you reduce the test case into something
smaller and self-contained?

- Heikki
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Lonni J Friedman
2013-01-29 16:36:12 UTC
Permalink
On Tue, Jan 29, 2013 at 8:32 AM, Heikki Linnakangas
Post by Lonni J Friedman
On Tue, Jan 15, 2013 at 2:57 AM, Heikki Linnakangas
That process sounds correct. Since you're using pg_basebackup -x option, you
don't even need to copy the WAL logs, although it shouldn't do any harm
either . The tar file should contain everything needed to restore the
backup.
Can you provide more information? The log output would be nice. How large is
the database? What kind of activity is there in the master while the backup
is taken?
Sorry for the delayed reply, I was out of the office.
The database is about 530GB uncompressed. The master is quite busy
all the time, with inserts, updates& deletes.
I've attached all the recent errors. I could send you the entire log
if you'd prefer, its about 800KB compressed.
Thanks. I'm afraid I didn't get any wiser from the log output. Since this is
a test system, could you reduce the test case into something smaller and
self-contained?
Sorry, I don't understand what you're requesting. How can I reduce a
test case when all I'm doing to generate the corrupted data is running
pg_basebackup on the standby? Do you mean using a smaller database?
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Heikki Linnakangas
2013-01-29 16:38:18 UTC
Permalink
Post by Lonni J Friedman
On Tue, Jan 29, 2013 at 8:32 AM, Heikki Linnakangas
Thanks. I'm afraid I didn't get any wiser from the log output. Since this is
a test system, could you reduce the test case into something smaller and
self-contained?
Sorry, I don't understand what you're requesting. How can I reduce a
test case when all I'm doing to generate the corrupted data is running
pg_basebackup on the standby? Do you mean using a smaller database?
Yes, smaller database, and a simpler schema with e.g only one table and
index. And only do inserts while the backup is running, for example.

- Heikki
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Lonni J Friedman
2013-01-29 16:40:10 UTC
Permalink
On Tue, Jan 29, 2013 at 8:38 AM, Heikki Linnakangas
Post by Heikki Linnakangas
Post by Lonni J Friedman
On Tue, Jan 29, 2013 at 8:32 AM, Heikki Linnakangas
Thanks. I'm afraid I didn't get any wiser from the log output. Since this is
a test system, could you reduce the test case into something smaller and
self-contained?
Sorry, I don't understand what you're requesting. How can I reduce a
test case when all I'm doing to generate the corrupted data is running
pg_basebackup on the standby? Do you mean using a smaller database?
Yes, smaller database, and a simpler schema with e.g only one table and
index. And only do inserts while the backup is running, for example.
ok, i'll give that a try.
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Loading...