[ADMIN] replication failure with GIN index

Discussion:

(too old to reply)

Rural Hunter

2012-04-06 01:56:25 UTC

I'm trying to set up a standby server. Both the primary and standby
servers are on latest version 9.1.3 on ubunt server 10.10. So far I
tried to init the setup 2 times but both failed after the replication
running for some time. what can I do to fix this? The log on the standby
is shown below:

2012-04-06 02:31:01 CST [@] LOG: restored log file
"0000000200000E3C000000F1" from archive
2012-04-06 02:35:35 CST [@] LOG: restored log file
"0000000200000E3C000000F2" from archive
2012-04-06 02:36:19 CST [@] LOG: restored log file
"0000000200000E3C000000F3" from archive
2012-04-06 02:36:48 CST [@] LOG: restored log file
"0000000200000E3C000000F4" from archive
2012-04-06 02:37:24 CST [@] LOG: restored log file
"0000000200000E3C000000F5" from archive
2012-04-06 02:37:27 CST [@] PANIC: GIN metapage disappeared
2012-04-06 02:37:27 CST [@] CONTEXT: xlog redo Update metapage, node:
37547844/16405/83896882 blkno: 4294967295
2012-04-06 02:37:28 CST [@] LOG: startup process (PID 24912) was
terminated by signal 6: Aborted
2012-04-06 02:37:28 CST [@] LOG: terminating any other active server
processes

--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Simon Riggs

2012-04-06 07:38:38 UTC

Permalink

I'm trying to set up a standby server. Both the primary and standby servers
are on latest version 9.1.3 on ubunt server 10.10. So far I tried to init
the setup 2 times but both failed after the replication running for some
"0000000200000E3C000000F1" from archive
"0000000200000E3C000000F2" from archive
"0000000200000E3C000000F3" from archive
"0000000200000E3C000000F4" from archive
"0000000200000E3C000000F5" from archive
37547844/16405/83896882 blkno: 4294967295
by signal 6: Aborted
processes

The blkno is all wrong, so it looks like a clear bug to me.

Blkno has been set to -1.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Tom Lane

2012-04-06 14:01:09 UTC

Permalink

Known bug, see
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=57b100fe0fb1d0d5803789d3113b89fa18a34fad

Post by Simon Riggs

Post by Rural Hunter
37547844/16405/83896882 blkno: 4294967295

The blkno is all wrong, so it looks like a clear bug to me.

[ looks into that... ] The funny blkno is attributable to this
overly-cute code:

case XLOG_GIN_UPDATE_META_PAGE:
appendStringInfo(buf, "Update metapage, ");
desc_node(buf, ((ginxlogUpdateMeta *) rec)->node, ((ginxlogUpdateMeta *) rec)->metadata.tail);
break;

and we also have

case XLOG_GIN_DELETE_LISTPAGE:
appendStringInfo(buf, "Delete list pages (%d), ", ((ginxlogDeleteListPages *) rec)->ndeleted);
desc_node(buf, ((ginxlogDeleteListPages *) rec)->node, ((ginxlogDeleteListPages *) rec)->metadata.head);
break;

While there could be some point in printing the list head or tail
pointer, it's just confusing to print it with a label of "blkno".
I think we should just print the metapage block number here and
be done with it.

regards, tom lane

--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin