Problems with Streaming Replication and PostgreSQL 9.2.2

Discussion:

(too old to reply)

JotaComm

2013-04-01 18:19:46 UTC

Hello, everybody

I have one problem and I need some help.

My environment: one master and one slave (PostgreSQL 9.2.2).

My cluster has about 160GB and pg_basebackup to syncronize them (master and
slave).

The syntax is:

pg_basebackup -h productionaddress -p productionport -U productionuser -D
datadirectory -P -v

My recovery.conf:

standby_mode = 'on'

primary_conninfo = 'host=productionaddress port=productionport
user=productionuser'

archive_cleanup_command = 'pg_archivecleanup /slave/transactionlogs %r'

My postgresql.conf: (master)

wal_level = hot_standby

checkpoint_segments = 10

archive_mode = on

archive_command = 'rsync -Crap %p ***@slaveaddress
:/slave/transactionlogs/%f'

max_wal_senders = 1

wal_keep_segments = 50

My postgresql.conf: (slave)

checkpoint_segments = 10

hot_standby = on

In my slave (logs) I have the following erros: My first attempt

2013-03-07 15:58:21 BRT [11817]: [1-1] user=,db= LOG: database system was
interrupted; last known up at 2013-03-07 15:55:43 BRT
2013-03-07 15:58:21 BRT [11817]: [2-1] user=,db= LOG: entering standby mode
2013-03-07 15:58:21 BRT [11818]: [1-1] user=,db= LOG: streaming
replication successfully connected to primary
2013-03-07 15:58:25 BRT [11817]: [3-1] user=,db= LOG: consistent recovery
state reached at 141/8FBB5F0
2013-03-07 15:58:25 BRT [11817]: [4-1] user=,db= LOG: redo starts at
141/2251F90
2013-03-07 15:58:25 BRT [11817]: [5-1] user=,db= FATAL: could not access
status of transaction 30622931
2013-03-07 15:58:25 BRT [11817]: [6-1] user=,db= DETAIL: Could not read
from file "pg_clog/001D" at offset 49152: Success.
2013-03-07 15:58:25 BRT [11817]: [7-1] user=,db= CONTEXT: xlog redo
commit: 2013-03-07 15:55:40.673623-03
2013-03-07 15:58:25 BRT [11767]: [1-1] user=,db= LOG: startup process (PID
11817) exited with exit code 1
2013-03-07 15:58:25 BRT [11767]: [2-1] user=,db= LOG: terminating any
other active server processes

In my slave (logs) I have the following erros: My second attempt

2013-03-11 12:07:49 BRT [5862]: [1-1] user=,db= LOG: database system was
interrupted; last known up at 2013-03-11 12:06:31 BRT
2013-03-11 12:07:49 BRT [5862]: [2-1] user=,db= LOG: entering standby mode
2013-03-11 12:07:49 BRT [5864]: [1-1] user=,db= LOG: streaming replication
successfully connected to primary
2013-03-11 12:07:53 BRT [5862]: [3-1] user=,db= LOG: consistent recovery
state reached at 168/816AE10
2013-03-11 12:07:53 BRT [5862]: [4-1] user=,db= LOG: redo starts at
167/FEC3D828
2013-03-11 12:07:53 BRT [5862]: [5-1] user=,db= FATAL: could not access
status of transaction 36529670
2013-03-11 12:07:53 BRT [5862]: [6-1] user=,db= DETAIL: Could not read
from file "pg_clog/0022" at offset 212992: Success.
2013-03-11 12:07:53 BRT [5862]: [7-1] user=,db= CONTEXT: xlog redo commit:
2013-03-11 12:05:35.069759-03
2013-03-11 12:07:53 BRT [5762]: [1-1] user=,db= LOG: startup process (PID
5862) exited with exit code 1
2013-03-11 12:07:53 BRT [5762]: [2-1] user=,db= LOG: terminating any other
active server processes

I had the same problem, but in different files (pg_clog):

First attempt:

2013-03-07 15:58:25 BRT [11817]: [5-1] user=,db= FATAL: could not access
status of transaction 30622931
2013-03-07 15:58:25 BRT [11817]: [6-1] user=,db= DETAIL: Could not read
from file "pg_clog/001D" at offset 49152: Success.

Second attempt:

2013-03-11 12:07:53 BRT [5862]: [5-1] user=,db= FATAL: could not access
status of transaction 36529670
2013-03-11 12:07:53 BRT [5862]: [6-1] user=,db= DETAIL: Could not read
from file "pg_clog/0022" at offset 212992: Success.

When I started the synchronism at the first time (no data), it was OK. Now,
my cluster has about 160GB and when the replication finishes I have these
problems. I noticed that pg_clog/current_file is the problem.

Any idea? Suggestions?

Thank you.

Best Regards

João Paulo

--
JotaComm
http://jotacomm.wordpress.com

Lonni J Friedman

2013-04-01 18:22:07 UTC

Permalink

If you're not running pg_basebackup with the -x option, how are you
getting the WAL files onto the slave?

Post by JotaComm
Hello, everybody
I have one problem and I need some help.
My environment: one master and one slave (PostgreSQL 9.2.2).
My cluster has about 160GB and pg_basebackup to syncronize them (master and
slave).
pg_basebackup -h productionaddress -p productionport -U productionuser -D
datadirectory -P -v
standby_mode = 'on'
primary_conninfo = 'host=productionaddress port=productionport
user=productionuser'
archive_cleanup_command = 'pg_archivecleanup /slave/transactionlogs %r'
My postgresql.conf: (master)
wal_level = hot_standby
checkpoint_segments = 10
archive_mode = on
archive_command = 'rsync -Crap %p
max_wal_senders = 1
wal_keep_segments = 50
My postgresql.conf: (slave)
checkpoint_segments = 10
hot_standby = on
In my slave (logs) I have the following erros: My first attempt
2013-03-07 15:58:21 BRT [11817]: [1-1] user=,db= LOG: database system was
interrupted; last known up at 2013-03-07 15:55:43 BRT
2013-03-07 15:58:21 BRT [11817]: [2-1] user=,db= LOG: entering standby mode
2013-03-07 15:58:21 BRT [11818]: [1-1] user=,db= LOG: streaming replication
successfully connected to primary
2013-03-07 15:58:25 BRT [11817]: [3-1] user=,db= LOG: consistent recovery
state reached at 141/8FBB5F0
2013-03-07 15:58:25 BRT [11817]: [4-1] user=,db= LOG: redo starts at
141/2251F90
2013-03-07 15:58:25 BRT [11817]: [5-1] user=,db= FATAL: could not access
status of transaction 30622931
2013-03-07 15:58:25 BRT [11817]: [6-1] user=,db= DETAIL: Could not read
from file "pg_clog/001D" at offset 49152: Success.
2013-03-07 15:55:40.673623-03
2013-03-07 15:58:25 BRT [11767]: [1-1] user=,db= LOG: startup process (PID
11817) exited with exit code 1
2013-03-07 15:58:25 BRT [11767]: [2-1] user=,db= LOG: terminating any other
active server processes
In my slave (logs) I have the following erros: My second attempt
2013-03-11 12:07:49 BRT [5862]: [1-1] user=,db= LOG: database system was
interrupted; last known up at 2013-03-11 12:06:31 BRT
2013-03-11 12:07:49 BRT [5862]: [2-1] user=,db= LOG: entering standby mode
2013-03-11 12:07:49 BRT [5864]: [1-1] user=,db= LOG: streaming replication
successfully connected to primary
2013-03-11 12:07:53 BRT [5862]: [3-1] user=,db= LOG: consistent recovery
state reached at 168/816AE10
2013-03-11 12:07:53 BRT [5862]: [4-1] user=,db= LOG: redo starts at
167/FEC3D828
2013-03-11 12:07:53 BRT [5862]: [5-1] user=,db= FATAL: could not access
status of transaction 36529670
2013-03-11 12:07:53 BRT [5862]: [6-1] user=,db= DETAIL: Could not read from
file "pg_clog/0022" at offset 212992: Success.
2013-03-11 12:05:35.069759-03
2013-03-11 12:07:53 BRT [5762]: [1-1] user=,db= LOG: startup process (PID
5862) exited with exit code 1
2013-03-11 12:07:53 BRT [5762]: [2-1] user=,db= LOG: terminating any other
active server processes
2013-03-07 15:58:25 BRT [11817]: [5-1] user=,db= FATAL: could not access
status of transaction 30622931
2013-03-07 15:58:25 BRT [11817]: [6-1] user=,db= DETAIL: Could not read
from file "pg_clog/001D" at offset 49152: Success.
2013-03-11 12:07:53 BRT [5862]: [5-1] user=,db= FATAL: could not access
status of transaction 36529670
2013-03-11 12:07:53 BRT [5862]: [6-1] user=,db= DETAIL: Could not read from
file "pg_clog/0022" at offset 212992: Success.
When I started the synchronism at the first time (no data), it was OK. Now,
my cluster has about 160GB and when the replication finishes I have these
problems. I noticed that pg_clog/current_file is the problem.
Any idea? Suggestions?
Thank you.
Best Regards
João Paulo
--
JotaComm
http://jotacomm.wordpress.com

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman ***@gmail.com
LlamaLand https://netllama.linux-sxs.org

--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

JotaComm

2013-04-01 18:30:48 UTC

Permalink

Hello, Lonni

I use rsync to transfer the files (pg_xlog) from master to slave.

rsync -Crap %p ***@slaveaddress:/slavetransactionlogs/%f

In my recovery.conf (slave) I have the following statement:

restore_command = 'test -f $slavetransactionlogs/%f && cp
$slavetransactionlogs/%f %p'

Best Regards

Post by Lonni J Friedman
If you're not running pg_basebackup with the -x option, how are you
getting the WAL files onto the slave?

and

Post by JotaComm
slave).
pg_basebackup -h productionaddress -p productionport -U productionuser -D
datadirectory -P -v
standby_mode = 'on'
primary_conninfo = 'host=productionaddress port=productionport
user=productionuser'
archive_cleanup_command = 'pg_archivecleanup /slave/transactionlogs %r'
My postgresql.conf: (master)
wal_level = hot_standby
checkpoint_segments = 10
archive_mode = on
archive_command = 'rsync -Crap %p
max_wal_senders = 1
wal_keep_segments = 50
My postgresql.conf: (slave)
checkpoint_segments = 10
hot_standby = on
In my slave (logs) I have the following erros: My first attempt
2013-03-07 15:58:21 BRT [11817]: [1-1] user=,db= LOG: database system

was

Post by JotaComm
interrupted; last known up at 2013-03-07 15:55:43 BRT
2013-03-07 15:58:21 BRT [11817]: [2-1] user=,db= LOG: entering standby

mode

Post by JotaComm
2013-03-07 15:58:21 BRT [11818]: [1-1] user=,db= LOG: streaming

replication

Post by JotaComm
successfully connected to primary
2013-03-07 15:58:25 BRT [11817]: [3-1] user=,db= LOG: consistent

recovery

Post by JotaComm
state reached at 141/8FBB5F0
2013-03-07 15:58:25 BRT [11817]: [4-1] user=,db= LOG: redo starts at
141/2251F90
2013-03-07 15:58:25 BRT [11817]: [5-1] user=,db= FATAL: could not access
status of transaction 30622931
2013-03-07 15:58:25 BRT [11817]: [6-1] user=,db= DETAIL: Could not read
from file "pg_clog/001D" at offset 49152: Success.
2013-03-07 15:58:25 BRT [11817]: [7-1] user=,db= CONTEXT: xlog redo
2013-03-07 15:55:40.673623-03
2013-03-07 15:58:25 BRT [11767]: [1-1] user=,db= LOG: startup process

(PID

Post by JotaComm
11817) exited with exit code 1
2013-03-07 15:58:25 BRT [11767]: [2-1] user=,db= LOG: terminating any

other

Post by JotaComm
active server processes
In my slave (logs) I have the following erros: My second attempt
2013-03-11 12:07:49 BRT [5862]: [1-1] user=,db= LOG: database system was
interrupted; last known up at 2013-03-11 12:06:31 BRT
2013-03-11 12:07:49 BRT [5862]: [2-1] user=,db= LOG: entering standby

mode

Post by JotaComm
2013-03-11 12:07:49 BRT [5864]: [1-1] user=,db= LOG: streaming

replication

Post by JotaComm
successfully connected to primary
2013-03-11 12:07:53 BRT [5862]: [3-1] user=,db= LOG: consistent recovery
state reached at 168/816AE10
2013-03-11 12:07:53 BRT [5862]: [4-1] user=,db= LOG: redo starts at
167/FEC3D828
2013-03-11 12:07:53 BRT [5862]: [5-1] user=,db= FATAL: could not access
status of transaction 36529670
2013-03-11 12:07:53 BRT [5862]: [6-1] user=,db= DETAIL: Could not read

from

Post by JotaComm
file "pg_clog/0022" at offset 212992: Success.
2013-03-11 12:07:53 BRT [5862]: [7-1] user=,db= CONTEXT: xlog redo
2013-03-11 12:05:35.069759-03
2013-03-11 12:07:53 BRT [5762]: [1-1] user=,db= LOG: startup process

(PID

Post by JotaComm
5862) exited with exit code 1
2013-03-11 12:07:53 BRT [5762]: [2-1] user=,db= LOG: terminating any

other

Post by JotaComm
active server processes
2013-03-07 15:58:25 BRT [11817]: [5-1] user=,db= FATAL: could not access
status of transaction 30622931
2013-03-07 15:58:25 BRT [11817]: [6-1] user=,db= DETAIL: Could not read
from file "pg_clog/001D" at offset 49152: Success.
2013-03-11 12:07:53 BRT [5862]: [5-1] user=,db= FATAL: could not access
status of transaction 36529670
2013-03-11 12:07:53 BRT [5862]: [6-1] user=,db= DETAIL: Could not read

from

Post by JotaComm
file "pg_clog/0022" at offset 212992: Success.
When I started the synchronism at the first time (no data), it was OK.

Now,

Post by JotaComm
my cluster has about 160GB and when the replication finishes I have these
problems. I noticed that pg_clog/current_file is the problem.
Any idea? Suggestions?
Thank you.
Best Regards
João Paulo
--
JotaComm
http://jotacomm.wordpress.com

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
LlamaLand https://netllama.linux-sxs.org

--
JotaComm
http://jotacomm.wordpress.com

Lonni J Friedman

2013-04-01 18:32:39 UTC

Permalink

I'd be curious whether your problem persisted if you ran pg_basebackup
with the -x option.

Post by JotaComm
Hello, Lonni
I use rsync to transfer the files (pg_xlog) from master to slave.
restore_command = 'test -f $slavetransactionlogs/%f && cp
$slavetransactionlogs/%f %p'
Best Regards

Post by Lonni J Friedman
If you're not running pg_basebackup with the -x option, how are you
getting the WAL files onto the slave?

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
LlamaLand https://netllama.linux-sxs.org

--
JotaComm
http://jotacomm.wordpress.com

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman ***@gmail.com
LlamaLand https://netllama.linux-sxs.org

--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin