[ADMIN] PG synchronous replication and unresponsive slave

Discussion:

(too old to reply)

Manoj Govindassamy

2012-01-16 18:51:23 UTC

anyone with PG Synchronous Replication knowledge, please help me with
your views on the below questions.

thanks,
Manoj

any help on this is much appreciated.
thanks,
Manoj

Hi,
I have a PG 9.1.2 Master <--> Slave with synchronous replication
setup. They are all working fine as expected. I do have a case where
I want to flip Master to non replication mode whenever its slave is
not responding. I have set replication_timeout to 5s and whenever
salve is not responding for for more than 5s, i see the master
detecting it. But, the transactions on the master is stuck till the
slave comes back. To get over it, I reloaded the config on master
with synchronous_commit = local. Further transactions on the master
are going thru fine with this local commits turned on.
1. Transaction which was stuck right when slave going away never went
thru even after I reloaded master's config with local commit on. I do
see all new transactions on master are going thru fine, except the
one which was stuck initially. How to get this stuck transaction
complete or return with error.
2. Whenever there is a problem with slave, I have to manually reload
master's config with local commit turned on to get master go forward.
Is there any automated way to reload this config with local commit on
on slave's unresponsiveness ? tcp connection timeouts, replication
timeouts all detect the failures, but i want to run some corrective
action on these failure detection.

--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Fujii Masao

2012-01-17 03:44:09 UTC

Permalink

On Tue, Jan 17, 2012 at 3:51 AM, Manoj Govindassamy

1. Transaction which was stuck right when slave going away never went
thru even after I reloaded master's config with local commit on. I do see
all new transactions on master are going thru fine, except the one which was
stuck initially. How to get this stuck transaction complete or return with
error.

Changing synchronous_commit doesn't affect such a transaction. Instead,
empty synchronous_standby_names and reload the configuration file to
resume that transaction.

2. Whenever there is a problem with slave, I have to manually reload
master's config with local commit turned on to get master go forward. Is
there any automated way to reload this config with local commit on on
slave's unresponsiveness ? tcp connection timeouts, replication timeouts all
detect the failures, but i want to run some corrective action on these
failure detection.

PostgreSQL doesn't have such a capability, but pgpool-II might have.
Can you ask that in pgpool-II mailing-list?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Manoj Govindassamy

2012-01-17 21:37:51 UTC

Permalink

Thanks for your views.

(1) Will try out emptying synchronous_standby_names on replica failures
and verify if the transactions proceeds thru.

(2) We are not comfortable moving to PGPool just for automatic failback
mode on hot-standby failure. Any suggestions on how to build this
failback mechanism for master in PG9.1.2 ? We are using C interface for
PG. Any kind of health checking that we can do on the master to detect
the hot-standby problem and let master reload its config with empty
synchronous_standby_names ?

Any help is much appreciated.

thanks,
Manoj

Post by Fujii Masao
On Tue, Jan 17, 2012 at 3:51 AM, Manoj Govindassamy

Changing synchronous_commit doesn't affect such a transaction. Instead,
empty synchronous_standby_names and reload the configuration file to
resume that transaction.

PostgreSQL doesn't have such a capability, but pgpool-II might have.
Can you ask that in pgpool-II mailing-list?
Regards,

--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Fujii Masao

2012-01-18 01:04:47 UTC

Permalink

On Wed, Jan 18, 2012 at 6:37 AM, Manoj Govindassamy

(2) We are not comfortable moving to PGPool just for automatic failback mode
on hot-standby failure.

Hmm.. my reply might be misleading. What I meant was to use pgpool-II
as a clusterware for PostgreSQL built-in replication, not as a replication
itself. You can health-check, do failover if necessary and manage the
PostgreSQL replication by using pgpool-II. AFAIK pgpool-II has such an
operation mode. But you are still not comfortable in using pgpool-II in
that way?

Regards,

Manoj Govindassamy

2012-01-18 01:54:04 UTC

Permalink

I am aware of pgpool-II and its features. Just that my requirements are
little different. I have a System (PG runs on it) which already has
Failover mechanism to another System and I want PG to be part of this
cluster and not clustered on its own. Mean, PG has to be running in
Master system and in synchronous replication mode with another slave
system, but the failover is driven from the higher level and not just on
PG's failure.

So, whenever PG's slave node is unresponsive, we better let the
replication cutoff and run the master system independently. So, we need
better mechanism to detect when Master PG's synchronous replication not
working as expected or when the slave PG is going unresponsive. If not,
master PG is held back by the slave PG and so the whole clustered system
is stuck. Hope, I am making some sense here. Let me know if there are
easy ways to detect Master PG's replication not working (via libpq would
be more preferable).

thanks,
Manoj

Post by Fujii Masao
On Wed, Jan 18, 2012 at 6:37 AM, Manoj Govindassamy

(2) We are not comfortable moving to PGPool just for automatic failback mode
on hot-standby failure.

--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Fujii Masao

2012-01-18 02:12:26 UTC

Permalink

On Wed, Jan 18, 2012 at 10:54 AM, Manoj Govindassamy

Post by Manoj Govindassamy
I am aware of pgpool-II and its features. Just that my requirements are
little different. I have a System (PG runs on it) which already has
Failover mechanism to another System and I want PG to be part of this
cluster and not clustered on its own. Mean, PG has to be running in Master
system and in synchronous replication mode with another slave system, but
the failover is driven from the higher level and not just on PG's failure.
So, whenever PG's slave node is unresponsive, we better let the replication
cutoff and run the master system independently. So, we need better mechanism
to detect when Master PG's synchronous replication not working as expected
or when the slave PG is going unresponsive. If not, master PG is held back
by the slave PG and so the whole clustered system is stuck. Hope, I am
making some sense here. Let me know if there are easy ways to detect Master
PG's replication not working (via libpq would be more preferable).

You can detect that by checking whether information about synchronous
standby is still in pg_stat_replication or not. But I have no good idea about
the way to automatically run some action like reload of the configuration file
on the failure detection. Maybe you need to implement that on your own...

Regards,

Tatsuo Ishii

2012-01-18 02:12:50 UTC

Permalink

Post by Manoj Govindassamy
I am aware of pgpool-II and its features. Just that my requirements
are little different. I have a System (PG runs on it) which already
has Failover mechanism to another System and I want PG to be part of
this cluster and not clustered on its own. Mean, PG has to be running
in Master system and in synchronous replication mode with another
slave system, but the failover is driven from the higher level and not
just on PG's failure.
So, whenever PG's slave node is unresponsive, we better let the
replication cutoff and run the master system independently. So, we
need better mechanism to detect when Master PG's synchronous
replication not working as expected or when the slave PG is going
unresponsive. If not, master PG is held back by the slave PG and so
the whole clustered system is stuck. Hope, I am making some sense
here. Let me know if there are easy ways to detect Master PG's
replication not working (via libpq would be more preferable).

I'm not sure I fully understand your requirement but...

From pgpool-II 3.1, it has a switch not to trigger failover and you
can use it for avoiding automatic failover of master node. For
detecting replication not working case, you can use replication delay
feature of pgpool-II. It monitors replication delay between master and
standby: if the delay is greater than a threshold, it stopps to send
read query to the standby. In case of standby failure (server down
etc.) you can use automatic failover as usual.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Post by Manoj Govindassamy
thanks,
Manoj

Post by Fujii Masao
On Wed, Jan 18, 2012 at 6:37 AM, Manoj Govindassamy

(2) We are not comfortable moving to PGPool just for automatic failback mode
on hot-standby failure.

--
http://www.postgresql.org/mailpref/pgsql-general

--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin