Escaping a blocked sendto() syscall without causing a restart

Tom Lane

2013-01-17 21:38:40 UTC

Post by Jerry Sievers
Does anyone know if one of the signals below can be sent to break out
,of this state *without* the postmaster sensing a crashed backend?
I've seen several times in the past at other companies, backends that
will not respond to cancel nor SIGTERM due to syscall that's blocked
on IO.
Quite often though apparently the backend would notice the broken
socket eventually and receive the signals and exit cleanly.
I've got one that's been wedged like that for a couple days now.
I recall trying several in a similar situation a while ago and of
course one of them interrupted the syscall all right but it was an
abort and we got the customary spontaneous postmaster restart.

Offhand it looks to me like most signals would kick the backend off the
send() call ... but it would loop right back and try again. See
internal_flush() in pqcomm.c. (If you're using SSL, this diagnosis
may or may not apply.)

We can't do anything except repeat the send attempt if the client
connection is to be kept in a sane state. It's possible that if the
interrupt was a SIGTERM (forced exit) we could mark the connection dead
and return early, but it would probably take some thought and
experimentation to get useful behavior that way. And I'm not at all
sure if we could get it to work in SSL mode ...

So the short answer is no, you probably can't kill the session without
causing a restart. Possibly we should add a TODO to make this better.

What you might consider instead, if this is a recurring problem, is
adjusting the postmaster-side TCP keepalive parameters so that dead
connections are noticed more quickly. The default connection timeout
according to the TCP standards is on the order of hours, but you can
reduce that quite a lot if your network environment is at all reliable.

(But it's not clear to me why your stuck-for-a-couple-days case wouldn't
have timed out long since. Are you sure this isn't a client-side
problem, ie client is wedged? If so, why not kill the client instead?)

regards, tom lane

--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin