slave restarts with kill -9 coming from somewhere, or nowhere

Discussion:

(too old to reply)

Bert

2013-04-02 17:34:02 UTC

Hello,

I'm running the latest postgres version (9.2.3), and today for the first
time I encountered this:

12774 2013-04-02 18:13:10 CEST LOG: server process (PID 28463) was
terminated by signal 9: Killed
12774 2013-04-02 18:13:10 CEST DETAIL: Failed process was running:
BEGIN;declare "SQL_CUR0xff25e80" cursor for select distinct .... as
"Reservation_date___time" , "C_4F_TRANSACTION"."FTRA_PRICE_VAL
12774 2013-04-02 18:13:10 CEST LOG: terminating any other active server
processes
12774 2013-04-02 18:13:12 CEST LOG: all server processes terminated;
reinitializing
29113 2013-04-02 18:13:15 CEST LOG: database system was interrupted while
in recovery at log time 2013-04-02 18:02:21 CEST
29113 2013-04-02 18:13:15 CEST HINT: If this has occurred more than once
some data might be corrupted and you might need to choose an earlier
recovery target.
29113 2013-04-02 18:13:15 CEST LOG: entering standby mode
29113 2013-04-02 18:13:15 CEST LOG: redo starts at 6B0/DD0928A0
29113 2013-04-02 18:13:22 CEST LOG: consistent recovery state reached at
6B0/DE3831E8
12774 2013-04-02 18:13:22 CEST LOG: database system is ready to accept
read only connections
29113 2013-04-02 18:13:22 CEST LOG: invalid record length at 6B0/DE3859B8
29117 2013-04-02 18:13:22 CEST LOG: streaming replication successfully
connected to primary

for as far as I know it happened twice today. I have no idea where these
kills are coming from. I only know thse are not nice :)

Does anyone has an idea what happened exactly?

wkr,
Bert

--
Bert Desmet
0477/305361

Tom Lane

2013-04-02 18:06:08 UTC

Permalink

Post by Bert
I'm running the latest postgres version (9.2.3), and today for the first
12774 2013-04-02 18:13:10 CEST LOG: server process (PID 28463) was
terminated by signal 9: Killed

AFAIK there are only two possible sources of signal 9: a manual kill,
or the Linux kernel's OOM killer. If it's the latter there should be
a concurrent entry in the kernel logfiles about this. If you find one,
suggest reading up on how to disable OOM kills, or at least reconfigure
your system to make them less probable.

regards, tom lane

--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Bert

2013-04-03 06:45:43 UTC

Permalink

Hi Tom,

thanks for the tip! it was indeed the oom killer.

Is it wise to disable the oom killer? Or will the server really go down
withough postgres doing something about it?

currently I already lowered the shared_memory value a bit..

cheers,
Bert

Post by Tom Lane

Post by Bert
I'm running the latest postgres version (9.2.3), and today for the first
12774 2013-04-02 18:13:10 CEST LOG: server process (PID 28463) was
terminated by signal 9: Killed

--
Bert Desmet
0477/305361

Bert

2013-04-03 08:10:56 UTC

Permalink

Hi all,

I have turned vm.overcommit_memory on 1.

It's a pretty much dedicated machine anyway, except for some postgres
maintainance scripts I run in python / bash from the server.

We'll see what it gives.

cheers,
Bert

Post by Bert
Hi Tom,
thanks for the tip! it was indeed the oom killer.
Is it wise to disable the oom killer? Or will the server really go down
withough postgres doing something about it?
currently I already lowered the shared_memory value a bit..
cheers,
Bert

Post by Tom Lane

Post by Bert
I'm running the latest postgres version (9.2.3), and today for the first
12774 2013-04-02 18:13:10 CEST LOG: server process (PID 28463) was
terminated by signal 9: Killed

--
Bert Desmet
0477/305361

Bert

2013-04-04 06:02:04 UTC

Permalink

hi,

this is strange: one connection almost killed the server. So not a
combination of a lot of connections. I saw one connection grewing till over
100GB. Then I cancelled the connection before the oom killer became active
again.

These are my memory settings:
shared_buffers = 20GB
temp_buffers = 1GB
max_prepared_transactions = 10
work_mem = 4GB
maintenance_work_mem = 1GB
max_stack_depth = 8MB
wal_buffers = 32MB
effective_cache_size = 88GB

The server has 128GB ram

How is it possible that one connection (query) uses all the ram? And how
can I avoid it?

ps: the database is a DWH. I don't need a lot of connections. But I want to
process a lot of data fast.

cheers,
Bert

Post by Bert
Hi all,
I have turned vm.overcommit_memory on 1.
It's a pretty much dedicated machine anyway, except for some postgres
maintainance scripts I run in python / bash from the server.
We'll see what it gives.
cheers,
Bert

Post by Tom Lane

Post by Bert
I'm running the latest postgres version (9.2.3), and today for the

first

Post by Bert
12774 2013-04-02 18:13:10 CEST LOG: server process (PID 28463) was
terminated by signal 9: Killed

--
Bert Desmet
0477/305361

Tom Lane

2013-04-04 06:17:58 UTC

Permalink

Post by Bert
work_mem = 4GB
How is it possible that one connection (query) uses all the ram? And how
can I avoid it?

Uh ... don't do the above. work_mem is the allowed memory consumption
per query step, ie per hash or sort operation. A complex query can
easily use multiples of work_mem.

regards, tom lane

--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Bert

2013-04-04 06:39:10 UTC

Permalink

aha, ok. This was a setting pg_tune sugested. But I can understand how that
is a bad idea.

wkr,
Bert

Post by Tom Lane

Post by Bert
work_mem = 4GB
How is it possible that one connection (query) uses all the ram? And how
can I avoid it?

Uh ... don't do the above. work_mem is the allowed memory consumption
per query step, ie per hash or sort operation. A complex query can
easily use multiples of work_mem.
regards, tom lane

--
Bert Desmet
0477/305361