[ADMIN] background writer being lazy?

Discussion:

(too old to reply)

Brian Fehrle

2011-11-01 19:14:59 UTC

version | PostgreSQL 8.4.1 on x86_64-unknown-linux-gnu, compiled
by GCC gcc (SUSE Linux) 4.3.2 [gcc-4_3-branch revision 141291], 64-bit
bgwriter_delay | 50ms
bgwriter_lru_maxpages | 500
bgwriter_lru_multiplier | 4
checkpoint_segments | 100
checkpoint_timeout | 40min
checkpoint_warning | 30min
checkpoint_completion_target | 0.5
effective_cache_size | 16GB
effective_io_concurrency | 4
extra_float_digits | 3
max_connections | 2000
max_stack_depth | 7MB
shared_buffers | 16GB
synchronous_commit | off
temp_buffers | 8192
TimeZone | US/Pacific
wal_buffers | 8MB
work_mem | 64MB

Since our checkpoint_completion_target is 0.5, we complete a checkpoint
around 20 mins after it started, and checkpoints occur like clockwork
every 40 minutes, no extra's are forced.

Performance overall is fairly good, but of course we're trying to
squeeze as much out of it as we can. One main thing is trying to lower
'spikey' disk IO so that performance is more consistent at any given time.

- Brian F

What problem are you trying to solve?

We're trying to offload dirty buffer writes from checkpoints and
from backends (not the bgwriter) to the background writer, I
believe with the hope that it's the same amount of disk IO, but
spread out evenly rather than just when a checkpoint is occurring.

What version of PostgreSQL are you using? Recent versions spread
out the checkpoint activity using the same process which does the
background writing, so there is no benefit from moving writes from
its background writing phase to its distributed checkpoint phase.
Depending on your setting of checkpoint_completion_target you are
probably spending as much or more time spreading the checkpoint as
doing background writing between checkpoints. Each of the last few
major releases has made this much better, so if you're spending time
tweaking something prior to 9.1, you'd probably be better served
putting that time into upgrading.

Yes. Writing dirty buffers when there are enough buffers
available to service requests would tend to increase overall disk
writes and degrade performance. You don't have a problem unless
you have a high percentage of writes from normal backends which
need to flush a buffer in order to get one.

This seems to be the case, as buffers_backend is between
checkpoint_buffers and clean_buffers from pg_stat_bgwriter.
checkpoint buffers: 622,519
clean_buffers: 65,879
clean_max_written: 56
backend_buffers: 460,471
Am I reading these right in wanting to reduce backend_buffers and
checkpoint_buffers?

Hmm. That is a higher percentage of backend writes than I would
like to see. What is your shared_memory setting? Actually, please
http://wiki.postgresql.org/wiki/Server_Configuration
How is your performance?
-Kevin

Kevin Grittner

2011-11-01 19:47:44 UTC

Permalink

Post by Brian Fehrle
PostgreSQL 8.4.1 on x86_64-unknown-linux-gnu

Please upgrade to the latest bug fix release of PostgreSQL:

http://www.postgresql.org/support/versioning

To see what bug and security fixes you're missing, look at release
notes for 8.4.2 to 8.4.9 here:

http://www.postgresql.org/docs/8.4/static/release.html

There have been improvements in your areas of concern in 9.0 and
9.1, so you might want to start planning a major release upgrade.
That's not as painful as it used to be, with pg_upgrade.

Post by Brian Fehrle
bgwriter_lru_maxpages | 500

FWIW, we do set this to 1000.

Post by Brian Fehrle
max_connections | 2000

This is probably your biggest problem. Unless you've got 1000 CPUs
on this box, you should use a connection pooler which is
transaction-oriented, limits the number of database connections, and
queues requests for a new transaction when all connections are in
use. This will almost certainly improve throughput and limit
latency problems. You do not need 2000 connections to support 2000
cnocurrent users; such a setting will make it harder to provide 2000
concurrent users with decent and consistent performance.

Post by Brian Fehrle
effective_cache_size | 16GB

Given your other settings, this seems likely to be low. I normally
add the cache space reported by the OS to the shared_buffers
setting.

Post by Brian Fehrle
shared_buffers | 16GB

This is probably at least twice what it should be. If you are
having problems with backends writing too many buffers and problems
with clusters of I/O congestion, you might want to drop it to the
0.5 to 2.0 GB range.

Post by Brian Fehrle
wal_buffers | 8MB

Might as well go to 16MB.

Post by Brian Fehrle
work_mem | 64MB

Each of your connections can allocate this much space, potentially
several times, at the same moment. Unless you really have a monster
machine, 64MB * 2000 connections is just asking for out of memory
failures at unpredictable peak load times.

Post by Brian Fehrle
One main thing is trying to lower 'spikey' disk IO so that
performance is more consistent at any given time.

The advice above should help with that.

-Kevin

--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Brian Fehrle

2011-11-01 18:19:13 UTC

Permalink

So my main question is, where is the issue?

That's my question, too. What problem are you trying to solve?

We're trying to offload dirty buffer writes from checkpoints and from
backends (not the bgwriter) to the background writer, I believe with the
hope that it's the same amount of disk IO, but spread out evenly rather
than just when a checkpoint is occurring.

It doesn't seem (to me) that the background writer is having a
hard time keeping up, because there are simply tons of times where
it's doing nothing. So is it just not determining that it needs to
do anything because there are already enough 'clean buffers' ready
for use at any given time?

Yes. Writing dirty buffers when there are enough buffers available
to service requests would tend to increase overall disk writes and
degrade performance. You don't have a problem unless you have a
high percentage of writes from normal backends which need to flush a
buffer in order to get one.

This seems to be the case, as buffers_backend is between
checkpoint_buffers and clean_buffers from pg_stat_bgwriter.
For example, on 2011-10-19:
checkpoint buffers: 622,519
clean_buffers: 65,879
clean_max_written: 56
backend_buffers: 460,471

Am I reading these right in wanting to reduce backend_buffers and
checkpoint_buffers?

- Brian F

Would increasing bgwriter_lru_multiplier to a higher value help
get more to be written by the bgwriter, and if so are there any
negative side effects I would need to consider for this?

Yes. If a buffer is written to again by PostgreSQL after it hits
disk because your background writer was overly aggressive, you will
hurt your overall throughput. Now, sometimes that's worth doing to
control latency, but you haven't described any such problem.
-Kevin

--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Greg Smith

2011-11-12 18:51:05 UTC

Permalink

The main thing I am currently seeing is that there are 300X or more
buffers written by checkpoints rather than background writer.

Writing buffers at checkpoint time is more efficient than having the
background writer handle them. I think your efforts to space
checkpoints out may have backfired a bit on you. You're letting 40
minutes of dirty buffers accumulate before they're written out. Putting
checkpoint_timeout closer to its default of 5 minutes again may reduce
the spikes you're seeing.

The changes you've made to the background writer configuration are also
counterproductive, given that it's not really going to trigger anyway.
I would only recommend decreasing bgwriter_delay or increasing
bgwriter_lru_maxpages or you see the total_clean_max_written value get
incremented regularly. If that's not happening, making the background
writer run more often and try to do more work just adds overhead.

Also: you've set shared_buffers to 16GB. That's beyond where most
people find increases to stop being useful. I'd wager you'll get less
spiky performance just by lowering that a lot. The 256MB to 1GB range
is where I normally end up on servers where lower latency is prioritized
instead of maximum throughput.

--
Greg Smith 2ndQuadrant US ***@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us