Autoanalyze of the autovacuum daemon ...

Discussion:

(too old to reply)

Baptiste LHOSTE

2012-10-31 15:05:13 UTC

Hi All,

We are using postgreSQL since 2007 (now we use postgreSQL 8.4) and until recently we used to perform deletion of old data then vacuum and analyze tasks by ourself (every night at 0am CET).
Nevertheless we reached a point where these tasks are taking so much time that why we decide to perform deletion of old data every hour and let the autovacuum daemon manage vacuum and analyze tasks.

But we have some difficulties to understand how work the trigger of the autovacuum daemon for autoanalyze tasks.

Let me explain our database structure. We have two kind of tables :
- first one on which we perform a complete truncate and a copy to fill each (partitioned tables - 288 partitions, one per 5mn re-used every day)
- second one on which we insert some new data every five minutes (avg~200 rows) and delete old data about every 1 hour (avg~1000 rows).
For complete understanding, we need up-to-date stats for the second one because the recurrent deletion might take a long time, (~1mn for less than 1000 deleted rows because planer uses seq scan instead of index scan).

The autovacuum perform autoanalyze tasks on first kind as soon as the process (truncate + copy) is done.

But the autoanalyze is not that effective for second kind.

We tried to reduce autovacuum_analyze_threshold (50 => 10) and autovacuum_analyze_scale_factor (0.1 => 0.005) for the second kind of tables (ALTER TABLE ... SET PARAMETERS ...) without any conclusive effect.

We can not find where is stored the total number of tuples inserted or updated since the last ANALYZE. Could someone give us the answer ?

We take a look at the relfrozenxid but our tables do not have a big value of relfrozenxid (< 10000000).

Most of the time there is no autovacuum analyze query in the pg_stat_activity, althought we set the autovacuum_naptime to 15s to try to start new analyze task more often.

We do not understand why we can't obtain some improvments with previous changes. Did we do something wrong ?

Thank you all for your kind advices,

Regards, Baptiste.

---
Baptiste LHOSTE
***@alaloop.com

ALALOOP S.A.S. - Technopole Izarbel - 64210 Bidart
Téléphone : +33 (0) 5 59 41 51 10
www.alaloop.com

--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Cédric Villemain

2012-11-05 10:32:46 UTC

Permalink

Hi,

Post by Baptiste LHOSTE
second one on which we insert some new data every five minutes (avg~200
rows) and delete old data about every 1 hour (avg~1000 rows). For complete
understanding, we need up-to-date stats for the second one because the
recurrent deletion might take a long time, (~1mn for less than 1000
deleted rows because planer uses seq scan instead of index scan).
The autovacuum perform autoanalyze tasks on first kind as soon as the
process (truncate + copy) is done.
But the autoanalyze is not that effective for second kind.
We tried to reduce autovacuum_analyze_threshold (50 => 10) and
autovacuum_analyze_scale_factor (0.1 => 0.005) for the second kind of
tables (ALTER TABLE ... SET PARAMETERS ...) without any conclusive effect.
We can not find where is stored the total number of tuples inserted or
updated since the last ANALYZE. Could someone give us the answer ?

in pg_stat_user_tables, not since the last time ANALYZE run, but you have the
number of reltuples from pg_class that is used to calculate the ratio.

Post by Baptiste LHOSTE
We take a look at the relfrozenxid but our tables do not have a big value
of relfrozenxid (< 10000000).
Most of the time there is no autovacuum analyze query in the
pg_stat_activity, althought we set the autovacuum_naptime to 15s to try to
start new analyze task more often.
We do not understand why we can't obtain some improvments with previous
changes. Did we do something wrong ?

Everything is relative to the size of the table, what is the content of
pg_class for the second kind of tables ? (relpages/reltuples)

--
CÃ©dric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - DÃ©veloppement, Expertise et Formation

Baptiste LHOSTE

2012-11-05 13:50:53 UTC

Permalink

Hi,

Post by CÃ©dric Villemain
in pg_stat_user_tables, not since the last time ANALYZE run, but you have the
number of reltuples from pg_class that is used to calculate the ratio.

So how does the autovacuum daemon to determine the number of obsolete row since its last work on a specific table ?

Post by CÃ©dric Villemain
Everything is relative to the size of the table, what is the content of
pg_class for the second kind of tables ? (relpages/reltuples)

relname | last_autoanalyse | reltuples | relpages | n_tup_ins + n_tup_upd + n_tup_del
| (from pg_stat_all_tables) | | | (from pg_stat_all_tables)
| | | |
table1 | 2012-10-15 21:12:04.362647+02 | 2.53296e+06 | 50789 | 28542483
table2 | 2012-10-15 18:27:30.181583+02 | 2.57144e+06 | 54833 | 31127737
table3 | 2012-10-15 13:13:09.569758+02 | 2.58646e+06 | 63208 | 27167944
table4 | 2012-10-15 12:05:39.094652+02 | 2.58655e+06 | 63181 | 27160409
table5 | 2012-10-15 12:05:32.347738+02 | 2.58611e+06 | 57272 | 27171661
table6 | 2012-10-15 10:43:40.558078+02 | 2.57925e+06 | 63145 | 27155875
table7 | 2012-10-15 10:40:25.469625+02 | 2.59551e+06 | 57254 | 27161876
table8 | 2012-10-15 10:25:10.776152+02 | 2.56987e+06 | 63147 | 27177907
table9 | 2012-10-15 09:30:17.407181+02 | 2.5923e+06 | 57231 | 27152953
table10 | 2012-10-15 08:29:25.427207+02 | 2.56237e+06 | 63085 | 31770581
table11 | 2012-10-15 08:26:14.550742+02 | 2.56503e+06 | 63065 | 31769609
table12 | 2012-10-15 06:39:01.737914+02 | 2.57833e+06 | 63023 | 31767543
table13 | 2012-10-15 06:35:36.43683+02 | 2.57217e+06 | 63047 | 31759954
table14 | 2012-10-15 05:08:02.858623+02 | 2.59068e+06 | 57119 | 27154555
table15 | 2012-10-15 04:40:48.695142+02 | 2.57449e+06 | 63002 | 31768034
table16 | 2012-10-10 07:51:49.625226+02 | 2.48584e+06 | 54014 | 22110223
table17 | 2012-10-09 10:03:36.649331+02 | 2.58047e+06 | 63742 | 23423257
table18 | 2012-10-09 10:02:13.549362+02 | 2.57854e+06 | 63585 | 23364035
table19 | 2012-10-09 09:18:55.447895+02 | 2.58851e+06 | 63516 | 23375180

Regards, Baptiste Lhoste

--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Cédric Villemain

2012-11-05 15:12:42 UTC

Permalink

Post by Baptiste LHOSTE
relname | last_autoanalyse | reltuples | relpages |
n_tup_ins + n_tup_upd + n_tup_del
| (from pg_stat_all_tables) | | | (from
| pg_stat_all_tables)
table1 | 2012-10-15 21:12:04.362647+02 | 2.53296e+06 | 50789 | 28542483
table2 | 2012-10-15 18:27:30.181583+02 | 2.57144e+06 | 54833 | 31127737
table3 | 2012-10-15 13:13:09.569758+02 | 2.58646e+06 | 63208 | 27167944
table4 | 2012-10-15 12:05:39.094652+02 | 2.58655e+06 | 63181 | 27160409
table5 | 2012-10-15 12:05:32.347738+02 | 2.58611e+06 | 57272 | 27171661
table6 | 2012-10-15 10:43:40.558078+02 | 2.57925e+06 | 63145 | 27155875
table7 | 2012-10-15 10:40:25.469625+02 | 2.59551e+06 | 57254 | 27161876
table8 | 2012-10-15 10:25:10.776152+02 | 2.56987e+06 | 63147 | 27177907
table9 | 2012-10-15 09:30:17.407181+02 | 2.5923e+06 | 57231 | 27152953
table10 | 2012-10-15 08:29:25.427207+02 | 2.56237e+06 | 63085 | 31770581
table11 | 2012-10-15 08:26:14.550742+02 | 2.56503e+06 | 63065 | 31769609
table12 | 2012-10-15 06:39:01.737914+02 | 2.57833e+06 | 63023 | 31767543
table13 | 2012-10-15 06:35:36.43683+02 | 2.57217e+06 | 63047 | 31759954
table14 | 2012-10-15 05:08:02.858623+02 | 2.59068e+06 | 57119 | 27154555
table15 | 2012-10-15 04:40:48.695142+02 | 2.57449e+06 | 63002 | 31768034
table16 | 2012-10-10 07:51:49.625226+02 | 2.48584e+06 | 54014 | 22110223
table17 | 2012-10-09 10:03:36.649331+02 | 2.58047e+06 | 63742 | 23423257
table18 | 2012-10-09 10:02:13.549362+02 | 2.57854e+06 | 63585 | 23364035
table19 | 2012-10-09 09:18:55.447895+02 | 2.58851e+06 | 63516 | 23375180

So that an auto-ANALYZE should run each 2.5-3 hours.
If it does not proceed, check that you do not have some process still doing
manual maintenance or DDL on those tables, they might lock the table and kill
autovacuum
(check the pg_stat_activity, and set log_autovacuum_min_duration=0 to track
all job done by autovacuum)

--
CÃ©dric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - DÃ©veloppement, Expertise et Formation

Baptiste LHOSTE

2012-11-08 08:54:19 UTC

Permalink

Hi,

Post by CÃ©dric Villemain
So that an auto-ANALYZE should run each 2.5-3 hours.
If it does not proceed, check that you do not have some process still doing manual maintenance or DDL on those tables, they might
lock the table and kill autovacuum
(check the pg_stat_activity, and set log_autovacuum_min_duration=0 to track all job done by autovacuum)

Sorry for my late answer, but since two days I noticed some improvements so I decided to wait a little bit to confirm these.

Finally the autovacuum daemon seems to work correctly. It does not proceed an auto-ANALYZE each 3 hours, it is more about each 12 hours but it is much better than at the beginning.

The fact is the autovacuum daemon took about 10 days to become effective on both kind of tables, maybe it needs this period to catch up and be effective.

Thanks for your time, Regards, Baptiste.

--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Baptiste LHOSTE

2012-11-09 10:13:45 UTC

Permalink

Hi,

Today I consulted the log of my PostgreSQL server and I saw that autovacuum tasks took to much time to do their work. I thought that ANALYZE was a fast operation ?

2012-11-09 10:52:33 CET LOG: automatic analyze of table "flows.public.agg_t20_outgoing_a41_src_net_and_dst_net_f5" system usage: CPU 0.20s/0.63u sec elapsed 137.12 sec
2012-11-09 10:52:58 CET LOG: automatic vacuum of table "flows.public.agg_t20_outgoing_a51_src_port_and_proto_f5": index scans: 0
pages: 0 removed, 36044 remain
tuples: 0 removed, 3495897 remain
system usage: CPU 0.53s/0.62u sec elapsed 307.28 sec
2012-11-09 10:53:33 CET LOG: automatic analyze of table "flows.public.agg_t387_incoming_a44_dst_port_and_proto_f5" system usage: CPU 0.24s/0.46u sec elapsed 174.73 sec
2012-11-09 10:55:50 CET LOG: automatic analyze of table "flows.public.agg_t20_outgoing_a51_src_port_and_proto_f5" system usage: CPU 0.20s/0.48u sec elapsed 152.16 sec
2012-11-09 10:55:55 CET LOG: automatic vacuum of table "flows.public.agg_t387_incoming_a41_src_net_and_dst_net_f5": index scans: 0
pages: 0 removed, 29849 remain
tuples: 0 removed, 1349327 remain
system usage: CPU 0.35s/0.28u sec elapsed 141.52 sec
2012-11-09 10:58:18 CET LOG: automatic vacuum of table "flows.public.agg_t205_incoming_a41_src_net_and_dst_net_f5": index scans: 0
pages: 0 removed, 34435 remain
tuples: 0 removed, 3339035 remain
system usage: CPU 0.66s/0.55u sec elapsed 340.06 sec
2012-11-09 10:59:32 CET LOG: automatic analyze of table "flows.public.agg_t387_incoming_a41_src_net_and_dst_net_f5" system usage: CPU 0.26s/0.54u sec elapsed 216.95 sec
2012-11-09 10:59:45 CET LOG: automatic vacuum of table "flows.public.agg_t387_incoming_a44_dst_port_and_proto_f5": index scans: 0
pages: 0 removed, 28733 remain
tuples: 0 removed, 1327333 remain
system usage: CPU 0.24s/0.23u sec elapsed 234.73 sec
2012-11-09 11:01:24 CET LOG: automatic analyze of table "flows.public.agg_t205_incoming_a41_src_net_and_dst_net_f5" system usage: CPU 0.22s/0.55u sec elapsed 186.27 sec
2012-11-09 11:01:50 CET LOG: automatic vacuum of table "flows.public.agg_t387_incoming_a41_src_net_and_dst_net_f5": index scans: 0
pages: 0 removed, 29849 remain
tuples: 0 removed, 1349527 remain
system usage: CPU 0.24s/0.23u sec elapsed 124.28 sec
2012-11-09 11:02:04 CET LOG: automatic vacuum of table "flows.public.agg_t205_incoming_a40_dst_net_f5": index scans: 0
pages: 0 removed, 19817 remain
tuples: 0 removed, 2008020 remain
system usage: CPU 0.33s/0.40u sec elapsed 151.76 sec
2012-11-09 11:03:22 CET LOG: automatic analyze of table "flows.public.agg_t205_incoming_a40_dst_net_f5" system usage: CPU 0.14s/0.37u sec elapsed 78.40 sec
2012-11-09 11:04:11 CET LOG: automatic vacuum of table "flows.public.agg_t387_incoming_a40_dst_net_f5": index scans: 0
pages: 0 removed, 26294 remain
tuples: 0 removed, 1316071 remain
system usage: CPU 0.25s/0.28u sec elapsed 140.99 sec
2012-11-09 11:05:08 CET LOG: automatic vacuum of table "flows.public.agg_t20_outgoing_a39_src_net_f5": index scans: 0
pages: 0 removed, 26821 remain
tuples: 0 removed, 2869526 remain
system usage: CPU 0.46s/0.42u sec elapsed 223.09 sec
2012-11-09 11:05:57 CET LOG: automatic analyze of table "flows.public.agg_t387_incoming_a40_dst_net_f5" system usage: CPU 0.28s/0.43u sec elapsed 105.64 sec
2012-11-09 11:06:39 CET LOG: automatic analyze of table "flows.public.agg_t20_outgoing_a39_src_net_f5" system usage: CPU 0.27s/0.44u sec elapsed 90.82 sec

Regards, Baptiste.

--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Kevin Grittner

2012-11-09 14:31:39 UTC

Permalink

Post by Baptiste LHOSTE
Today I consulted the log of my PostgreSQL server and I saw that
autovacuum tasks took to much time to do their work. I thought that
ANALYZE was a fast operation ?

That depends on configuration settings and on whether the computer
(or VM) is so swamped that the autovacuum task is starved for cycles.
Also on any overrides of statistics targets for those tables.

Please show us the output from running this query:

http://wiki.postgresql.org/wiki/Server_Configuration

Have you overridden any statistics targets?

A description of the environment would also be good. Hardware? Load?

-Kevin

--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Baptiste LHOSTE

2012-11-09 15:35:23 UTC

Permalink

Hi,

Post by Kevin Grittner
That depends on configuration settings and on whether the computer
(or VM) is so swamped that the autovacuum task is starved for cycles.
Also on any overrides of statistics targets for those tables.
http://wiki.postgresql.org/wiki/Server_Configuration

Post by Kevin Grittner
Have you overridden any statistics targets?

We have put some specific values to autovacuum parameters for tables which take so much time to be analyzed. To do that we used following kind of query :

ALTER TABLE tablename SET (autovacuum_vacuum_threshold=20, autovacuum_vacuum_scale_factor=0.01, autovacuum_analyze_threshold=10, autovacuum_analyze_scale_factor=0.005, autovacuum_vacuum_cost_delay=10, autovacuum_vacuum_cost_limit=1000);

These tables have two timestamp columns and a btree index on both timestamp column.
Will it be more efficient for us to configure the autovacuum daemon analyze task only on those columns ? If yes, how can we do that ?

Post by Kevin Grittner
A description of the environment would also be good. Hardware? Load?

Debian 6 64bits
8 Intel(R) Xeon(R) CPU E5506 @ 2.13GHz
16 Go RAM - 4 Go Swap
4 * 300 Go Raid 0

Regards, Baptiste.

--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Kevin Grittner

2012-11-09 15:59:57 UTC

Permalink

Post by Kevin Grittner
http://wiki.postgresql.org/wiki/Server_Configuration

[very reasonable settings except for a very large work_mem]

Make sure that work_mem setting isn't driving you into swapping or
near-zero caching. A shortage of cache space could explain this
because it looks like about 8.5 ms for each page read. About the only
other thing I can think to recommend is to decrease
autovacuum_cost_delay to 10ms and see if that helps.

These tables have two timestamp columns and a btree index on both
timestamp column. Will it be more efficient for us to configure the
autovacuum daemon analyze task only on those columns ?

No.

4 * 300 Go Raid 0

You do realize that if any of those four drives fail you will need to
use your backup, right?

-Kevin

--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Baptiste LHOSTE

2012-11-09 16:37:01 UTC

Permalink

Post by Kevin Grittner
http://wiki.postgresql.org/wiki/Server_Configuration

[very reasonable settings except for a very large work_mem]
Make sure that work_mem setting isn't driving you into swapping or
near-zero caching. A shortage of cache space could explain this
because it looks like about 8.5 ms for each page read. About the only
other thing I can think to recommend is to decrease
autovacuum_cost_delay to 10ms and see if that helps.

When I run a command htop on my server I can not see any swap issue :

Mem 4Go / 16Go
Swp 175B / 4Go

I am sorry but I do not understand we you suggest to decrease autovacuum_cost_delay to 10 ms.
We already set autovacuum_vacuum_cost_delay to 10ms for those tables.

Post by Kevin Grittner
4 * 300 Go Raid 0

You do realize that if any of those four drives fail you will need to
use your backup, right?

Yes, we do. We already used our backup when we changed some old servers.

Regards, Baptiste.

--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin