[ADMIN] db size growing out of control when using clustered Jackrabbit

Discussion:

(too old to reply)

Gary Webster

2012-07-23 21:13:59 UTC

Hello. I'm hoping someone has seen this before.

We are trying to use Postgres Plus v9.1.3 as the Persistence Manager in
Jackrabbit (Apache JCR) clustering (
http://wiki.apache.org/jackrabbit/Clustering).
Whenever the JCR is under load, the ws_bundle TOAST table in the
repository schema, grows out of control !

Some of my team members maintain that this problem doesn't occur with
MySQL, but I would rather stay with Postgres if possible...

Thanks.

Joshua D. Drake

2012-07-23 21:40:11 UTC

Permalink

Post by Gary Webster
Hello. I'm hoping someone has seen this before.
We are trying to use Postgres Plus v9.1.3 as the Persistence Manager in
Jackrabbit (Apache JCR) clustering
(http://wiki.apache.org/jackrabbit/Clustering).
Whenever the JCR is under load, the ws_bundle TOAST table in the
repository schema, grows out of control !
Some of my team members maintain that this problem doesn't occur with
MySQL, but I would rather stay with Postgres if possible...

I don't really know anything about jackrabbit but generally this problem
presents when you have a lot of transactions that are idle. Meaning, you
have transactions that are just open, doing nothing. This will present a
problem with routine maintenance.

Under load you can check your process list to see if you have long
running transactions that are idle ( idle in transaction ). If you do,
you have a code problem not a postgres problem and it is presenting
itself through bloat.

Note: IDLE is fine. It is specifically IDLE IN TRANSACTION that is a
problem.

Sincerely,

Joshua D. Drake

Post by Gary Webster
Thanks.

--
Command Prompt, Inc. - http://www.commandprompt.com/
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC
@cmdpromptinc - 509-416-6579

--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Gary Webster

2012-07-24 15:58:53 UTC

Permalink

Hello.
Thanks for the response.

There are several 'idle in transaction' on this server/app, but to a
different db/schema.
The "repository" (JCR) schema has only a few 'idle', none 'in transaction' .

By "routine maintenance", do you mean autovacuum, or something else?
Autovacuum does appear to usually get 'auto-canceled' by a lock. However,
even when it runs successfully, it doesn't seem to help with this ws_bundle
Toast table size.
I am rather looking for a root cause here. Surely this table is not
supposed to grow so much (100s of GB). It is even bigger than the data
store!

I agree that this may not be an 'error' in Postgres, but somehow it is not
playing well with Jackrabbit clustering.

Post by Joshua D. Drake

Post by Gary Webster
Hello. I'm hoping someone has seen this before.
We are trying to use Postgres Plus v9.1.3 as the Persistence Manager in
Jackrabbit (Apache JCR) clustering
(http://wiki.apache.org/**jackrabbit/Clustering<http://wiki.apache.org/jackrabbit/Clustering>
).
Whenever the JCR is under load, the ws_bundle TOAST table in the
repository schema, grows out of control !
Some of my team members maintain that this problem doesn't occur with
MySQL, but I would rather stay with Postgres if possible...

I don't really know anything about jackrabbit but generally this problem
presents when you have a lot of transactions that are idle. Meaning, you
have transactions that are just open, doing nothing. This will present a
problem with routine maintenance.
Under load you can check your process list to see if you have long running
transactions that are idle ( idle in transaction ). If you do, you have a
code problem not a postgres problem and it is presenting itself through
bloat.
Note: IDLE is fine. It is specifically IDLE IN TRANSACTION that is a
problem.
Sincerely,
Joshua D. Drake

Post by Gary Webster
Thanks.

Tom Lane

2012-07-24 17:08:35 UTC

Permalink

Post by Gary Webster
By "routine maintenance", do you mean autovacuum, or something else?
Autovacuum does appear to usually get 'auto-canceled' by a lock.

That's bad and you should look into the reason why it happens. Ordinary
DML (CRUD) operations should not kick autovac off a table. If it's
happening, it's probably because something is fooling with the table's
schema, which doesn't seem like something you want to have happening
during routine operations; especially not on tables that are large
enough for this to be an issue in the first place. Or it might be
something doing a LOCK TABLE as a substitute for more fine-grained
locking; which again is bad for performance reasons that have nothing
to do with hobbling autovacuum.

regards, tom lane

--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Joshua D. Drake

2012-07-24 17:41:41 UTC

Permalink

Post by Gary Webster
Hello.
Thanks for the response.
There are several 'idle in transaction' on this server/app, but to a
different db/schema.

This is a cluster issue, not a database issue. So if you have an idnle
in transaction, then it is affecting your JCR schema as well.

Post by Gary Webster
The "repository" (JCR) schema has only a few 'idle', none 'in transaction' .
By "routine maintenance", do you mean autovacuum, or something else?

I mean autovacuum.

Post by Gary Webster
Autovacuum does appear to usually get 'auto-canceled' by a lock.

That is a problem too.

Post by Gary Webster
However, even when it runs successfully, it doesn't seem to help with
this ws_bundle Toast table size.

It won't if you have the above idle in transactions, regardless of database.

Sincerely,

jD
--
Command Prompt, Inc. - http://www.commandprompt.com/
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC
@cmdpromptinc - 509-416-6579

--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Gary Webster

2012-07-25 18:37:18 UTC

Permalink

Post by Gary Webster
Hello.
Thanks for the response.
There are several 'idle in transaction' on this server/app, but to a
different db/schema.

This is a cluster issue, not a database issue. So if you have an idnle in
transaction, then it is affecting your JCR schema as well.

OK, how do I track/debug/stop the "idle in transaction"s ?

The "repository" (JCR) schema has only a few 'idle', none 'in

Post by Gary Webster
transaction' .
By "routine maintenance", do you mean autovacuum, or something else?

I mean autovacuum.
I was hoping to find more of a 'root cause' (eg. jackrabbit config) for

this issue.
I can't believe that this table is supposed to be getting so big, to even
require much vacuuming.

Autovacuum does appear to usually get 'auto-canceled' by a lock.
That is a problem too.
OK, I am trying to find out why this is happening.
However, even when it runs successfully, it doesn't seem to help with

Post by Gary Webster
this ws_bundle Toast table size.

It won't if you have the above idle in transactions, regardless of database.
Sincerely,
jD
--
Command Prompt, Inc. - http://www.commandprompt.com/
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC
@cmdpromptinc - 509-416-6579

Joshua D. Drake

2012-07-25 18:44:44 UTC

Permalink

Post by Joshua D. Drake
This is a cluster issue, not a database issue. So if you have an
idnle in transaction, then it is affecting your JCR schema as well.
OK, how do I track/debug/stop the "idle in transaction"s ?

Well idle in transaction is ALWAYS a code issue. You have code that is
executing that is starting a transaction, leaving the connection open
while not closing (committing/rollingback) the transaction.

You could turn on query logging and make sure pid and timestamp is in
the log_line_prefix. They you can see what pids are idle in transaction
and trace to what the last query was.
o you mean autovacuum, or something else?

Post by Joshua D. Drake
I mean autovacuum.
I was hoping to find more of a 'root cause' (eg. jackrabbit config) for
this issue.
I can't believe that this table is supposed to be getting so big, to
even require much vacuuming.

Any update/delete to that table is going to cause bloat, autovacuum
cleans that up. If it can. If it can't, it will just continously grow.

Sincerely,

jD
--
Command Prompt, Inc. - http://www.commandprompt.com/
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC
@cmdpromptinc - 509-416-6579

--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Gary Webster

2012-07-26 19:31:48 UTC

Permalink

Post by Joshua D. Drake
This is a cluster issue, not a database issue. So if you have an

Post by Joshua D. Drake
idnle in transaction, then it is affecting your JCR schema as well.
OK, how do I track/debug/stop the "idle in transaction"s ?

Well idle in transaction is ALWAYS a code issue. You have code that is
executing that is starting a transaction, leaving the connection open while
not closing (committing/rollingback) the transaction.
You could turn on query logging and make sure pid and timestamp is in the
log_line_prefix. They you can see what pids are idle in transaction and
trace to what the last query was.
OK, I set "log_statement = "all""

The log grew to 1GB in ~minute! It is dominated by this one statement,
which occurs every ~1.4 sec:
"update WS_BUNDLE set BUNDLE_DATA = $1 where NODE_ID_HI = $2 and NODE_ID_LO
= $3"
parameter $1 is hex, over 6million characters long !! Surely this is the
root of my problem.

o you mean autovacuum, or something else?

Post by Joshua D. Drake

Any update/delete to that table is going to cause bloat, autovacuum cleans
that up. If it can. If it can't, it will just continously grow.
Sincerely,
jD
--
Command Prompt, Inc. - http://www.commandprompt.com/
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC
@cmdpromptinc - 509-416-6579

Scott Marlowe

2012-07-26 20:05:19 UTC

Permalink

Post by Joshua D. Drake
OK, I set "log_statement = "all""
The log grew to 1GB in ~minute! It is dominated by this one statement,
"update WS_BUNDLE set BUNDLE_DATA = $1 where NODE_ID_HI = $2 and NODE_ID_LO
= $3"
parameter $1 is hex, over 6million characters long !! Surely this is the
root of my problem.

That's crazy big. Is that a normal payload size for this app?

--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Joshua D. Drake

2012-07-27 18:11:34 UTC

Permalink

Post by Joshua D. Drake
OK, I set "log_statement = "all""
The log grew to 1GB in ~minute! It is dominated by this one statement,
"update WS_BUNDLE set BUNDLE_DATA = $1 where NODE_ID_HI = $2 and
NODE_ID_LO = $3"
parameter $1 is hex, over 6million characters long !! Surely this is
the root of my problem.

It definitely is. Every time you update, you are creating a dead tuple
(unless it is HOT capable). If autovacuum can't come in behind and clean
that up due to idle in transaction, you are going to have serious problems.

Sincerely,

jD
--
Command Prompt, Inc. - http://www.commandprompt.com/
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC
@cmdpromptinc - 509-416-6579

--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Craig Ringer

2012-07-24 00:35:19 UTC

Permalink

Post by Gary Webster
Hello. I'm hoping someone has seen this before.
We are trying to use Postgres Plus v9.1.3 as the Persistence Manager
in Jackrabbit (Apache JCR) clustering
(http://wiki.apache.org/jackrabbit/Clustering).
Whenever the JCR is under load, the ws_bundle TOAST table in the
repository schema, grows out of control !

Has somebody disabled autovacuum or set it to barely run at all?

Try setting autovacuum to very aggressively vacuum the problem table(s).

--
Craig Ringer

Gary Webster

2012-07-24 15:34:57 UTC

Permalink

Hello.
Thanks for the response.

Autovacuum is set VERY aggressive.
However, it does not help with the ws_bundle Toast table.

A manual _full_ vacuum (not recommended?) does do the deed.
However, it often gives this error:
ERROR: missing chunk number 0 for toast value 639113 in pg_toast_533386

BTW, how bad is that? Corruption?

Post by Gary Webster
Hello. I'm hoping someone has seen this before.
We are trying to use Postgres Plus v9.1.3 as the Persistence Manager in
Jackrabbit (Apache JCR) clustering (
http://wiki.apache.org/jackrabbit/Clustering).
Whenever the JCR is under load, the ws_bundle TOAST table in the
repository schema, grows out of control !
Has somebody disabled autovacuum or set it to barely run at all?
Try setting autovacuum to very aggressively vacuum the problem table(s).
--
Craig Ringer