Discussion:
Database size stays constant but disk space keeps shrinking -- postgres 9.1
(too old to reply)
Greg Williamson
2012-09-27 22:05:59 UTC
Permalink
Dear list,

I have a postgres database, 9.1.3, which shows a fairly constant amount of space used by postgres, but total disk space kees shrinking.
If I restart postgres the space on my file system returns.

This cluster is replicated to another; that custer does not show this problem.

There are two main databases on this cluster, both using logging. One is used to accumulate some stats on our systen.

The other is a slimmed-down version of our production database, which gets recreated hourly by a shell script which pulls data from remote servers, does a pg_dump of the resulting 3 gig database, and then drops it.

I posted a message about ths a few days and got no responses. I am hoping for better luck this time as this is a serious issue,not relayed t vacuuming or routine admin tasks which may or may not be happening.

Thanks for any help / advice / things to look for.

Greg

a few details:
db11:5432=# select version();
                                                   version
--------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.1.3 on x86_64-unknown-linux-gnu, compiled by gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit

Says one of our sysads:
"These are single disk systems. single no
n-raid sata. Using xfs as the filesystem like the rest of the dbs."
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Steve Crawford
2012-09-27 22:48:28 UTC
Permalink
Post by Greg Williamson
Dear list,
I have a postgres database, 9.1.3, which shows a fairly constant amount of space used by postgres, but total disk space kees shrinking.
If I restart postgres the space on my file system returns.
This cluster is replicated to another; that custer does not show this problem.
There are two main databases on this cluster, both using logging. One is used to accumulate some stats on our systen.
The other is a slimmed-down version of our production database, which gets recreated hourly by a shell script which pulls data from remote servers, does a pg_dump of the resulting 3 gig database, and then drops it.
I posted a message about ths a few days and got no responses. I am hoping for better luck this time as this is a serious issue,not relayed t vacuuming or routine admin tasks which may or may not be happening.
No answer but some questions.

How are you determining space used by PostgreSQL? From the OS (i.e. du
on PostgreSQL's directories or similar) or from PostgreSQL itself (pg_
system tables, \dt+, etc.)?

What method of replication are you using?

Do you have any indication if the issue is only related to which
database is causing the issue? Can you stop activity on one of them and
see if the problem persists?

Have you checked to see if there are any processes that have open
handles to deleted files (lsof -X | grep deleted). Deleted files won't
show up in du but won't release their disk space until the process
exits. Perhaps a script or scripts, even one of your hourly ones, that
terminate when the server restarts? You could save the output of lsof
and ps immediately before and after a restart and compare them.

Let us know what you find.

Cheers,
Steve
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Greg Williamson
2012-09-28 02:01:24 UTC
Permalink
Steve (and others who replied):


----- Original Message -----
Sent: Thursday, September 27, 2012 3:48 PM
Subject: Re: [ADMIN] Database size stays constant but disk space keeps shrinking -- postgres 9.1
Post by Greg Williamson
Dear list,
I have a postgres database, 9.1.3, which shows a fairly constant amount of
space used by postgres, but total disk space kees shrinking.
Post by Greg Williamson
If I restart postgres the space on my file system returns.
This cluster is replicated to another; that custer does not show this
problem.
Post by Greg Williamson
There are two main databases on this cluster, both using logging. One is
used to accumulate some stats on our systen.
Post by Greg Williamson
The other is a slimmed-down version of our production database, which gets
recreated hourly by a shell script which pulls data from remote servers, does a
pg_dump of the resulting 3 gig database, and then drops it.
Post by Greg Williamson
I posted a message about ths a few days and got no responses. I am hoping
for better luck this time as this is a serious issue,not relayed t vacuuming or
routine admin tasks which may or may not be happening.
No answer but some questions.
How are you determining space used by PostgreSQL? From the OS (i.e. du on
PostgreSQL's directories or similar) or from PostgreSQL itself (pg_ system
tables, \dt+, etc.)?
 
df -h and the postgres internal functions that show size of relations.
What method of replication are you using?
streaming replication w/ synchronous commit
Do you have any indication if the issue is only related to which database is
causing the issue? Can you stop activity on one of them and see if the problem
persists?
If I suspend the build / drop database process the shrinking of available disk space slows dramatically.
Have you checked to see if there are any processes that have open handles to
deleted files (lsof -X | grep deleted). Deleted files won't show up in du
but won't release their disk space until the process exits. Perhaps a script
or scripts, even one of your hourly ones, that terminate when the server
restarts? You could save the output of lsof and ps immediately before and after
a restart and compare them.
 
lsof -X | grep deleted | wc -l

shows: 835 such files.

A couple:
postgres   2540 postgres   50u      REG                8,3     409600      93429 /var/lib/postgresql/9.1/main/base/2789
200/11816 (deleted)
postgres   2540 postgres   51u      REG                8,3   18112512   49694570 /var/lib/postgresql/9.1/main/base/2789
200/2791679 (deleted)
<...>

while ls shows:
***@db11:~# ls /var/lib/postgresql/9.1/main/base/ | more
1
11938
11946
1418400
16387
16392
16402
16424
16449
2047839
pgsql_tmp

We've a planned restart scheduled soon which will let me find any scripts that might be keeping things open, and I'll review my script that creates / populates / drops the database hourly.
 

This does not seem to be related to temp tables. Something in the script, perhaps ...

Further suggestions and  / or questions welcome.

Thanks !

Greg W.
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Tom Lane
2012-09-28 02:14:39 UTC
Permalink
Post by Greg Williamson
Have you checked to see if there are any processes that have open handles to
deleted files (lsof -X | grep deleted).
lsof -X | grep deleted | wc -l
shows: 835 such files.
postgres 2540 postgres 50u REG 8,3 409600 93429 /var/lib/postgresql/9.1/main/base/2789
200/11816 (deleted)
postgres 2540 postgres 51u REG 8,3 18112512 49694570 /var/lib/postgresql/9.1/main/base/2789
200/2791679 (deleted)
<...>
So, which processes are holding these open, and what are they doing
exactly? Let's see output from ps and pg_stat_activity, maybe even
attach to them with gdb and get stack traces.
Post by Greg Williamson
We've a planned restart scheduled soon which will let me find any
scripts that might be keeping things open,
A restart will destroy all the evidence, so let's not be in a hurry
to do that before we've identified what's happening.

regards, tom lane
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Greg Williamson
2012-09-28 02:23:54 UTC
Permalink
Tom --

----- Original Message -----
Sent: Thursday, September 27, 2012 7:14 PM
Subject: Re: [ADMIN] Database size stays constant but disk space keeps shrinking -- postgres 9.1
Post by Greg Williamson
Post by Steve Crawford
Have you checked to see if there are any processes that have open
handles to
Post by Greg Williamson
Post by Steve Crawford
deleted files (lsof -X | grep deleted).
lsof -X | grep deleted | wc -l
shows: 835 such files.
postgres  2540 postgres  50u      REG                8,3    409600     
93429 /var/lib/postgresql/9.1/main/base/2789
Post by Greg Williamson
200/11816 (deleted)
postgres  2540 postgres  51u      REG                8,3  18112512 
49694570 /var/lib/postgresql/9.1/main/base/2789
Post by Greg Williamson
200/2791679 (deleted)
<...>
So, which processes are holding these open, and what are they doing
exactly?  Let's see output from ps and pg_stat_activity, maybe even
attach to them with gdb and get stack traces.
Post by Greg Williamson
We've a planned restart scheduled soon which will let me find any
scripts that might be keeping things open,
A restart will destroy all the evidence, so let's not be in a hurry
to do that before we've identified what's happening.
            regards, tom lane
Thanks for the suggestions -- I'll post back when I have more info. Many of these do not seem to have a link to any identifiable process that is still running, but some do and they have pointed me away from the hourly drop / rebuild, at least for now. Looks like the stats database may be the issue.

Greg W.
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Tom Lane
2012-09-28 02:55:52 UTC
Permalink
Post by Greg Williamson
postgres 2540 postgres 50u REG 8,3 409600 93429 /var/lib/postgresql/9.1/main/base/2789200/11816 (deleted)
postgres 2540 postgres 51u REG 8,3 18112512 49694570 /var/lib/postgresql/9.1/main/base/2789200/2791679 (deleted)
Thanks for the suggestions -- I'll post back when I have more info. Many of these do not seem to have a link to any identifiable process that is still running, but some do and they have pointed me away from the hourly drop / rebuild, at least for now. Looks like the stats database may be the issue.
BTW, looking at that again --- the filenames appear to be ordinary
tables in database 2789200, but there is something mighty odd about the
first one: 11816 is an OID that should only be handed out during initdb.
And in 9.1 what it would be handed out to is pg_shdescription. Now it's
not impossible that pg_shdescription's original table file would get
deleted: a VACUUM FULL or CLUSTER on that catalog would do it. But
AFAICS there is no situation in which that relfilenode number would
appear in a regular database --- it should be under the global/
subdirectory of $PGDATA. So unless you miscopied that filename, there
is something odd going on here above and beyond the problem of open
file descriptors not getting closed. Do you have any nonstandard
maintenance practices in this installation, such as doing database-wide
VACUUM FULL every so often?

regards, tom lane
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Greg Williamson
2012-09-28 05:28:26 UTC
Permalink
Tom --



----- Original Message -----
Sent: Thursday, September 27, 2012 7:55 PM
Subject: Re: [ADMIN] Database size stays constant but disk space keeps shrinking -- postgres 9.1
Post by Greg Williamson
Post by Greg Williamson
postgres  2540 postgres  50u      REG                8,3   
409600      93429 /var/lib/postgresql/9.1/main/base/2789200/11816 (deleted)
Post by Greg Williamson
Post by Greg Williamson
postgres  2540 postgres  51u      REG                8,3 
18112512  49694570 /var/lib/postgresql/9.1/main/base/2789200/2791679 (deleted)
Post by Greg Williamson
Thanks for the suggestions -- I'll post back when I have more info.
Many of these do not seem to have a link to any identifiable process that is
still running, but some do and they have pointed me away from the hourly drop /
rebuild, at least for now. Looks like the stats database may be the issue.
BTW, looking at that again --- the filenames appear to be ordinary
tables in database 2789200, but there is something mighty odd about the
first one: 11816 is an OID that should only be handed out during initdb.
And in 9.1 what it would be handed out to is pg_shdescription.  Now it's
not impossible that pg_shdescription's original table file would get
deleted: a VACUUM FULL or CLUSTER on that catalog would do it.  But
AFAICS there is no situation in which that relfilenode number would
appear in a regular database --- it should be under the global/
subdirectory of $PGDATA.  So unless you miscopied that filename, there
is something odd going on here above and beyond the problem of open
file descriptors not getting closed.
Nope -- a sim-ple copy that seems to have gotten all of the data output.
                                                         Do you have any nonstandard
maintenance practices in this installation, such as doing database-wide
VACUUM FULL every so often?
None that I know of -- logs don't show any. and there are none on cronjobs. I'm asking the developers behind this app -- they may be doing something strange.  I'll post back as they answer.

Thanks hugely,

Greg W.
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Greg Williamson
2012-10-02 22:02:23 UTC
Permalink
I've done some more testing and the problem seems to be repmgr itself.

A few details below...


----- Original Message -----
Sent: Thursday, September 27, 2012 7:23 PM
Subject: Re: [ADMIN] Database size stays constant but disk space keeps shrinking -- postgres 9.1
T om --
----- Original Message -----
Sent: Thursday, September 27, 2012 7:14 PM
Subject: Re: [ADMIN] Database size stays constant but disk space keeps
shrinking -- postgres 9.1
  Have you checked to see if there are any processes that have open
handles to
  deleted files (lsof -X | grep deleted).
  lsof -X | grep deleted | wc -l
  shows: 835 such files.
  postgres   2540 postgres   50u      REG                8,3     409600 
   
93429 /var/lib/postgresql/9.1/main/base/2789
  200/11816 (deleted)
  postgres   2540 postgres   51u      REG                8,3   18112512 
49694570 /var/lib/postgresql/9.1/main/base/2789
  200/2791679 (deleted)
  <...>
So, which processes are holding these open, and what are they doing
exactly?  Let's see output from ps and pg_stat_activity, maybe even
attach to them with gdb and get stack traces.
  We've a planned restart scheduled soon which will let me find any
  scripts that might be keeping things open,
A restart will destroy all the evidence, so let's not be in a hurry
to do that before we've identified what's happening.
            regards, tom lane
Thanks for the suggestions -- I'll post back when I have more info. Many of
these do not seem to have a link to any identifiable process that is still
running, but some do and they have pointed me away from the hourly drop /
rebuild, at least for now. Looks like the stats database may be the issue.
Greg W.
I turned off the cronjob that did the hourly database create / drop and am still leaking disk space, but a but slower -- only lost 2 gigs overnight.

While running this process I see these data directories:
***@db11:~$ ls -lrt 9.1/main/base
total 200
drwx------ 2 postgres postgres     6 2012-09-21 16:36 pgsql_tmp
drwx------ 2 postgres postgres  8192 2012-10-01 00:26 16387
drwx------ 2 postgres postgres 16384 2012-10-01 00:26 1418400
drwx------ 2 postgres postgres  8192 2012-10-01 00:26 2047839
drwx------ 2 postgres postgres  8192 2012-10-01 00:26 11946
drwx------ 2 postgres postgres  8192 2012-10-01 00:27 16449
drwx------ 2 postgres postgres  8192 2012-10-01 00:27 16392
drwx------ 2 postgres postgres  8192 2012-10-01 00:27 16402
drwx------ 2 postgres postgres  8192 2012-10-01 00:27 11938
drwx------ 2 postgres postgres  8192 2012-10-01 00:27 1
drwx------ 2 postgres postgres  8192 2012-10-01 08:17 16424
drwx------ 2 postgres postgres 32768 2012-10-01 19:20 3171846

When it is done (note the last directory is now gone):
***@db11:~$ ls -lrt 9.1/main/base
total 140
drwx------ 2 postgres postgres     6 2012-09-21 16:36 pgsql_tmp
drwx------ 2 postgres postgres  8192 2012-10-01 00:26 16387
drwx------ 2 postgres postgres 16384 2012-10-01 00:26 1418400
drwx------ 2 postgres postgres  8192 2012-10-01 00:26 2047839
drwx------ 2 postgres postgres  8192 2012-10-01 00:26 11946
drwx------ 2 postgres postgres  8192 2012-10-01 00:27 16449
drwx------ 2 postgres postgres  8192 2012-10-01 00:27 16392
drwx------ 2 postgres postgres  8192 2012-10-01 00:27 16402
drwx------ 2 postgres postgres  8192 2012-10-01 00:27 11938
drwx------ 2 postgres postgres  8192 2012-10-01 00:27 1
drwx------ 2 postgres postgres  8192 2012-10-01 08:17 16424

When I run lsof -X and grep for deleted files I see these 4 new entries added since the last database create/drop:
ase/3167420/3169915 (deleted)
postgres  21116 postgres   66u      REG                8,3   19709952  136501576 /var/lib/postgresql/9.1/main/base/3171846/3174279 (deleted)
postgres  21116 postgres   67u      REG                8,3   15450112  136501574 /var/lib/postgresql/9.1/main/base/3171846/3174278 (deleted)
postgres  21116 postgres   68u      REG                8,3   28344320  136410873 /var/lib/postgresql/9.1/main/base/3171846/3172541 (deleted)
postgres  21116 postgres   69u      REG                8,3   82452480  144333458 /var/lib/postgresql/9.1/main/base/3171846/3174341 (deleted)
***@db11:~# 
***@db11:~# ps auxww | grep 21116
postgres 21116  0.0  0.1 100416 32332 ?        Ss   00:26   0:16 postgres: repmgr repmgr 199.9.xxx.yyy(45239) idle               
root     25755  0.0  0.0   6440   840 pts/2    S+   19:38   0:00 grep --color=auto 21116

======

With the database create/drop suspended we still see a steady accumulation of dead file descriptors, but at a slower rate.
< /dev/sda3              67G   28G   39G  42% /
---
/dev/sda3              67G   29G   38G  44% /
Other than abandoning repmgr I don't see a solution. I've posted this to the repmgr discussion group but have had zero responses (and, frankly, am not holding my breath).

If anyone has any suggestions I'm all ears.

Thanks for the bandwidth!

Greg W.
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Gabriele Bartolini
2012-10-03 11:41:32 UTC
Permalink
Hi Greg,

On Tue, 2 Oct 2012 15:02:23 -0700 (PDT), Greg Williamson
Post by Greg Williamson
Other than abandoning repmgr I don't see a solution. I've posted this
to the repmgr discussion group but have had zero responses (and,
frankly, am not holding my breath).
If you are 100% sure it is repmgr ... :)

I am not 100% sure it is ... Under normal circumstances (once the
standby has been cloned), repmgr simply controls the status of a standby
server communicating with a master through the streaming replication
protocol. As any other standby would do.

Cheers,
Gabriele
--
Gabriele Bartolini - 2ndQuadrant Italia
PostgreSQL Training, Services and Support
***@2ndQuadrant.it - www.2ndQuadrant.it
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Jaime Casanova
2012-10-03 13:56:17 UTC
Permalink
On Wed, Oct 3, 2012 at 6:41 AM, Gabriele Bartolini
Post by Micky Gough
Hi Greg,
On Tue, 2 Oct 2012 15:02:23 -0700 (PDT), Greg Williamson
Post by Greg Williamson
Other than abandoning repmgr I don't see a solution. I've posted this
to the repmgr discussion group but have had zero responses (and,
frankly, am not holding my breath).
If you are 100% sure it is repmgr ... :)
I am not 100% sure it is ... Under normal circumstances (once the standby
has been cloned), repmgr simply controls the status of a standby server
communicating with a master through the streaming replication protocol. As
any other standby would do.
more to the point...
if this is happening on master (where you can create and drop
databases), then repmgrd has no reason to be running there... and very
limited subset of commands (master register, cluster show, cluster
cleanup) can be run on master
--
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566 Cell: +593 987171157
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Greg Williamson
2012-10-03 21:53:51 UTC
Permalink
Gabriele --


----- Original Message -----
Sent: Wednesday, October 3, 2012 4:41 AM
Subject: Re: [ADMIN] Database size stays constant but disk space keeps shrinking -- postgres 9.1
Hi Greg,
On Tue, 2 Oct 2012 15:02:23 -0700 (PDT), Greg Williamson
Post by Greg Williamson
Other than abandoning repmgr I don't see a solution. I've posted
this
Post by Greg Williamson
to the repmgr discussion group but have had zero responses (and,
frankly, am not holding my breath).
If you are 100% sure it is repmgr ... :)
I am not 100% sure it is ... Under normal circumstances (once the standby has
been cloned), repmgr simply controls the status of a standby server
communicating with a master through the streaming replication protocol. As any
other standby would do.
Indeed -- stopping repmgr has lef to a continued accumulation of the dead but not gone file handles, but almost all are now owned by the application, with only a few held by WAL shipping and the like. So repmgr was just getting the blame, unfairly.

Am working on the code itself now.

Thanks for the response!

Greg W.
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Jaime Casanova
2012-10-17 17:05:09 UTC
Permalink
Post by Greg Williamson
Indeed -- stopping repmgr has lef to a continued accumulation of the dead but not gone file handles, but almost all are now owned by the application, with only a few held by WAL shipping and the like. So repmgr was just getting the blame, unfairly.
Am working on the code itself now.
Just in case, it seems like
http://archives.postgresql.org/pgsql-committers/2012-10/msg00194.php
is a fix to your problem. It was commited a few minutes ago by Tom
Lane so you can patch your instalation or wait until the next
postgresql minor version to be released
--
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566 Cell: +593 987171157
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Greg Williamson
2012-10-17 21:56:17 UTC
Permalink
fantastic !
thanks.
greg
________________________________
Sent: Wednesday, October 17, 2012 10:05 AM
Subject: Re: [ADMIN] Database size stays constant but disk space keeps shrinking -- postgres 9.1
Post by Greg Williamson
Indeed -- stopping repmgr has lef to a continued accumulation of the dead but not gone file handles, but almost all are now owned by the application, with only a few held by WAL shipping and the like. So repmgr was just getting the blame, unfairly.
Am working on the code itself now.
Just in case, it seems like
http://archives.postgresql.org/pgsql-committers/2012-10/msg00194.php
is a fix to your problem. It was commited a few minutes ago by Tom
Lane so you can patch your instalation or wait until the next
postgresql minor version to be released
--
Jaime Casanova        www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566        Cell: +593 987171157
Steve Crawford
2012-09-28 15:59:56 UTC
Permalink
Post by Greg Williamson
...
The other is a slimmed-down version of our production database, which gets
Post by Greg Williamson
recreated hourly by a shell script which pulls data from remote servers, does a
pg_dump of the resulting 3 gig database, and then drops it.
...
Could you explain this process in more detail? Are you creating a new
database, reading in data, dumping then dropping the whole database or
just manipulating tables within an existing database?
Post by Greg Williamson
Post by Greg Williamson
Have you checked to see if there are any processes that have open handles to
deleted files (lsof -X | grep deleted). Deleted files won't show up in du
but won't release their disk space until the process exits. Perhaps a script
or scripts, even one of your hourly ones, that terminate when the server
restarts? You could save the output of lsof and ps immediately before and after
a restart and compare them.
lsof -X | grep deleted | wc -l
shows: 835 such files.
postgres 2540 postgres 50u REG 8,3 409600 93429 /var/lib/postgresql/9.1/main/base/2789
200/11816 (deleted)
postgres 2540 postgres 51u REG 8,3 18112512 49694570 /var/lib/postgresql/9.1/main/base/2789
200/2791679 (deleted)
<...>...
I'll leave it to you and Tom to puzzle over the the postgres-related
open files. Meanwhile, I'm a bit curious about the other 800+ and
whether they are associated with scripts or processes that are connected
to PostgreSQL.

First, what is the output of "select * from pg_stat_activity;"? Are
there connections you don't expect to see? If you force any of them
closed (after checking with anyone who may be impacted), do you see any
file handles released or disk-space freed?

Second, do any of the processes associated with the other
open-but-deleted files relate to programs or scripts that connect to
PostgreSQL? Next time you do a restart, do any of the processes exit or
do any of the deleted files get closed?

I'm wondering if you have processes that connect to PostgreSQL which
terminate and release their file-handles when PG is restarted.

Cheers,
Steve
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Dinesh Bhandary
2012-09-28 17:09:39 UTC
Permalink
We had a situation where external sort was creating a humungous temp
file and the space was reclaimed when the process was completed.

Thanks.
Dinesh
Post by Steve Crawford
Post by Greg Williamson
...
The other is a slimmed-down version of our production database, which gets
Post by Greg Williamson
recreated hourly by a shell script which pulls data from remote servers, does a
pg_dump of the resulting 3 gig database, and then drops it.
...
Could you explain this process in more detail? Are you creating a new
database, reading in data, dumping then dropping the whole database or
just manipulating tables within an existing database?
Post by Greg Williamson
Post by Greg Williamson
Have you checked to see if there are any processes that have open handles to
deleted files (lsof -X | grep deleted). Deleted files won't show up in du
but won't release their disk space until the process exits. Perhaps a script
or scripts, even one of your hourly ones, that terminate when the server
restarts? You could save the output of lsof and ps immediately before and after
a restart and compare them.
lsof -X | grep deleted | wc -l
shows: 835 such files.
postgres 2540 postgres 50u REG 8,3
409600 93429 /var/lib/postgresql/9.1/main/base/2789
200/11816 (deleted)
postgres 2540 postgres 51u REG 8,3 18112512
49694570 /var/lib/postgresql/9.1/main/base/2789
200/2791679 (deleted)
<...>...
I'll leave it to you and Tom to puzzle over the the postgres-related
open files. Meanwhile, I'm a bit curious about the other 800+ and
whether they are associated with scripts or processes that are
connected to PostgreSQL.
First, what is the output of "select * from pg_stat_activity;"? Are
there connections you don't expect to see? If you force any of them
closed (after checking with anyone who may be impacted), do you see
any file handles released or disk-space freed?
Second, do any of the processes associated with the other
open-but-deleted files relate to programs or scripts that connect to
PostgreSQL? Next time you do a restart, do any of the processes exit
or do any of the deleted files get closed?
I'm wondering if you have processes that connect to PostgreSQL which
terminate and release their file-handles when PG is restarted.
Cheers,
Steve
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Greg Williamson
2012-09-28 22:47:52 UTC
Permalink
Steve --

----- Original Message -----
Sent: Friday, September 28, 2012 8:59 AM
Subject: Re: [ADMIN] Database size stays constant but disk space keeps shrinking -- postgres 9.1
...
  The other is a slimmed-down version of our production database, which
gets
Post by Greg Williamson
recreated hourly by a shell script which pulls data from remote
servers, does a
Post by Greg Williamson
pg_dump of the resulting 3 gig database, and then drops it.
...
Could you explain this process in more detail? Are you creating a new database,
reading in data, dumping then dropping the whole database or just manipulating
tables within an existing database?
 
We create the database, and then use shell scripts to get a pg_dump -s of a production server and use that to create tables, indexes, etc.

Then a sequence of psql commands retrieves data from production, usually a subset of whatever table is being grabbed based on a slimmed down set of userids of interest, recency, etc. Mostly these get loaded as is i nto the tables in the new database; occasionally we build a temp table and do some simple joins to get the final results.

When all these commands are done, we pg_dump the new database, compress it and make a tar file, and finally issue a DROP DATABASE command.

Takes about 20 minutes and runs once an hour.
Post by Greg Williamson
Have you checked to see if there are any processes that have open
handles to
Post by Greg Williamson
deleted files (lsof -X | grep deleted). Deleted files won't show up
in du
Post by Greg Williamson
but won't release their disk space until the process exits. Perhaps
a script
Post by Greg Williamson
or scripts, even one of your hourly ones, that terminate when the
server
Post by Greg Williamson
restarts? You could save the output of lsof and ps immediately before
and after
Post by Greg Williamson
a restart and compare them.
 
lsof -X | grep deleted | wc -l
shows: 835 such files.
postgres  2540 postgres  50u      REG                8,3    409600     
93429 /var/lib/postgresql/9.1/main/base/2789
200/11816 (deleted)
postgres  2540 postgres  51u      REG                8,3  18112512 
49694570 /var/lib/postgresql/9.1/main/base/2789
200/2791679 (deleted)
<...>...
I'll leave it to you and Tom to puzzle over the the postgres-related open
files. Meanwhile, I'm a bit curious about the other 800+ and whether they
are associated with scripts or processes that are connected to PostgreSQL.
 
These all seem to be from two places -- repmgr (transient ) and this stats application.
First, what is the output of "select * from pg_stat_activity;"? Are
there connections you don't expect to see? If you force any of them closed
(after checking with anyone who may be impacted), do you see any file handles
released or disk-space freed?
Nothing unexpected, lots of IDLE connections (20-30, depending). We just tried a round of closing half of the stat application connections and it didn't seem to make a big difference.
Second, do any of the processes associated with the other open-but-deleted files
relate to programs or scripts that connect to PostgreSQL? Next time you do a
restart, do any of the processes exit or do any of the deleted files get closed?
 
repmgr is now also a suspect, although from what I can see it keeps things in that state for nly a short while. But we are investigating further.

Everythng else connects via pgbouncer, so we are also wondering if the tcp_keepalive we added might be hurting us.
I'm wondering if you have processes that connect to PostgreSQL which
terminate and release their file-handles when PG is restarted.
Cheers,
Steve
Thanks for questions and the time -- still digging into this.

Greg
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Tom Lane
2012-09-28 23:49:09 UTC
Permalink
Post by Greg Williamson
I'll leave it to you and Tom to puzzle over the the postgres-related open
files. Meanwhile, I'm a bit curious about the other 800+ and whether they
are associated with scripts or processes that are connected to PostgreSQL.
These all seem to be from two places -- repmgr (transient ) and this stats application.
Do you have any characterization yet of which deleted files are being
held open by which processes? In particular I'm wondering if the
held-open deleted files are in a recently-dropped database, and whether
they are being held open by regular backends or one of the background
processes such as bgwriter, and if the former what are those backends
doing exactly.

It's entirely expected that recently-deleted files might be held open
for a little while, but there are mechanisms that are supposed to
prevent them from being held open indefinitely. I'm guessing that your
usage pattern might be tripping over some gap in those mechanisms, but
we don't have enough info yet to speculate about what.

BTW, I now think that my question about the unexpected OID value shown
for one deleted file may have been a red herring --- it seems not
implausible that lsof was just lying to you. It has to do some
guesswork to reconstruct file paths for deleted files, and I don't think
it's always right about that.

regards, tom lane
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Dinesh Bhandary
2012-09-28 02:19:51 UTC
Permalink
I'd also keep track of tmp directory while building. I hope it is not
creating a huge external sort file. By building you meant to say
pg_restore, I assume.
How big is your maintenance_work_mem? Index creation pools memory from
this buffer. 
Post by Greg Williamson
----- Original Message -----
Sent: Thursday, September 27, 2012 3:48 PM
Subject: Re: [ADMIN] Database size stays constant but disk space keeps
shrinking -- postgres 9.1
Post by Greg Williamson
Dear list,
I have a postgres database, 9.1.3, which shows a fairly constant amount of
space used by postgres, but total disk space kees shrinking.
Post by Greg Williamson
If I restart postgres the space on my file system returns.
This cluster is replicated to another; that custer does not show this
problem.
Post by Greg Williamson
There are two main databases on this cluster, both using logging. One is
used to accumulate some stats on our systen.
Post by Greg Williamson
The other is a slimmed-down version of our production database, which gets
recreated hourly by a shell script which pulls data from remote servers, does a
pg_dump of the resulting 3 gig database, and then drops it.
Post by Greg Williamson
I posted a message about ths a few days and got no responses. I am hoping
for better luck this time as this is a serious issue,not relayed t vacuuming or
routine admin tasks which may or may not be happening.
No answer but some questions.
How are you determining space used by PostgreSQL? From the OS (i.e. du on
PostgreSQL's directories or similar) or from PostgreSQL itself (pg_ system
tables, \dt+, etc.)?
 
df -h and the postgres internal functions that show size of relations.
What method of replication are you using?
streaming replication w/ synchronous commit
Do you have any indication if the issue is only related to which database is
causing the issue? Can you stop activity on one of them and see if the problem
persists?
If I suspend the build / drop database process the shrinking of
available disk space slows dramatically.
Have you checked to see if there are any processes that have open handles to
deleted files (lsof -X | grep deleted). Deleted files won't show up in du
but won't release their disk space until the process exits. Perhaps a script
or scripts, even one of your hourly ones, that terminate when the server
restarts? You could save the output of lsof and ps immediately before and after
a restart and compare them.
 
lsof -X | grep deleted | wc -l
shows: 835 such files.
postgres   2540 postgres   50u      REG                8,3     409600  
   93429 /var/lib/postgresql/9.1/main/base/2789
200/11816 (deleted)
postgres   2540 postgres   51u      REG                8,3   18112512  
49694570 /var/lib/postgresql/9.1/main/base/2789
200/2791679 (deleted)
<...>
1
11938
11946
1418400
16387
16392
16402
16424
16449
2047839
pgsql_tmp
We've a planned restart scheduled soon which will let me find any
scripts that might be keeping things open, and I'll review my script
that creates / populates / drops the database hourly.
 
This does not seem to be related to temp tables. Something in the script, perhaps ...
Further suggestions and  / or questions welcome.
Thanks !
Greg W.
--
http://www.postgresql.org/mailpref/pgsql-admin
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Greg Williamson
2012-10-01 21:37:20 UTC
Permalink
Dinesh --


----- Original Message -----
Sent: Thursday, September 27, 2012 7:19 PM
Subject: Re: [ADMIN] Database size stays constant but disk space keeps shrinking -- postgres 9.1
I'd also keep track of tmp directory while building. I hope it is not
creating a huge external sort file. By building you meant to say
pg_restore, I assume.
Actually everything is done a single SQL commands; the DDL from the source gets run on the new database, and then we do a series of SQL commands. The tmp space doesn't seem to get hit as there are only a few sorts being run as we reduce a list of "users from" and "users to" to a single list of unique user ids.
How big is your maintenance_work_mem? Index creation pools memory from
this buffer. 
maintenance_work_mem = 1500MB# min 1MB

Which seems to be enough for this task set.

***@db11:~/9.1/main$ ls -lrt base/pgsql_tmp
total 0

Something malign in one of our applications, perhaps interacting with this hourly drop. Still working with the developers to see what they are doing "under the hood."

Thanks for the suggestions!

Greg
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Micky Gough
2012-09-27 22:48:37 UTC
Permalink
Hi Greg,
Post by Greg Williamson
I have a postgres database, 9.1.3, which shows a fairly constant amount of
space used by postgres, but total disk space kees shrinking.
If I restart postgres the space on my file system returns.
It sounds like (maybe?) Postgres is writing to a file that has been
deleted. You won't be able to see the file using ls, so it's hard to track.
Restarting postgres will release the lock on the inode and the filesystem
flushes it.

You can check for deleted files that are being held open using lsof |grep
deleted.

Micky
Dinesh Bhandary
2012-09-27 23:11:27 UTC
Permalink
You might also want to check to see if temp files are growing out of bounds.

Thanks.
Dinesh
Post by Greg Williamson
Dear list,
I have a postgres database, 9.1.3, which shows a fairly constant amount of space used by postgres, but total disk space kees shrinking.
If I restart postgres the space on my file system returns.
This cluster is replicated to another; that custer does not show this problem.
There are two main databases on this cluster, both using logging. One is used to accumulate some stats on our systen.
The other is a slimmed-down version of our production database, which gets recreated hourly by a shell script which pulls data from remote servers, does a pg_dump of the resulting 3 gig database, and then drops it.
I posted a message about ths a few days and got no responses. I am hoping for better luck this time as this is a serious issue,not relayed t vacuuming or routine admin tasks which may or may not be happening.
Thanks for any help / advice / things to look for.
Greg
db11:5432=# select version();
version
--------------------------------------------------------------------------------------------------------------
PostgreSQL 9.1.3 on x86_64-unknown-linux-gnu, compiled by gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit
"These are single disk systems. single no
n-raid sata. Using xfs as the filesystem like the rest of the dbs."
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Loading...