Discussion:
[ADMIN] Giant Postgresql Database Cluster Folder
(too old to reply)
Kiruba suthan
2012-04-10 05:51:54 UTC
Permalink
We are using Postgresql Database server to host DB for an analytical tool I
am working on. Basically this DB has metrics about files. Row count of few
tables are more than 18 million. But the content of the tables are basic
data types like numbers, date & string. No binary data is stored. The size
of the DB is around 1 GB when taken a full dump.



We query the DB using complex views to get reports and most of the result
set of these queries are quite huge (row count in hundreds of thousand or
in million).



The size of the DB Cluster Folder varies between 400GB to 600GB which is
unreasonably huge for the actual data. It is eating up all disk-space in
the server. When I create a fresh DB from the dump in a new server the size
of the DB cluster folder is around 2.3 GB which is very reasonable to me.


Experts,

Could you help me how to clean up DB Cluster folder and reclaim disk space
please? And please give me some insight into how data is organized in DB
Cluster and what should I do to avoid this happening again?



Size of directories under DB Cluster Folder is mentioned below

[***@server DB_CLUSER_DATA]$ du -ksh *
407G base
316K global
49M pg_clog
4.0K pg_hba.conf
4.0K pg_ident.conf
120K pg_multixact
12K pg_notify
32K pg_stat_tmp
88K pg_subtrans
4.0K pg_tblspc
4.0K pg_twophase
4.0K PG_VERSION
129M pg_xlog
20K postgresql.conf
4.0K postmaster.opts
4.0K postmaster.pid


Thank you very much!



Regards,

Kiruba
raghu ram
2012-04-10 06:14:08 UTC
Permalink
Post by Kiruba suthan
We are using Postgresql Database server to host DB for an analytical tool
I am working on. Basically this DB has metrics about files. Row count of
few tables are more than 18 million. But the content of the tables are
basic data types like numbers, date & string. No binary data is stored. The
size of the DB is around 1 GB when taken a full dump.
We query the DB using complex views to get reports and most of the result
set of these queries are quite huge (row count in hundreds of thousand or
in million).
The size of the DB Cluster Folder varies between 400GB to 600GB which is
unreasonably huge for the actual data. It is eating up all disk-space in
the server. When I create a fresh DB from the dump in a new server the size
of the DB cluster folder is around 2.3 GB which is very reasonable to me.
Experts,
Could you help me how to clean up DB Cluster folder and reclaim disk space
please? And please give me some insight into how data is organized in DB
Cluster and what should I do to avoid this happening again?
Size of directories under DB Cluster Folder is mentioned below
407G base
316K global
49M pg_clog
4.0K pg_hba.conf
4.0K pg_ident.conf
120K pg_multixact
12K pg_notify
32K pg_stat_tmp
88K pg_subtrans
4.0K pg_tblspc
4.0K pg_twophase
4.0K PG_VERSION
129M pg_xlog
20K postgresql.conf
4.0K postmaster.opts
4.0K postmaster.pid
Perform VACUUM FULL on entire cluster to reclaim space. This operation will
put Database Objects in Exclusive lock mode,so requesting you to do this
activity in non-peak hours.

Once you completed above maintenance activity,then schedule manual VACUUM
ANALYZE on every day once,so that dead rows space will be reused while
inserting new records into a table.

--Raghu
robin
2012-04-10 07:19:58 UTC
Permalink
pts rule name description
---- ---------------------- --------------------------------------------------
-20 ALL_TRUSTED Passed through trusted hosts only via SMTP
-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
[score: 0.0000]
X-Pg-Spam-Score: -1.9 (-)
X-Mailing-List: pgsql-admin
X-List-Archive: <http://archives.postgresql.org/pgsql-admin>
X-List-Help: <mailto:***@postgresql.org?body=help>
X-List-ID: <pgsql-admin.postgresql.org>
X-List-Owner: <mailto:pgsql-admin-***@postgresql.org>
X-List-Post: <mailto:pgsql-***@postgresql.org>
X-List-Subscribe: <mailto:***@postgresql.org?body=sub%20pgsql-admin>
X-List-Unsubscribe: <mailto:***@postgresql.org?body=unsub%20pgsql-admin>
X-Precedence: bulk
Bytes: 4824
Xref: number.nntp.dca.giganews.com pgsql.admin:23656

Given your data set is small so a dump/restore cycle is painless, you
should also consider upgrading to the latest stable release.

I agree with Raghu that you need to vacuum your database periodically
to keep it's footprint under control, but rather than doing the VACUUM
commands via scheduled activity you should use the auto vacuum daemon
unless you have specific needs. You probably do need to do the first
vacuum by hand with no users connected (or do a dump/restore cycle which
is perhaps quicker with a system as bloated as yours).

http://www.postgresql.org/docs/8.2/static/routine-vacuuming.html#AUTOVACUUM

Given you missed the need to vacuum I would read the whole routine
maintenance section in case there are other activities that you should
be performing to keep your data safe and available:

http://www.postgresql.org/docs/8.2/static/maintenance.html

Hope this helps,

Robin
Post by Kiruba suthan
Could you help me how to clean up DB Cluster folder and reclaim disk
space please? And please give me some insight into how data is
organized in DB Cluster and what should I do to avoid this happening
again?
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Loading...