Experience with large number of tables in single PostgreSQL instance

Discussion:

(too old to reply)

Vedran Krivokuca

2013-04-10 07:07:10 UTC

Hello all,

We are exploring possible strategies on deploying PostgreSQL with
application which will store fairly much data (current implementation
stores around 345 GB of data, and it will be subject of up to 10 times
more data stored in the database).

Now, not to go into too much details to avoid bothering you, the key
fact is that expected data sets needed at any moment for manipulation
at any given point in time (let's call them operational data sets)
will be around 1/10.000th of whole data set. There can be up to 100
concurrent different data sets operations at the same time.
Operational data sets are strictly defined in terms of what they
contain, but they could contain any piece of complete data set, so we
have 2 obvious strategies here to go with:

1) we can go with different instances of PostgreSQL service, let's say
(for pure theory) 10 of them on the same HA cluster setup. Every
instance would hold let's say 1/10th of that big recordset, and around
3.000 database tables (this apparently shouldn't be of any problem to
PostgreSQL, see below).

2) we can go with single instance of PostgreSQL service which would
then contain 30.000 database tables.

So, consider this purely theoretical discussion - does anyone here
have experience of running PostgreSQL service with large number of
database tables (couple of thousands up to couple of tens of
thousands)?

This entry in Wiki indicates PostgreSQL in general doesn't have
problem with "thousands of tables", how about tens of thousands?
http://postgres.cz/wiki/PostgreSQL_SQL_Tricks#Faster_table.27s_list

Best regards,
V.

--
Pozdrav/Greetings,
Vedran Krivokuća
Disclaimer: This message may contain information.

--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Vedran Krivokuca

2013-04-10 07:10:59 UTC

Permalink

Post by Vedran Krivokuca
1) we can go with different instances of PostgreSQL service, let's say
(for pure theory) 10 of them on the same HA cluster setup. Every
instance would hold let's say 1/10th of that big recordset, and around
3.000 database tables (this apparently shouldn't be of any problem to
PostgreSQL, see below).

--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Vasilis Ventirozos

2013-04-10 07:24:50 UTC

Permalink

Post by Vedran Krivokuca

I think that a limitation would be on the OS (filesystem and kernel)
because each table
is a file (or more) under a directory (per database). I haven't got
experience on how
modern Linux system cope with that but i think it would be something
easy to test.
Database wise you shouldn't have a problem.

Dale Betts

2013-04-10 08:43:10 UTC

Permalink

I'd agree, certainly in my experiences.

You need to ensure OS parameters such as the max open files (fs.file-max
if we're talking Linux) is set appropriately. Baring in mind each user
will have an open file on each underlying datafile for the databases
they're connected to.

Dale
Dale Betts
Technical Specialist

T 01623 48 98 22
F 01623 48 98 20
e ***@hssnet.com <mailto:***@hssnet.com>
Sales www.hssnet.com <http:%5C%5Cwww.hssnet.com>

Healthcare Software Systems
3rd Floor i2 Mansfield
Hamilton Court
Oakham Business Park
Mansfield
NG18 5FB
THE INFORMATION CONTAINED IN THIS MESSAGE IS CONFIDENTIAL and is
intended for the addressee only. If you are not the addressee, any
disclosure, reproduction, copying, distribution, or other dissemination
or use of this communication is strictly prohibited. Any views or
opinions presented are those of the sender and do not necessarily
represent Healthcare Software Systems.

HSS Ltd is a company registered in England and Wales with company number
2112370

On Wed, Apr 10, 2013 at 10:10 AM, Vedran Krivokuca
On Wed, Apr 10, 2013 at 9:07 AM, Vedran Krivokuca

Post by Vedran Krivokuca
1) we can go with different instances of PostgreSQL service,

let's say

Post by Vedran Krivokuca
(for pure theory) 10 of them on the same HA cluster setup. Every
instance would hold let's say 1/10th of that big recordset, and

around

Post by Vedran Krivokuca
3.000 database tables (this apparently shouldn't be of any

problem to

Post by Vedran Krivokuca
PostgreSQL, see below).

I am talking shit here, obviously, excuse me. We would go with, for
example, 10 or 100 databases in the same PostgreSQL instance.
Buzzwording got better of me. :) The rest of my initial e-mail still
applies, I will cherish any input of experience on this subject.
V.
--
Pozdrav/Greetings,
Vedran KrivokuÄa
Disclaimer: This message may contain information.
--
http://www.postgresql.org/mailpref/pgsql-admin
I think that a limitation would be on the OS (filesystem and kernel) because each table
is a file (or more) under a directory (per database). I haven't got experience on how
modern Linux system cope with that but i think it would be something easy to test.
Database wise you shouldn't have a problem.

Vedran Krivokuca

2013-04-10 13:05:20 UTC

Permalink

Ok, to reply to myself and other interested parties, further insights
are welcome!

I've been digging today for information on this subject and:

http://postgresql.1045698.n5.nabble.com/GENERAL-Maximum-number-of-tables-per-database-and-slowness-td1853836.html
Basically saying: any number of tables is acceptable (well in range of
couple of tens of thousands we might end up with), but have in mind
pg_catalog might seriously grow with such number of tables.
Huge pg_catalog tends to slow down query planner. This thread is from
2005., but I doubt those fact changed substionally with time.

I'll probably do some testing, but I can hardly replicate production
volume of data. Seems that due to predictable nature of the queries we
are running against this system we'll go with multiple databases, each
containing up to couple thousand tables at most. Still weighting pros
and cons of each approach.

--
Pozdrav/Greetings,
Vedran Krivokuća
Disclaimer: This message may contain information.

--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Prashanth Ranjalkar

2013-04-10 13:09:27 UTC

Permalink

Every table or index is created in the form of OS files therefore max open
files need to be set appropriately in order to achieve the larger table
count. There would be no limitation for creation of tables in PostgreSQL
and performance would be the major criteria as catalogs get overburdened.

**
*Thanks & Regards,*
*** *
*Prashanth Ranjalkar*
*Database Consultant & Architect*
*Skype:prashanth.ranjalkar*
*www.postgresdba.net*

Post by Dale Betts
I'd agree, certainly in my experiences.
You need to ensure OS parameters such as the max open files (fs.file-max
if we're talking Linux) is set appropriately. Baring in mind each user will
have an open file on each underlying datafile for the databases they're
connected to.
Dale
Dale Betts
Technical Specialist
T 01623 48 98 22
F 01623 48 98 20
Sales www.hssnet.com <http:%5C%5Cwww.hssnet.com>
Healthcare Software Systems
3rd Floor i2 Mansfield
Hamilton Court
Oakham Business Park
Mansfield
NG18 5FB
THE INFORMATION CONTAINED IN THIS MESSAGE IS CONFIDENTIAL and is
intended for the addressee only. If you are not the addressee, any
disclosure, reproduction, copying, distribution, or other dissemination or
use of this communication is strictly prohibited. Any views or opinions
presented are those of the sender and do not necessarily represent
Healthcare Software Systems.
HSS Ltd is a company registered in England and Wales with company number
2112370

Post by Vedran Krivokuca

I think that a limitation would be on the OS (filesystem and kernel) because each table
is a file (or more) under a directory (per database). I haven't got experience on how
modern Linux system cope with that but i think it would be something easy to test.
Database wise you shouldn't have a problem.

Marc Mamin

2013-04-11 08:14:09 UTC

Permalink

Post by Vedran Krivokuca
2) we can go with single instance of PostgreSQL service which would
then contain 30.000 database tables.
So, consider this purely theoretical discussion - does anyone here have
experience of running PostgreSQL service with large number of database
tables (couple of thousands up to couple of tens of thousands)?

Hello,

We are handling data for several customers, each of them having its own schema.
This allows us to easily bulk move data on other servers when place gets rare.
These are comparable to data warehouse data and are used by a reporting application.
A majority of the tables are seldom or never used which may moderate possible issues with very large pg_catalog.

We are overall impressed how well Postgres handle this :-)
On the other hand we've spent a *LOT* of work in table and query design and we do have some problems with the parser performance:

We do log all durations greater than 1 seconds. 1 % of these are "parse" statements
about 90 % of these long parse statements are below 10 seconds.

Here the figure of our server hosting the most tables.
I expect them to be somewhat above the maximum that we should allow.

DB size :3500 GB

pg_catalog size: 2GB

select count(*) from pg_tables;
120'884

select count(*) from pg_indexes;
219'082

select count(*) from pg_attribute;
2'779'199

Server: 48 GB RAM & 12 cores

regards,

Marc Mamin

--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin