Discussion:
[ADMIN] rsync and streaming replication
(too old to reply)
Jean-Armel Luce
2011-11-12 14:15:56 UTC
Permalink
Hi all,


I am using Postgres 9.0.3 in production with Slony 2.0.6 for replication.
I am currently testing the streaming replication with 9.0.3. My database
contains 100 GBytes of data (6000 tables).

I have 1 master node and 3 slave nodes. The 1st slave node is in the same
datacenter as the master node. The 2 other slave nodes are in a 2nd distant
datacenter.

Say node 1 is the master node, node 2 is the local slave node, and nodes 3
and 4 are the slave nodes in the distant datacenter.

After installation of the streaming replication, I have checked that all
the files have been copied from the master to the slaves and that files
have the same size and modification date on each node.
When I send update requests, the streaming replication works perfectly; I
am able to replicate many thousands of updates per second.
I have updated, inserted and deleted nearly 70% of the rows in all the
tables (mainly update requests), and replication is working perfectly.

However; when I want to promote node 3 as the master, I need to rsync all
the logs and databases from node 3 to node 4, node 1 and node 2.

Even if I use rsync -a (as suggested in the wiki
http://wiki.postgresql.org/wiki/Streaming_Replication), it takes a long
time (nearly 2 hours for a distant slave).
It looks that all database files do not have the same modification date in
the master node and in the slave nodes, so the rsync copies quite all the
database from the new master to the slaves.


At first, I was suspecting vacuum process for modifying the files not
simultaneously in the master and the slaves. When I check the dates last
autoanalyze and last_autovacuum fields in pg_stat_user_tables, they are
diffent than the mtime of the files; so it looks autovacuum is not
responsible of that.

Please, could you help me to understand why it is so long to rsync the data
from the new master to the other slaves ?
Did I miss anything ?
Any idea ?


Jean-Armel
Kevin Grittner
2011-11-12 16:11:56 UTC
Permalink
Post by Jean-Armel Luce
Please, could you help me to understand why it is so long to rsync
the data from the new master to the other slaves ?
Your post is a little light on details. I think the most useful
information would be the output of:

rysnc --version

and the exact rsync command you are using. If you are using any sort
of remote mount point, that would be important to know.

-Kevin
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Jerry Sievers
2011-11-12 19:21:22 UTC
Permalink
Post by Jean-Armel Luce
Hi all,
I am using Postgres 9.0.3 in production with Slony 2.0.6 for replication.
I am currently testing the streaming replication with 9.0.3.? My database contains 100 GBytes of data (6000 tables).
I have 1 master node and 3 slave nodes. The 1st slave node is in the same datacenter as the master node. The 2 other
slave nodes are in a 2nd distant datacenter.
Say node 1 is the master node, node 2 is the local slave node, and nodes 3 and 4 are the slave nodes in the distant
datacenter.
After installation of the streaming replication, I have checked that all the files have been copied from the master to
the slaves and that files have the same size and modification date on each node.
When I send update requests, the streaming replication works perfectly; I am able to replicate many thousands of
updates per second.
I have updated, inserted and deleted nearly 70% of the rows in all the tables (mainly update requests), and
replication is working perfectly.
However; when I want to promote node 3 as the master, I need to rsync all the logs and databases from node 3 to node
4, node 1 and node 2.
That is a popular misconception.

Assuming you've got your WAL repositories well organized and your
standbys are configure to follow latest timeline...

You should be able to repoint the existing standbys and even the
demoted master just by creating or changing recovery.conf as needed
and restarting all standbys.

I did a talk about that at Pg-East 2010 based on Pg 8.4. It all still
seems to work now with streaming replication and hot-standby as well.

HTH
Post by Jean-Armel Luce
Even if I use rsync -a (as suggested in the wiki http://wiki.postgresql.org/wiki/Streaming_Replication), it takes a
long time (nearly 2 hours for a distant slave).
It looks that all database files do not have the same modification date in the master node and in the slave nodes, so
the rsync copies quite all the database from the new master to the slaves.
At first, I was suspecting vacuum process for modifying the files not simultaneously in the master and the slaves.
When I check the dates? last autoanalyze and last_autovacuum fields in pg_stat_user_tables, they are diffent than the
mtime of the files; so it looks autovacuum is not responsible of that.
Please, could you help me to understand why it is so long to rsync the data from the new master to the other slaves ?
Did I miss anything ?
Any idea ?
Jean-Armel
--
Jerry Sievers
Postgres DBA/Development Consulting
e: ***@comcast.net
p: 305.321.1144
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Jean-Armel Luce
2011-11-13 08:44:46 UTC
Permalink
Hi Jerry and Kevin,

Thanks for your answers.

Jerry, I tried as you said with the parameter recovery_target_timeline =
'latest' and it works.

I tried on a smaller test database (only 15MB) with PG9.1.1 and only 1
slave.

My switchover procedure was :

Step 1 : stop the old master
/usr/local/pgsql/bin/pg_ctl stop -m immediate -D
/usr/local/pgsql91/server1/data

Step 2 : promote slave as master :
touch /usr/local/pgsql91/server2/data/trigger_file

Step 3 : declare the old master as a standby server
Step 3.1 : vi /usr/local/pgsql91/server1/data/postgresql.conf
Add hot_standby = on in the postgresql.conf

Step 3.2 Set recovery.conf for old master server (including
recovery_target_timeline = 'latest')
cp /usr/local/pgsql91/server1/data/recovery.bkp
/usr/local/pgsql91/server1/data/recovery.conf

Step 4 : start old master
/usr/local/pgsql/bin/pg_ctl start -D /usr/local/pgsql91/server1/data &


The old master is now a hot_standby of the new master. Replication works
without rsyncing all data from new master to new slave.


Tomorrow, I shall try with PG9.0.3, 3 slaves and a primary database with
100 GB.

Thanks.

Jal
Cédric Villemain
2011-11-14 17:01:13 UTC
Permalink
Post by Jean-Armel Luce
Hi Jerry and Kevin,
Thanks for your answers.
Jerry, I tried as you said with the parameter recovery_target_timeline =
'latest' and it works.
I tried on a smaller test database (only 15MB) with PG9.1.1 and only 1
slave.
Step 1 : stop the old master
/usr/local/pgsql/bin/pg_ctl stop -m immediate -D
/usr/local/pgsql91/server1/data
touch /usr/local/pgsql91/server2/data/trigger_file
Step 3 : declare the old master as a standby server
Step 3.1 : vi /usr/local/pgsql91/server1/data/postgresql.conf
Add hot_standby = on in the postgresql.conf
Step 3.2 Set recovery.conf for old master server (including
recovery_target_timeline = 'latest')
cp /usr/local/pgsql91/server1/data/recovery.bkp
/usr/local/pgsql91/server1/data/recovery.conf
Step 4 : start old master
/usr/local/pgsql/bin/pg_ctl start -D /usr/local/pgsql91/server1/data &
The old master is now a hot_standby of the new master. Replication works
without rsyncing all data from new master to new slave.
Tomorrow, I shall try with PG9.0.3, 3 slaves and a primary database with 100
GB.
Thanks.
Jal
just for the value : rsync --checksum is the option to use to prevent
copying of identical files (it computes checksum on both side before
sending)
--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Jean-Armel Luce
2011-11-14 17:59:22 UTC
Permalink
Hi,

I tried many times with different options of rsync :
with rsync -a as explained in the wiki (
http://www.postgresql.org/docs/9.0/interactive/warm-standby.html#STANDBY-SERVER-SETUP),
it takes 1h40 for each distant slave and quite all data files are
transferred
with -c (checksum) or -z (compress) , it takes more time, probably due to
checksum time or compress/uncompress time

For example, the rsync commands I am using are for data and logs are (I
have 2 databases, 1 tablespace per database):
rsync -a /var/opt/hosting/db/slony/pg_xlog/*
slave-db01.profiles.bench.pns-si.s1.p.fti.net:
/var/opt/hosting/db/slony/pg_xlog/
rsync -a /var/opt/hosting/db/profiles/bench/*
slave-db01.profiles.bench.pns-si.s1.p.fti.net:
/var/opt/hosting/db/profiles/bench/
rsync -a /var/opt/hosting/db/profiles/profiles/*
slave-db01.profiles.bench.pns-si.s1.p.fti.net:
/var/opt/hosting/db/profiles/profiles/


More information about version of rsync :
***@master-db01:~$ rsync --version
rsync version 3.0.7 protocol version 30

Jal
Post by Cédric Villemain
Post by Jean-Armel Luce
Hi Jerry and Kevin,
Thanks for your answers.
Jerry, I tried as you said with the parameter recovery_target_timeline =
'latest' and it works.
I tried on a smaller test database (only 15MB) with PG9.1.1 and only 1
slave.
Step 1 : stop the old master
/usr/local/pgsql/bin/pg_ctl stop -m immediate -D
/usr/local/pgsql91/server1/data
touch /usr/local/pgsql91/server2/data/trigger_file
Step 3 : declare the old master as a standby server
Step 3.1 : vi /usr/local/pgsql91/server1/data/postgresql.conf
Add hot_standby = on in the postgresql.conf
Step 3.2 Set recovery.conf for old master server (including
recovery_target_timeline = 'latest')
cp /usr/local/pgsql91/server1/data/recovery.bkp
/usr/local/pgsql91/server1/data/recovery.conf
Step 4 : start old master
/usr/local/pgsql/bin/pg_ctl start -D /usr/local/pgsql91/server1/data &
The old master is now a hot_standby of the new master. Replication works
without rsyncing all data from new master to new slave.
Tomorrow, I shall try with PG9.0.3, 3 slaves and a primary database with
100
Post by Jean-Armel Luce
GB.
Thanks.
Jal
just for the value : rsync --checksum is the option to use to prevent
copying of identical files (it computes checksum on both side before
sending)
--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation
Scott Ribe
2011-11-14 18:08:49 UTC
Permalink
Post by Cédric Villemain
just for the value : rsync --checksum is the option to use to prevent
copying of identical files
No, that's not what it's used for. It already avoids sending identical blocks by using checksums. --checksum forces a checksum on files that have identical sizes & mod times, thus catching files that have different contents despite having the same mod times & sizes.
--
Scott Ribe
***@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Cédric Villemain
2011-11-15 10:02:50 UTC
Permalink
Post by Scott Ribe
Post by Cédric Villemain
just for the value : rsync --checksum is the option to use to prevent
copying of identical files
No, that's not what it's used for. It already avoids sending identical blocks by using checksums. --checksum forces a checksum on files that have identical sizes & mod times, thus catching files that have different contents despite having the same mod times & sizes.
no, you are wrong.
-c, --checksum
"This changes the way rsync checks if the files have been changed and
are in need of a transfer. Without this option, rsync uses a "quick
check" that (by default) checks if each file's size and time of last
modification match between the sender and receiver. This option
changes this to compare a 128-bit checksum for each file that has a
matching size. Generating the checksums means that both sides will
expend a lot of disk I/O reading all the data in the files in the
transfer (and this is prior to any reading that will be done to
transfer changed files), so this can slow things down significantly. "
...
--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Scott Ribe
2011-11-15 14:08:08 UTC
Permalink
Post by Cédric Villemain
no, you are wrong.
-c, --checksum
"This changes the way rsync checks if the files have been changed and
are in need of a transfer. Without this option, rsync uses a "quick
check" that (by default) checks if each file's size and time of last
modification match between the sender and receiver. This option
changes this to compare a 128-bit checksum for each file that has a
matching size. Generating the checksums means that both sides will
expend a lot of disk I/O reading all the data in the files in the
transfer (and this is prior to any reading that will be done to
transfer changed files), so this can slow things down significantly. "
Seriously, read that and what I said. They are the same, except that the documentation provides more detail.
--
Scott Ribe
***@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Cédric Villemain
2011-11-15 15:30:58 UTC
Permalink
Post by Scott Ribe
Post by Cédric Villemain
no, you are wrong.
-c, --checksum
"This changes the way rsync checks if the files have been changed and
are in need of a transfer. Without this option, rsync uses a "quick
check" that (by default) checks if each file's size and time of last
modification match between the sender and receiver. This option
changes this to compare a 128-bit checksum for each file that has a
matching size. Generating the checksums means that both sides will
expend a lot of disk I/O reading all the data in the files in the
transfer (and this is prior to any reading that will be done to
transfer changed files), so this can slow things down significantly. "
Seriously, read that and what I said. They are the same, except that the documentation provides more detail.
Seriously, I did. Is my post "just for the value : rsync --checksum is
the option to use to prevent copying of **identical files**" incorrect
?

OP contains "It looks that all database files do not have the same
modification date in the master node and in the slave nodes, so the
rsync copies quite all the database from the new master to the
slaves."

One benefit is when files are in fact identical on both side, so that
rsync does not have to process checksum for each blocks on source and
destination. (when there are few changes, we expect rsync to copy only
those few changes, with or without --checksum).
--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Scott Ribe
2011-11-15 15:59:10 UTC
Permalink
Post by Cédric Villemain
Seriously, I did. Is my post "just for the value : rsync --checksum is
the option to use to prevent copying of **identical files**" incorrect
?
It's at least incomplete and somewhat misleading. But I guess you could say the same about my post; we seem to be focusing on 2 different aspects of its behavior ;-)
Post by Cédric Villemain
OP contains "It looks that all database files do not have the same
modification date in the master node and in the slave nodes, so the
rsync copies quite all the database from the new master to the
slaves."
Yes, and rsync should only be copying changed blocks in that case, of which you are aware, but which OP did not seem to realize.
Post by Cédric Villemain
One benefit is when files are in fact identical on both side, so that
rsync does not have to process checksum for each blocks on source and
destination. (when there are few changes, we expect rsync to copy only
those few changes, with or without --checksum).
Well, but it does calculate checksums on the entire contents of both files (which takes as much I/O and about as much CPU as calculating checksums for each block), even when timestamps & sizes are identical.

For the OP's case, identical files with differing timestamps, the only potential savings is from not exchanging checksums over the network, which is not likely to offer any meaningful improvement in performance, which still leaves open the question as to why rsync is so slow in that, when we know it is usually relatively fast to sync two servers with few differences.

Would be nice to actually hear from OP regarding file sizes/counts & network bandwidth & disks & and so on ;-)
--
Scott Ribe
***@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Jean-Armel Luce
2011-11-15 18:39:52 UTC
Permalink
Sorry to be so long to answer :-(


2011/11/14 Scott Ribe <***@elevated-dev.com>

- How large is the db? By which I mean how much disk space does the
data directory occupy?
20 GB

- What's the bandwidth of the network connection to the distant slave?
during rsync, I see narly 125Mbits/sec

- What's the CPU & disk on each end?
disk : 6 * 146GB 15Krpm RAID10
CPU : 16 cores Xeon(R) CPU L5630 @ 2.13GHz cache size : 12288 KB

Only Postgres is running on my servers (no HTTP server, nothing else, ...).
CPU usage is very low.

This afternoon, I have again sent some updates requests, which were
replicated to the sslaves.
:
- I am looking modification of modification dates and checksums of 2 tables
among my 6000 tables :
For each file, the checksum is the same on all the slaves, but different
from the checksum of the master.
For each file, the modification time is different on each node. (see below)

So if I want to promote one slave as the master, it will not need to copy
data from the new master to the previous slaves with rsync, but it will
copy all the files from the new master to the old master (which is now a
slave).

I shall try tomorrow topromote again a slave, and I shall rsync withh
--checksum.
I don't think that it is -a is very useful withh --checksum (no need to
preserve modification times). Do you agree ?



On the master :
***@master-db01:/var/opt/hosting/db/profiles/bench/PG_9.0_201008051/29797$
ls -l 107867807
-rw------- 1 postgres postgres 5914624 Nov 15 14:31 107867807
***@master-db01:/var/opt/hosting/db/profiles/bench/PG_9.0_201008051/29797$
md5sum 107867807
23c28c6432c073d370e7e9624fd04e3b 107867807

***@master-db01:/var/opt/hosting/db/profiles/bench/PG_9.0_201008051/29797$
ls -l 107867867
-rw------- 1 postgres postgres 5996544 Nov 15 14:31 107867867
***@master-db01:/var/opt/hosting/db/profiles/bench/PG_9.0_201008051/29797$
md5sum 107867867
1f3398e18e22bfeb31ca2db82d8517f2 107867867
***@master-db01
:/var/opt/hosting/db/profiles/bench/PG_9.0_201008051/29797$


On slave 1 :
***@master-db01:/var/opt/hosting/db/profiles/bench/PG_9.0_201008051/29797$
ls -l 107867807
-rw------- 1 postgres postgres 5914624 Nov 15 14:16 107867807
***@master-db01:/var/opt/hosting/db/profiles/bench/PG_9.0_201008051/29797$
md5sum 107867807
f79d93ca00ac400f709a028c597ca4ab 107867807
***@master-db01
:/var/opt/hosting/db/profiles/bench/PG_9.0_201008051/29797$
***@master-db01:/var/opt/hosting/db/profiles/bench/PG_9.0_201008051/29797$
ls -l 107867867
-rw------- 1 postgres postgres 5996544 Nov 15 14:18 107867867
***@master-db01:/var/opt/hosting/db/profiles/bench/PG_9.0_201008051/29797$
md5sum 107867867
3d5534f855168378f5cb2e30d622c0aa 107867867


On slave 2 :
***@slave-db01:/var/opt/hosting/db/profiles/bench/PG_9.0_201008051/29797$
ls -l 107867807
-rw------- 1 postgres postgres 5914624 Nov 15 14:24 107867807
***@slave-db01:/var/opt/hosting/db/profiles/bench/PG_9.0_201008051/29797$
md5sum 107867807
f79d93ca00ac400f709a028c597ca4ab 107867807
***@slave-db01
:/var/opt/hosting/db/profiles/bench/PG_9.0_201008051/29797$
***@slave-db01:/var/opt/hosting/db/profiles/bench/PG_9.0_201008051/29797$
ls -l 107867867
-rw------- 1 postgres postgres 5996544 Nov 15 14:25 107867867
***@slave-db01:/var/opt/hosting/db/profiles/bench/PG_9.0_201008051/29797$
md5sum 107867867
3d5534f855168378f5cb2e30d622c0aa 107867867



On slave 3 :
***@slave-db01:/var/opt/hosting/db/profiles/bench/PG_9.0_201008051/29797$
ls -l 107867807
-rw------- 1 postgres postgres 5914624 Nov 15 14:16 107867807
***@slave-db01:/var/opt/hosting/db/profiles/bench/PG_9.0_201008051/29797$
md5sum 107867807
f79d93ca00ac400f709a028c597ca4ab 107867807
***@slave-db01
:/var/opt/hosting/db/profiles/bench/PG_9.0_201008051/29797$
***@slave-db01:/var/opt/hosting/db/profiles/bench/PG_9.0_201008051/29797$
ls -l 107867867
-rw------- 1 postgres postgres 5996544 Nov 15 14:18 107867867
***@slave-db01:/var/opt/hosting/db/profiles/bench/PG_9.0_201008051/29797$
md5sum 107867867
3d5534f855168378f5cb2e30d622c0aa 107867867
***@slave-db01
:/var/opt/hosting/db/profiles/bench/PG_9.0_201008051/29797$
***@slave-db01
:/var/opt/hosting/db/profiles/bench/PG_9.0_201008051/29797$

Jal
Post by Kevin Grittner
--
http://www.postgresql.org/mailpref/pgsql-admin
Jean-Armel Luce
2011-11-15 20:11:37 UTC
Permalink
Humm sorry, I did a mistake
Post by Jean-Armel Luce
Sorry to be so long to answer :-(
- How large is the db? By which I mean how much disk space does the
data directory occupy?
100 GB
- How large is the db? By which I mean how much disk space does the
data directory occupy?
20 GB
- What's the bandwidth of the network connection to the distant slave?
during rsync, I see narly 125Mbits/sec
- What's the CPU & disk on each end?
disk : 6 * 146GB 15Krpm RAID10
Only Postgres is running on my servers (no HTTP server, nothing else,
...). CPU usage is very low.
This afternoon, I have again sent some updates requests, which were
replicated to the sslaves.
- I am looking modification of modification dates and checksums of 2
For each file, the checksum is the same on all the slaves, but different
from the checksum of the master.
For each file, the modification time is different on each node. (see below)
So if I want to promote one slave as the master, it will not need to copy
data from the new master to the previous slaves with rsync, but it will
copy all the files from the new master to the old master (which is now a
slave).
I shall try tomorrow topromote again a slave, and I shall rsync withh
--checksum.
I don't think that it is -a is very useful withh --checksum (no need to
preserve modification times). Do you agree ?
ls -l 107867807
-rw------- 1 postgres postgres 5914624 Nov 15 14:31 107867807
md5sum 107867807
23c28c6432c073d370e7e9624fd04e3b 107867807
ls -l 107867867
-rw------- 1 postgres postgres 5996544 Nov 15 14:31 107867867
md5sum 107867867
1f3398e18e22bfeb31ca2db82d8517f2 107867867
:/var/opt/hosting/db/profiles/bench/PG_9.0_201008051/29797$
ls -l 107867807
-rw------- 1 postgres postgres 5914624 Nov 15 14:16 107867807
md5sum 107867807
f79d93ca00ac400f709a028c597ca4ab 107867807
:/var/opt/hosting/db/profiles/bench/PG_9.0_201008051/29797$
ls -l 107867867
-rw------- 1 postgres postgres 5996544 Nov 15 14:18 107867867
md5sum 107867867
3d5534f855168378f5cb2e30d622c0aa 107867867
ls -l 107867807
-rw------- 1 postgres postgres 5914624 Nov 15 14:24 107867807
md5sum 107867807
f79d93ca00ac400f709a028c597ca4ab 107867807
:/var/opt/hosting/db/profiles/bench/PG_9.0_201008051/29797$
ls -l 107867867
-rw------- 1 postgres postgres 5996544 Nov 15 14:25 107867867
md5sum 107867867
3d5534f855168378f5cb2e30d622c0aa 107867867
ls -l 107867807
-rw------- 1 postgres postgres 5914624 Nov 15 14:16 107867807
md5sum 107867807
f79d93ca00ac400f709a028c597ca4ab 107867807
:/var/opt/hosting/db/profiles/bench/PG_9.0_201008051/29797$
ls -l 107867867
-rw------- 1 postgres postgres 5996544 Nov 15 14:18 107867867
md5sum 107867867
3d5534f855168378f5cb2e30d622c0aa 107867867
:/var/opt/hosting/db/profiles/bench/PG_9.0_201008051/29797$
:/var/opt/hosting/db/profiles/bench/PG_9.0_201008051/29797$
Jal
Post by Kevin Grittner
--
http://www.postgresql.org/mailpref/pgsql-admin
Cédric Villemain
2011-11-15 20:31:38 UTC
Permalink
Post by Jean-Armel Luce
This afternoon, I have again sent some updates requests, which were
replicated to the sslaves.
- I am looking modification of modification dates and checksums of 2 tables
For each file, the checksum is the same on all the slaves, but different
from the checksum of the master.
For each file, the modification time is different on each node. (see below)
you are probably hit by "hint bits": they are not WAL-logged with 9.0
so the files can be different just because of "select" you issued on
master and/or standby.
Post by Jean-Armel Luce
So if I want to promote one slave as the master, it will not need to copy
data from the new master to the previous slaves with rsync, but it will copy
all the files from the new master to the old master (which is now a slave).
I shall try tomorrow topromote again a slave, and I shall rsync withh
--checksum.
I don't think that it is -a is very useful withh --checksum (no need to
preserve modification times). Do you agree ?
-a is a good shortcut, chaging the modtime is not a real cost. So
despite you don't need to keep the mtime, there is no benefit in not
keeping it :)
Well, after re-reading rsync manual, and taking into account Scott answers:
use --ignore-time will make all files rsynced (thus it will check each
block and copy only the blocks which differ)
use --checksum will make all files to be read and checksumed in both
side before trying to rsynced them (and check each block and copy them
if required). Obvisouly when the files do not have the same size, they
are rsynced without a 'global' checksum.

It is safer (someone can say paranoid, which is correct) to use one of
those in the PostgreSQL case where we have a size limit and where
files can be modified in both side without affecting their size. So
there is a hight risk to have the same size on source and destination
and a very low risk to have the same modification time when the
content is changed. I admit the risk is very low and in practice it
should not happen. As many things should not happen...

If you want to reduce the re-rsync step, you may want to try to have
similar files in both places by using vacuum freeze before initial
rsync, or something like that (so hint bits are set before rsync).
--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Jean-Armel Luce
2011-11-16 16:37:19 UTC
Permalink
Hi,

Today I tried to promote a slave as master using rsync --checksum (without
doing vacuum freeze) instead of rsync --all.

It takes only 30 minutes with rsync --checksum. Only a few tables are
rsynced. Most of the time is consumed by checksum.
With rsync --all, it takes 1h40 min.

So, rsync --checksum looks better than rsync --all

Jal
Kevin Grittner
2011-11-16 16:50:43 UTC
Permalink
Post by Jean-Armel Luce
So, rsync --checksum looks better than rsync --all
I've never heard of an --all option for rsync. What does that do?

-Kevin
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Scott Ribe
2011-11-16 16:42:34 UTC
Permalink
Post by Jean-Armel Luce
So, rsync --checksum looks better than rsync --all
--all??? What the heck is that and why were you using it?
--
Scott Ribe
***@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Cédric Villemain
2011-11-16 16:50:56 UTC
Permalink
Post by Jean-Armel Luce
Hi,
Today I tried to promote a slave as master using rsync --checksum (without
doing vacuum freeze) instead of rsync --all.
It takes only 30 minutes with rsync --checksum.  Only a few tables are
rsynced. Most of the time is consumed by checksum.
With rsync --all, it takes 1h40 min.
So, rsync --checksum looks better than rsync --all
rsync --all does not exist, you mean --archive I bet (the longopt for
-a, a shorthand for the options below plus -Dt for device/special and
time).
Also you probably want to keep --owner --group --perms --recursive
(or you do file by file with a find or something like that ?)
--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Jean-Armel Luce
2011-11-16 17:11:01 UTC
Permalink
You are right. I used -a, and I was wanting to be more meaningful so I
wrote --all in my post.
Please read --archive insterad of --all

I kept --recursive.
I didn't use --owner , --group, --perms (permissions, group and owner are
the same on each side).
I rsynced all the directory (without /*) as Scott explained.

Sorry for the mistake.

Jal
Scott Ribe
2011-11-16 17:23:28 UTC
Permalink
You are right. I used -a, and I was wanting to be more meaningful so I wrote --all in my post.
Please read --archive insterad of --all
Oh, OK. Still seems odd that it took so much longer. Granted, for the files with different timestamps but identical contents, it then syncs them. But it does so by checksumming blocks, comparing checksums, and sending only blocks that are different over the network. Granted, it has to send some checksums over the network, but that's pretty minor traffic. I believe you said you'd seen 125Mb/s over the network? Is that actually accurate? Does the network connection have high latency?

Also, I believe you said -z seemed to slow it down? That has not been my experience at all with rsync'ing pg databases. Between all the values that are stored as plain text, and the redundancies in indexes, I usually see a good speed increase from compressing the data in transit.

I'm certainly glad that you've got a 3x speed increase--that's significant progress. But still, something seems odd about the performance you've reported, so I'm left wondering about what could cause that. Disk performance glitch at one end or the other, network performance, CPU load???

One thing worth doing I think is to use --stats on every test, so you can see every time how many files and how much data is actually transferred. Also, if you're sitting there watching, sometimes --progress can be informative...
--
Scott Ribe
***@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Scott Ribe
2011-11-14 18:15:10 UTC
Permalink
I think there have been two similar threads recently, and I want to be sure I'm not confusing them. So:

- How large is the db? By which I mean how much disk space does the data directory occupy?

- What's the bandwidth of the network connection to the distant slave?

- What's the CPU & disk on each end?
rsync -a /var/opt/hosting/db/slony/pg_xlog/* slave-db01.profiles.bench.pns-si.s1.p.fti.net:/var/opt/hosting/db/slony/pg_xlog/
rsync -a /var/opt/hosting/db/profiles/bench/* slave-db01.profiles.bench.pns-si.s1.p.fti.net:/var/opt/hosting/db/profiles/bench/
rsync -a /var/opt/hosting/db/profiles/profiles/* slave-db01.profiles.bench.pns-si.s1.p.fti.net:/var/opt/hosting/db/profiles/profiles/
Well, there's one error. Your command is rsync'ing each file individually, so of course each file is sync'd. Sync the directories instead--in other words leave off the * (but not the /) and let rsync decide which files need sync'ing.
--
Scott Ribe
***@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Loading...