Discussion:
Postgres base Backup fails to recover all logs fails from archive
(too old to reply)
Lars Aksel Opsahl
2013-01-05 21:30:17 UTC
Permalink
Hi


This is postgreSQL 9.2.1


By accident I was running the the following command

«delete from org_oks.oks_kartobjekt_flate where id = 53»

on the wrong database.


- I first picked up last basebackup from the backup server. The base backup was from 22/11 and It was made made by using «24.3.3. Making a Base Backup Using the Low Level API» with SELECT pg_start_backup('label'); and then a file system backup and then SELECT pg_stop_backup();


The row I deleted was added after the base backup was created.

- Her is the cat from 0000000100000023000000F4.00000020.backup

START WAL LOCATION: 23/F4000020 (file 0000000100000023000000F4)

STOP WAL LOCATION: 23/F47303F0 (file 0000000100000023000000F4)

CHECKPOINT LOCATION: 23/F4000020

BACKUP METHOD: pg_start_backup

BACKUP FROM: master

START TIME: 2012-11-22 09:57:03 CET

LABEL: /backup/db01/base_backup_2012_11_22

STOP TIME: 2012-11-22 10:38:44 CET

- I create the recovery.conf with the following content (I have also tested without recovery_target_time but it's the same behavior)

restore_command ='cp /data01/new_archive_files/%f %p'

recovery_target_time = '2013-01-04 04:00:00'

- Here is the log from startup

LOG: database system was interrupted; last known up at 2012-11-22 10:22:36 CET

LOG: starting archive recovery

LOG: restored log file "0000000100000023000000F4" from archive

LOG: redo starts at 23/F4000080

LOG: consistent recovery state reached at 23/F5000000

LOG: restored log file "0000000100000023000000F5" from archive

LOG: restored log file "0000000100000023000000F6" from archive

LOG: restored log file "0000000100000023000000F7" from archive

LOG: contrecord is requested by 23/F7000020

LOG: redo done at 23/F6000080

LOG: restored log file "0000000100000023000000F6" from archive

cp: cannot stat `/data01/new_archive_files/00000002.history': No such file or directory

LOG: selected new timeline ID: 2

cp: cannot stat `/data01/new_archive_files/00000001.history': No such file or directory

LOG: archive recovery complete

LOG: database system is ready to accept connections

LOG: autovacuum launcher started


- Here is the list some the archive files.

-rwx------ 1 postgres postgres 16777216 Nov 24 07:11 000000010000002400000008

-rwx------ 1 postgres postgres 16777216 Nov 24 07:10 000000010000002400000007

-rwx------ 1 postgres postgres 16777216 Nov 24 07:00 000000010000002400000006

-rwx------ 1 postgres postgres 16777216 Nov 24 06:50 000000010000002400000005

-rwx------ 1 postgres postgres 16777216 Nov 24 06:45 000000010000002400000004

-rwx------ 1 postgres postgres 16777216 Nov 24 06:40 000000010000002400000003

-rwx------ 1 postgres postgres 16777216 Nov 24 06:32 000000010000002400000002

-rwx------ 1 postgres postgres 16777216 Nov 24 06:26 000000010000002400000001

-rwx------ 1 postgres postgres 16777216 Nov 24 06:16 000000010000002400000000

-rwx------ 1 postgres postgres 16777216 Nov 24 06:08 0000000100000023000000FE

-rwx------ 1 postgres postgres 16777216 Nov 24 06:02 0000000100000023000000FD

-rwx------ 1 postgres postgres 16777216 Nov 24 05:57 0000000100000023000000FC

-rwx------ 1 postgres postgres 16777216 Nov 24 05:39 0000000100000023000000FB

-rwx------ 1 postgres postgres 16777216 Nov 24 05:32 0000000100000023000000FA

-rwx------ 1 postgres postgres 16777216 Nov 24 05:31 0000000100000023000000F9

-rwx------ 1 postgres postgres 16777216 Nov 24 05:31 0000000100000023000000F8

-rwx------ 1 postgres postgres 16777216 Nov 24 05:27 0000000100000023000000F7

-rwx------ 1 postgres postgres 16777216 Nov 22 12:37 0000000100000023000000F6

-rwx------ 1 postgres postgres 16777216 Nov 22 12:37 0000000100000023000000F5

-rwx------ 1 postgres postgres 16777216 Nov 22 10:38 0000000100000023000000F4

-rwx------ 1 postgres postgres 325 Nov 22 10:38 0000000100000023000000F4.00000020.backup


I have two questions

1. Can anybody see what the problem is here and what I am doing wrong ? (I have done this many times before without any problems.)

2. Is it possible to extract sql insert's from the archive files abouve and in that way find the content of the row I deleted ? (The table name oks_kartobjekt_flate only occurs in about 10 archive files)

Thanks in advance

Lars
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Magnus Hagander
2013-01-05 21:45:07 UTC
Permalink
On Sat, Jan 5, 2013 at 10:30 PM, Lars Aksel Opsahl
Post by Lars Aksel Opsahl
Hi
This is postgreSQL 9.2.1
By accident I was running the the following command
«delete from org_oks.oks_kartobjekt_flate where id = 53»
on the wrong database.
- I first picked up last basebackup from the backup server. The base backup was from 22/11 and It was made made by using «24.3.3. Making a Base Backup Using the Low Level API» with SELECT pg_start_backup('label'); and then a file system backup and then SELECT pg_stop_backup();
The row I deleted was added after the base backup was created.
- Her is the cat from 0000000100000023000000F4.00000020.backup
START WAL LOCATION: 23/F4000020 (file 0000000100000023000000F4)
STOP WAL LOCATION: 23/F47303F0 (file 0000000100000023000000F4)
CHECKPOINT LOCATION: 23/F4000020
BACKUP METHOD: pg_start_backup
BACKUP FROM: master
START TIME: 2012-11-22 09:57:03 CET
LABEL: /backup/db01/base_backup_2012_11_22
STOP TIME: 2012-11-22 10:38:44 CET
- I create the recovery.conf with the following content (I have also tested without recovery_target_time but it's the same behavior)
restore_command ='cp /data01/new_archive_files/%f %p'
recovery_target_time = '2013-01-04 04:00:00'
- Here is the log from startup
LOG: database system was interrupted; last known up at 2012-11-22 10:22:36 CET
LOG: starting archive recovery
LOG: restored log file "0000000100000023000000F4" from archive
LOG: redo starts at 23/F4000080
LOG: consistent recovery state reached at 23/F5000000
LOG: restored log file "0000000100000023000000F5" from archive
LOG: restored log file "0000000100000023000000F6" from archive
LOG: restored log file "0000000100000023000000F7" from archive
LOG: contrecord is requested by 23/F7000020
LOG: redo done at 23/F6000080
LOG: restored log file "0000000100000023000000F6" from archive
cp: cannot stat `/data01/new_archive_files/00000002.history': No such file or directory
LOG: selected new timeline ID: 2
cp: cannot stat `/data01/new_archive_files/00000001.history': No such file or directory
LOG: archive recovery complete
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
- Here is the list some the archive files.
-rwx------ 1 postgres postgres 16777216 Nov 24 07:11 000000010000002400000008
-rwx------ 1 postgres postgres 16777216 Nov 24 07:10 000000010000002400000007
-rwx------ 1 postgres postgres 16777216 Nov 24 07:00 000000010000002400000006
-rwx------ 1 postgres postgres 16777216 Nov 24 06:50 000000010000002400000005
-rwx------ 1 postgres postgres 16777216 Nov 24 06:45 000000010000002400000004
-rwx------ 1 postgres postgres 16777216 Nov 24 06:40 000000010000002400000003
-rwx------ 1 postgres postgres 16777216 Nov 24 06:32 000000010000002400000002
-rwx------ 1 postgres postgres 16777216 Nov 24 06:26 000000010000002400000001
-rwx------ 1 postgres postgres 16777216 Nov 24 06:16 000000010000002400000000
-rwx------ 1 postgres postgres 16777216 Nov 24 06:08 0000000100000023000000FE
-rwx------ 1 postgres postgres 16777216 Nov 24 06:02 0000000100000023000000FD
-rwx------ 1 postgres postgres 16777216 Nov 24 05:57 0000000100000023000000FC
-rwx------ 1 postgres postgres 16777216 Nov 24 05:39 0000000100000023000000FB
-rwx------ 1 postgres postgres 16777216 Nov 24 05:32 0000000100000023000000FA
-rwx------ 1 postgres postgres 16777216 Nov 24 05:31 0000000100000023000000F9
-rwx------ 1 postgres postgres 16777216 Nov 24 05:31 0000000100000023000000F8
-rwx------ 1 postgres postgres 16777216 Nov 24 05:27 0000000100000023000000F7
-rwx------ 1 postgres postgres 16777216 Nov 22 12:37 0000000100000023000000F6
-rwx------ 1 postgres postgres 16777216 Nov 22 12:37 0000000100000023000000F5
-rwx------ 1 postgres postgres 16777216 Nov 22 10:38 0000000100000023000000F4
-rwx------ 1 postgres postgres 325 Nov 22 10:38 0000000100000023000000F4.00000020.backup
I have two questions
1. Can anybody see what the problem is here and what I am doing wrong ? (I have done this many times before without any problems.)
What actually went wrong? Because the log looks like a perfectly
normal restore? (Though you should really add timestamps to your log)
Post by Lars Aksel Opsahl
2. Is it possible to extract sql insert's from the archive files abouve and in that way find the content of the row I deleted ? (The table name oks_kartobjekt_flate only occurs in about 10 archive files)
No, the archive files only contain the binary changes to the disk
blocks, not the SQL statements or data.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Lars Aksel Opsahl
2013-01-05 22:30:52 UTC
Permalink
Hi

Thanks for your help.

The problem is that it stops to restore the log files after reading log file 0000000100000023000000F7.

The recovery does not continue to read log file 0000000100000023000000F8 and the rest of the log files.

I don't under stand why it stops to read the rest of the log files.

Lars
________________________________________
Fra: Magnus Hagander [***@hagander.net]
Sendt: 5. januar 2013 22:45
Til: Lars Aksel Opsahl
Kopi: pgsql-***@postgresql.org
Emne: Re: [ADMIN] Postgres base Backup fails to recover all logs fails from archive

On Sat, Jan 5, 2013 at 10:30 PM, Lars Aksel Opsahl
Post by Lars Aksel Opsahl
Hi
This is postgreSQL 9.2.1
By accident I was running the the following command
«delete from org_oks.oks_kartobjekt_flate where id = 53»
on the wrong database.
- I first picked up last basebackup from the backup server. The base backup was from 22/11 and It was made made by using «24.3.3. Making a Base Backup Using the Low Level API» with SELECT pg_start_backup('label'); and then a file system backup and then SELECT pg_stop_backup();
The row I deleted was added after the base backup was created.
- Her is the cat from 0000000100000023000000F4.00000020.backup
START WAL LOCATION: 23/F4000020 (file 0000000100000023000000F4)
STOP WAL LOCATION: 23/F47303F0 (file 0000000100000023000000F4)
CHECKPOINT LOCATION: 23/F4000020
BACKUP METHOD: pg_start_backup
BACKUP FROM: master
START TIME: 2012-11-22 09:57:03 CET
LABEL: /backup/db01/base_backup_2012_11_22
STOP TIME: 2012-11-22 10:38:44 CET
- I create the recovery.conf with the following content (I have also tested without recovery_target_time but it's the same behavior)
restore_command ='cp /data01/new_archive_files/%f %p'
recovery_target_time = '2013-01-04 04:00:00'
- Here is the log from startup
LOG: database system was interrupted; last known up at 2012-11-22 10:22:36 CET
LOG: starting archive recovery
LOG: restored log file "0000000100000023000000F4" from archive
LOG: redo starts at 23/F4000080
LOG: consistent recovery state reached at 23/F5000000
LOG: restored log file "0000000100000023000000F5" from archive
LOG: restored log file "0000000100000023000000F6" from archive
LOG: restored log file "0000000100000023000000F7" from archive
LOG: contrecord is requested by 23/F7000020
LOG: redo done at 23/F6000080
LOG: restored log file "0000000100000023000000F6" from archive
cp: cannot stat `/data01/new_archive_files/00000002.history': No such file or directory
LOG: selected new timeline ID: 2
cp: cannot stat `/data01/new_archive_files/00000001.history': No such file or directory
LOG: archive recovery complete
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
- Here is the list some the archive files.
-rwx------ 1 postgres postgres 16777216 Nov 24 07:11 000000010000002400000008
-rwx------ 1 postgres postgres 16777216 Nov 24 07:10 000000010000002400000007
-rwx------ 1 postgres postgres 16777216 Nov 24 07:00 000000010000002400000006
-rwx------ 1 postgres postgres 16777216 Nov 24 06:50 000000010000002400000005
-rwx------ 1 postgres postgres 16777216 Nov 24 06:45 000000010000002400000004
-rwx------ 1 postgres postgres 16777216 Nov 24 06:40 000000010000002400000003
-rwx------ 1 postgres postgres 16777216 Nov 24 06:32 000000010000002400000002
-rwx------ 1 postgres postgres 16777216 Nov 24 06:26 000000010000002400000001
-rwx------ 1 postgres postgres 16777216 Nov 24 06:16 000000010000002400000000
-rwx------ 1 postgres postgres 16777216 Nov 24 06:08 0000000100000023000000FE
-rwx------ 1 postgres postgres 16777216 Nov 24 06:02 0000000100000023000000FD
-rwx------ 1 postgres postgres 16777216 Nov 24 05:57 0000000100000023000000FC
-rwx------ 1 postgres postgres 16777216 Nov 24 05:39 0000000100000023000000FB
-rwx------ 1 postgres postgres 16777216 Nov 24 05:32 0000000100000023000000FA
-rwx------ 1 postgres postgres 16777216 Nov 24 05:31 0000000100000023000000F9
-rwx------ 1 postgres postgres 16777216 Nov 24 05:31 0000000100000023000000F8
-rwx------ 1 postgres postgres 16777216 Nov 24 05:27 0000000100000023000000F7
-rwx------ 1 postgres postgres 16777216 Nov 22 12:37 0000000100000023000000F6
-rwx------ 1 postgres postgres 16777216 Nov 22 12:37 0000000100000023000000F5
-rwx------ 1 postgres postgres 16777216 Nov 22 10:38 0000000100000023000000F4
-rwx------ 1 postgres postgres 325 Nov 22 10:38 0000000100000023000000F4.00000020.backup
I have two questions
1. Can anybody see what the problem is here and what I am doing wrong ? (I have done this many times before without any problems.)
What actually went wrong? Because the log looks like a perfectly
normal restore? (Though you should really add timestamps to your log)
Post by Lars Aksel Opsahl
2. Is it possible to extract sql insert's from the archive files abouve and in that way find the content of the row I deleted ? (The table name oks_kartobjekt_flate only occurs in about 10 archive files)
No, the archive files only contain the binary changes to the disk
blocks, not the SQL statements or data.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Magnus Hagander
2013-01-07 20:47:37 UTC
Permalink
On Sat, Jan 5, 2013 at 11:30 PM, Lars Aksel Opsahl
Post by Lars Aksel Opsahl
Hi
Thanks for your help.
The problem is that it stops to restore the log files after reading log file 0000000100000023000000F7.
The recovery does not continue to read log file 0000000100000023000000F8 and the rest of the log files.
I don't under stand why it stops to read the rest of the log files.
Ah, I see that now.

Well, it's claiming that the F7 logfile is corrupt.

There's also an interesting jump in timestamps - the F7 logfile is
from *much* later than F6. Almost two whole days. Is that actually
normal, or did something possibly go wrong with your archiving?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Lars Aksel Opsahl
2013-01-08 10:00:07 UTC
Permalink
Hi again

I have tried to check different /var/log/messages on the servers but they only go back 1 month. I also had quick glance at Nagios and Munin logs but I could not find anything there.

This database is mainly used for reading data as it is now, but it's a couple of applications which also store data. The 2 days with out activity in the archive logs is strange when I checked back on the activity reports for applications. I will set the archive_timeout, but the problem is that we have small updates and the archive files has the same size and is not depending on the amount of changed data.

We use NFS to copy archive files to the backup server because that is simple, but maybe we should have used rsyn or something else, but independent off how we do it, we need a method to check that the archive file is OK after it's copy.

If it's not OK the files must be transfers once more from the database server. I assume the archive always will be OK when it's created on the database server.

Is the the way off doing this to make a script that copies the archive file to backup server cheeks that the files are equal before before it is deleted on the database server ?

Is there any simple utility program that can check if the archive file is OK or do I have to do a restore to check if the file is OK ?

Thanks Lars
________________________________________
Fra: Magnus Hagander [***@hagander.net]
Sendt: 7. januar 2013 21:47
Til: Lars Aksel Opsahl
Kopi: pgsql-***@postgresql.org
Emne: Re: [ADMIN] Postgres base Backup fails to recover all logs fails from archive

On Sat, Jan 5, 2013 at 11:30 PM, Lars Aksel Opsahl
Post by Lars Aksel Opsahl
Hi
Thanks for your help.
The problem is that it stops to restore the log files after reading log file 0000000100000023000000F7.
The recovery does not continue to read log file 0000000100000023000000F8 and the rest of the log files.
I don't under stand why it stops to read the rest of the log files.
Ah, I see that now.

Well, it's claiming that the F7 logfile is corrupt.

There's also an interesting jump in timestamps - the F7 logfile is
from *much* later than F6. Almost two whole days. Is that actually
normal, or did something possibly go wrong with your archiving?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Magnus Hagander
2013-01-13 13:30:56 UTC
Permalink
On Tue, Jan 8, 2013 at 11:00 AM, Lars Aksel Opsahl
Post by Lars Aksel Opsahl
Hi again
I have tried to check different /var/log/messages on the servers but they only go back 1 month. I also had quick glance at Nagios and Munin logs but I could not find anything there.
This database is mainly used for reading data as it is now, but it's a couple of applications which also store data. The 2 days with out activity in the archive logs is strange when I checked back on the activity reports for applications. I will set the archive_timeout, but the problem is that we have small updates and the archive files has the same size and is not depending on the amount of changed data.
Yeah, that is indeed a problem. pg_receivexlog can help you arund that
issue, but then your receiving end needs to be a "real box" and not
just an NFS share.
Post by Lars Aksel Opsahl
We use NFS to copy archive files to the backup server because that is simple, but maybe we should have used rsyn or something else, but independent off how we do it, we need a method to check that the archive file is OK after it's copy.
I would generally recommend using somehting that writes the file
atomically. For example, rsync. But you can use rsync with your target
being the NFS share - you don't need an rsync server. Or just a manual
copy+rename. As long as your NFS isn't mounted asynchronously, that
should be safe.
Post by Lars Aksel Opsahl
If it's not OK the files must be transfers once more from the database server. I assume the archive always will be OK when it's created on the database server.
Is the the way off doing this to make a script that copies the archive file to backup server cheeks that the files are equal before before it is deleted on the database server ?
Copy+rename should take care of it for you - I doubt it's an actual
failure in contents wrong, it's more that something crashed in the
middle of a file and therefor it ended up being corrupt. If that was
written to a temp file and then renamed into place, that should not
happen.
Post by Lars Aksel Opsahl
Is there any simple utility program that can check if the archive file is OK or do I have to do a restore to check if the file is OK ?
I don't believe there is such a tool. You could make some very most
basic verification manually, but to verify the full contents you
really need to fully parse them - and the tool to do that is
"postgres".

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Loading...